1: #LyX 1.3 created this file. For more info see http://www.lyx.org/
2: \lyxformat 221
3: \textclass article
4: \begin_preamble
5: \addtolength{\oddsidemargin}{-50pt}
6: \addtolength{\voffset}{-50pt}
7: \addtolength{\textwidth}{100pt}
8: \addtolength{\textheight}{100pt}
9:
10:
11: \usepackage{xspace}
12: \newcommand{\phipsi}{$\phi,\psi$\xspace}
13: \newcommand{\eg}{e.g.\xspace}
14: \newcommand{\etc}{etc.\xspace}
15: \newcommand{\dfs}{\textsc{dfs}~}
16: \newcommand{\cpp}{\textsc{c++}\xspace}
17: \newcommand{\python}{\textsc{python}\xspace}
18: \newcommand{\swig}{\textsc{swig}\xspace}
19: \newcommand{\pdb}{\textsc{PDB}\xspace}
20: \newcommand{\gabb}{\textsc{GABB}\xspace}
21: \newcommand{\scwrl}{\textsc{scwrl}\xspace}
22: \newcommand{\rapper}{\textsc{rapper}\xspace}
23: \newcommand{\rappertk}{\textit{Rapper}\textbf{tk}\xspace}
24: \newcommand{\probe}{\textsc{probe}\xspace}
25: \newcommand{\ie}{i.e.\xspace}
26: \newcommand{\ca}{$C_\alpha$\xspace}
27: \newcommand{\chir}{$\chi$}
28: \newcommand{\CA}[1]{${C_\alpha}^{#1}$~}
29: \newcommand{\CB}[1]{${C_\beta}^{#1}$~}
30: \newcommand{\C}[1]{$C^{#1}$~}
31: \newcommand{\N}[1]{$N^{#1}$\xspace}
32: \newcommand{\cO}[1]{$O^{#1}$\xspace}
33:
34:
35: \newcommand{\Ang}[1]{${#1}$\AA\xspace}
36:
37: \newcommand{\degr}[1]{${#1}^o$\xspace}
38:
39: \date{}
40: \end_preamble
41: \language english
42: \inputencoding auto
43: \fontscheme default
44: \graphics default
45: \paperfontsize 12
46: \spacing onehalf
47: \papersize Default
48: \paperpackage a4
49: \use_geometry 0
50: \use_amsmath 0
51: \use_natbib 1
52: \use_numerical_citations 0
53: \paperorientation portrait
54: \secnumdepth 3
55: \tocdepth 3
56: \paragraph_separation indent
57: \defskip medskip
58: \quotes_language english
59: \quotes_times 2
60: \papercolumns 1
61: \papersides 1
62: \paperpagestyle default
63:
64: \layout Title
65:
66: Identification of specificity determining residues in enzymes using environment
67: specific substitution tables
68: \layout Author
69:
70: Swanand Gore and Tom Blundell
71: \newline
72: {swanand,tom}@cryst.bioc.cam.ac.uk
73: \newline
74: Department of Biochemistry, University of Cambridge
75: \newline
76: Cambridge CB2 1GA England
77: \layout Abstract
78: \pagebreak_bottom
79: Environment specific substitution tables have been used effectively for
80: distinguishing structural and functional constraints on proteins and thereby
81: identify their active sites (
82: \begin_inset LatexCommand \citet{distinguish_str_func_restr}
83:
84: \end_inset
85:
86: ).
87: This work explores whether a similar approach can be used to identify specifici
88: ty determining residues (SDRs) responsible for cofactor dependence, substrate
89: specificity or subtle catalytic variations.
90: We combine structure-sequence information and functional annotation from
91: various data sources to create structural alignments for homologous enzymes
92: and functional partitions therein.
93: We develop a scoring procedure to predict SDRs and assess their accuracy
94: using information from bound specific ligands and published literature.
95: \layout Section
96:
97: Introduction
98: \layout Standard
99:
100: Enzymes are critical to cellular machinery.
101: Enzymes are believed to have developed different specificities following
102: gene duplication events that ease the evolutionary pressure on copies and
103: allow exploration of novel avenues to greater organismal fitness.
104: Each copy then develops its own niche, characterized by expression and
105: localization, catalytic mechanism, substrate specificity, cofactor dependence
106: and catalysis products.
107: Such paralogous enzymes should have an evolutionary imprint corresponding
108: to their specific niche, in addition to maintenance of structural fold.
109: Thus evolutionary analysis of available structural and sequnce data should
110: enable identification of key residues responsible for specificity of various
111: kinds.
112: Enzyme specificity can be estimated with functional assays without structure
113: determination, but identification of SDRs (specificity determining residues)
114: remains difficult.
115: While ENZYME (
116: \begin_inset LatexCommand \citet{ENZYME}
117:
118: \end_inset
119:
120: ) - a database of enzyme sequences with detailed functional annotation -
121: exists, there is no such database of SDRs.
122: Time, cost and technical limitations slow down structure determination
123: and even when structure is known, it is not trivial to identify the residues
124: important for binding cofactors and substrates.
125: Hence it is important to be able to identify such residues computationally.
126: Reliable detection of such residues will aid in deciding whether a SNP
127: is deleterious or neutral and suggest mutation studies.
128: Function assignment to sequence could be done at a finer level, e.g.
129: by verifying that SDRs necessary for certain substrate are present.
130: Computational SDR identification has received a lot of attention and several
131: methods have been proposed.
132: Evolutionary trace (ET) is one of the most important methods (
133: \begin_inset LatexCommand \citet{evoltraceStrClust}
134:
135: \end_inset
136:
137: ,
138: \begin_inset LatexCommand \citet{EThybridMethods}
139:
140: \end_inset
141:
142: ).
143: It builds a phylogenetic tree based on sequence comparisons, such that
144: branch lengths are indicative of evolutionary divergence.
145: Functional subgroups consist of sequences in subtrees determined from this
146: tree using a divergence cutoff.
147: Residues common to a subtree are considered specificity-conferring rather
148: than the ones common to entire tree.
149: Spatial cluster identification can be used with ET to reduce the number
150: of false positives.
151: Inferring phylogeny correctly remains the main cause of concern in this
152: approach, hence attempts have been made to use existing annotation with
153: various statistical techniques.
154: Another important direction is to use spatial proximity of residues.
155: \layout Standard
156:
157: Cornerstone of our approach is that structural environment influences residue
158: substitution patterns, illustrated by
159: \begin_inset LatexCommand \citet{earlyESST}
160:
161: \end_inset
162:
163: and later used effectively for structure-sequence alignment and fold recognitio
164: n (
165: \begin_inset LatexCommand \citet{fugue}
166:
167: \end_inset
168:
169: ).
170: Structural environment of a residue is described in terms of secondary
171: structure, solvent accessibility, sidechain-sidechain and sidechain-mainchain
172: hydrogen bonding.
173: Residue substitution tables derived from a set of high quality sequence-structu
174: re alignments represent the expected substitution rate in a structural environme
175: nt.
176: Unexpected conservation of a residue is indicative of functional restraint
177: acting on it.
178: Advantage of using ESSTs is that the structurally conserved residues are
179: masked, which is why active sites of homologous enzymes can be identied
180: reliably with this approach.
181: This approach has been extended in the present work by using functional
182: annotation information.
183: \layout Standard
184:
185: A set of homologous enzymes is generally a union of smaller functionally
186: specific subsets, e.g.
187: substrate-specific subsets in serine proteinases (trypsin, chymotrypsin
188: etc.), cofactor-specific subsets in ferrodoxin reductases (NAD and NADP
189: specific) and so on.
190: In multiple sequence alignment of a homologous protein family, SDRs generally
191: appear as differentially conserved subcolumns.
192: But all such appearances would not be SDRs.
193: Our hypothesis is that SDRs would be identified by combining differential
194: conservation with ESST-based detection of functional restraint.
195: \layout Section
196:
197: Families, functional partitions and profiles
198: \layout Standard
199:
200: In order to test our hypothesis, we need to construct a dataset of homologous
201: enzyme families with reliable functional partitions in them.
202: While SCOP classification can be used in a straightforward way for making
203: families, identifying functionally specific subsets is not a trivial task.
204: Some automated approaches to detect functional shift, e.g.
205:
206: \begin_inset LatexCommand \citet{funshiftakker}
207:
208: \end_inset
209:
210: , exist to infer such partitions but manual annotation remains the most
211: reliable.
212: Additionally, protein function is not a precise and quantifiable entity.
213: This restricted our study to enzymes which are the the most well studied
214: and well annotated class of proteins.
215: Enzyme function is fairly well defined and well classified according to
216: hierarchical Enzyme Classification scheme (EC).
217: We use the mapping between SCOP domains and EC numbers (
218: \begin_inset LatexCommand \citet{scopec}
219:
220: \end_inset
221:
222: ) to make EC-specific subgroups within a SCOP domain family.
223: We generate profiles (multiple structure-sequence alignments) for SCOP
224: families and functional partitions.
225: Sequence homologs for structural families were found using PSIBLAST (
226: \begin_inset LatexCommand \citet{psiblast}
227:
228: \end_inset
229:
230: ) on nonredundant sequence database, whereas function-specific partitions
231: were enriched using PSIBLAST searches on ENZYME database (
232: \begin_inset LatexCommand \citet{ENZYME}
233:
234: \end_inset
235:
236: ).
237: PSIBLAST hit on ENZYME database is retained only if the EC number of hit
238: matches that of query.
239: All PSIBLAST searches were with 5 rounds and e-value 0.01, hits smaller
240: than 75% of query length were ignored.
241: All structure-sequence alignments were carried out with fugueseq (
242: \begin_inset LatexCommand \citet{fugue}
243:
244: \end_inset
245:
246: ) which has been shown to improve alignment quality over PSIBLAST.
247: This process is summarized in Fig.
248: \begin_inset LatexCommand \ref{workflow}
249:
250: \end_inset
251:
252: .
253: \layout Standard
254:
255:
256: \begin_inset Float figure
257: wide false
258: collapsed false
259:
260: \layout Caption
261:
262: Workflow
263: \layout Standard
264: \align center
265:
266: \begin_inset Graphics
267: filename workflow.pdf
268: width 150mm
269:
270: \end_inset
271:
272:
273: \layout Standard
274:
275:
276: \begin_inset LatexCommand \label{workflow}
277:
278: \end_inset
279:
280:
281: \end_inset
282:
283:
284: \layout Standard
285:
286: Another constraint on the choice of dataset comes from the need for sufficient
287: functional diversity in a SCOP domain family.
288: In its absence, the contrast between the domain family and EC-specific
289: subgroup within it might not be detectable.
290: Hence we chose the SCOP families with at least two different EC annotations.
291: \layout Standard
292:
293: To be able to test the hypothesis quantitatively, a gold standard set of
294: SDRs for every enzyme is needed.
295: But SDRs are generally a topic of lively debate among researchers, partly
296: due to the infeasibility of performing all necessary mutation studies.
297: Thus there is no such dataset in our knowledge.
298: Hence we use the information of bound ligands and close-by residues to
299: assess the hypothesis.
300: Due to this, the dataset gets restricted to only those cases where at least
301: one EC-specific domain group has a relevant ligand bound.
302: A relevant ligand is the one unique to the reaction carried out by that
303: EC-group among all possible reactions in that domain family.
304: For example, in SCOP family c.1.10.4 there are two functional subgroups:
305: \layout Standard
306:
307: 3-deoxy-8-phosphooctulonate synthase (EC 2.5.1.55) : Phosphoenolpyruvate +
308: D-arabinose 5-phosphate + H(2)O = 2-dehydro-3-deoxy-D-octonate 8-phosphate
309: + phosphate
310: \layout Standard
311:
312: 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) : Phosphoenolpyruvate +
313: D-erythrose 4-phosphate + H(2)O = 3-deoxy-D-arabino-hept-2-ulosonate 7-phosphat
314: e + phosphate
315: \layout Standard
316:
317: Here D-arabinose 5-phosphate is unique to EC 2.5.1.55 and is present in domain
318: 1fxqA as A5P.
319: Hence it is taken as an indicator of SDR locations and not phosphienolpyruvate
320: which is common cofactor in both reactions.
321: We sometimes use products also as such indicators.
322: Ligand is considered relevant if its name from the PDB file (HETNAM, HETSYM
323: records) matches its name in the reaction or PDBsum (
324: \begin_inset LatexCommand \citet{pdbsum}
325:
326: \end_inset
327:
328: ) finds it sufficiently similar to ideal ligand molecule.
329: Our final dataset consists of 97 examples drawn from 68 families.
330: Very few SDR identification studies are carried out with these many examples.
331: \layout Section
332:
333: Profiles and substitution patterns
334: \layout Standard
335:
336: Structural and sequence information in MSSA can be misleading if dominated
337: by very close homologs, hence each MSSA was filtered with 90% sequence
338: identity cutoff to avoid redundancy.
339: \layout Standard
340:
341: Observed substitution pattern for a column in profile MSSA (multiple structure-s
342: equence alignment) was calculated after weighing down contributions from
343: similar sequences (
344: \begin_inset Formula $>60\%$
345: \end_inset
346:
347: sequence identity).
348: Gaps were ignored while calculating the observed substitution pattern but
349: the ratio of gaps to amino acids in a column was computed.
350: Columns with high gap content are generally not functional hence gap content
351: was used as a filtering criterion as described later.
352: Observed substitution patterns are normalized and sequence entropy was
353: also calculated to get a measure of variability in the column as
354: \begin_inset Formula $\sum_{i=1}^{20}-f_{i}log(f_{i})$
355: \end_inset
356:
357: , where
358: \begin_inset Formula $f_{i}$
359: \end_inset
360:
361: is the fraction of
362: \begin_inset Formula $i^{th}$
363: \end_inset
364:
365: amino acid in the distribution.
366: \layout Standard
367:
368: Expected substitution patterns for a column were calculated using environment
369: specific substitution probability tables derived from high quality multiple
370: structure alignments from 371 families (
371: \begin_inset LatexCommand \citet{fugue}
372:
373: \end_inset
374:
375: ).
376: Substitution probabilties from every structure were averaged to get expected
377: substitution probabilities for each column in MSSA.
378: Again, sequence-based clustering was used to avoid expected substitution
379: pattern getting dominated by very similar structures.
380: \layout Standard
381:
382: Functional restraint is calculated as the city-block distance between normalized
383: observed and predicted substitution patterns (
384: \begin_inset Formula $\sum_{i=1}^{20}o_{i}-e_{i}$
385: \end_inset
386:
387: ,
388: \begin_inset Formula $o_{i}$
389: \end_inset
390:
391: being observed fraction of
392: \begin_inset Formula $i^{th}$
393: \end_inset
394:
395: amino acid and
396: \begin_inset Formula $e_{i}$
397: \end_inset
398:
399: being the fraction of times it is expected to occur).
400: Thus, for both MSSAs (whole family and EC-specific) we have the following
401: quantities : functional restraint (
402: \begin_inset Formula $famF,ecF$
403: \end_inset
404:
405: ), gap content (
406: \begin_inset Formula $famG,ecG$
407: \end_inset
408:
409: ) and sequence entropy (
410: \begin_inset Formula $famE,ecE$
411: \end_inset
412:
413: ).
414: Moreover for each MSSA, number of sequences
415: \begin_inset Formula $<80\%$
416: \end_inset
417:
418: identical to each other was taken as an indicator of evolutionary information
419: available in it.
420: \layout Section
421:
422: Benchmarking
423: \layout Standard
424:
425: In order to assess the differences in residues important for whole family
426: and EC partition, baseline predictions were made by choosing top-ranking
427: residues according to whole family functional constraint from residues
428: which are not highly gapped (
429: \begin_inset Formula $famG<0.5$
430: \end_inset
431:
432: ).
433: Number of baseline and SDR predictions is same whenever they are compared
434: or an overlap between them is computed.
435: This helps in assessing whether information in the EC-specific MSSA is
436: distinct.
437: \layout Standard
438:
439: The likelihood of a residue to be an SDR is presumably proportional to its
440: proximity to the specific ligand.
441: Hence, to quantify the merit of a prediction, we defined mean proximity
442: as the ratio of mean separation between predicted residues and ligand.
443: Mean relative proximity is defined as the ratio of mean proximity to the
444: mean separation between all residues in the domain and the ligand.
445: Distance between a residue and ligand is taken to be the closest distance
446: between residue sidechain (mainchain for glycine) and ligand atoms.
447: Smaller the mean relative proximity, better the prediction.
448: Prediction quality will also depend on the number of distinct homologous
449: sequences available.
450: In case of multiple ligands close to a domain, a residue's proximity to
451: the ligand is calculated with respect to the closest ligand.
452: The basis for SDR prediction is that it be sufficiently distinct between
453: whole family and EC-specific MSSAs.
454: As
455: \begin_inset LatexCommand \citet{funshiftakker}
456:
457: \end_inset
458:
459: describe it, an SDR should be a rate-shifted or conservation-shifted site.
460: Additionally, SDR should be sufficiently functionally constrained from
461: ESSTs perspective (
462: \begin_inset Formula $ecF$
463: \end_inset
464:
465: ).
466: For a residue with low entropy in EC MSSA, if change in entropy
467: \begin_inset Formula $dE$
468: \end_inset
469:
470: (family MSSA sequence entropy - EC MSSA sequence entropy) is high, it indicates
471: that it could be SDR.
472: Since each MSSA will be different in its variability, it is not advisable
473: to use same functional constraint cutoff or entropy cutoff for all of them.
474: This immediately suggests two 2-step approaches : choose top
475: \begin_inset Formula $N1$
476: \end_inset
477:
478: residues with high dierence in sequence entropy between whole and EC MSSAs,
479: then select top
480: \begin_inset Formula $N2$
481: \end_inset
482:
483: according to functional constraint in EC MSSA and vice versa.
484: But there could be a third and more attractive approach that combines functiona
485: l constraint from EC MSSA and sequence entropy difference.
486: We pursue the third approach.
487: \layout Standard
488:
489: We assume that SDR score of a residue is a linear combination of its functional
490: constraint, entropy and change in entropy, given that the residue passes
491: certain quality checks (
492: \begin_inset Formula $ecF>0.5$
493: \end_inset
494:
495: ,
496: \begin_inset Formula $ecG<0.5$
497: \end_inset
498:
499: ,
500: \begin_inset Formula $ecE<1$
501: \end_inset
502:
503: ,
504: \begin_inset Formula $dE>0.5$
505: \end_inset
506:
507: ):
508: \layout Standard
509:
510:
511: \begin_inset Formula $SDRscore=ecF+a*(famE-ecE)-b*ecE$
512: \end_inset
513:
514:
515: \layout Standard
516:
517: In order to optimize the parameters
518: \begin_inset Formula $a,b$
519: \end_inset
520:
521: and test the optimal ones, we created a high quality test set from our
522: examples, consisting of 23 examples drawn from SCOP families with at least
523: 2 EC groups, each with at
524: \begin_inset Formula $>10$
525: \end_inset
526:
527: distict sequence homologs from ENZYME database.
528: Parameters
529: \begin_inset Formula $a,b$
530: \end_inset
531:
532: were varied from 0 to 5 in steps of
533: \begin_inset Formula $0.2$
534: \end_inset
535:
536: and 10 SDR predictions were made.
537: For each value of
538: \begin_inset Formula $a$
539: \end_inset
540:
541: and
542: \begin_inset Formula $b$
543: \end_inset
544:
545: , SDR and baseline predictions are made, each consisiting of 10 residues.
546: Note that baseline predictions are not affected by values of
547: \begin_inset Formula $a,b$
548: \end_inset
549:
550: .
551: Optimization can be done with two objectives, either to minimize the mean
552: proximity or to maximize the number of close (
553: \begin_inset Formula $<$
554: \end_inset
555:
556:
557: \begin_inset ERT
558: status Collapsed
559:
560: \layout Standard
561:
562: \backslash
563: Ang{6}
564: \end_inset
565:
566: ) residues.
567:
568: \begin_inset Formula $a,b$
569: \end_inset
570:
571: values of
572: \begin_inset Formula $0.4,1.2$
573: \end_inset
574:
575: minimize the prior obective to
576: \begin_inset ERT
577: status Collapsed
578:
579: \layout Standard
580:
581: \backslash
582: Ang{9.24}
583: \end_inset
584:
585: and yield
586: \begin_inset Formula $3.6$
587: \end_inset
588:
589: close residues per prediction, whereas
590: \begin_inset Formula $0,0.8$
591: \end_inset
592:
593: maximize the latter to
594: \begin_inset Formula $4.08$
595: \end_inset
596:
597: residues while yielding
598: \begin_inset ERT
599: status Collapsed
600:
601: \layout Standard
602:
603: \backslash
604: Ang{9.36}
605: \end_inset
606:
607: for the prior.
608: Performance of these two
609: \begin_inset Formula $a,b$
610: \end_inset
611:
612: values on different sets of examples is shown in Table
613: \begin_inset LatexCommand \ref{evolABperf}
614:
615: \end_inset
616:
617: .
618: \layout Standard
619:
620:
621: \begin_inset Float table
622: wide false
623: collapsed true
624:
625: \layout Caption
626:
627: Optimal values of a and b for various levels of evolutionary information
628: available.
629: \layout Standard
630: \align center
631:
632: \begin_inset Tabular
633: <lyxtabular version="3" rows="8" columns="5">
634: <features>
635: <column alignment="center" valignment="top" leftline="true" width="0">
636: <column alignment="block" valignment="top" leftline="true" width="1in">
637: <column alignment="block" valignment="top" leftline="true" width="1in">
638: <column alignment="block" valignment="top" leftline="true" width="1in">
639: <column alignment="block" valignment="top" leftline="true" rightline="true" width="1in">
640: <row topline="true">
641: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
642: \begin_inset Text
643:
644: \layout Standard
645:
646: Criteria for
647: \end_inset
648: </cell>
649: <cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
650: \begin_inset Text
651:
652: \layout Standard
653:
654: Mean proximity
655: \end_inset
656: </cell>
657: <cell multicolumn="2" alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
658: \begin_inset Text
659:
660: \layout Standard
661:
662: \end_inset
663: </cell>
664: <cell multicolumn="1" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
665: \begin_inset Text
666:
667: \layout Standard
668:
669: #close (<
670: \begin_inset ERT
671: status Collapsed
672:
673: \layout Standard
674:
675: \backslash
676: Ang{6}
677: \end_inset
678:
679: ) residues
680: \end_inset
681: </cell>
682: <cell multicolumn="2" alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
683: \begin_inset Text
684:
685: \layout Standard
686:
687: \end_inset
688: </cell>
689: </row>
690: <row bottomline="true">
691: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
692: \begin_inset Text
693:
694: \layout Standard
695:
696: choice of examples
697: \end_inset
698: </cell>
699: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
700: \begin_inset Text
701:
702: \layout Standard
703:
704: (0,0.8)
705: \end_inset
706: </cell>
707: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
708: \begin_inset Text
709:
710: \layout Standard
711:
712: (0.4,1.2)
713: \end_inset
714: </cell>
715: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
716: \begin_inset Text
717:
718: \layout Standard
719:
720: (0.0.8)
721: \end_inset
722: </cell>
723: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
724: \begin_inset Text
725:
726: \layout Standard
727:
728: (0.4,1.2)
729: \end_inset
730: </cell>
731: </row>
732: <row topline="true">
733: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
734: \begin_inset Text
735:
736: \layout Standard
737:
738: >5 homologs
739: \end_inset
740: </cell>
741: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
742: \begin_inset Text
743:
744: \layout Standard
745:
746: 10.84
747: \end_inset
748: </cell>
749: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
750: \begin_inset Text
751:
752: \layout Standard
753:
754: 11.24
755: \end_inset
756: </cell>
757: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
758: \begin_inset Text
759:
760: \layout Standard
761:
762: 3.35
763: \end_inset
764: </cell>
765: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
766: \begin_inset Text
767:
768: \layout Standard
769:
770: 3.01
771: \end_inset
772: </cell>
773: </row>
774: <row>
775: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
776: \begin_inset Text
777:
778: \layout Standard
779:
780: (67 examples)
781: \end_inset
782: </cell>
783: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
784: \begin_inset Text
785:
786: \layout Standard
787:
788: \end_inset
789: </cell>
790: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
791: \begin_inset Text
792:
793: \layout Standard
794:
795: \end_inset
796: </cell>
797: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
798: \begin_inset Text
799:
800: \layout Standard
801:
802: \end_inset
803: </cell>
804: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
805: \begin_inset Text
806:
807: \layout Standard
808:
809: \end_inset
810: </cell>
811: </row>
812: <row topline="true">
813: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
814: \begin_inset Text
815:
816: \layout Standard
817:
818: >10 homologs
819: \end_inset
820: </cell>
821: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
822: \begin_inset Text
823:
824: \layout Standard
825:
826: 10.41
827: \end_inset
828: </cell>
829: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
830: \begin_inset Text
831:
832: \layout Standard
833:
834: 10.64
835: \end_inset
836: </cell>
837: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
838: \begin_inset Text
839:
840: \layout Standard
841:
842: 3.45
843: \end_inset
844: </cell>
845: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
846: \begin_inset Text
847:
848: \layout Standard
849:
850: 3.2
851: \end_inset
852: </cell>
853: </row>
854: <row>
855: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
856: \begin_inset Text
857:
858: \layout Standard
859:
860: (55 examples)
861: \end_inset
862: </cell>
863: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
864: \begin_inset Text
865:
866: \layout Standard
867:
868: \end_inset
869: </cell>
870: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
871: \begin_inset Text
872:
873: \layout Standard
874:
875: \end_inset
876: </cell>
877: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
878: \begin_inset Text
879:
880: \layout Standard
881:
882: \end_inset
883: </cell>
884: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
885: \begin_inset Text
886:
887: \layout Standard
888:
889: \end_inset
890: </cell>
891: </row>
892: <row topline="true">
893: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
894: \begin_inset Text
895:
896: \layout Standard
897:
898: >10 homologs, >1 EC
899: \end_inset
900: </cell>
901: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
902: \begin_inset Text
903:
904: \layout Standard
905:
906: 9.36
907: \end_inset
908: </cell>
909: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
910: \begin_inset Text
911:
912: \layout Standard
913:
914: 9.24
915: \end_inset
916: </cell>
917: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
918: \begin_inset Text
919:
920: \layout Standard
921:
922: 4.08
923: \end_inset
924: </cell>
925: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
926: \begin_inset Text
927:
928: \layout Standard
929:
930: 3.6
931: \end_inset
932: </cell>
933: </row>
934: <row bottomline="true">
935: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
936: \begin_inset Text
937:
938: \layout Standard
939:
940: (23 examples)
941: \end_inset
942: </cell>
943: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
944: \begin_inset Text
945:
946: \layout Standard
947:
948: \end_inset
949: </cell>
950: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
951: \begin_inset Text
952:
953: \layout Standard
954:
955: \end_inset
956: </cell>
957: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
958: \begin_inset Text
959:
960: \layout Standard
961:
962: \end_inset
963: </cell>
964: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
965: \begin_inset Text
966:
967: \layout Standard
968:
969: \end_inset
970: </cell>
971: </row>
972: </lyxtabular>
973:
974: \end_inset
975:
976:
977: \layout Standard
978:
979:
980: \begin_inset LatexCommand \label{evolABperf}
981:
982: \end_inset
983:
984:
985: \end_inset
986:
987:
988: \layout Standard
989:
990: This suggests that optimal
991: \begin_inset Formula $a,b$
992: \end_inset
993:
994: parameters are
995: \begin_inset Formula $0,0.8$
996: \end_inset
997:
998: .
999: It is surprising that there is no importance for the value of
1000: \begin_inset Formula $dE=famE-genE$
1001: \end_inset
1002:
1003: in SDR score.
1004: Perhaps this is due to the quality checks applied prior to calculation
1005: of SDR scores, which demand
1006: \begin_inset Formula $dE>0.5$
1007: \end_inset
1008:
1009: .
1010: \layout Standard
1011:
1012: Fig.
1013: \begin_inset LatexCommand \ref{proxDistrib}
1014:
1015: \end_inset
1016:
1017: shows the distribution of mean proximity in various sets derived according
1018: to number of distinct homologs in ENZYME.
1019: This shows that quality of evolutionary information available has great
1020: impact on quality of predictions.
1021: \layout Standard
1022:
1023:
1024: \begin_inset Float figure
1025: wide false
1026: collapsed false
1027:
1028: \layout Caption
1029:
1030: Frequency of observing a certain mean proximity of SDR predictions (binned
1031: in
1032: \begin_inset ERT
1033: status Collapsed
1034:
1035: \layout Standard
1036:
1037: \backslash
1038: Ang{1}
1039: \end_inset
1040:
1041: bins) for different qualities of evolutionary information available.
1042: \layout Standard
1043: \align center
1044:
1045: \begin_inset Graphics
1046: filename proxDistrib.pdf
1047: width 150mm
1048:
1049: \end_inset
1050:
1051:
1052: \layout Standard
1053:
1054:
1055: \begin_inset LatexCommand \label{proxDistrib}
1056:
1057: \end_inset
1058:
1059:
1060: \end_inset
1061:
1062:
1063: \layout Standard
1064:
1065: Mean relative proximity indicates how far from random is the prediction.
1066: Table
1067: \begin_inset LatexCommand \ref{meanRelProxTable}
1068:
1069: \end_inset
1070:
1071: shows that mean relative proximity depends on quality of evolutionary informati
1072: on and is far from random for both SDR and baseline predictions.
1073: \layout Standard
1074:
1075:
1076: \begin_inset Float table
1077: wide false
1078: collapsed false
1079:
1080: \layout Caption
1081:
1082: Mean relative proximity in various datasets made according to number of
1083: available distinct homologs.
1084: \layout Standard
1085: \align center
1086:
1087: \begin_inset Tabular
1088: <lyxtabular version="3" rows="5" columns="4">
1089: <features>
1090: <column alignment="center" valignment="top" leftline="true" width="0">
1091: <column alignment="center" valignment="top" leftline="true" width="0">
1092: <column alignment="center" valignment="top" leftline="true" width="0">
1093: <column alignment="center" valignment="top" leftline="true" rightline="true" width="0">
1094: <row topline="true">
1095: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1096: \begin_inset Text
1097:
1098: \layout Standard
1099:
1100: Dataset
1101: \end_inset
1102: </cell>
1103: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1104: \begin_inset Text
1105:
1106: \layout Standard
1107:
1108: Mean Rel.
1109: Prox.
1110: \end_inset
1111: </cell>
1112: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1113: \begin_inset Text
1114:
1115: \layout Standard
1116:
1117: Mean Rel.
1118: Prox.
1119: \end_inset
1120: </cell>
1121: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
1122: \begin_inset Text
1123:
1124: \layout Standard
1125:
1126: Frequency of
1127: \end_inset
1128: </cell>
1129: </row>
1130: <row>
1131: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1132: \begin_inset Text
1133:
1134: \layout Standard
1135:
1136: \end_inset
1137: </cell>
1138: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1139: \begin_inset Text
1140:
1141: \layout Standard
1142:
1143: \end_inset
1144: </cell>
1145: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1146: \begin_inset Text
1147:
1148: \layout Standard
1149:
1150: \end_inset
1151: </cell>
1152: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
1153: \begin_inset Text
1154:
1155: \layout Standard
1156:
1157: MRP(SDR)
1158: \begin_inset Formula $\leq$
1159: \end_inset
1160:
1161: MRP(baseline)
1162: \end_inset
1163: </cell>
1164: </row>
1165: <row topline="true">
1166: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1167: \begin_inset Text
1168:
1169: \layout Standard
1170:
1171: >0 homologs
1172: \end_inset
1173: </cell>
1174: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1175: \begin_inset Text
1176:
1177: \layout Standard
1178:
1179: 0.67
1180: \end_inset
1181: </cell>
1182: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1183: \begin_inset Text
1184:
1185: \layout Standard
1186:
1187: 0.66
1188: \end_inset
1189: </cell>
1190: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
1191: \begin_inset Text
1192:
1193: \layout Standard
1194:
1195: 34% (33/97)
1196: \end_inset
1197: </cell>
1198: </row>
1199: <row topline="true">
1200: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1201: \begin_inset Text
1202:
1203: \layout Standard
1204:
1205: >5 homologs
1206: \end_inset
1207: </cell>
1208: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1209: \begin_inset Text
1210:
1211: \layout Standard
1212:
1213: 0.57
1214: \end_inset
1215: </cell>
1216: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1217: \begin_inset Text
1218:
1219: \layout Standard
1220:
1221: 0.66
1222: \end_inset
1223: </cell>
1224: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
1225: \begin_inset Text
1226:
1227: \layout Standard
1228:
1229: 60% (40/67)
1230: \end_inset
1231: </cell>
1232: </row>
1233: <row topline="true" bottomline="true">
1234: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1235: \begin_inset Text
1236:
1237: \layout Standard
1238:
1239: >10 homologs
1240: \end_inset
1241: </cell>
1242: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1243: \begin_inset Text
1244:
1245: \layout Standard
1246:
1247: 0.57
1248: \end_inset
1249: </cell>
1250: <cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
1251: \begin_inset Text
1252:
1253: \layout Standard
1254:
1255: 0.62
1256: \end_inset
1257: </cell>
1258: <cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
1259: \begin_inset Text
1260:
1261: \layout Standard
1262:
1263: 85% (47/55)
1264: \end_inset
1265: </cell>
1266: </row>
1267: </lyxtabular>
1268:
1269: \end_inset
1270:
1271:
1272: \layout Standard
1273:
1274:
1275: \begin_inset LatexCommand \label{meanRelProxTable}
1276:
1277: \end_inset
1278:
1279:
1280: \end_inset
1281:
1282:
1283: \layout Standard
1284:
1285: The fraction of SDRs present in baseline predictions is
1286: \begin_inset Formula $15\%$
1287: \end_inset
1288:
1289: in all
1290: \begin_inset Formula $>0,>5,>10$
1291: \end_inset
1292:
1293: homologs classes, which suggests that SDR predictions are fairly different
1294: than baseline.
1295: This also suggests that baseline and SDR predictions are complementary
1296: to each other.
1297: \layout Section
1298:
1299: Some examples
1300: \layout Standard
1301:
1302: When quality sequence information is available, SDR predictions are closer
1303: to specific ligand than baseline predictions which in turn are closer than
1304: random.
1305: Here we compare our Top10 predictions with information from literature
1306: for some examples.
1307: \layout Subsection
1308:
1309: Aminotransferases
1310: \layout Standard
1311:
1312: Aminotransferases or transaminases are important to amino acid biosynthesis
1313: and unique due to their specificity to two substrates : a glutamate and
1314: a amino-carrier.
1315: Our dataset contains two SCOP families (c.67.1.1 and c.67.1.4) that contain transamin
1316: ases.
1317: Of those, we focus on SCOP family c.67.1.1 which contains the functional categorie
1318: s aspartate transaminase (AspAT, EC 2.6.1.1) and histidinol phosphate transaminase
1319: (HspAT, EC 2.6.1.9).
1320: Other non-transaminase members of this family include threonine adolases
1321: (EC 4.1.2.5) and alliin lyase (EC 4.4.1.4).
1322: When Top10 predictions were analyzed in 1gex, an HspAT, we found that SDR
1323: predictions are very well clustered around the ligands PLP and HSP, but
1324: 5 of the 10 predictions were shared with Top10 baseline predictions.
1325: This overlap can be attributed to degrees of functional diversity in the
1326: SCOP family, i.e.
1327: large entropy reduction in HspAT residues could be due to their importance
1328: to general transaminase mechanism (as opposed to aldolase mechanism) or
1329: for substrate specificity to histidinol phosphate (as opposed to aspartate
1330: in AspATs).
1331: In order to increase the number of distinct predictions, Top20 baseline
1332: and SDR predictions were used.
1333: Fig.
1334: \begin_inset LatexCommand \ref{figTransaminase}
1335:
1336: \end_inset
1337:
1338: shows the predictions for 1gexA, an HspAT from E.
1339: coli - 7 predictions are common.
1340: Catalytically important residues (
1341: \begin_inset LatexCommand \citet{Haruyama2001}
1342:
1343: \end_inset
1344:
1345: ) Asn-157, Tyr-187, Lys-214 are identied as baseline, SDR and common respectivel
1346: y.
1347: Tyr-55, which interacts with substrate of the other subunit, is predicted
1348: as SDR
1349: \begin_inset Foot
1350: collapsed true
1351:
1352: \layout Standard
1353:
1354: This is conrmed from a similar prediction in 1gc4, an AspAT.
1355: \end_inset
1356:
1357: .
1358: Tyr-20, believed to be important for specificity, is not predicted as such
1359: because it is conserved only 80% of times, whereas a similarly placed Tyr-55
1360: from other subunit is much better conserved (98% times) and could be equally
1361: important for specificity.
1362: Ala-186, considered important for restricting rotation of PLP's pyrimidine
1363: ring and thereby contributing to strain essential for enzyme function,
1364: is predicted as both SDR and baseline.
1365: Most other predicted SDRs lie close to the substrate.
1366: Their location and AspAT counterparts suggest their role in conferring
1367: specificty towards histidinol phosphate (see Table
1368: \begin_inset LatexCommand \ref{transaminaseTable}
1369:
1370: \end_inset
1371:
1372: ).
1373: \layout Standard
1374:
1375:
1376: \begin_inset Float table
1377: wide false
1378: collapsed false
1379:
1380: \layout Caption
1381:
1382: Residues from speculated roles
1383: \begin_inset LatexCommand \citet{Haruyama2001}
1384:
1385: \end_inset
1386:
1387: for HspAT 1gex and how well they were predicted.
1388: The aligned residues in other subfamilies with transaminases are also shown.
1389: \layout Standard
1390: \align center
1391:
1392: \begin_inset Graphics
1393: filename transaminaseTable.pdf
1394: width 150mm
1395:
1396: \end_inset
1397:
1398:
1399: \layout Standard
1400:
1401:
1402: \begin_inset LatexCommand \label{transaminaseTable}
1403:
1404: \end_inset
1405:
1406:
1407: \end_inset
1408:
1409:
1410: \layout Standard
1411:
1412:
1413: \begin_inset Float figure
1414: wide false
1415: collapsed false
1416:
1417: \layout Caption
1418:
1419: SDR (green) and functional residue (red) predictions for 1gex, a HspAT.
1420: Residues predicted both as functional and specificity-conferring are colored
1421: blue.
1422: Top left panel shows Top5 predictions, top right panel shows Top10 predictions
1423: and bottom panel zooms in on the region around ligand in the Top10 case.
1424: \layout Standard
1425: \align center
1426:
1427: \begin_inset Graphics
1428: filename transaminaseFig.jpg
1429: width 150mm
1430:
1431: \end_inset
1432:
1433:
1434: \layout Standard
1435:
1436:
1437: \begin_inset LatexCommand \label{figTransaminase}
1438:
1439: \end_inset
1440:
1441:
1442: \end_inset
1443:
1444:
1445: \layout Subsection
1446:
1447: Phosphoric monoester hydrolases
1448: \layout Standard
1449:
1450: SCOP family e.7.1.1 in our dataset contains 4 classes of phosphoric monoester
1451: hydrolases, 3'(2'),5'-bisphosphate nucleotidase (EC 3.1.3.7), Fructose-bisphosphat
1452: ase (EC 3.1.3.11), Inositolphosphate phosphatase (EC 3.1.3.25) and Inositol-1,4-bispho
1453: sphate 1-phosphatase (EC 3.1.3.57).
1454: Here we look at the SDR and baseline predictions for 1cnq, a member of
1455: FBPase category.
1456: FBPases are of key importance to regulation of gluconeogenic pathway and
1457: catalyze the hydrolysis of fructose 1,6-biphosphate to fructose 6-phosphate.
1458: They are metal dependent and are allosterically controlled by AMP which
1459: triggers a conformational change and masks the fructose active site.
1460: Fig.
1461: \begin_inset LatexCommand \ref{figFBPase}
1462:
1463: \end_inset
1464:
1465: shows the Top10 baseline and general predictions, the overlap in this case
1466: of 2 residues.
1467: F6P molecule around which most predictions are clustered lies in the active
1468: site whereas the other F6P molecule is similarly located as AMP (from compariso
1469: n with PDB 1yyz).
1470: Baseline predictions Tyr-279, Glu-280, Tyr-244, Met-244 and common prediction
1471: Tyr-264 are within interacting distance of F6P ligand in the active site.
1472: Most predicted SDRs form the active site walls and differ between FBPase
1473: and IMPase (1awb) : Arg-276 to His, Ser-96 to Gly, Ser-123 to Thr, Ser-124
1474: to Thr (see Table
1475: \begin_inset LatexCommand \ref{FBPaseTable}
1476:
1477: \end_inset
1478:
1479: ).
1480: It is surprising to see that the allosteric site is only mildly detected.
1481: Predictions Ala-161 (Top10 SDR), Lys-290 (Top10 baseline) and Val-178 (Top20
1482: SDR) are close and suggestive of some role in AMP binding.
1483: \layout Standard
1484:
1485:
1486: \begin_inset Float table
1487: wide false
1488: collapsed false
1489:
1490: \layout Caption
1491:
1492: Speculated roles of residues in FBPase for 1cnq from literature and how
1493: well they were predicted.
1494: Aligned residues in other subfamilies of hydrolases are also shown.
1495: \layout Standard
1496: \align center
1497:
1498: \begin_inset Graphics
1499: filename FBPaseTable.pdf
1500: width 150mm
1501:
1502: \end_inset
1503:
1504:
1505: \layout Standard
1506:
1507:
1508: \begin_inset LatexCommand \label{FBPaseTable}
1509:
1510: \end_inset
1511:
1512:
1513: \end_inset
1514:
1515:
1516: \layout Standard
1517:
1518:
1519: \begin_inset Float figure
1520: wide false
1521: collapsed false
1522:
1523: \layout Caption
1524:
1525: SDR and functional residue predictions for 1cnq, a FBPase.
1526: Residue-coloring scheme same as Fig.
1527: \begin_inset LatexCommand \ref{figTransaminase}
1528:
1529: \end_inset
1530:
1531: .
1532: The bottom panel is a closer view of the region around ligand in the top
1533: panel.
1534: \layout Standard
1535: \align center
1536:
1537: \begin_inset Graphics
1538: filename figFBPase.jpg
1539: width 100mm
1540:
1541: \end_inset
1542:
1543:
1544: \layout Standard
1545:
1546:
1547: \begin_inset LatexCommand \label{figFBPase}
1548:
1549: \end_inset
1550:
1551:
1552: \end_inset
1553:
1554:
1555: \layout Subsection
1556:
1557: Dehydrogenases
1558: \layout Standard
1559:
1560: L-3-hydroxyacyl-CoA dehydrogenase (HAD, EC 1.1.1.35) is penultimate enzyme
1561: in -oxidation spiral and catalyzes conversion of hydroxy group to keto
1562: group while converting NAD+ to NADH.
1563: It consists of NAD-binding and C-terminal domains, which undergo relative
1564: movement between NAD binding and substrate binding events (
1565: \begin_inset LatexCommand \citet{activesiteSequestration}
1566:
1567: \end_inset
1568:
1569: ).
1570: Its SCOP family is c.2.1.6, other members of which are other NAD/NADP-dependent
1571: dehydrogenases (ECs 1.1.1.8, 1.1.1.22, 1.1.1.44).
1572: HAD is represented in our dataset by NAD-binding domain of 1f0y (residues
1573: from A-12 to A-203).
1574: Fig.
1575: \begin_inset LatexCommand \ref{figHAD}
1576:
1577: \end_inset
1578:
1579: shows Top10 baseline and SDR predictions.
1580: Catalytically important pair of Glu-170 and His-158 is identied as SDRs.
1581: Ser-137, interesting due to its contact with substrate as well as NAD,
1582: is also identied as SDR.
1583: With the exceptions of Leu-122, Ala-35 (baseline) and Gly-29, Ala-107 (SDR),
1584: all other predictions are within interacting distance of either NAD or
1585: substrate.
1586: Ser-61 and Lys-68 are not detected due to their high entropy.
1587: \layout Standard
1588:
1589:
1590: \begin_inset Float figure
1591: wide false
1592: collapsed false
1593:
1594: \layout Caption
1595:
1596: SDR and functional residue predictions for 1f0y, a HAD.
1597: Residue-coloring scheme same as Fig.
1598: \begin_inset LatexCommand \ref{figTransaminase}
1599:
1600: \end_inset
1601:
1602: .
1603: \layout Standard
1604: \align center
1605:
1606: \begin_inset Graphics
1607: filename figHAD.jpg
1608: width 100mm
1609:
1610: \end_inset
1611:
1612:
1613: \layout Standard
1614:
1615:
1616: \begin_inset LatexCommand \label{figHAD}
1617:
1618: \end_inset
1619:
1620:
1621: \end_inset
1622:
1623:
1624: \layout Subsection
1625:
1626: Tryptophan biosynthesis enzymes
1627: \layout Standard
1628:
1629: Phosphoribosylanthranilate (PRA) isomerase (TrpF) is a
1630: \begin_inset Formula $(\beta\alpha)_{8}$
1631: \end_inset
1632:
1633: barrel enzyme which is the most common fold adopted by enzymes and popular
1634: among non-enzymes.
1635: TrpF (EC 5.3.1.24) shares its SCOP family (c.1.2.4) with indole-3-glycerol-phosphate
1636: synthase (EC 4.1.1.48) and tryptophan synthase (EC 4.2.1.20), which are all involved
1637: in Trp biosynthesis.
1638: Top10 baseline and SDR predictions are show in Fig.
1639: \begin_inset LatexCommand \ref{figTRPF}
1640:
1641: \end_inset
1642:
1643: .
1644: His-83 and Arg-36, considered important for catalysis, are predicted.
1645: Gln-81 (Glu in Trp synthase 1kfc), predicted as baseline and SDR, could
1646: be important for catalysis due to its location.
1647: A few baseline predictions are far from active site and their conservation
1648: suggests protein-protein binding interface.
1649: Predicted SDRs lie close to ligand and are either replaced by other residues
1650: in Trp synthase (Arg-36 to Asn) or deleted (Gln-184, Asp-178), which suggests
1651: that they could be specificity determining.
1652: \layout Standard
1653:
1654:
1655: \begin_inset Float figure
1656: wide false
1657: collapsed false
1658:
1659: \layout Caption
1660:
1661: SDR and functional residue predictions for TrpF.
1662: Residue-coloring scheme same as Fig.
1663: \begin_inset LatexCommand \ref{figTransaminase}
1664:
1665: \end_inset
1666:
1667: .
1668: \layout Standard
1669: \align center
1670:
1671: \begin_inset Graphics
1672: filename figTRPF.jpg
1673: width 100mm
1674:
1675: \end_inset
1676:
1677:
1678: \layout Standard
1679:
1680:
1681: \begin_inset LatexCommand \label{figTRPF}
1682:
1683: \end_inset
1684:
1685:
1686: \end_inset
1687:
1688:
1689: \layout Subsection
1690:
1691: tRNA synthetases
1692: \layout Standard
1693:
1694: Aminoacyl-tRNA synthetases catalyze the process of attaching an amino acid
1695: to its tRNA carrier so that it can be incorporated into a protein.
1696: SCOP family c.26.1.1 contains tyrosyl-tRNA synthetase (EC 6.1.1.1) along with
1697: other (Trp-, Glu-, Gln-) tRNA synthetases.
1698: Fig.
1699: \begin_inset LatexCommand \ref{figTyrTRNA}
1700:
1701: \end_inset
1702:
1703: shows baseline and SDR predictions for tyrosyl-tRNA synthetase 1h3e from
1704: a thermophilic baterium T.
1705: thermophilus (
1706: \begin_inset LatexCommand \citet{tyrTRNAclass12}
1707:
1708: \end_inset
1709:
1710: ).
1711: Residues important for catalysis from 51-HIGH and 233-KMSKS regions are
1712: predicted as baseline (His-52, Gly-54, His-55, Lys-235).
1713: Predicted SDRs lie close to the substrate and cofactor.
1714: Residues specific for L-tyrosine binding, according to
1715: \begin_inset LatexCommand \citet{tyrTRNAspecificity}
1716:
1717: \end_inset
1718:
1719: (e.g.
1720: Thr-80, Tyr-175, Gln-179, Asp-182, Glu-197), are detected.
1721: Note that substrate similarity makes 2 broad divisions in this family correspon
1722: ding to Trp/Tyr and Glu/Gln, each of which is subdivided into finer groups.
1723: Table
1724: \begin_inset LatexCommand \ref{tRnaTable}
1725:
1726: \end_inset
1727:
1728: shows residues structurally aligned to SDRs in these tRNA synthetases.
1729: \layout Standard
1730:
1731:
1732: \begin_inset Float table
1733: wide false
1734: collapsed false
1735:
1736: \layout Caption
1737:
1738: Residues in other tRNA synthetases aligned to predicted SDRs in tyrosil
1739: tRNA synthetase.
1740: \layout Standard
1741: \align center
1742:
1743: \begin_inset Graphics
1744: filename tRNAtable.pdf
1745: width 150mm
1746:
1747: \end_inset
1748:
1749:
1750: \layout Standard
1751:
1752:
1753: \begin_inset LatexCommand \label{tRnaTable}
1754:
1755: \end_inset
1756:
1757:
1758: \end_inset
1759:
1760:
1761: \layout Standard
1762:
1763:
1764: \begin_inset Float figure
1765: wide false
1766: collapsed false
1767:
1768: \layout Caption
1769:
1770: SDR and functional residue predictions for 1h3e (tyrosil tRNA synthetase).
1771: Residue-coloring scheme same as Fig.
1772: \begin_inset LatexCommand \ref{figTransaminase}
1773:
1774: \end_inset
1775:
1776: .
1777: \layout Standard
1778: \align center
1779:
1780: \begin_inset Graphics
1781: filename figTYRtRNA.jpg
1782: width 100mm
1783:
1784: \end_inset
1785:
1786:
1787: \layout Standard
1788:
1789:
1790: \begin_inset LatexCommand \label{figTyrTRNA}
1791:
1792: \end_inset
1793:
1794:
1795: \end_inset
1796:
1797:
1798: \layout Standard
1799:
1800: Residues distinct for each substrate-group could be specific for it, e.g.
1801: Gln-179.
1802: Detection of residue Tyr-175 as SDR suggests that there could be more functions
1803: associated with this structural family than these four AATSs.
1804: Detection of residues close to cofactor indicates different/no cofactors
1805: used by other functions of this structural family.
1806: Some residues speculated by
1807: \begin_inset LatexCommand \citet{tyrTRNAspecificity}
1808:
1809: \end_inset
1810:
1811: to be functional, stay undetected, e.g.
1812: Asn-128 which is not predicted due to high entropy (Ser dominates the MSSA
1813: column, not Asn).
1814: \layout Section
1815:
1816: Conclusion
1817: \layout Standard
1818:
1819: We have combined structural and sequence information, functional annnotation,
1820: residue entropy and environment specific substitution tables to predict
1821: specificity determining residues.
1822: We tested the predictions by using information of specific ligands and
1823: in some cases, published literature.
1824: We found that the predictions are far from random and functionally relevant,
1825: which suggests that our approach is effective.
1826: Predictions obtained with functional annotation (SDRs) and without it (baseline
1827: ) are different, suggesting that available functional annotation is valuable.
1828: SDR and baseline predictions are complementary because they enlarge the
1829: set of functionally significant residues that can be computationally identified.
1830: We expected and found that our method cannot identify significant residues
1831: in absence of high quality evolutionary information, hence the importance
1832: of identifying chemically interesting patches remains undiminished.
1833: A major concern is how to obtain functional partitions in absence of annotation
1834: , which is similar as establishing ortho/paralogy relationships.
1835: We plan to explore structure-sequence scoring schemes that would help establish
1836: functional partitions reliably.
1837: Alternatively, it would be useful to analyze the effects of constructing
1838: a functional partition based on sequence identity.
1839: We plan to use residue proximity information and residue contact conservation
1840: to detect clusters which may not be conserved in the obvious sense.
1841: We expect that cluster identification will alleviate the problem of not
1842: identifying structurally conserved residues.
1843: The most important purpose of SDR and catalytic residue identification
1844: is to help classify SNPs into normal/deleterious classes and this would
1845: be an important avenue to explore in near future.
1846: \layout Subsection*
1847:
1848: Acknowledgements
1849: \layout Standard
1850:
1851: We thank Dr Kenji Mizuguchi and Dr Vijayalakshmi Chelliah for helpful discussion
1852: s.
1853: Swanand Gore thanks Cambridge Commonwealth Trust and Universities UK Overseas
1854: Research Studentship for funding.
1855: \layout Standard
1856:
1857:
1858: \begin_inset LatexCommand \BibTeX[marko]{sdr}
1859:
1860: \end_inset
1861:
1862:
1863: \the_end
1864: