1: \documentclass[10pt]{article}
2:
3:
4: \usepackage{graphicx}
5: \usepackage{cite} % Make references as [1-4], not [1,2,3,4]
6:
7: \setlength{\topmargin}{0.0cm}
8: \setlength{\textheight}{21.5cm}
9: \setlength{\oddsidemargin}{0cm}
10: \setlength{\textwidth}{16.5cm}
11: \setlength{\columnsep}{0.6cm}
12:
13: \begin{document}
14:
15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16: %% %%
17: %% Enter the title of your article here %%
18: %% %%
19: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
20:
21: \title{CRNPRED: Highly Accurate Prediction of One-dimensional Protein Structures by Large-scale Critical Random Networks}
22:
23: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24: %% %%
25: %% Enter the authors here %%
26: %% %%
27: %% Ensure \and is entered between all but %%
28: %% the last two authors. This will be %%
29: %% replaced by a comma in the final article %%
30: %% %%
31: %% Ensure there are no trailing spaces at %%
32: %% the ends of the lines %%
33: %% %%
34: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
35:
36:
37: \author{Akira R Kinjo$^{1,2}$ and Ken Nishikawa$^{1,2}$\\
38: $^1$Center for Information Biology and DNA Data Bank of Japan,\\
39: National Institute of Genetics, Mishima, 411-8540, Japan\\
40: $^2$Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), \\Mishima 411-8540, Japan
41: }
42:
43: \maketitle
44:
45: \begin{abstract}
46: \textbf{Background:}
47: One-dimensional protein structures such as secondary structures or contact
48: numbers are useful for three-dimensional structure prediction and helpful for
49: intuitive understanding of the sequence-structure relationship.
50: Accurate prediction methods will serve as a basis for these and other purposes.\\
51: \textbf{Results:} We implemented a program CRNPRED which predicts secondary
52: structures, contact numbers and residue-wise contact orders. This program is
53: based on a novel machine learning scheme called critical random
54: networks. Unlike most conventional one-dimensional structure prediction
55: methods which are based on local windows of an amino acid sequence, CRNPRED
56: takes into account the whole sequence. CRNPRED achieves, on average per chain,
57: $Q_3$ = 81\% for secondary structure prediction, and correlation coefficients
58: of 0.75 and 0.61 for contact number and residue-wise contact order
59: predictions, respectively.\\
60: \textbf{Conclusion:} CRNPRED will be a useful tool for computational as
61: well as experimental biologists who need accurate one-dimensional protein
62: structure predictions.
63: \end{abstract}
64:
65:
66:
67: \section*{Background}
68:
69: One-dimensional (1D) structures of a protein are residue-wise
70: quantities or symbols onto which some features of the native
71: three-dimensional (3D) structure are projected.
72: 1D structures are of interest for several reasons. For example, predicted
73: secondary structures, a kind of 1D structures, are often used to limit the
74: conformational space to be searched in 3D structure prediction.
75: Furthermore, it has recently been shown that certain sets of the native
76: (as opposed to predicted) 1D structures of
77: a protein contain sufficient information to recover the native 3D
78: structure~\cite{PortoETAL2004,KinjoANDNishikawa2005}. These 1D structures are
79: either the principal eigenvector of the contact map~\cite{PortoETAL2004} or a set of secondary structures (SS), contact numbers (CN) and residue-wise contact orders (RWCO)~\cite{KinjoANDNishikawa2005}.
80: Therefore, it is possible, at least in principle, to predict the native 3D
81: structure by first predicting the 1D structures, and then by constructing
82: the 3D structure from these 1D structures. 1D structures are not only useful
83: for 3D structure predictions, but also helpful for intuitive understanding
84: of the correspondence between the protein structure and its amino acid sequence
85: due to the residue-wise characteristics of 1D structures. Therefore, accurate
86: prediction of 1D protein structures is of fundamental biological interest.
87:
88: Secondary structure prediction has a long history \cite{Rost2003}.
89: Almost all the modern predictors are based on position-specific scoring
90: matrices (PSSM) and some kind of machine learning techniques such as neural
91: networks or support vector machines. Currently the best predictors achieve
92: $Q_3$ of 77--79\% \cite{Jones1999,PollastriANDMcLysaght2005}.
93: The study of contact number prediction also started long time
94: ago \cite{NishikawaANDOoi1980, NishikawaANDOoi1986}, but further
95: improvements were made only recently \cite{KinjoETAL2005, Yuan2005, KinjoANDNishikawa2005c}. These recent methods are based on the ideas developed in SS
96: predictions (i.e., PSSM and machine learning), and achieve a correlation
97: coefficient of 0.68--0.73.
98:
99: Recently, we have developed a new method for accurately predicting SS, CN,
100: and RWCO based on a novel machine learning scheme,
101: critical random networks (CRN) ~\cite{KinjoANDNishikawa2005c}.
102: In this paper, we briefly describe the formulation of the method, and recent
103: improvements leading to even better predictions.
104: The computer program for SS, CN, and RWCO prediction named CRNPRED has been
105: developed for the convenience of the general user, and a web interface and
106: source code are made available online.
107:
108: \section*{Implementation}
109:
110: \subsection*{Definition of 1D structures}
111: \textit{Secondary structures (SS):}
112: Secondary structures were defined by the DSSP program \cite{DSSP}.
113: For three-state SS prediction, the simple encoding scheme (the so-called CK
114: mapping) was employed \cite{CrooksANDBrenner2004}.
115: That is, $\alpha$ helices ($H$), $\beta$ strands ($E$), and other structures
116: (``coils'') defined by DSSP were encoded as $H$, $E$, and $C$, respectively.
117: Note that we do not use the CASP-style conversion scheme (the so-called EHL
118: mapping) in which DSSP's $H$, $G$ ($3_{10}$ helix) and $I$ ($\pi$ helix) are encoded as $H$, and DSSP's $E$ and $B$ ($\beta$ bridge) as $E$.
119: We believe the CK mapping is more natural and useful for 3D structure
120: predictions (e.g., geometrical restraints should be different between an
121: $\alpha$ helix and a $3_{10}$ helix).
122: For SS prediction, we introduce feature variables $(y_i^H, y_i^E, y_i^C)$
123: to represent each type of secondary structures at the $i$-th residue position,
124: so that $H$ is represented as $(1,-1,-1)$, $E$ as $(-1,1,-1)$, and $C$ as
125: $(-1,-1,1)$.
126:
127: \textit{Contact numbers (CN):}
128: Let $C_{i,j}$ represent the contact map of a protein. Usually, the contact
129: map is defined so that $C_{i,j} = 1$ if the $i$-th and $j$-th residues are in
130: contact by some definition, or $C_{i,j} = 0$, otherwise. As in our
131: previous study, we slightly modify the definition using a sigmoid function.
132: That is,
133: \begin{equation}
134: C_{i,j} = 1/\{1+\exp[w(r_{i,j} - d)]\}
135: \end{equation}
136: where $r_{i,j}$ is the distance between $C_{\beta}$ ($C_{\alpha}$
137: for glycines) atoms of the $i$-th and $j$-th residues, $d = 12$\AA{} is a
138: cutoff distance, and $w$ is a sharpness parameter of the sigmoid function
139: which is set to 3 \cite{KinjoETAL2005,KinjoANDNishikawa2005}. The rather
140: generous cutoff length of 12\AA{} was shown to optimize the prediction
141: accuracy \cite{KinjoETAL2005}. The use of the sigmoid function enables us to
142: use the contact numbers in molecular dynamics
143: simulations \cite{KinjoANDNishikawa2005}.
144: Using the above definition of the contact map, the contact number of the
145: $i$-th residue of a protein is defined as
146: \begin{equation}
147: n_i = \sum_{j:|i-j|>2}C_{i,j}. \label{eq:defcn}
148: \end{equation}
149: The feature variable $y_i$ for CN is defined as $y_i = n_i / \log L$ where
150: $L$ is the sequence length of a target protein. The normalization
151: factor $\log L$ is introduced because we have observed that the contact
152: number averaged over a protein chain is roughly proportional to $\log L$,
153: and thus division by this value removes the size-dependence of predicted
154: contact numbers.
155:
156: \textit{Residue-wise contact orders (RWCO):}
157: RWCO was first introduced in \cite{KinjoANDNishikawa2005}.
158: This quantity measures the extent to which a residue makes long-range contacts
159: in a native protein structure.
160: Using the same notation as contact numbers,
161: the RWCO of the $i$-th residue in a protein structure is defined by
162: \begin{equation}
163: o_i = \sum_{j:|i-j|>2}|i-j|C_{i,j}. \label{eq:defrwco}
164: \end{equation}
165: The feature variable $y_i$ for RWCO is defined as $y_i = o_i / L$ where
166: $L$ is the sequence length. Due to the similar reason as CN, the normalization
167: factor $L$ was introduced to remove the size-dependence of the predicted
168: RWCOs (the RWCO averaged over a protein chain is roughly proportional to the
169: chain length).
170:
171: \subsection*{Critical random networks}
172: Here we briefly describe the critical random network (CRN) method introduced
173: in \cite{KinjoANDNishikawa2005c} which should be referred to for the details.
174: Unlike most conventional methods for 1D structure prediction [except for
175: some including the bidirectional recurrent neural networks \cite{BaldiETAL1999,PollastriANDMcLysaght2005,ChenANDChaudhari2006}], the CRN method
176: takes the whole amino acid sequence into account. In the CRN method,
177: an $N$-dimensional state vector $\mathbf{x}_i$ is assigned to the $i$-th
178: residue of the target sequence (we use $N = 5000$ throughout this paper).
179: Neighboring state vectors along the sequence
180: are connected via a random $N\times N$ orthogonal matrix $W$. This matrix is
181: also block-diagonal with the size of blocks ranging uniformly randomly
182: between 2 and 50. The input to the CRN is the position-specific scoring matrix
183: (PSSM), $U = (\mathbf{u}_1, \cdots, \mathbf{u}_L)$
184: of the target sequence obtained by PSI-BLAST~\cite{AltschulETAL1997} ($L$ is the sequence length of the target protein).
185: We impose that the state vectors satisfy the following equation of state:
186: \begin{equation}
187: \label{eq:eos}
188: \mathbf{x}_i = \tanh[\beta W (\mathbf{x}_{i-1} + \mathbf{x}_{i+1}) + \alpha V\mathbf{u}_i]
189: \end{equation}
190: for $i = 1, \cdots , L$ where $V$ is an $N\times 21$ random matrix
191: (the 21st component of $\mathbf{u}_i$ is always set to unity), and $\beta$ and $\alpha$ are scalar parameters. The fixed boundary condition is imposed ($\mathbf{x}_0 = \mathbf{x}_{L+1} = \mathbf{0}$). By setting $\beta = 0.5$,
192: the system of state vectors is made to be near a critical point in a certain
193: sense, and thus the range of site-site correlation is expected to be long
194: when $\alpha$ is sufficiently small but finite~\cite{KinjoANDNishikawa2005c}.
195: In this way, each state vector implicitly incorporates long-range correlations.
196: The 1D structure of the $i$-th residue is predicted as
197: a linear projection of a local window of the PSSM and the state vector obtained by solving Eq. \ref{eq:eos}:
198: \begin{equation}
199: \label{eq:pred}
200: y_i = \sum_{m=-M}^{M}\sum_{a=1}^{21}D_{m,a}u_{a,i+m} + \sum_{k=1}^{N}E_{k}x_{k,i}
201: \end{equation}
202: where $y_i$ is the predicted quantity, and $D_{m,a}$ and $E_k$ are the
203: regression parameters. In the first summation, each PSSM column is extended to
204: include the ``terminal'' residue.
205: Since Eq. \ref{eq:pred} is a simple linear equation once the equation of
206: state (Eq. \ref{eq:eos}) has been solved, learning the parameters $D_{m,a}$ and
207: $E_{k}$ reduces to an ordinary linear regression problem.
208: For SS prediction, the triple $(y^{H}_i, y^{E}_i, y^{C}_i)$ is
209: calculated simultaneously, and the SS class is predicted as
210: $\mathrm{arg}\max_{s\in \{H, E, C\}}y^{s}_i$. For the CN and RWCO prediction,
211: real values are predicted (2-state prediction is also made for CN using
212: the average CN for each residue type as the threshold for ``exposed''
213: or ``buried'' as in \cite{PollastriETAL2002}).
214: The half window size $M$ is set to 9 for SS and CN predictions, and to 26 for
215: RWCO.
216:
217: \subsection*{Ensemble prediction}
218: Since the CRN-based prediction is parametrized by the random matrices $W$
219: and $V$,
220: slightly different predictions are obtained for different pairs of $W$ and $V$.
221: We can improve the prediction by taking the average over an ensemble of
222: such different predictions. 20 CRN-based predictors were constructed using
223: 20 sets of different random matrices $W$ and $V$. CN and RWCO are predicted
224: as uniform averages of these 20 predictions.
225:
226: For SS prediction, we employ further training. Let $s_{i}^{t,n}$ be the
227: prediction results of the $n$-th predictor for 1D structure $t$
228: ($H$, $E$, $C$, CN, and RWCO) of the $i$-th residue.
229: The second stage SS prediction is made by the following linear scheme:
230: \begin{equation}
231: \label{eq:ss2}
232: y_{i}^{ss} = \sum_{n=1}^{20}\sum_{t}\sum_{m=-3}^{3}w_{n,t,m}s_{i+m}^{t,n}
233: \end{equation}
234: where $ss = H, E, C$, and $w_{n,t,m}$ is the weight obtained from a training
235: set. Finally, the feature variable for each SS class of the
236: $i$-th residue is obtained by $(y_{i-1}^{ss} + 2y_{i}^{ss} + y_{i+1}^{ss})/4$.
237: This last procedure was found particularly effective for improving the
238: segment overlap (SOV) measure.
239:
240: \subsection*{Additional input}
241: Another improvement is the addition of the amino acid composition of
242: the target sequence to the predictor \cite{Yuan2005}:
243: The term $\sum_{a=1}^{20}F_af_a$ was added to Eq. \ref{eq:pred} where $F_a$
244: is a regression parameter, and $f_a$ is the fraction of the amino acid
245: type $a$.
246:
247: \subsection*{Training and test data set}
248: We carried out a 15-fold cross-validation test following exactly the same
249: procedure and the same data set as the previous
250: study \cite{KinjoANDNishikawa2005c}. In the data set, there are 680 protein
251: domains, each of which represents a superfamily according to the SCOP
252: database (version 1.65) \cite{SCOP}. This data set was randomly divided so
253: that 630 domains were used for training and the remaining 50 domains for
254: testing, and the random division was repeated 15 times.
255: No pair of these domains belong to the same superfamily, and hence they are
256: not expected to be homologous. Thus, the present benchmark is a very
257: stringent one.
258:
259: For obtaining PSSMs by running PSI-BLAST, we use the UniRef100
260: (version 6.8) amino acid sequence database \cite{UniProt} containing some
261: 3 million entries.
262: Also the number of iterations in PSI-BLAST homology searches was reduced
263: to 3 times from 10 used in the previous study. This especially increased the
264: accuracy of SS predictions.
265: These results are consistent with the study of \cite{PrzybylskiANDRost2002}.
266:
267: \subsection*{Numerics}
268: One drawback of the CRN method is the computational time required for
269: numerically solving the equation of state (Eq. \ref{eq:eos}).
270: For that purpose, instead of the Gauss-Seidel-like
271: method previously used, we implemented a successive over-relaxation
272: method which was found to be much more efficient.
273:
274: Let $\nu$ denote the stage of iteration.
275: We set the initial value of the state vectors (with $\nu = 0$) as
276: \begin{equation}
277: \mathbf{x}_{i}^{(0)} = \tanh [\alpha V \mathbf{u}_{i}].\label{eq:init_eos}
278: \end{equation}
279: Then, for $i = 1, \cdots , L$ (in increasing order of $i$), we update
280: the state vectors by
281: \begin{eqnarray}
282: \mathbf{x}_{i}^{(2\nu+1)} \gets & \mathbf{x}_{i}^{(2\nu)} + \omega
283: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)}\nonumber\\
284: & +\mathbf{x}_{i+1}^{(2\nu)})
285: + \alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu)}\}.
286: \label{eq:feos}
287: \end{eqnarray}
288: Next, we update them in the reverse order. That is, for $i = L, \cdots , 1$
289: (in decreasing order of $i$),
290: \begin{eqnarray}
291: \mathbf{x}_{i}^{(2\nu+2)} \gets & \mathbf{x}_{i}^{(2\nu+1)} + \omega
292: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)} \nonumber\\
293: & + \mathbf{x}_{i+1}^{(2\nu+2)})
294: +\alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu+1)}\}.
295: \label{eq:beos}
296: \end{eqnarray}
297: We then set $\nu \gets \nu + 1$, and iterate Eqs. (\ref{eq:feos}) and (\ref{eq:beos}) until $\{\mathbf{x}_{i}\}$ converges. The acceleration parameter of $\omega = 1.4$ was found effective.
298: The convergence criterion is
299: \begin{equation}
300: \sqrt{\sum_{i=1}^{L}||\mathbf{x}_{i}^{(2\nu+2)}-\mathbf{x}_{i}^{(2\nu+1)}||_{\mathbf{R}^{N}}^{2}/{NL}}<10^{-3}
301: \end{equation}
302: where $||\cdot||_{\mathbf{R}^{N}}$ denotes the Euclidean norm.
303: This criterion is much less stringent than previous study ($10^{-7}$), but this
304: does not affect the prediction accuracy significantly.
305: Convergence is typically achieved within 10 to 12 iterations for one protein.
306:
307:
308: \section*{Results and Discussion}
309: There are two main ingredients for the improved one-dimensional protein
310: structure prediction in the present study. First is the use of large-scale
311: critical random networks of 5000 dimension and 20 ensemble predictors.
312: Second is the use of a large sequence database (UniRef100) for PSI-BLAST
313: searches.
314: As demonstrated in Table~1, the CRN method achieves remarkably
315: accurate predictions.
316: In comparison with the previous study \cite{KinjoANDNishikawa2005c} based on
317: 2000-dimensional CRNs (10 ensemble predictors),
318: the $Q_3$ and $SOV$ measures in SS predictions improved from 77.8\% and 77.3\%
319: to 80.5\% and 80.0\%, respectively. Similarly, the average correlation
320: coefficient improved from 0.726 to 0.746 for CN predictions,
321: and from 0.601 to 0.613 for RWCO predictions. The 2-state predictions for
322: CN yields, on average, $Q_2$ = 76.8\% per chain and 76.7\% per residue, and
323: Matthews' correlation coefficient of 0.533.
324:
325: A closer examination of the SS prediction results (Table 2)
326: reveals the drastic improvement of $\beta$ strand prediction from $Q_E$
327: = 61.9\% to 69.3\% (per residue). Although the values of $Q_C$ and $Q_E^{pre}$
328: are slightly lower than in the previous study by 0.6--1.0\%, the accuracies of
329: other classes have improved by 2.5--4\%.
330:
331: CRNPRED compares favorably with other secondary structure prediction methods.
332: The widely used PSIPRED program \cite{Jones1999,PSIPRED} which is based on conventional
333: feed-forward neural networks achieves $Q_3$ of 78\%.
334: A more recently developed method, Porter, \cite{PollastriANDMcLysaght2005}
335: which is based on bidirectional recurrent neural networks achieves $Q_3$ of
336: 79\%. An even more intricate method based on bidirectional segmented-memory
337: recurrent neural networks \cite{ChenANDChaudhari2006} shows an accuracy
338: of $Q_3$ = 73\% (this rather low accuracy may be attributed to the small size
339: of training set used). However, it should be reminded that these studies are
340: based on different data sets for both training and testing as well as the
341: definition of
342: secondary structural categories. Therefore, these comparisons may not be
343: very informative, but only give a rough estimation of relative performance.
344:
345: Regarding the contact number prediction, CRNPRED, achieving $Cor$ = 0.75,
346: is the most accurate method available today. The simple linear method \cite{KinjoETAL2005} with multiple
347: sequence alignment derived from the HSSP database \cite{HSSP} showed a
348: correlation coefficient of 0.63. A more advanced method based on support vector machines (local window-based) achieves a correlation of 0.68 per chain\cite{Yuan2005}.
349:
350: It is known that the number of homologs found by the PSI-BLAST searches
351: significantly affects the prediction accuracies \cite{PrzybylskiANDRost2002}.
352: We have examined this effect by plotting the accuracy measures for a
353: given minimum number of homologs found by PSI-BLAST (Fig. 1).
354: For example, we see in Fig. 1 that, for those proteins with
355: more than 100 homologs, the average $Q_3$ for SS predictions is 82.2\%.
356: The effect of the number of homologs significantly depends on the type of
357: 1D structure. For SS prediction, $Q_3$ steadily increases as the number of
358: homologs increases up to 100, but it stays in the range between 82.0 and 82.4
359: until the minimum number of homologs reaches around 400, and then it starts to
360: decrease. For CN prediction, $Cor$ also increases steadily but more slowly,
361: and it does not degrade when the minimum number of homologs reaches 500.
362: This tendency implies that CN is more conservative than SS during protein
363: evolution, which is consistent with previous observations \cite{KinjoANDNishikawa2004,BastollaETAL2005}. On the contrary, RWCO exhibits a peculiar behavior.
364: The value of $Cor$ reaches its peak at the minimum number of homologs of 80
365: beyond which the value rapidly decreases. This indicates that RWCO is not
366: evolutionarily well conserved. It was observed that the accuracies of SS and
367: CN predictions constantly increased when the dimension of CRNs was increased
368: from 2000 to 5000, but such was not the case for RWCO (data not shown).
369: RWCO seems to be such delicate a quantity that it is very difficult to extract
370: relevant information from the amino acid sequence.
371:
372: Finally, we note on practical applicability of predicted 1D
373: structures. We do not believe, at present, that the construction of
374: a 3D structure purely from the predicted 1D structures is practical,
375: if possible at all, because of the limited accuracy of the RWCO prediction.
376: However, SS and CN predictions are very accurate for many proteins
377: so that they may already serve as valuable restraints for 3D structure
378: predictions. Also, SS and CN predictions may be applied to domain
379: identification often necessary for experimental determination of protein
380: structures. CRNPRED has been proved useful for such a purpose \cite{MinezakiETAL2006}.
381: Although of the limited accuracy, predicted RWCOs still exhibit significant
382: correlations with the correct values. Since RWCOs reflect the extent to which
383: a residue is involved in long-range contacts, predicted RWCOs may be
384: useful for enumerating potentially structurally important residues.
385:
386: An interesting alternative application of the CRN framework is to regard the
387: solution of the equation of state (Eq. \ref{eq:eos}) as an extended sequence
388: profile. By so doing, it is straightforward to apply the solution to the
389: profile-profile comparison for fold recognition \cite{TomiiANDAkiyama2004}.
390: Such an application may be also pursued in the future.
391:
392: \section*{Availability and Requirements}
393:
394: \begin{description}
395: \item[Project name:] CRNPRED
396: \item[Project home page:] ~\\http://bioinformatics.org/crnpred/
397: \item[Operating system:] UNIX-like OS (including Linux and Mac OS X).
398: \item[Programming language:] C.
399: \item[Other requirements:] zsh, PSI-BLAST (blastpgp), The UniRef100 amino acid sequence database.
400: \item[License:] Public domain.
401: \item[Any restrictions to use by non-academics:] None.
402: \end{description}
403:
404: \section*{List of Abbreviations Used}
405: CRN, critical random network; SS, secondary structure; CN, contact number;
406: RWCO, residue-wise contact order; 1D, one-dimensional; 3D, three-dimensional.
407:
408: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
409: \section*{Authors contributions}
410: A. R. K. designed and implemented the method, carried out benchmarks, wrote
411: the first draft of the manuscript. A. R. K. and K. N. analyzed the results and
412: improved the manuscript.
413:
414:
415: %%%%%%%%%%%%%%%%%%%%%%%%%%%
416: \section*{Acknowledgements}
417: We thank Yasumasa Shigemoto for helping construct the CRNPRED web interface.
418: This work was supported in part by the MEXT, Japan.
419:
420:
421:
422: %\bibliographystyle{bmc_article} % Style BST file
423: % \bibliography{refs,mypaper}
424: %% BioMed_Central_Bib_Style_v1.01
425:
426: \begin{thebibliography}{10}
427: \providecommand{\url}[1]{[#1]}
428: \providecommand{\urlprefix}{}
429:
430: \bibitem{PortoETAL2004}
431: Porto M, Bastolla U, Roman HE, Vendruscolo M: \textbf{Reconstruction of protein
432: structures from a vectorial representation}. \emph{Phys. Rev. Lett.} 2004,
433: \textbf{92}:218101.
434:
435: \bibitem{KinjoANDNishikawa2005}
436: Kinjo AR, Nishikawa K: \textbf{Recoverable one-dimensional encoding of protein
437: three-dimensional structures}. \emph{Bioinformatics} 2005,
438: \textbf{21}:2167--2170. [Doi:10.1093/bioinformatics/bti330].
439:
440: \bibitem{Rost2003}
441: Rost B: \textbf{Prediction in {1D}: secondary structure, membrane helices, and
442: accessibility}. In \emph{Structural Bioinformatics}. Edited by Bourne PE,
443: Weissig H, Hoboken, U.S.A.: Wiley-Liss, Inc. 2003:559--587.
444:
445: \bibitem{Jones1999}
446: Jones DT: \textbf{Protein secondary structure prediction based on
447: position-specific scoring matrices}. \emph{J. Mol. Biol.} 1999,
448: \textbf{292}:195--202.
449:
450: \bibitem{PollastriANDMcLysaght2005}
451: Pollastri G, {McLysaght} A: \textbf{Porter: a new, accurate server for protein
452: secondary structure prediction}. \emph{Bioinformatics} 2005,
453: \textbf{21}:1719--1720.
454:
455: \bibitem{NishikawaANDOoi1980}
456: Nishikawa K, Ooi T: \textbf{Prediction of the surface-interior diagram of
457: globular proteins by an empirical method}. \emph{Int. J. Peptide Protein
458: Res.} 1980, \textbf{16}:19--32.
459:
460: \bibitem{NishikawaANDOoi1986}
461: Nishikawa K, Ooi T: \textbf{Radial locations of amino acid residues in a
462: globular protein: Correlation with the sequence}. \emph{J. Biochem.} 1986,
463: \textbf{100}:1043--1047.
464:
465: \bibitem{KinjoETAL2005}
466: Kinjo AR, Horimoto K, Nishikawa K: \textbf{Predicting absolute contact numbers
467: of native protein structure from amino acid sequence}. \emph{Proteins} 2005,
468: \textbf{58}:158--165. [Doi:10.1002/prot.20300].
469:
470: \bibitem{Yuan2005}
471: Yuan Z: \textbf{Better prediction of protein contact number using a support
472: vector regression analysis of amino acid sequence}. \emph{BMC Bioinformatics}
473: 2005, \textbf{6}:248.
474:
475: \bibitem{KinjoANDNishikawa2005c}
476: Kinjo AR, Nishikawa K: \textbf{Predicting secondary structures, contact
477: numbers, and residue-wise contact orders of native protein structure from
478: amino acid sequence using critical random networks}. \emph{BIOPHYSICS} 2005,
479: \textbf{1}:67--74. [Doi:10.2142/biophysics.1.67].
480:
481: \bibitem{DSSP}
482: Kabsch W, Sander C: \textbf{Dictionary of Protein Secondary Structure: Pattern
483: recognition of hydrogen bonded and geometrical features}. \emph{Biopolymers}
484: 1983, \textbf{22}:2577--2637.
485:
486: \bibitem{CrooksANDBrenner2004}
487: Crooks GE, Brenner SE: \textbf{Protein secondary structure: entropy,
488: correlations and prediction}. \emph{Bioinformatics} 2004,
489: \textbf{20}:1603--1611.
490:
491: \bibitem{BaldiETAL1999}
492: Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G: \textbf{Exploiting the past
493: and the future in protein secondary structure prediction}.
494: \emph{Bioinformatics} 1999, \textbf{15}:937--946.
495:
496: \bibitem{ChenANDChaudhari2006}
497: Chen J, Chaudhari NS: \textbf{Bidirectional segmented-memory recurrent neural
498: network for protein secondary structure prediction}. \emph{Soft Computing}
499: 2006, \textbf{10}:315--324.
500:
501: \bibitem{AltschulETAL1997}
502: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DL:
503: \textbf{Gapped Blast and {PSI}-Blast: A new generation of protein database
504: search programs}. \emph{Nucleic Acids Res.} 1997, \textbf{25}:3389--3402.
505:
506: \bibitem{PollastriETAL2002}
507: Pollastri G, Baldi P, Fariselli P, Casadio R: \textbf{Prediction of
508: coordination number and relative solvent accessibility in proteins}.
509: \emph{Proteins} 2002, \textbf{47}:142--153.
510:
511: \bibitem{SCOP}
512: Murzin AG, Brenner SE, Hubbard T, Chothia C: \textbf{{SCOP}: A structural
513: classification of proteins database for the investigation of sequences and
514: structures}. \emph{J. Mol. Biol.} 1995, \textbf{247}:536--540.
515:
516: \bibitem{UniProt}
517: Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E,
518: Huang H, Lopez R, Magrane M, Martin MJ, Natale D, {O'Donovan} C, Redaschi N,
519: Yeh LS: \textbf{The universal protein resource ({UniProt})}. \emph{Nucleic
520: Acids Res.} 2005, \textbf{33}:D154--D159.
521:
522: \bibitem{PrzybylskiANDRost2002}
523: Przybylski D, Rost B: \textbf{Alignments grow, secondary structure prediction
524: improves}. \emph{Proteins} 2002, \textbf{46}:197--205.
525:
526: \bibitem{PSIPRED}
527: McGuffin LJ, Bryson K, Jones DT: \textbf{The PSIPRED protein structure
528: prediction server}. \emph{Bioinformatics} 2000, \textbf{16}:404--405.
529:
530: \bibitem{HSSP}
531: Sander C, Schneider R: \textbf{Database of homology-derived protein
532: structures}. \emph{Proteins} 1991, \textbf{9}:56--68.
533:
534: \bibitem{KinjoANDNishikawa2004}
535: Kinjo AR, Nishikawa K: \textbf{Eigenvalue analysis of amino acid substitution
536: matrices reveals a sharp transition of the mode of sequence conservation in
537: proteins}. \emph{Bioinformatics} 2004, \textbf{20}:2504--2508.
538:
539: \bibitem{BastollaETAL2005}
540: Bastolla U, Porto M, Roman HE, Vendruscolo M: \textbf{Principal eigenvector of
541: contact matrices and hydrophobicity profiles in proteins}. \emph{Proteins}
542: 2005, \textbf{58}:22--30.
543:
544: \bibitem{MinezakiETAL2006}
545: Minezaki Y, Homma K, Kinjo AR, Nishikawa K: \textbf{Human transcription factors
546: contain a high fraction of intrinsically disordered regions essential for
547: transcriptional regulation}. \emph{J. Mol. Biol.} 2006. in press.
548:
549: \bibitem{TomiiANDAkiyama2004}
550: Tomii K, Akiyama Y: \textbf{{FORTE}: a profile-profile comparison tool for
551: protein fold recognition}. \emph{Bioinformatics} 2004, \textbf{20}:594--595.
552:
553: \bibitem{SOV99}
554: Zemla A, Venclovas C, Fidelis K, Rost B: \textbf{A modified definition of Sov,
555: a segment-based measure for protein secondary structure prediction
556: assessment}. \emph{Proteins} 1999, \textbf{34}:220--223.
557:
558: \end{thebibliography}
559:
560: \newcommand{\BMCxmlcomment}[1]{}
561:
562: \BMCxmlcomment{
563:
564: <refgrp>
565:
566: <bibl id="B1">
567: <title><p>Reconstruction of protein structures from a vectorial
568: representation</p></title>
569: <aug>
570: <au><snm>Porto</snm><fnm>M.</fnm></au>
571: <au><snm>Bastolla</snm><fnm>U.</fnm></au>
572: <au><snm>Roman</snm><fnm>H. E.</fnm></au>
573: <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>
574: </aug>
575: <source>Phys. Rev. Lett.</source>
576: <pubdate>2004</pubdate>
577: <volume>92</volume>
578: <fpage>218101</fpage>
579: </bibl>
580:
581: <bibl id="B2">
582: <title><p>Recoverable one-dimensional encoding of protein three-dimensional
583: structures</p></title>
584: <aug>
585: <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
586: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
587: </aug>
588: <source>Bioinformatics</source>
589: <pubdate>2005</pubdate>
590: <volume>21</volume>
591: <fpage>2167</fpage>
592: <lpage>2170</lpage>
593: <note>doi:10.1093/bioinformatics/bti330</note>
594: </bibl>
595:
596: <bibl id="B3">
597: <title><p>Prediction in {1D}: secondary structure, membrane helices, and
598: accessibility</p></title>
599: <aug>
600: <au><snm>Rost</snm><fnm>B.</fnm></au>
601: </aug>
602: <source>Structural Bioinformatics</source>
603: <publisher>Hoboken, U.S.A.: Wiley-Liss, Inc.</publisher>
604: <editor>Bourne, P. E. and Weissig, H.</editor>
605: <section><title><p>28</p></title></section>
606: <pubdate>2003</pubdate>
607: <fpage>559</fpage>
608: <lpage>587</lpage>
609: </bibl>
610:
611: <bibl id="B4">
612: <title><p>Protein secondary structure prediction based on position-specific
613: scoring matrices</p></title>
614: <aug>
615: <au><snm>Jones</snm><fnm>D. T.</fnm></au>
616: </aug>
617: <source>J. Mol. Biol.</source>
618: <pubdate>1999</pubdate>
619: <volume>292</volume>
620: <fpage>195</fpage>
621: <lpage>202</lpage>
622: </bibl>
623:
624: <bibl id="B5">
625: <title><p>Porter: a new, accurate server for protein secondary structure
626: prediction</p></title>
627: <aug>
628: <au><snm>Pollastri</snm><fnm>G.</fnm></au>
629: <au><snm>{McLysaght}</snm><fnm>A.</fnm></au>
630: </aug>
631: <source>Bioinformatics</source>
632: <pubdate>2005</pubdate>
633: <volume>21</volume>
634: <fpage>1719</fpage>
635: <lpage>-1720</lpage>
636: </bibl>
637:
638: <bibl id="B6">
639: <title><p>Prediction of the surface-interior diagram of globular proteins by
640: an empirical method</p></title>
641: <aug>
642: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
643: <au><snm>Ooi</snm><fnm>T.</fnm></au>
644: </aug>
645: <source>Int. J. Peptide Protein Res.</source>
646: <pubdate>1980</pubdate>
647: <volume>16</volume>
648: <fpage>19</fpage>
649: <lpage>32</lpage>
650: </bibl>
651:
652: <bibl id="B7">
653: <title><p>Radial locations of amino acid residues in a globular protein:
654: Correlation with the sequence</p></title>
655: <aug>
656: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
657: <au><snm>Ooi</snm><fnm>T.</fnm></au>
658: </aug>
659: <source>J. Biochem.</source>
660: <pubdate>1986</pubdate>
661: <volume>100</volume>
662: <fpage>1043</fpage>
663: <lpage>1047</lpage>
664: </bibl>
665:
666: <bibl id="B8">
667: <title><p>Predicting absolute contact numbers of native protein structure
668: from amino acid sequence</p></title>
669: <aug>
670: <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
671: <au><snm>Horimoto</snm><fnm>K.</fnm></au>
672: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
673: </aug>
674: <source>Proteins</source>
675: <pubdate>2005</pubdate>
676: <volume>58</volume>
677: <fpage>158</fpage>
678: <lpage>165</lpage>
679: <note>doi:10.1002/prot.20300</note>
680: </bibl>
681:
682: <bibl id="B9">
683: <title><p>Better prediction of protein contact number using a support vector
684: regression analysis of amino acid sequence</p></title>
685: <aug>
686: <au><snm>Yuan</snm><fnm>Z.</fnm></au>
687: </aug>
688: <source>BMC Bioinformatics</source>
689: <pubdate>2005</pubdate>
690: <volume>6</volume>
691: <fpage>248</fpage>
692: </bibl>
693:
694: <bibl id="B10">
695: <title><p>Predicting secondary structures, contact numbers, and residue-wise
696: contact orders of native protein structure from amino acid sequence using
697: critical random networks</p></title>
698: <aug>
699: <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
700: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
701: </aug>
702: <source>BIOPHYSICS</source>
703: <pubdate>2005</pubdate>
704: <volume>1</volume>
705: <fpage>67</fpage>
706: <lpage>74</lpage>
707: <note>doi:10.2142/biophysics.1.67</note>
708: </bibl>
709:
710: <bibl id="B11">
711: <title><p>Dictionary of Protein Secondary Structure: Pattern recognition of
712: hydrogen bonded and geometrical features</p></title>
713: <aug>
714: <au><snm>Kabsch</snm><fnm>W.</fnm></au>
715: <au><snm>Sander</snm><fnm>C.</fnm></au>
716: </aug>
717: <source>Biopolymers</source>
718: <pubdate>1983</pubdate>
719: <volume>22</volume>
720: <fpage>2577</fpage>
721: <lpage>2637</lpage>
722: </bibl>
723:
724: <bibl id="B12">
725: <title><p>Protein secondary structure: entropy, correlations and
726: prediction</p></title>
727: <aug>
728: <au><snm>Crooks</snm><fnm>G. E.</fnm></au>
729: <au><snm>Brenner</snm><fnm>S. E.</fnm></au>
730: </aug>
731: <source>Bioinformatics</source>
732: <pubdate>2004</pubdate>
733: <volume>20</volume>
734: <fpage>1603</fpage>
735: <lpage>1611</lpage>
736: </bibl>
737:
738: <bibl id="B13">
739: <title><p>Exploiting the past and the future in protein secondary structure
740: prediction</p></title>
741: <aug>
742: <au><snm>Baldi</snm><fnm>P.</fnm></au>
743: <au><snm>Brunak</snm><fnm>S.</fnm></au>
744: <au><snm>Frasconi</snm><fnm>P.</fnm></au>
745: <au><snm>Soda</snm><fnm>G.</fnm></au>
746: <au><snm>Pollastri</snm><fnm>G.</fnm></au>
747: </aug>
748: <source>Bioinformatics</source>
749: <pubdate>1999</pubdate>
750: <volume>15</volume>
751: <fpage>937</fpage>
752: <lpage>946</lpage>
753: </bibl>
754:
755: <bibl id="B14">
756: <title><p>Bidirectional segmented-memory recurrent neural network for protein
757: secondary structure prediction</p></title>
758: <aug>
759: <au><snm>Chen</snm><fnm>J.</fnm></au>
760: <au><snm>Chaudhari</snm><fnm>N. S.</fnm></au>
761: </aug>
762: <source>Soft Computing</source>
763: <pubdate>2006</pubdate>
764: <volume>10</volume>
765: <fpage>315</fpage>
766: <lpage>324</lpage>
767: </bibl>
768:
769: <bibl id="B15">
770: <title><p>Gapped Blast and {PSI}-Blast: A new generation of protein database
771: search programs</p></title>
772: <aug>
773: <au><snm>Altschul</snm><fnm>S. F.</fnm></au>
774: <au><snm>Madden</snm><fnm>T. L.</fnm></au>
775: <au><snm>Schaffer</snm><fnm>A. A.</fnm></au>
776: <au><snm>Zhang</snm><fnm>J.</fnm></au>
777: <au><snm>Zhang</snm><fnm>Z.</fnm></au>
778: <au><snm>Miller</snm><fnm>W.</fnm></au>
779: <au><snm>Lipman</snm><fnm>D. L.</fnm></au>
780: </aug>
781: <source>Nucleic Acids Res.</source>
782: <pubdate>1997</pubdate>
783: <volume>25</volume>
784: <fpage>3389</fpage>
785: <lpage>3402</lpage>
786: </bibl>
787:
788: <bibl id="B16">
789: <title><p>Prediction of coordination number and relative solvent
790: accessibility in proteins</p></title>
791: <aug>
792: <au><snm>Pollastri</snm><fnm>G.</fnm></au>
793: <au><snm>Baldi</snm><fnm>P.</fnm></au>
794: <au><snm>Fariselli</snm><fnm>P.</fnm></au>
795: <au><snm>Casadio</snm><fnm>R.</fnm></au>
796: </aug>
797: <source>Proteins</source>
798: <pubdate>2002</pubdate>
799: <volume>47</volume>
800: <fpage>142</fpage>
801: <lpage>153</lpage>
802: </bibl>
803:
804: <bibl id="B17">
805: <title><p>{SCOP}: A structural classification of proteins database for the
806: investigation of sequences and structures</p></title>
807: <aug>
808: <au><snm>Murzin</snm><fnm>A. G.</fnm></au>
809: <au><snm>Brenner</snm><fnm>S. E.</fnm></au>
810: <au><snm>Hubbard</snm><fnm>T.</fnm></au>
811: <au><snm>Chothia</snm><fnm>C.</fnm></au>
812: </aug>
813: <source>J. Mol. Biol.</source>
814: <pubdate>1995</pubdate>
815: <volume>247</volume>
816: <fpage>536</fpage>
817: <lpage>540</lpage>
818: </bibl>
819:
820: <bibl id="B18">
821: <title><p>The universal protein resource ({UniProt})</p></title>
822: <aug>
823: <au><snm>Bairoch</snm><fnm>A.</fnm></au>
824: <au><snm>Apweiler</snm><fnm>R.</fnm></au>
825: <au><snm>Wu</snm><fnm>C. H.</fnm></au>
826: <au><snm>Barker</snm><fnm>W. C.</fnm></au>
827: <au><snm>Boeckmann</snm><fnm>B.</fnm></au>
828: <au><snm>Ferro</snm><fnm>S.</fnm></au>
829: <au><snm>Gasteiger</snm><fnm>E.</fnm></au>
830: <au><snm>Huang</snm><fnm>H.</fnm></au>
831: <au><snm>Lopez</snm><fnm>R.</fnm></au>
832: <au><snm>Magrane</snm><fnm>M.</fnm></au>
833: <au><snm>Martin</snm><fnm>M. J.</fnm></au>
834: <au><snm>Natale</snm><fnm>D.A.</fnm></au>
835: <au><snm>{O'Donovan}</snm><fnm>C.</fnm></au>
836: <au><snm>Redaschi</snm><fnm>N.</fnm></au>
837: <au><snm>Yeh</snm><fnm>L. S.</fnm></au>
838: </aug>
839: <source>Nucleic Acids Res.</source>
840: <pubdate>2005</pubdate>
841: <volume>33</volume>
842: <fpage>D154</fpage>
843: <lpage>D159</lpage>
844: </bibl>
845:
846: <bibl id="B19">
847: <title><p>Alignments grow, secondary structure prediction
848: improves</p></title>
849: <aug>
850: <au><snm>Przybylski</snm><fnm>D.</fnm></au>
851: <au><snm>Rost</snm><fnm>B.</fnm></au>
852: </aug>
853: <source>Proteins</source>
854: <pubdate>2002</pubdate>
855: <volume>46</volume>
856: <fpage>197</fpage>
857: <lpage>205</lpage>
858: </bibl>
859:
860: <bibl id="B20">
861: <title><p>The PSIPRED protein structure prediction server</p></title>
862: <aug>
863: <au><snm>McGuffin</snm><fnm>L. J.</fnm></au>
864: <au><snm>Bryson</snm><fnm>K.</fnm></au>
865: <au><snm>Jones</snm><fnm>D. T.</fnm></au>
866: </aug>
867: <source>Bioinformatics</source>
868: <pubdate>2000</pubdate>
869: <volume>16</volume>
870: <fpage>404</fpage>
871: <lpage>405</lpage>
872: </bibl>
873:
874: <bibl id="B21">
875: <title><p>Database of homology-derived protein structures</p></title>
876: <aug>
877: <au><snm>Sander</snm><fnm>C.</fnm></au>
878: <au><snm>Schneider</snm><fnm>R.</fnm></au>
879: </aug>
880: <source>Proteins</source>
881: <pubdate>1991</pubdate>
882: <volume>9</volume>
883: <fpage>56</fpage>
884: <lpage>68</lpage>
885: </bibl>
886:
887: <bibl id="B22">
888: <title><p>Eigenvalue analysis of amino acid substitution matrices reveals a
889: sharp transition of the mode of sequence conservation in proteins</p></title>
890: <aug>
891: <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
892: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
893: </aug>
894: <source>Bioinformatics</source>
895: <pubdate>2004</pubdate>
896: <volume>20</volume>
897: <fpage>2504</fpage>
898: <lpage>2508</lpage>
899: </bibl>
900:
901: <bibl id="B23">
902: <title><p>Principal eigenvector of contact matrices and hydrophobicity
903: profiles in proteins</p></title>
904: <aug>
905: <au><snm>Bastolla</snm><fnm>U.</fnm></au>
906: <au><snm>Porto</snm><fnm>M.</fnm></au>
907: <au><snm>Roman</snm><fnm>H. E.</fnm></au>
908: <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>
909: </aug>
910: <source>Proteins</source>
911: <pubdate>2005</pubdate>
912: <volume>58</volume>
913: <fpage>22</fpage>
914: <lpage>30</lpage>
915: </bibl>
916:
917: <bibl id="B24">
918: <title><p>Human transcription factors contain a high fraction of
919: intrinsically disordered regions essential for transcriptional
920: regulation</p></title>
921: <aug>
922: <au><snm>Minezaki</snm><fnm>Y.</fnm></au>
923: <au><snm>Homma</snm><fnm>K.</fnm></au>
924: <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
925: <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
926: </aug>
927: <source>J. Mol. Biol.</source>
928: <pubdate>2006</pubdate>
929: <inpress />
930: </bibl>
931:
932: <bibl id="B25">
933: <title><p>{FORTE}: a profile-profile comparison tool for protein fold
934: recognition</p></title>
935: <aug>
936: <au><snm>Tomii</snm><fnm>K.</fnm></au>
937: <au><snm>Akiyama</snm><fnm>Y.</fnm></au>
938: </aug>
939: <source>Bioinformatics</source>
940: <pubdate>2004</pubdate>
941: <volume>20</volume>
942: <fpage>594</fpage>
943: <lpage>595</lpage>
944: </bibl>
945:
946: <bibl id="B26">
947: <title><p>A modified definition of Sov, a segment-based measure for protein
948: secondary structure prediction assessment</p></title>
949: <aug>
950: <au><snm>Zemla</snm><fnm>A</fnm></au>
951: <au><snm>Venclovas</snm><fnm>C.</fnm></au>
952: <au><snm>Fidelis</snm><fnm>K.</fnm></au>
953: <au><snm>Rost</snm><fnm>B.</fnm></au>
954: </aug>
955: <source>Proteins</source>
956: <pubdate>1999</pubdate>
957: <volume>34</volume>
958: <fpage>220</fpage>
959: <lpage>223</lpage>
960: </bibl>
961:
962: </refgrp>
963: } % end of \BMCxmlcomment
964:
965: \newpage
966: \section*{Figures}
967: \subsection*{Figure 1}
968: Average accuracy measure for given minimum number of homologs found by PSI-BLAST. From top to bottom: $Q_3$ of secondary structure predictions, $Cor$ of contact number predictions, and $Cor$ of residue-wise contact number predictions.
969:
970: \includegraphics[width=6cm]{fig1.eps}
971:
972: \newpage
973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
974: %% %%
975: %% Tables %%
976: %% %%
977: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
978:
979: \section*{Tables}
980: \subsection*{Table 1 - Summary of average prediction accuracies per chain (median in parentheses).}
981: \par
982: \mbox{
983: \begin{tabular}{lll}\hline
984: SS & $Q_3$= 80.5\% (81.6) & $SOV$= 80.0\% (81.1)\\
985: CN & $Cor$= 0.746 (0.768) & $DevA$= 0.686 (0.670) \\
986: RWCO & $Cor$= 0.613 (0.646) & $DevA$= 0.877 (0.812)\\\hline
987: \end{tabular}
988: }\\
989:
990: SS, Secondary structure prediction: $Q_3$ is the percentage of correct prediction.; $SOV$ is the segment overlap measure~\cite{SOV99}.\\
991: CN, Contact number prediction: $Cor$ is the Pearson's correlation coefficient between the predicted and native CNs; $DevA$ is the RMS error normalized by the standard deviation of the native CN \cite{KinjoETAL2005}.\\
992: RWCO, Residue-wise contact order prediction: $Cor$ and $DevA$ are defined as for
993: CN but calculated with predicted and native RWCOs.
994:
995:
996: \subsection*{Table 2: Summary of per-residue accuracies for SS predictions.}
997: \par
998: \mbox{
999: \begin{tabular}[tbh]{lrrr}\hline
1000: measure & $H$ & $E$ & $C$ \\\hline
1001: $Q_s$ & 82.7 & 69.3 & 84.0 \\
1002: $Q_s^{pre}$ & 84.4 & 78.9 & 78.3\\
1003: $MC$ & 0.754 & 0.674 & 0.645 \\\hline
1004: \end{tabular}
1005: }\\
1006:
1007: $Q_s$: The number of correctly predicted residues of the SS class $s = H, E, C$
1008: divided by the number of residues in the class in native structures.\\
1009: $Q_s^{pre}$: The number of correctly predicted residues of the SS class $s = H, E, C$
1010: divided by the number of residues predicted as the corresponding class.\\
1011: $MC$: Matthews' correlation coefficient.
1012: \end{document}
1013: