q-bio0604013/kinjo.tex
1: \documentclass[10pt]{article}    
2: 
3: 
4: \usepackage{graphicx}
5: \usepackage{cite} % Make references as [1-4], not [1,2,3,4]
6:  
7: \setlength{\topmargin}{0.0cm}
8: \setlength{\textheight}{21.5cm}
9: \setlength{\oddsidemargin}{0cm} 
10: \setlength{\textwidth}{16.5cm}
11: \setlength{\columnsep}{0.6cm}
12: 
13: \begin{document}
14: 
15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16: %%                                          %%
17: %% Enter the title of your article here     %%
18: %%                                          %%
19: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
20: 
21: \title{CRNPRED: Highly Accurate Prediction of One-dimensional Protein Structures by Large-scale Critical Random Networks}
22:  
23: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24: %%                                          %%
25: %% Enter the authors here                   %%
26: %%                                          %%
27: %% Ensure \and is entered between all but   %%
28: %% the last two authors. This will be       %%
29: %% replaced by a comma in the final article %%
30: %%                                          %%
31: %% Ensure there are no trailing spaces at   %% 
32: %% the ends of the lines                    %%     	
33: %%                                          %%
34: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
35: 
36: 
37: \author{Akira R Kinjo$^{1,2}$ and Ken Nishikawa$^{1,2}$\\
38:     $^1$Center for Information Biology and DNA Data Bank of Japan,\\
39:         National Institute of Genetics, Mishima, 411-8540, Japan\\
40:     $^2$Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), \\Mishima 411-8540, Japan
41:       }
42:       
43: \maketitle
44: 
45: \begin{abstract}
46: \textbf{Background:} 
47: One-dimensional protein structures such as secondary structures or contact 
48: numbers are useful for three-dimensional structure prediction and helpful for 
49: intuitive understanding of the sequence-structure relationship.
50: Accurate prediction methods will serve as a basis for these and other purposes.\\
51: \textbf{Results:} We implemented a program CRNPRED which predicts secondary 
52: structures, contact numbers and residue-wise contact orders. This program is 
53: based on a novel machine learning scheme called critical random 
54: networks. Unlike most conventional one-dimensional structure prediction 
55: methods which are based on local windows of an amino acid sequence, CRNPRED 
56: takes into account the whole sequence. CRNPRED achieves, on average per chain, 
57: $Q_3$ = 81\% for secondary structure prediction, and correlation coefficients 
58: of 0.75 and 0.61 for contact number and residue-wise contact order 
59: predictions, respectively.\\
60: \textbf{Conclusion:} CRNPRED will be a useful tool for computational as 
61: well as experimental biologists who need accurate one-dimensional protein 
62: structure predictions. 
63: \end{abstract}
64: 
65: 
66: 
67: \section*{Background}
68: 
69: One-dimensional (1D) structures of a protein are residue-wise 
70: quantities or symbols onto which some features of the native 
71: three-dimensional (3D) structure are projected.
72: 1D structures are of interest for several reasons. For example, predicted 
73: secondary structures, a kind of 1D structures, are often used to limit the 
74: conformational space to be searched in 3D structure prediction. 
75: Furthermore, it has recently been shown that certain sets of the native 
76: (as opposed to predicted) 1D structures of 
77: a protein contain sufficient information to recover the native 3D 
78: structure~\cite{PortoETAL2004,KinjoANDNishikawa2005}. These 1D structures are
79: either the principal eigenvector of the contact map~\cite{PortoETAL2004} or a set of secondary structures (SS), contact numbers (CN) and residue-wise contact orders (RWCO)~\cite{KinjoANDNishikawa2005}.
80: Therefore, it is possible, at least in principle, to predict the native 3D 
81: structure by first predicting the 1D structures, and then by constructing 
82: the 3D structure from these 1D structures. 1D structures are not only useful 
83: for 3D structure predictions, but also helpful for intuitive understanding 
84: of the correspondence between the protein structure and its amino acid sequence
85: due to the residue-wise characteristics of 1D structures. Therefore, accurate 
86: prediction of 1D protein structures is of fundamental biological interest. 
87: 
88: Secondary structure prediction has a long history \cite{Rost2003}. 
89: Almost all the modern predictors are based on position-specific scoring 
90: matrices (PSSM) and some kind of machine learning techniques such as neural 
91: networks or support vector machines. Currently the best predictors achieve 
92: $Q_3$ of 77--79\% \cite{Jones1999,PollastriANDMcLysaght2005}. 
93: The study of contact number prediction also started long time 
94: ago \cite{NishikawaANDOoi1980, NishikawaANDOoi1986}, but further 
95: improvements were made only recently \cite{KinjoETAL2005, Yuan2005, KinjoANDNishikawa2005c}. These recent methods are based on the ideas developed in SS 
96: predictions (i.e., PSSM and machine learning), and achieve a correlation 
97: coefficient of 0.68--0.73.
98: 
99: Recently, we have developed a new method for accurately predicting SS, CN, 
100: and RWCO based on a novel machine learning scheme, 
101: critical random networks (CRN) ~\cite{KinjoANDNishikawa2005c}. 
102: In this paper, we briefly describe the formulation of the method, and recent 
103: improvements leading to even better predictions.
104: The computer program for SS, CN, and RWCO prediction named CRNPRED has been 
105: developed for the convenience of the general user, and a web interface and 
106: source code are made available online.
107:  
108: \section*{Implementation}
109: 
110: \subsection*{Definition of 1D structures}
111: \textit{Secondary structures (SS):}
112: Secondary structures were defined by the DSSP program \cite{DSSP}.
113: For three-state SS prediction, the simple encoding scheme (the so-called CK 
114: mapping) was employed \cite{CrooksANDBrenner2004}.
115: That is, $\alpha$ helices ($H$), $\beta$ strands ($E$), and other structures
116: (``coils'') defined by DSSP were encoded as $H$, $E$, and $C$, respectively.
117: Note that we do not use the CASP-style conversion scheme (the so-called EHL 
118: mapping) in which DSSP's $H$, $G$ ($3_{10}$ helix) and $I$ ($\pi$ helix) are encoded as $H$, and DSSP's $E$ and $B$ ($\beta$ bridge) as $E$.
119: We believe the CK mapping is more natural and useful for 3D structure 
120: predictions (e.g., geometrical restraints should be different between an 
121: $\alpha$ helix and a $3_{10}$ helix).
122: For SS prediction, we introduce feature variables $(y_i^H, y_i^E, y_i^C)$ 
123: to represent each type of secondary structures at the $i$-th residue position,
124: so that $H$ is represented as $(1,-1,-1)$, $E$ as $(-1,1,-1)$, and $C$ as 
125: $(-1,-1,1)$.
126: 
127: \textit{Contact numbers (CN):}
128: Let $C_{i,j}$ represent the contact map of a protein. Usually, the contact 
129: map is defined so that $C_{i,j} = 1$ if the $i$-th and $j$-th residues are in 
130: contact by some definition, or $C_{i,j} = 0$, otherwise. As in our 
131: previous study, we slightly modify the definition using a sigmoid function. 
132: That is, 
133: \begin{equation}
134:   C_{i,j} = 1/\{1+\exp[w(r_{i,j} - d)]\}
135: \end{equation}
136: where $r_{i,j}$ is the distance between $C_{\beta}$ ($C_{\alpha}$ 
137: for glycines) atoms of the $i$-th and $j$-th residues, $d = 12$\AA{} is a 
138: cutoff distance, and $w$ is a sharpness parameter of the sigmoid function 
139: which is set to 3 \cite{KinjoETAL2005,KinjoANDNishikawa2005}. The rather 
140: generous cutoff length of 12\AA{} was shown to optimize the prediction 
141: accuracy \cite{KinjoETAL2005}. The use of the sigmoid function enables us to 
142: use the contact numbers in molecular dynamics 
143: simulations \cite{KinjoANDNishikawa2005}.
144: Using the above definition of the contact map, the contact number of the
145: $i$-th residue of a protein is defined as
146: \begin{equation}
147:   n_i = \sum_{j:|i-j|>2}C_{i,j}. \label{eq:defcn}
148: \end{equation}
149: The feature variable $y_i$ for CN is defined as $y_i = n_i / \log L$ where 
150: $L$ is the sequence length of a target protein. The normalization 
151: factor $\log L$ is introduced because we have observed that the contact 
152: number averaged over a protein chain is roughly proportional to $\log L$,
153: and thus division by this value removes the size-dependence of predicted
154: contact numbers.
155: 
156: \textit{Residue-wise contact orders (RWCO):}
157: RWCO was first introduced in \cite{KinjoANDNishikawa2005}.
158: This quantity measures the extent to which a residue makes long-range contacts
159: in a native protein structure.
160: Using the same notation as contact numbers, 
161: the RWCO of the $i$-th residue in a protein structure is defined by 
162: \begin{equation}
163:   o_i = \sum_{j:|i-j|>2}|i-j|C_{i,j}. \label{eq:defrwco}
164: \end{equation}
165: The feature variable $y_i$ for RWCO is defined as $y_i = o_i / L$ where 
166: $L$ is the sequence length. Due to the similar reason as CN, the normalization
167: factor $L$ was introduced to remove the size-dependence of the predicted
168: RWCOs (the RWCO averaged over a protein chain is roughly proportional to the 
169: chain length).
170: 
171: \subsection*{Critical random networks}
172: Here we briefly describe the critical random network (CRN) method introduced 
173: in \cite{KinjoANDNishikawa2005c} which should be referred to for the details.
174:  Unlike most conventional methods for 1D structure prediction [except for 
175: some including the bidirectional recurrent neural networks \cite{BaldiETAL1999,PollastriANDMcLysaght2005,ChenANDChaudhari2006}], the CRN method 
176: takes the whole amino acid sequence into account. In the CRN method, 
177: an $N$-dimensional state vector $\mathbf{x}_i$  is assigned to the $i$-th 
178: residue of the target sequence (we use $N = 5000$ throughout this paper). 
179: Neighboring state vectors along the sequence 
180: are connected via a random $N\times N$ orthogonal matrix $W$. This matrix is 
181: also block-diagonal with the size of blocks ranging uniformly randomly 
182: between 2 and 50. The input to the CRN is the position-specific scoring matrix 
183: (PSSM), $U = (\mathbf{u}_1, \cdots, \mathbf{u}_L)$ 
184: of the target sequence obtained by PSI-BLAST~\cite{AltschulETAL1997} ($L$ is the sequence length of the target protein). 
185: We impose that the state vectors satisfy the following equation of state:
186: \begin{equation}
187:   \label{eq:eos}
188:   \mathbf{x}_i = \tanh[\beta W (\mathbf{x}_{i-1} + \mathbf{x}_{i+1}) + \alpha V\mathbf{u}_i]
189: \end{equation}
190: for $i = 1, \cdots , L$ where $V$ is an $N\times 21$ random matrix 
191: (the 21st component of $\mathbf{u}_i$ is always set to unity), and $\beta$ and $\alpha$ are scalar parameters. The fixed boundary condition is imposed ($\mathbf{x}_0 = \mathbf{x}_{L+1} = \mathbf{0}$). By setting $\beta = 0.5$, 
192: the system of state vectors is made to be near a critical point in a certain 
193: sense, and thus the range of site-site correlation is expected to be long 
194: when $\alpha$ is sufficiently small but finite~\cite{KinjoANDNishikawa2005c}. 
195: In this way, each state vector implicitly incorporates long-range correlations.
196: The 1D structure of the $i$-th residue is predicted as 
197: a linear projection of a local window of the PSSM and the state vector obtained by solving Eq. \ref{eq:eos}: 
198: \begin{equation}
199:   \label{eq:pred}
200:   y_i = \sum_{m=-M}^{M}\sum_{a=1}^{21}D_{m,a}u_{a,i+m} + \sum_{k=1}^{N}E_{k}x_{k,i}
201: \end{equation}
202: where $y_i$ is the predicted quantity, and $D_{m,a}$ and $E_k$ are the 
203: regression parameters. In the first summation, each PSSM column is extended to 
204: include the ``terminal'' residue. 
205: Since Eq. \ref{eq:pred} is a simple linear equation once the equation of 
206: state (Eq. \ref{eq:eos}) has been solved, learning the parameters $D_{m,a}$ and
207:  $E_{k}$ reduces to an ordinary linear regression problem.
208: For SS prediction, the triple $(y^{H}_i, y^{E}_i, y^{C}_i)$ is 
209: calculated simultaneously, and the SS class is predicted as 
210: $\mathrm{arg}\max_{s\in \{H, E, C\}}y^{s}_i$.  For the CN and RWCO prediction,
211: real values are predicted (2-state prediction is also made for CN using 
212: the average CN for each residue type as the threshold for ``exposed'' 
213: or ``buried'' as in \cite{PollastriETAL2002}).
214: The half window size $M$ is set to 9 for SS and CN predictions, and to 26 for 
215: RWCO. 
216: 
217: \subsection*{Ensemble prediction}
218: Since the CRN-based prediction is parametrized by the random matrices $W$ 
219: and $V$,
220: slightly different predictions are obtained for different pairs of $W$ and $V$. 
221: We can improve the prediction by taking the average over an ensemble of 
222: such different predictions. 20 CRN-based predictors were constructed using 
223: 20 sets of different random matrices $W$ and $V$. CN and RWCO are predicted 
224: as uniform averages of these 20 predictions. 
225: 
226: For SS prediction, we employ further training. Let $s_{i}^{t,n}$ be the 
227: prediction results of the $n$-th predictor for 1D structure $t$ 
228: ($H$, $E$, $C$, CN, and RWCO) of the $i$-th residue.
229: The second stage SS prediction is made by the following linear scheme:
230: \begin{equation}
231:   \label{eq:ss2}
232:   y_{i}^{ss} = \sum_{n=1}^{20}\sum_{t}\sum_{m=-3}^{3}w_{n,t,m}s_{i+m}^{t,n}
233: \end{equation}
234: where $ss = H, E, C$, and $w_{n,t,m}$ is the weight obtained from a training 
235: set. Finally, the feature variable for each SS class of the 
236: $i$-th residue is obtained by $(y_{i-1}^{ss} + 2y_{i}^{ss} + y_{i+1}^{ss})/4$. 
237: This last procedure was found particularly effective for improving the 
238: segment overlap (SOV) measure.
239: 
240: \subsection*{Additional input}
241: Another improvement is the addition of the amino acid composition of 
242: the target sequence to the predictor \cite{Yuan2005}:
243: The term $\sum_{a=1}^{20}F_af_a$ was added to Eq. \ref{eq:pred} where $F_a$
244: is a regression parameter, and $f_a$ is the fraction of the amino acid 
245: type $a$.
246: 
247: \subsection*{Training and test data set}
248: We carried out a 15-fold cross-validation test following exactly the same 
249: procedure and the same data set as the previous 
250: study \cite{KinjoANDNishikawa2005c}. In the data set, there are 680 protein 
251: domains, each of which represents a superfamily according to the SCOP 
252: database (version 1.65) \cite{SCOP}. This data set was randomly divided so 
253: that 630 domains were used for training and the remaining 50 domains for 
254: testing, and the random division was repeated 15 times. 
255: No pair of these domains belong to the same superfamily, and hence they are 
256: not expected to be homologous. Thus, the present benchmark is a very 
257: stringent one.
258: 
259: For obtaining PSSMs by running PSI-BLAST, we use the UniRef100 
260: (version 6.8) amino acid sequence database \cite{UniProt} containing some
261: 3 million entries.
262: Also the number of iterations in PSI-BLAST homology searches was reduced 
263: to 3 times from 10 used in the previous study. This especially increased the 
264: accuracy of SS predictions. 
265: These results are consistent with the study of \cite{PrzybylskiANDRost2002}.
266: 
267: \subsection*{Numerics}
268: One drawback of the CRN method is the computational time required for
269: numerically solving the equation of state (Eq. \ref{eq:eos}).
270: For that purpose, instead of the Gauss-Seidel-like 
271: method previously used, we implemented a successive over-relaxation 
272: method which was found to be much more efficient.
273: 
274: Let $\nu$ denote the stage of iteration.
275: We set the initial value of the state vectors (with $\nu = 0$) as 
276: \begin{equation}
277:   \mathbf{x}_{i}^{(0)} = \tanh [\alpha V \mathbf{u}_{i}].\label{eq:init_eos}
278: \end{equation}
279: Then, for $i = 1, \cdots , L$ (in increasing order of $i$), we update 
280: the state vectors by
281: \begin{eqnarray}
282:   \mathbf{x}_{i}^{(2\nu+1)} \gets & \mathbf{x}_{i}^{(2\nu)} + \omega
283: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)}\nonumber\\
284: & +\mathbf{x}_{i+1}^{(2\nu)}) 
285: + \alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu)}\}.
286: \label{eq:feos}
287: \end{eqnarray}
288: Next, we update them in the reverse order. That is, for $i = L, \cdots , 1$ 
289: (in decreasing order of $i$), 
290: \begin{eqnarray}
291:   \mathbf{x}_{i}^{(2\nu+2)}  \gets & \mathbf{x}_{i}^{(2\nu+1)} + \omega 
292: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)} \nonumber\\
293: & + \mathbf{x}_{i+1}^{(2\nu+2)}) 
294: +\alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu+1)}\}.
295: \label{eq:beos}
296: \end{eqnarray}
297: We then set $\nu \gets \nu + 1$, and iterate Eqs. (\ref{eq:feos}) and (\ref{eq:beos}) until $\{\mathbf{x}_{i}\}$ converges. The acceleration parameter of $\omega = 1.4$ was found effective. 
298: The convergence criterion is 
299: \begin{equation}
300: \sqrt{\sum_{i=1}^{L}||\mathbf{x}_{i}^{(2\nu+2)}-\mathbf{x}_{i}^{(2\nu+1)}||_{\mathbf{R}^{N}}^{2}/{NL}}<10^{-3}
301: \end{equation}
302: where $||\cdot||_{\mathbf{R}^{N}}$ denotes the Euclidean norm.
303: This criterion is much less stringent than previous study ($10^{-7}$), but this
304: does not affect the prediction accuracy significantly.
305: Convergence is typically achieved within 10 to 12 iterations for one protein.
306: 
307: 
308: \section*{Results and Discussion}
309: There are two main ingredients for the improved one-dimensional protein 
310: structure prediction in the present study. First is the use of large-scale 
311: critical random networks of 5000 dimension and 20 ensemble predictors. 
312: Second is the use of a large sequence database (UniRef100) for PSI-BLAST 
313: searches.
314: As demonstrated in Table~1, the CRN method achieves remarkably 
315: accurate predictions.
316: In comparison with the previous study \cite{KinjoANDNishikawa2005c} based on
317: 2000-dimensional CRNs (10 ensemble predictors), 
318: the $Q_3$ and $SOV$ measures in SS predictions improved from 77.8\% and 77.3\% 
319: to 80.5\% and 80.0\%, respectively. Similarly, the average correlation 
320: coefficient improved from 0.726 to 0.746 for CN predictions, 
321: and from 0.601 to 0.613 for RWCO predictions. The 2-state predictions for 
322: CN yields, on average, $Q_2$ = 76.8\% per chain and 76.7\% per residue, and 
323: Matthews' correlation coefficient of 0.533.
324: 
325: A closer examination of the SS prediction results (Table 2) 
326: reveals the drastic improvement of $\beta$ strand prediction from $Q_E$ 
327: = 61.9\% to 69.3\% (per residue). Although the values of $Q_C$ and $Q_E^{pre}$
328: are slightly lower than in the previous study by 0.6--1.0\%, the accuracies of
329: other classes have improved by 2.5--4\%.
330: 
331: CRNPRED compares favorably with other secondary structure prediction methods.
332: The widely used PSIPRED program \cite{Jones1999,PSIPRED} which is based on conventional 
333: feed-forward neural networks achieves $Q_3$ of 78\%. 
334: A more recently developed method, Porter, \cite{PollastriANDMcLysaght2005}
335: which is based on bidirectional recurrent neural networks achieves $Q_3$ of 
336: 79\%. An even more intricate method based on bidirectional segmented-memory 
337: recurrent neural networks \cite{ChenANDChaudhari2006} shows an accuracy 
338: of $Q_3$ = 73\% (this rather low accuracy may be attributed to the small size 
339: of training set used). However, it should be reminded that these studies are 
340: based on different data sets for both training and testing as well as the 
341: definition of 
342: secondary structural categories. Therefore, these comparisons may not be 
343: very informative, but only give a rough estimation of relative performance. 
344: 
345: Regarding the contact number prediction, CRNPRED, achieving $Cor$ = 0.75, 
346: is the most accurate method available today. The simple linear method \cite{KinjoETAL2005} with multiple 
347: sequence alignment derived from the HSSP database \cite{HSSP} showed a 
348: correlation coefficient of 0.63. A more advanced method based on support vector machines (local window-based) achieves a correlation of 0.68 per chain\cite{Yuan2005}.
349: 
350: It is known that the number of homologs found by the PSI-BLAST searches 
351: significantly affects the prediction accuracies \cite{PrzybylskiANDRost2002}. 
352: We have examined this effect by plotting the accuracy measures for a 
353: given minimum number of homologs found by PSI-BLAST (Fig. 1).
354: For example, we see in Fig. 1 that, for those proteins with 
355: more than 100 homologs, the average $Q_3$ for SS predictions is 82.2\%.  
356: The effect of the number of homologs significantly depends on the type of 
357: 1D structure. For SS prediction, $Q_3$ steadily increases as the number of 
358: homologs increases up to 100, but it stays in the range between 82.0 and 82.4
359: until the minimum number of homologs reaches around 400, and then it starts to
360: decrease. For CN prediction, $Cor$ also increases steadily but more slowly, 
361: and it does not degrade when the minimum number of homologs reaches 500. 
362: This tendency implies that CN is more conservative than SS during protein 
363: evolution, which is consistent with previous observations \cite{KinjoANDNishikawa2004,BastollaETAL2005}. On the contrary, RWCO exhibits a peculiar behavior. 
364: The value of $Cor$ reaches its peak at the minimum number of homologs of 80 
365: beyond which the value rapidly decreases. This indicates that RWCO is not 
366: evolutionarily well conserved. It was observed that the accuracies of SS and 
367: CN predictions constantly increased when the dimension of CRNs was increased 
368: from 2000 to 5000, but such was not the case for RWCO (data not shown). 
369: RWCO seems to be such delicate a quantity that it is very difficult to extract 
370: relevant information from the amino acid sequence.
371: 
372: Finally, we note on practical applicability of predicted 1D 
373: structures. We do not believe, at present, that the construction of
374: a 3D structure purely from the predicted 1D structures is practical, 
375: if possible at all, because of the limited accuracy of the RWCO prediction.
376: However, SS and CN predictions are very accurate for many proteins 
377: so that they may already serve as valuable restraints for 3D structure 
378: predictions. Also, SS and CN predictions may be applied to domain 
379: identification often necessary for experimental determination of protein 
380: structures. CRNPRED has been proved useful for such a purpose \cite{MinezakiETAL2006}.
381: Although of the limited accuracy, predicted RWCOs still exhibit significant 
382: correlations with the correct values. Since RWCOs reflect the extent to which
383: a residue is involved in long-range contacts, predicted RWCOs may be 
384: useful for enumerating potentially structurally important residues. 
385: 
386: An interesting alternative application of the CRN framework is to regard the 
387: solution of the equation of state (Eq. \ref{eq:eos}) as an extended sequence 
388: profile. By so doing, it is straightforward to apply the solution to the 
389: profile-profile comparison for fold recognition \cite{TomiiANDAkiyama2004}. 
390: Such an application may be also pursued in the future.
391: 
392: \section*{Availability and Requirements}
393: 
394: \begin{description}
395: \item[Project name:] CRNPRED
396: \item[Project home page:] ~\\http://bioinformatics.org/crnpred/
397: \item[Operating system:] UNIX-like OS (including Linux and Mac OS X).
398: \item[Programming language:] C.
399: \item[Other requirements:] zsh, PSI-BLAST (blastpgp), The UniRef100 amino acid sequence database.
400: \item[License:] Public domain.
401: \item[Any restrictions to use by non-academics:] None.
402: \end{description}
403: 
404: \section*{List of Abbreviations Used}
405: CRN, critical random network; SS, secondary structure; CN, contact number; 
406: RWCO, residue-wise contact order; 1D, one-dimensional; 3D, three-dimensional.
407: 
408: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
409: \section*{Authors contributions}
410: A. R. K. designed and implemented the method, carried out benchmarks, wrote 
411: the first draft of the manuscript. A. R. K. and K. N. analyzed the results and 
412: improved the manuscript.
413:     
414: 
415: %%%%%%%%%%%%%%%%%%%%%%%%%%%
416: \section*{Acknowledgements}
417: We thank Yasumasa Shigemoto for helping construct the CRNPRED web interface.
418: This work was supported in part by the MEXT, Japan.
419: 
420: 
421:  
422: %\bibliographystyle{bmc_article}  % Style BST file
423: %  \bibliography{refs,mypaper} 
424: %% BioMed_Central_Bib_Style_v1.01
425: 
426: \begin{thebibliography}{10}
427: \providecommand{\url}[1]{[#1]}
428: \providecommand{\urlprefix}{}
429: 
430: \bibitem{PortoETAL2004}
431: Porto M, Bastolla U, Roman HE, Vendruscolo M: \textbf{Reconstruction of protein
432:   structures from a vectorial representation}. \emph{Phys. Rev. Lett.} 2004,
433:   \textbf{92}:218101.
434: 
435: \bibitem{KinjoANDNishikawa2005}
436: Kinjo AR, Nishikawa K: \textbf{Recoverable one-dimensional encoding of protein
437:   three-dimensional structures}. \emph{Bioinformatics} 2005,
438:   \textbf{21}:2167--2170. [Doi:10.1093/bioinformatics/bti330].
439: 
440: \bibitem{Rost2003}
441: Rost B: \textbf{Prediction in {1D}: secondary structure, membrane helices, and
442:   accessibility}. In \emph{Structural Bioinformatics}. Edited by Bourne PE,
443:   Weissig H, Hoboken, U.S.A.: Wiley-Liss, Inc. 2003:559--587.
444: 
445: \bibitem{Jones1999}
446: Jones DT: \textbf{Protein secondary structure prediction based on
447:   position-specific scoring matrices}. \emph{J. Mol. Biol.} 1999,
448:   \textbf{292}:195--202.
449: 
450: \bibitem{PollastriANDMcLysaght2005}
451: Pollastri G, {McLysaght} A: \textbf{Porter: a new, accurate server for protein
452:   secondary structure prediction}. \emph{Bioinformatics} 2005,
453:   \textbf{21}:1719--1720.
454: 
455: \bibitem{NishikawaANDOoi1980}
456: Nishikawa K, Ooi T: \textbf{Prediction of the surface-interior diagram of
457:   globular proteins by an empirical method}. \emph{Int. J. Peptide Protein
458:   Res.} 1980, \textbf{16}:19--32.
459: 
460: \bibitem{NishikawaANDOoi1986}
461: Nishikawa K, Ooi T: \textbf{Radial locations of amino acid residues in a
462:   globular protein: Correlation with the sequence}. \emph{J. Biochem.} 1986,
463:   \textbf{100}:1043--1047.
464: 
465: \bibitem{KinjoETAL2005}
466: Kinjo AR, Horimoto K, Nishikawa K: \textbf{Predicting absolute contact numbers
467:   of native protein structure from amino acid sequence}. \emph{Proteins} 2005,
468:   \textbf{58}:158--165. [Doi:10.1002/prot.20300].
469: 
470: \bibitem{Yuan2005}
471: Yuan Z: \textbf{Better prediction of protein contact number using a support
472:   vector regression analysis of amino acid sequence}. \emph{BMC Bioinformatics}
473:   2005, \textbf{6}:248.
474: 
475: \bibitem{KinjoANDNishikawa2005c}
476: Kinjo AR, Nishikawa K: \textbf{Predicting secondary structures, contact
477:   numbers, and residue-wise contact orders of native protein structure from
478:   amino acid sequence using critical random networks}. \emph{BIOPHYSICS} 2005,
479:   \textbf{1}:67--74. [Doi:10.2142/biophysics.1.67].
480: 
481: \bibitem{DSSP}
482: Kabsch W, Sander C: \textbf{Dictionary of Protein Secondary Structure: Pattern
483:   recognition of hydrogen bonded and geometrical features}. \emph{Biopolymers}
484:   1983, \textbf{22}:2577--2637.
485: 
486: \bibitem{CrooksANDBrenner2004}
487: Crooks GE, Brenner SE: \textbf{Protein secondary structure: entropy,
488:   correlations and prediction}. \emph{Bioinformatics} 2004,
489:   \textbf{20}:1603--1611.
490: 
491: \bibitem{BaldiETAL1999}
492: Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G: \textbf{Exploiting the past
493:   and the future in protein secondary structure prediction}.
494:   \emph{Bioinformatics} 1999, \textbf{15}:937--946.
495: 
496: \bibitem{ChenANDChaudhari2006}
497: Chen J, Chaudhari NS: \textbf{Bidirectional segmented-memory recurrent neural
498:   network for protein secondary structure prediction}. \emph{Soft Computing}
499:   2006, \textbf{10}:315--324.
500: 
501: \bibitem{AltschulETAL1997}
502: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DL:
503:   \textbf{Gapped Blast and {PSI}-Blast: A new generation of protein database
504:   search programs}. \emph{Nucleic Acids Res.} 1997, \textbf{25}:3389--3402.
505: 
506: \bibitem{PollastriETAL2002}
507: Pollastri G, Baldi P, Fariselli P, Casadio R: \textbf{Prediction of
508:   coordination number and relative solvent accessibility in proteins}.
509:   \emph{Proteins} 2002, \textbf{47}:142--153.
510: 
511: \bibitem{SCOP}
512: Murzin AG, Brenner SE, Hubbard T, Chothia C: \textbf{{SCOP}: A structural
513:   classification of proteins database for the investigation of sequences and
514:   structures}. \emph{J. Mol. Biol.} 1995, \textbf{247}:536--540.
515: 
516: \bibitem{UniProt}
517: Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E,
518:   Huang H, Lopez R, Magrane M, Martin MJ, Natale D, {O'Donovan} C, Redaschi N,
519:   Yeh LS: \textbf{The universal protein resource ({UniProt})}. \emph{Nucleic
520:   Acids Res.} 2005, \textbf{33}:D154--D159.
521: 
522: \bibitem{PrzybylskiANDRost2002}
523: Przybylski D, Rost B: \textbf{Alignments grow, secondary structure prediction
524:   improves}. \emph{Proteins} 2002, \textbf{46}:197--205.
525: 
526: \bibitem{PSIPRED}
527: McGuffin LJ, Bryson K, Jones DT: \textbf{The PSIPRED protein structure
528:   prediction server}. \emph{Bioinformatics} 2000, \textbf{16}:404--405.
529: 
530: \bibitem{HSSP}
531: Sander C, Schneider R: \textbf{Database of homology-derived protein
532:   structures}. \emph{Proteins} 1991, \textbf{9}:56--68.
533: 
534: \bibitem{KinjoANDNishikawa2004}
535: Kinjo AR, Nishikawa K: \textbf{Eigenvalue analysis of amino acid substitution
536:   matrices reveals a sharp transition of the mode of sequence conservation in
537:   proteins}. \emph{Bioinformatics} 2004, \textbf{20}:2504--2508.
538: 
539: \bibitem{BastollaETAL2005}
540: Bastolla U, Porto M, Roman HE, Vendruscolo M: \textbf{Principal eigenvector of
541:   contact matrices and hydrophobicity profiles in proteins}. \emph{Proteins}
542:   2005, \textbf{58}:22--30.
543: 
544: \bibitem{MinezakiETAL2006}
545: Minezaki Y, Homma K, Kinjo AR, Nishikawa K: \textbf{Human transcription factors
546:   contain a high fraction of intrinsically disordered regions essential for
547:   transcriptional regulation}. \emph{J. Mol. Biol.} 2006.  in press.
548: 
549: \bibitem{TomiiANDAkiyama2004}
550: Tomii K, Akiyama Y: \textbf{{FORTE}: a profile-profile comparison tool for
551:   protein fold recognition}. \emph{Bioinformatics} 2004, \textbf{20}:594--595.
552: 
553: \bibitem{SOV99}
554: Zemla A, Venclovas C, Fidelis K, Rost B: \textbf{A modified definition of Sov,
555:   a segment-based measure for protein secondary structure prediction
556:   assessment}. \emph{Proteins} 1999, \textbf{34}:220--223.
557: 
558: \end{thebibliography}
559: 
560: \newcommand{\BMCxmlcomment}[1]{}
561: 
562: \BMCxmlcomment{
563: 
564: <refgrp>
565: 
566: <bibl id="B1">
567:   <title><p>Reconstruction of protein structures from a vectorial
568:   representation</p></title>
569:   <aug>
570:     <au><snm>Porto</snm><fnm>M.</fnm></au>
571:     <au><snm>Bastolla</snm><fnm>U.</fnm></au>
572:     <au><snm>Roman</snm><fnm>H. E.</fnm></au>
573:     <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>
574:   </aug>
575:   <source>Phys. Rev. Lett.</source>
576:   <pubdate>2004</pubdate>
577:   <volume>92</volume>
578:   <fpage>218101</fpage>
579: </bibl>
580: 
581: <bibl id="B2">
582:   <title><p>Recoverable one-dimensional encoding of protein three-dimensional
583:   structures</p></title>
584:   <aug>
585:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
586:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
587:   </aug>
588:   <source>Bioinformatics</source>
589:   <pubdate>2005</pubdate>
590:   <volume>21</volume>
591:   <fpage>2167</fpage>
592:   <lpage>2170</lpage>
593:   <note>doi:10.1093/bioinformatics/bti330</note>
594: </bibl>
595: 
596: <bibl id="B3">
597:   <title><p>Prediction in {1D}: secondary structure, membrane helices, and
598:   accessibility</p></title>
599:   <aug>
600:     <au><snm>Rost</snm><fnm>B.</fnm></au>
601:   </aug>
602:   <source>Structural Bioinformatics</source>
603:   <publisher>Hoboken, U.S.A.: Wiley-Liss, Inc.</publisher>
604:   <editor>Bourne, P. E. and Weissig, H.</editor>
605:   <section><title><p>28</p></title></section>
606:   <pubdate>2003</pubdate>
607:   <fpage>559</fpage>
608:   <lpage>587</lpage>
609: </bibl>
610: 
611: <bibl id="B4">
612:   <title><p>Protein secondary structure prediction based on position-specific
613:   scoring matrices</p></title>
614:   <aug>
615:     <au><snm>Jones</snm><fnm>D. T.</fnm></au>
616:   </aug>
617:   <source>J. Mol. Biol.</source>
618:   <pubdate>1999</pubdate>
619:   <volume>292</volume>
620:   <fpage>195</fpage>
621:   <lpage>202</lpage>
622: </bibl>
623: 
624: <bibl id="B5">
625:   <title><p>Porter: a new, accurate server for protein secondary structure
626:   prediction</p></title>
627:   <aug>
628:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>
629:     <au><snm>{McLysaght}</snm><fnm>A.</fnm></au>
630:   </aug>
631:   <source>Bioinformatics</source>
632:   <pubdate>2005</pubdate>
633:   <volume>21</volume>
634:   <fpage>1719</fpage>
635:   <lpage>-1720</lpage>
636: </bibl>
637: 
638: <bibl id="B6">
639:   <title><p>Prediction of the surface-interior diagram of globular proteins by
640:   an empirical method</p></title>
641:   <aug>
642:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
643:     <au><snm>Ooi</snm><fnm>T.</fnm></au>
644:   </aug>
645:   <source>Int. J. Peptide Protein Res.</source>
646:   <pubdate>1980</pubdate>
647:   <volume>16</volume>
648:   <fpage>19</fpage>
649:   <lpage>32</lpage>
650: </bibl>
651: 
652: <bibl id="B7">
653:   <title><p>Radial locations of amino acid residues in a globular protein:
654:   Correlation with the sequence</p></title>
655:   <aug>
656:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
657:     <au><snm>Ooi</snm><fnm>T.</fnm></au>
658:   </aug>
659:   <source>J. Biochem.</source>
660:   <pubdate>1986</pubdate>
661:   <volume>100</volume>
662:   <fpage>1043</fpage>
663:   <lpage>1047</lpage>
664: </bibl>
665: 
666: <bibl id="B8">
667:   <title><p>Predicting absolute contact numbers of native protein structure
668:   from amino acid sequence</p></title>
669:   <aug>
670:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
671:     <au><snm>Horimoto</snm><fnm>K.</fnm></au>
672:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
673:   </aug>
674:   <source>Proteins</source>
675:   <pubdate>2005</pubdate>
676:   <volume>58</volume>
677:   <fpage>158</fpage>
678:   <lpage>165</lpage>
679:   <note>doi:10.1002/prot.20300</note>
680: </bibl>
681: 
682: <bibl id="B9">
683:   <title><p>Better prediction of protein contact number using a support vector
684:   regression analysis of amino acid sequence</p></title>
685:   <aug>
686:     <au><snm>Yuan</snm><fnm>Z.</fnm></au>
687:   </aug>
688:   <source>BMC Bioinformatics</source>
689:   <pubdate>2005</pubdate>
690:   <volume>6</volume>
691:   <fpage>248</fpage>
692: </bibl>
693: 
694: <bibl id="B10">
695:   <title><p>Predicting secondary structures, contact numbers, and residue-wise
696:   contact orders of native protein structure from amino acid sequence using
697:   critical random networks</p></title>
698:   <aug>
699:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
700:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
701:   </aug>
702:   <source>BIOPHYSICS</source>
703:   <pubdate>2005</pubdate>
704:   <volume>1</volume>
705:   <fpage>67</fpage>
706:   <lpage>74</lpage>
707:   <note>doi:10.2142/biophysics.1.67</note>
708: </bibl>
709: 
710: <bibl id="B11">
711:   <title><p>Dictionary of Protein Secondary Structure: Pattern recognition of
712:   hydrogen bonded and geometrical features</p></title>
713:   <aug>
714:     <au><snm>Kabsch</snm><fnm>W.</fnm></au>
715:     <au><snm>Sander</snm><fnm>C.</fnm></au>
716:   </aug>
717:   <source>Biopolymers</source>
718:   <pubdate>1983</pubdate>
719:   <volume>22</volume>
720:   <fpage>2577</fpage>
721:   <lpage>2637</lpage>
722: </bibl>
723: 
724: <bibl id="B12">
725:   <title><p>Protein secondary structure: entropy, correlations and
726:   prediction</p></title>
727:   <aug>
728:     <au><snm>Crooks</snm><fnm>G. E.</fnm></au>
729:     <au><snm>Brenner</snm><fnm>S. E.</fnm></au>
730:   </aug>
731:   <source>Bioinformatics</source>
732:   <pubdate>2004</pubdate>
733:   <volume>20</volume>
734:   <fpage>1603</fpage>
735:   <lpage>1611</lpage>
736: </bibl>
737: 
738: <bibl id="B13">
739:   <title><p>Exploiting the past and the future in protein secondary structure
740:   prediction</p></title>
741:   <aug>
742:     <au><snm>Baldi</snm><fnm>P.</fnm></au>
743:     <au><snm>Brunak</snm><fnm>S.</fnm></au>
744:     <au><snm>Frasconi</snm><fnm>P.</fnm></au>
745:     <au><snm>Soda</snm><fnm>G.</fnm></au>
746:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>
747:   </aug>
748:   <source>Bioinformatics</source>
749:   <pubdate>1999</pubdate>
750:   <volume>15</volume>
751:   <fpage>937</fpage>
752:   <lpage>946</lpage>
753: </bibl>
754: 
755: <bibl id="B14">
756:   <title><p>Bidirectional segmented-memory recurrent neural network for protein
757:   secondary structure prediction</p></title>
758:   <aug>
759:     <au><snm>Chen</snm><fnm>J.</fnm></au>
760:     <au><snm>Chaudhari</snm><fnm>N. S.</fnm></au>
761:   </aug>
762:   <source>Soft Computing</source>
763:   <pubdate>2006</pubdate>
764:   <volume>10</volume>
765:   <fpage>315</fpage>
766:   <lpage>324</lpage>
767: </bibl>
768: 
769: <bibl id="B15">
770:   <title><p>Gapped Blast and {PSI}-Blast: A new generation of protein database
771:   search programs</p></title>
772:   <aug>
773:     <au><snm>Altschul</snm><fnm>S. F.</fnm></au>
774:     <au><snm>Madden</snm><fnm>T. L.</fnm></au>
775:     <au><snm>Schaffer</snm><fnm>A. A.</fnm></au>
776:     <au><snm>Zhang</snm><fnm>J.</fnm></au>
777:     <au><snm>Zhang</snm><fnm>Z.</fnm></au>
778:     <au><snm>Miller</snm><fnm>W.</fnm></au>
779:     <au><snm>Lipman</snm><fnm>D. L.</fnm></au>
780:   </aug>
781:   <source>Nucleic Acids Res.</source>
782:   <pubdate>1997</pubdate>
783:   <volume>25</volume>
784:   <fpage>3389</fpage>
785:   <lpage>3402</lpage>
786: </bibl>
787: 
788: <bibl id="B16">
789:   <title><p>Prediction of coordination number and relative solvent
790:   accessibility in proteins</p></title>
791:   <aug>
792:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>
793:     <au><snm>Baldi</snm><fnm>P.</fnm></au>
794:     <au><snm>Fariselli</snm><fnm>P.</fnm></au>
795:     <au><snm>Casadio</snm><fnm>R.</fnm></au>
796:   </aug>
797:   <source>Proteins</source>
798:   <pubdate>2002</pubdate>
799:   <volume>47</volume>
800:   <fpage>142</fpage>
801:   <lpage>153</lpage>
802: </bibl>
803: 
804: <bibl id="B17">
805:   <title><p>{SCOP}: A structural classification of proteins database for the
806:   investigation of sequences and structures</p></title>
807:   <aug>
808:     <au><snm>Murzin</snm><fnm>A. G.</fnm></au>
809:     <au><snm>Brenner</snm><fnm>S. E.</fnm></au>
810:     <au><snm>Hubbard</snm><fnm>T.</fnm></au>
811:     <au><snm>Chothia</snm><fnm>C.</fnm></au>
812:   </aug>
813:   <source>J. Mol. Biol.</source>
814:   <pubdate>1995</pubdate>
815:   <volume>247</volume>
816:   <fpage>536</fpage>
817:   <lpage>540</lpage>
818: </bibl>
819: 
820: <bibl id="B18">
821:   <title><p>The universal protein resource ({UniProt})</p></title>
822:   <aug>
823:     <au><snm>Bairoch</snm><fnm>A.</fnm></au>
824:     <au><snm>Apweiler</snm><fnm>R.</fnm></au>
825:     <au><snm>Wu</snm><fnm>C. H.</fnm></au>
826:     <au><snm>Barker</snm><fnm>W. C.</fnm></au>
827:     <au><snm>Boeckmann</snm><fnm>B.</fnm></au>
828:     <au><snm>Ferro</snm><fnm>S.</fnm></au>
829:     <au><snm>Gasteiger</snm><fnm>E.</fnm></au>
830:     <au><snm>Huang</snm><fnm>H.</fnm></au>
831:     <au><snm>Lopez</snm><fnm>R.</fnm></au>
832:     <au><snm>Magrane</snm><fnm>M.</fnm></au>
833:     <au><snm>Martin</snm><fnm>M. J.</fnm></au>
834:     <au><snm>Natale</snm><fnm>D.A.</fnm></au>
835:     <au><snm>{O'Donovan}</snm><fnm>C.</fnm></au>
836:     <au><snm>Redaschi</snm><fnm>N.</fnm></au>
837:     <au><snm>Yeh</snm><fnm>L. S.</fnm></au>
838:   </aug>
839:   <source>Nucleic Acids Res.</source>
840:   <pubdate>2005</pubdate>
841:   <volume>33</volume>
842:   <fpage>D154</fpage>
843:   <lpage>D159</lpage>
844: </bibl>
845: 
846: <bibl id="B19">
847:   <title><p>Alignments grow, secondary structure prediction
848:   improves</p></title>
849:   <aug>
850:     <au><snm>Przybylski</snm><fnm>D.</fnm></au>
851:     <au><snm>Rost</snm><fnm>B.</fnm></au>
852:   </aug>
853:   <source>Proteins</source>
854:   <pubdate>2002</pubdate>
855:   <volume>46</volume>
856:   <fpage>197</fpage>
857:   <lpage>205</lpage>
858: </bibl>
859: 
860: <bibl id="B20">
861:   <title><p>The PSIPRED protein structure prediction server</p></title>
862:   <aug>
863:     <au><snm>McGuffin</snm><fnm>L. J.</fnm></au>
864:     <au><snm>Bryson</snm><fnm>K.</fnm></au>
865:     <au><snm>Jones</snm><fnm>D. T.</fnm></au>
866:   </aug>
867:   <source>Bioinformatics</source>
868:   <pubdate>2000</pubdate>
869:   <volume>16</volume>
870:   <fpage>404</fpage>
871:   <lpage>405</lpage>
872: </bibl>
873: 
874: <bibl id="B21">
875:   <title><p>Database of homology-derived protein structures</p></title>
876:   <aug>
877:     <au><snm>Sander</snm><fnm>C.</fnm></au>
878:     <au><snm>Schneider</snm><fnm>R.</fnm></au>
879:   </aug>
880:   <source>Proteins</source>
881:   <pubdate>1991</pubdate>
882:   <volume>9</volume>
883:   <fpage>56</fpage>
884:   <lpage>68</lpage>
885: </bibl>
886: 
887: <bibl id="B22">
888:   <title><p>Eigenvalue analysis of amino acid substitution matrices reveals a
889:   sharp transition of the mode of sequence conservation in proteins</p></title>
890:   <aug>
891:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
892:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
893:   </aug>
894:   <source>Bioinformatics</source>
895:   <pubdate>2004</pubdate>
896:   <volume>20</volume>
897:   <fpage>2504</fpage>
898:   <lpage>2508</lpage>
899: </bibl>
900: 
901: <bibl id="B23">
902:   <title><p>Principal eigenvector of contact matrices and hydrophobicity
903:   profiles in proteins</p></title>
904:   <aug>
905:     <au><snm>Bastolla</snm><fnm>U.</fnm></au>
906:     <au><snm>Porto</snm><fnm>M.</fnm></au>
907:     <au><snm>Roman</snm><fnm>H. E.</fnm></au>
908:     <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>
909:   </aug>
910:   <source>Proteins</source>
911:   <pubdate>2005</pubdate>
912:   <volume>58</volume>
913:   <fpage>22</fpage>
914:   <lpage>30</lpage>
915: </bibl>
916: 
917: <bibl id="B24">
918:   <title><p>Human transcription factors contain a high fraction of
919:   intrinsically disordered regions essential for transcriptional
920:   regulation</p></title>
921:   <aug>
922:     <au><snm>Minezaki</snm><fnm>Y.</fnm></au>
923:     <au><snm>Homma</snm><fnm>K.</fnm></au>
924:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>
925:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>
926:   </aug>
927:   <source>J. Mol. Biol.</source>
928:   <pubdate>2006</pubdate>
929:   <inpress />
930: </bibl>
931: 
932: <bibl id="B25">
933:   <title><p>{FORTE}: a profile-profile comparison tool for protein fold
934:   recognition</p></title>
935:   <aug>
936:     <au><snm>Tomii</snm><fnm>K.</fnm></au>
937:     <au><snm>Akiyama</snm><fnm>Y.</fnm></au>
938:   </aug>
939:   <source>Bioinformatics</source>
940:   <pubdate>2004</pubdate>
941:   <volume>20</volume>
942:   <fpage>594</fpage>
943:   <lpage>595</lpage>
944: </bibl>
945: 
946: <bibl id="B26">
947:   <title><p>A modified definition of Sov, a segment-based measure for protein
948:   secondary structure prediction assessment</p></title>
949:   <aug>
950:     <au><snm>Zemla</snm><fnm>A</fnm></au>
951:     <au><snm>Venclovas</snm><fnm>C.</fnm></au>
952:     <au><snm>Fidelis</snm><fnm>K.</fnm></au>
953:     <au><snm>Rost</snm><fnm>B.</fnm></au>
954:   </aug>
955:   <source>Proteins</source>
956:   <pubdate>1999</pubdate>
957:   <volume>34</volume>
958:   <fpage>220</fpage>
959:   <lpage>223</lpage>
960: </bibl>
961: 
962: </refgrp>
963: } % end of \BMCxmlcomment
964: 
965: \newpage
966: \section*{Figures}
967:   \subsection*{Figure 1}
968: Average accuracy measure for given minimum number of homologs found by PSI-BLAST. From top to bottom: $Q_3$ of secondary structure predictions, $Cor$ of contact number predictions, and $Cor$ of residue-wise contact number predictions.
969: 
970: \includegraphics[width=6cm]{fig1.eps}
971: 
972: \newpage
973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
974: %%                               %%
975: %% Tables                        %%
976: %%                               %%
977: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
978: 
979: \section*{Tables}
980:   \subsection*{Table 1 - Summary of average prediction accuracies per chain (median in parentheses).}
981:     \par
982:     \mbox{
983: \begin{tabular}{lll}\hline
984: SS & $Q_3$= 80.5\% (81.6) & $SOV$= 80.0\% (81.1)\\
985: CN & $Cor$= 0.746 (0.768) & $DevA$= 0.686 (0.670) \\
986: RWCO & $Cor$= 0.613 (0.646) & $DevA$= 0.877 (0.812)\\\hline
987:   \end{tabular}
988:       }\\
989: 
990: SS, Secondary structure prediction: $Q_3$ is the percentage of correct prediction.; $SOV$ is the segment overlap measure~\cite{SOV99}.\\
991: CN, Contact number prediction: $Cor$ is the Pearson's correlation coefficient between the predicted and native CNs; $DevA$ is the RMS error normalized by the standard deviation of the native CN \cite{KinjoETAL2005}.\\
992: RWCO, Residue-wise contact order prediction: $Cor$ and $DevA$ are defined as for
993: CN but calculated with predicted and native RWCOs.
994: 
995: 
996: \subsection*{Table 2: Summary of per-residue accuracies for SS predictions.}
997: \par
998: \mbox{
999:   \begin{tabular}[tbh]{lrrr}\hline
1000: measure    & $H$ & $E$ & $C$ \\\hline
1001: $Q_s$      & 82.7 & 69.3 & 84.0 \\
1002: $Q_s^{pre}$ & 84.4 & 78.9 & 78.3\\
1003: $MC$       &  0.754 & 0.674 & 0.645 \\\hline
1004:   \end{tabular}
1005: }\\
1006: 
1007: $Q_s$: The number of correctly predicted residues of the SS class $s = H, E, C$
1008:  divided by the number of residues in the class in native structures.\\
1009: $Q_s^{pre}$: The number of correctly predicted residues of the SS class $s = H, E, C$
1010:  divided by the number of residues predicted as the corresponding class.\\
1011: $MC$: Matthews' correlation coefficient.
1012: \end{document}
1013: