0604:q-bio0604013/kinjo.tex

1: \documentclass[10pt]{article}

2:

3:

4: \usepackage{graphicx}

5: \usepackage{cite} % Make references as [1-4], not [1,2,3,4]

6:

7: \setlength{\topmargin}{0.0cm}

8: \setlength{\textheight}{21.5cm}

9: \setlength{\oddsidemargin}{0cm}

10: \setlength{\textwidth}{16.5cm}

11: \setlength{\columnsep}{0.6cm}

12:

13: \begin{document}

14:

15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

16: %%                                          %%

17: %% Enter the title of your article here     %%

18: %%                                          %%

19: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

20:

21: \title{CRNPRED: Highly Accurate Prediction of One-dimensional Protein Structures by Large-scale Critical Random Networks}

22:

23: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

24: %%                                          %%

25: %% Enter the authors here                   %%

26: %%                                          %%

27: %% Ensure \and is entered between all but   %%

28: %% the last two authors. This will be       %%

29: %% replaced by a comma in the final article %%

30: %%                                          %%

31: %% Ensure there are no trailing spaces at   %%

32: %% the ends of the lines                    %%

33: %%                                          %%

34: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

35:

36:

37: \author{Akira R Kinjo$^{1,2}$ and Ken Nishikawa$^{1,2}$\\

38:     $^1$Center for Information Biology and DNA Data Bank of Japan,\\

39:         National Institute of Genetics, Mishima, 411-8540, Japan\\

40:     $^2$Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), \\Mishima 411-8540, Japan

41:       }

42:

43: \maketitle

44:

45: \begin{abstract}

46: \textbf{Background:}

47: One-dimensional protein structures such as secondary structures or contact

48: numbers are useful for three-dimensional structure prediction and helpful for

49: intuitive understanding of the sequence-structure relationship.

50: Accurate prediction methods will serve as a basis for these and other purposes.\\

51: \textbf{Results:} We implemented a program CRNPRED which predicts secondary

52: structures, contact numbers and residue-wise contact orders. This program is

53: based on a novel machine learning scheme called critical random

54: networks. Unlike most conventional one-dimensional structure prediction

55: methods which are based on local windows of an amino acid sequence, CRNPRED

56: takes into account the whole sequence. CRNPRED achieves, on average per chain,

57: $Q_3$ = 81\% for secondary structure prediction, and correlation coefficients

58: of 0.75 and 0.61 for contact number and residue-wise contact order

59: predictions, respectively.\\

60: \textbf{Conclusion:} CRNPRED will be a useful tool for computational as

61: well as experimental biologists who need accurate one-dimensional protein

62: structure predictions.

63: \end{abstract}

64:

65:

66:

67: \section*{Background}

68:

69: One-dimensional (1D) structures of a protein are residue-wise

70: quantities or symbols onto which some features of the native

71: three-dimensional (3D) structure are projected.

72: 1D structures are of interest for several reasons. For example, predicted

73: secondary structures, a kind of 1D structures, are often used to limit the

74: conformational space to be searched in 3D structure prediction.

75: Furthermore, it has recently been shown that certain sets of the native

76: (as opposed to predicted) 1D structures of

77: a protein contain sufficient information to recover the native 3D

78: structure~\cite{PortoETAL2004,KinjoANDNishikawa2005}. These 1D structures are

79: either the principal eigenvector of the contact map~\cite{PortoETAL2004} or a set of secondary structures (SS), contact numbers (CN) and residue-wise contact orders (RWCO)~\cite{KinjoANDNishikawa2005}.

80: Therefore, it is possible, at least in principle, to predict the native 3D

81: structure by first predicting the 1D structures, and then by constructing

82: the 3D structure from these 1D structures. 1D structures are not only useful

83: for 3D structure predictions, but also helpful for intuitive understanding

84: of the correspondence between the protein structure and its amino acid sequence

85: due to the residue-wise characteristics of 1D structures. Therefore, accurate

86: prediction of 1D protein structures is of fundamental biological interest.

87:

88: Secondary structure prediction has a long history \cite{Rost2003}.

89: Almost all the modern predictors are based on position-specific scoring

90: matrices (PSSM) and some kind of machine learning techniques such as neural

91: networks or support vector machines. Currently the best predictors achieve

92: $Q_3$ of 77--79\% \cite{Jones1999,PollastriANDMcLysaght2005}.

93: The study of contact number prediction also started long time

94: ago \cite{NishikawaANDOoi1980, NishikawaANDOoi1986}, but further

95: improvements were made only recently \cite{KinjoETAL2005, Yuan2005, KinjoANDNishikawa2005c}. These recent methods are based on the ideas developed in SS

96: predictions (i.e., PSSM and machine learning), and achieve a correlation

97: coefficient of 0.68--0.73.

98:

99: Recently, we have developed a new method for accurately predicting SS, CN,

100: and RWCO based on a novel machine learning scheme,

101: critical random networks (CRN) ~\cite{KinjoANDNishikawa2005c}.

102: In this paper, we briefly describe the formulation of the method, and recent

103: improvements leading to even better predictions.

104: The computer program for SS, CN, and RWCO prediction named CRNPRED has been

105: developed for the convenience of the general user, and a web interface and

106: source code are made available online.

107:

108: \section*{Implementation}

109:

110: \subsection*{Definition of 1D structures}

111: \textit{Secondary structures (SS):}

112: Secondary structures were defined by the DSSP program \cite{DSSP}.

113: For three-state SS prediction, the simple encoding scheme (the so-called CK

114: mapping) was employed \cite{CrooksANDBrenner2004}.

115: That is, $\alpha$ helices ($H$), $\beta$ strands ($E$), and other structures

116: (``coils'') defined by DSSP were encoded as $H$, $E$, and $C$, respectively.

117: Note that we do not use the CASP-style conversion scheme (the so-called EHL

118: mapping) in which DSSP's $H$, $G$ ($3_{10}$ helix) and $I$ ($\pi$ helix) are encoded as $H$, and DSSP's $E$ and $B$ ($\beta$ bridge) as $E$.

119: We believe the CK mapping is more natural and useful for 3D structure

120: predictions (e.g., geometrical restraints should be different between an

121: $\alpha$ helix and a $3_{10}$ helix).

122: For SS prediction, we introduce feature variables $(y_i^H, y_i^E, y_i^C)$

123: to represent each type of secondary structures at the $i$-th residue position,

124: so that $H$ is represented as $(1,-1,-1)$, $E$ as $(-1,1,-1)$, and $C$ as

125: $(-1,-1,1)$.

126:

127: \textit{Contact numbers (CN):}

128: Let $C_{i,j}$ represent the contact map of a protein. Usually, the contact

129: map is defined so that $C_{i,j} = 1$ if the $i$-th and $j$-th residues are in

130: contact by some definition, or $C_{i,j} = 0$, otherwise. As in our

131: previous study, we slightly modify the definition using a sigmoid function.

132: That is,

133: \begin{equation}

134:   C_{i,j} = 1/\{1+\exp[w(r_{i,j} - d)]\}

135: \end{equation}

136: where $r_{i,j}$ is the distance between $C_{\beta}$ ($C_{\alpha}$

137: for glycines) atoms of the $i$-th and $j$-th residues, $d = 12$\AA{} is a

138: cutoff distance, and $w$ is a sharpness parameter of the sigmoid function

139: which is set to 3 \cite{KinjoETAL2005,KinjoANDNishikawa2005}. The rather

140: generous cutoff length of 12\AA{} was shown to optimize the prediction

141: accuracy \cite{KinjoETAL2005}. The use of the sigmoid function enables us to

142: use the contact numbers in molecular dynamics

143: simulations \cite{KinjoANDNishikawa2005}.

144: Using the above definition of the contact map, the contact number of the

145: $i$-th residue of a protein is defined as

146: \begin{equation}

147:   n_i = \sum_{j:|i-j|>2}C_{i,j}. \label{eq:defcn}

148: \end{equation}

149: The feature variable $y_i$ for CN is defined as $y_i = n_i / \log L$ where

150: $L$ is the sequence length of a target protein. The normalization

151: factor $\log L$ is introduced because we have observed that the contact

152: number averaged over a protein chain is roughly proportional to $\log L$,

153: and thus division by this value removes the size-dependence of predicted

154: contact numbers.

155:

156: \textit{Residue-wise contact orders (RWCO):}

157: RWCO was first introduced in \cite{KinjoANDNishikawa2005}.

158: This quantity measures the extent to which a residue makes long-range contacts

159: in a native protein structure.

160: Using the same notation as contact numbers,

161: the RWCO of the $i$-th residue in a protein structure is defined by

162: \begin{equation}

163:   o_i = \sum_{j:|i-j|>2}|i-j|C_{i,j}. \label{eq:defrwco}

164: \end{equation}

165: The feature variable $y_i$ for RWCO is defined as $y_i = o_i / L$ where

166: $L$ is the sequence length. Due to the similar reason as CN, the normalization

167: factor $L$ was introduced to remove the size-dependence of the predicted

168: RWCOs (the RWCO averaged over a protein chain is roughly proportional to the

169: chain length).

170:

171: \subsection*{Critical random networks}

172: Here we briefly describe the critical random network (CRN) method introduced

173: in \cite{KinjoANDNishikawa2005c} which should be referred to for the details.

174:  Unlike most conventional methods for 1D structure prediction [except for

175: some including the bidirectional recurrent neural networks \cite{BaldiETAL1999,PollastriANDMcLysaght2005,ChenANDChaudhari2006}], the CRN method

176: takes the whole amino acid sequence into account. In the CRN method,

177: an $N$-dimensional state vector $\mathbf{x}_i$  is assigned to the $i$-th

178: residue of the target sequence (we use $N = 5000$ throughout this paper).

179: Neighboring state vectors along the sequence

180: are connected via a random $N\times N$ orthogonal matrix $W$. This matrix is

181: also block-diagonal with the size of blocks ranging uniformly randomly

182: between 2 and 50. The input to the CRN is the position-specific scoring matrix

183: (PSSM), $U = (\mathbf{u}_1, \cdots, \mathbf{u}_L)$

184: of the target sequence obtained by PSI-BLAST~\cite{AltschulETAL1997} ($L$ is the sequence length of the target protein).

185: We impose that the state vectors satisfy the following equation of state:

186: \begin{equation}

187:   \label{eq:eos}

188:   \mathbf{x}_i = \tanh[\beta W (\mathbf{x}_{i-1} + \mathbf{x}_{i+1}) + \alpha V\mathbf{u}_i]

189: \end{equation}

190: for $i = 1, \cdots , L$ where $V$ is an $N\times 21$ random matrix

191: (the 21st component of $\mathbf{u}_i$ is always set to unity), and $\beta$ and $\alpha$ are scalar parameters. The fixed boundary condition is imposed ($\mathbf{x}_0 = \mathbf{x}_{L+1} = \mathbf{0}$). By setting $\beta = 0.5$,

192: the system of state vectors is made to be near a critical point in a certain

193: sense, and thus the range of site-site correlation is expected to be long

194: when $\alpha$ is sufficiently small but finite~\cite{KinjoANDNishikawa2005c}.

195: In this way, each state vector implicitly incorporates long-range correlations.

196: The 1D structure of the $i$-th residue is predicted as

197: a linear projection of a local window of the PSSM and the state vector obtained by solving Eq. \ref{eq:eos}:

198: \begin{equation}

199:   \label{eq:pred}

200:   y_i = \sum_{m=-M}^{M}\sum_{a=1}^{21}D_{m,a}u_{a,i+m} + \sum_{k=1}^{N}E_{k}x_{k,i}

201: \end{equation}

202: where $y_i$ is the predicted quantity, and $D_{m,a}$ and $E_k$ are the

203: regression parameters. In the first summation, each PSSM column is extended to

204: include the ``terminal'' residue.

205: Since Eq. \ref{eq:pred} is a simple linear equation once the equation of

206: state (Eq. \ref{eq:eos}) has been solved, learning the parameters $D_{m,a}$ and

207:  $E_{k}$ reduces to an ordinary linear regression problem.

208: For SS prediction, the triple $(y^{H}_i, y^{E}_i, y^{C}_i)$ is

209: calculated simultaneously, and the SS class is predicted as

210: $\mathrm{arg}\max_{s\in \{H, E, C\}}y^{s}_i$.  For the CN and RWCO prediction,

211: real values are predicted (2-state prediction is also made for CN using

212: the average CN for each residue type as the threshold for ``exposed''

213: or ``buried'' as in \cite{PollastriETAL2002}).

214: The half window size $M$ is set to 9 for SS and CN predictions, and to 26 for

215: RWCO.

216:

217: \subsection*{Ensemble prediction}

218: Since the CRN-based prediction is parametrized by the random matrices $W$

219: and $V$,

220: slightly different predictions are obtained for different pairs of $W$ and $V$.

221: We can improve the prediction by taking the average over an ensemble of

222: such different predictions. 20 CRN-based predictors were constructed using

223: 20 sets of different random matrices $W$ and $V$. CN and RWCO are predicted

224: as uniform averages of these 20 predictions.

225:

226: For SS prediction, we employ further training. Let $s_{i}^{t,n}$ be the

227: prediction results of the $n$-th predictor for 1D structure $t$

228: ($H$, $E$, $C$, CN, and RWCO) of the $i$-th residue.

229: The second stage SS prediction is made by the following linear scheme:

230: \begin{equation}

231:   \label{eq:ss2}

232:   y_{i}^{ss} = \sum_{n=1}^{20}\sum_{t}\sum_{m=-3}^{3}w_{n,t,m}s_{i+m}^{t,n}

233: \end{equation}

234: where $ss = H, E, C$, and $w_{n,t,m}$ is the weight obtained from a training

235: set. Finally, the feature variable for each SS class of the

236: $i$-th residue is obtained by $(y_{i-1}^{ss} + 2y_{i}^{ss} + y_{i+1}^{ss})/4$.

237: This last procedure was found particularly effective for improving the

238: segment overlap (SOV) measure.

239:

240: \subsection*{Additional input}

241: Another improvement is the addition of the amino acid composition of

242: the target sequence to the predictor \cite{Yuan2005}:

243: The term $\sum_{a=1}^{20}F_af_a$ was added to Eq. \ref{eq:pred} where $F_a$

244: is a regression parameter, and $f_a$ is the fraction of the amino acid

245: type $a$.

246:

247: \subsection*{Training and test data set}

248: We carried out a 15-fold cross-validation test following exactly the same

249: procedure and the same data set as the previous

250: study \cite{KinjoANDNishikawa2005c}. In the data set, there are 680 protein

251: domains, each of which represents a superfamily according to the SCOP

252: database (version 1.65) \cite{SCOP}. This data set was randomly divided so

253: that 630 domains were used for training and the remaining 50 domains for

254: testing, and the random division was repeated 15 times.

255: No pair of these domains belong to the same superfamily, and hence they are

256: not expected to be homologous. Thus, the present benchmark is a very

257: stringent one.

258:

259: For obtaining PSSMs by running PSI-BLAST, we use the UniRef100

260: (version 6.8) amino acid sequence database \cite{UniProt} containing some

261: 3 million entries.

262: Also the number of iterations in PSI-BLAST homology searches was reduced

263: to 3 times from 10 used in the previous study. This especially increased the

264: accuracy of SS predictions.

265: These results are consistent with the study of \cite{PrzybylskiANDRost2002}.

266:

267: \subsection*{Numerics}

268: One drawback of the CRN method is the computational time required for

269: numerically solving the equation of state (Eq. \ref{eq:eos}).

270: For that purpose, instead of the Gauss-Seidel-like

271: method previously used, we implemented a successive over-relaxation

272: method which was found to be much more efficient.

273:

274: Let $\nu$ denote the stage of iteration.

275: We set the initial value of the state vectors (with $\nu = 0$) as

276: \begin{equation}

277:   \mathbf{x}_{i}^{(0)} = \tanh [\alpha V \mathbf{u}_{i}].\label{eq:init_eos}

278: \end{equation}

279: Then, for $i = 1, \cdots , L$ (in increasing order of $i$), we update

280: the state vectors by

281: \begin{eqnarray}

282:   \mathbf{x}_{i}^{(2\nu+1)} \gets & \mathbf{x}_{i}^{(2\nu)} + \omega

283: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)}\nonumber\\

284: & +\mathbf{x}_{i+1}^{(2\nu)})

285: + \alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu)}\}.

286: \label{eq:feos}

287: \end{eqnarray}

288: Next, we update them in the reverse order. That is, for $i = L, \cdots , 1$

289: (in decreasing order of $i$),

290: \begin{eqnarray}

291:   \mathbf{x}_{i}^{(2\nu+2)}  \gets & \mathbf{x}_{i}^{(2\nu+1)} + \omega

292: \{\tanh [W(\mathbf{x}_{i-1}^{(2\nu+1)} \nonumber\\

293: & + \mathbf{x}_{i+1}^{(2\nu+2)})

294: +\alpha V \mathbf{u}_{i}] - \mathbf{x}_{i}^{(2\nu+1)}\}.

295: \label{eq:beos}

296: \end{eqnarray}

297: We then set $\nu \gets \nu + 1$, and iterate Eqs. (\ref{eq:feos}) and (\ref{eq:beos}) until $\{\mathbf{x}_{i}\}$ converges. The acceleration parameter of $\omega = 1.4$ was found effective.

298: The convergence criterion is

299: \begin{equation}

300: \sqrt{\sum_{i=1}^{L}||\mathbf{x}_{i}^{(2\nu+2)}-\mathbf{x}_{i}^{(2\nu+1)}||_{\mathbf{R}^{N}}^{2}/{NL}}<10^{-3}

301: \end{equation}

302: where $||\cdot||_{\mathbf{R}^{N}}$ denotes the Euclidean norm.

303: This criterion is much less stringent than previous study ($10^{-7}$), but this

304: does not affect the prediction accuracy significantly.

305: Convergence is typically achieved within 10 to 12 iterations for one protein.

306:

307:

308: \section*{Results and Discussion}

309: There are two main ingredients for the improved one-dimensional protein

310: structure prediction in the present study. First is the use of large-scale

311: critical random networks of 5000 dimension and 20 ensemble predictors.

312: Second is the use of a large sequence database (UniRef100) for PSI-BLAST

313: searches.

314: As demonstrated in Table~1, the CRN method achieves remarkably

315: accurate predictions.

316: In comparison with the previous study \cite{KinjoANDNishikawa2005c} based on

317: 2000-dimensional CRNs (10 ensemble predictors),

318: the $Q_3$ and $SOV$ measures in SS predictions improved from 77.8\% and 77.3\%

319: to 80.5\% and 80.0\%, respectively. Similarly, the average correlation

320: coefficient improved from 0.726 to 0.746 for CN predictions,

321: and from 0.601 to 0.613 for RWCO predictions. The 2-state predictions for

322: CN yields, on average, $Q_2$ = 76.8\% per chain and 76.7\% per residue, and

323: Matthews' correlation coefficient of 0.533.

324:

325: A closer examination of the SS prediction results (Table 2)

326: reveals the drastic improvement of $\beta$ strand prediction from $Q_E$

327: = 61.9\% to 69.3\% (per residue). Although the values of $Q_C$ and $Q_E^{pre}$

328: are slightly lower than in the previous study by 0.6--1.0\%, the accuracies of

329: other classes have improved by 2.5--4\%.

330:

331: CRNPRED compares favorably with other secondary structure prediction methods.

332: The widely used PSIPRED program \cite{Jones1999,PSIPRED} which is based on conventional

333: feed-forward neural networks achieves $Q_3$ of 78\%.

334: A more recently developed method, Porter, \cite{PollastriANDMcLysaght2005}

335: which is based on bidirectional recurrent neural networks achieves $Q_3$ of

336: 79\%. An even more intricate method based on bidirectional segmented-memory

337: recurrent neural networks \cite{ChenANDChaudhari2006} shows an accuracy

338: of $Q_3$ = 73\% (this rather low accuracy may be attributed to the small size

339: of training set used). However, it should be reminded that these studies are

340: based on different data sets for both training and testing as well as the

341: definition of

342: secondary structural categories. Therefore, these comparisons may not be

343: very informative, but only give a rough estimation of relative performance.

344:

345: Regarding the contact number prediction, CRNPRED, achieving $Cor$ = 0.75,

346: is the most accurate method available today. The simple linear method \cite{KinjoETAL2005} with multiple

347: sequence alignment derived from the HSSP database \cite{HSSP} showed a

348: correlation coefficient of 0.63. A more advanced method based on support vector machines (local window-based) achieves a correlation of 0.68 per chain\cite{Yuan2005}.

349:

350: It is known that the number of homologs found by the PSI-BLAST searches

351: significantly affects the prediction accuracies \cite{PrzybylskiANDRost2002}.

352: We have examined this effect by plotting the accuracy measures for a

353: given minimum number of homologs found by PSI-BLAST (Fig. 1).

354: For example, we see in Fig. 1 that, for those proteins with

355: more than 100 homologs, the average $Q_3$ for SS predictions is 82.2\%.

356: The effect of the number of homologs significantly depends on the type of

357: 1D structure. For SS prediction, $Q_3$ steadily increases as the number of

358: homologs increases up to 100, but it stays in the range between 82.0 and 82.4

359: until the minimum number of homologs reaches around 400, and then it starts to

360: decrease. For CN prediction, $Cor$ also increases steadily but more slowly,

361: and it does not degrade when the minimum number of homologs reaches 500.

362: This tendency implies that CN is more conservative than SS during protein

363: evolution, which is consistent with previous observations \cite{KinjoANDNishikawa2004,BastollaETAL2005}. On the contrary, RWCO exhibits a peculiar behavior.

364: The value of $Cor$ reaches its peak at the minimum number of homologs of 80

365: beyond which the value rapidly decreases. This indicates that RWCO is not

366: evolutionarily well conserved. It was observed that the accuracies of SS and

367: CN predictions constantly increased when the dimension of CRNs was increased

368: from 2000 to 5000, but such was not the case for RWCO (data not shown).

369: RWCO seems to be such delicate a quantity that it is very difficult to extract

370: relevant information from the amino acid sequence.

371:

372: Finally, we note on practical applicability of predicted 1D

373: structures. We do not believe, at present, that the construction of

374: a 3D structure purely from the predicted 1D structures is practical,

375: if possible at all, because of the limited accuracy of the RWCO prediction.

376: However, SS and CN predictions are very accurate for many proteins

377: so that they may already serve as valuable restraints for 3D structure

378: predictions. Also, SS and CN predictions may be applied to domain

379: identification often necessary for experimental determination of protein

380: structures. CRNPRED has been proved useful for such a purpose \cite{MinezakiETAL2006}.

381: Although of the limited accuracy, predicted RWCOs still exhibit significant

382: correlations with the correct values. Since RWCOs reflect the extent to which

383: a residue is involved in long-range contacts, predicted RWCOs may be

384: useful for enumerating potentially structurally important residues.

385:

386: An interesting alternative application of the CRN framework is to regard the

387: solution of the equation of state (Eq. \ref{eq:eos}) as an extended sequence

388: profile. By so doing, it is straightforward to apply the solution to the

389: profile-profile comparison for fold recognition \cite{TomiiANDAkiyama2004}.

390: Such an application may be also pursued in the future.

391:

392: \section*{Availability and Requirements}

393:

394: \begin{description}

395: \item[Project name:] CRNPRED

396: \item[Project home page:] ~\\http://bioinformatics.org/crnpred/

397: \item[Operating system:] UNIX-like OS (including Linux and Mac OS X).

398: \item[Programming language:] C.

399: \item[Other requirements:] zsh, PSI-BLAST (blastpgp), The UniRef100 amino acid sequence database.

400: \item[License:] Public domain.

401: \item[Any restrictions to use by non-academics:] None.

402: \end{description}

403:

404: \section*{List of Abbreviations Used}

405: CRN, critical random network; SS, secondary structure; CN, contact number;

406: RWCO, residue-wise contact order; 1D, one-dimensional; 3D, three-dimensional.

407:

408: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

409: \section*{Authors contributions}

410: A. R. K. designed and implemented the method, carried out benchmarks, wrote

411: the first draft of the manuscript. A. R. K. and K. N. analyzed the results and

412: improved the manuscript.

413:

414:

415: %%%%%%%%%%%%%%%%%%%%%%%%%%%

416: \section*{Acknowledgements}

417: We thank Yasumasa Shigemoto for helping construct the CRNPRED web interface.

418: This work was supported in part by the MEXT, Japan.

419:

420:

421:

422: %\bibliographystyle{bmc_article}  % Style BST file

423: %  \bibliography{refs,mypaper}

424: %% BioMed_Central_Bib_Style_v1.01

425:

426: \begin{thebibliography}{10}

427: \providecommand{\url}[1]{[#1]}

428: \providecommand{\urlprefix}{}

429:

430: \bibitem{PortoETAL2004}

431: Porto M, Bastolla U, Roman HE, Vendruscolo M: \textbf{Reconstruction of protein

432:   structures from a vectorial representation}. \emph{Phys. Rev. Lett.} 2004,

433:   \textbf{92}:218101.

434:

435: \bibitem{KinjoANDNishikawa2005}

436: Kinjo AR, Nishikawa K: \textbf{Recoverable one-dimensional encoding of protein

437:   three-dimensional structures}. \emph{Bioinformatics} 2005,

438:   \textbf{21}:2167--2170. [Doi:10.1093/bioinformatics/bti330].

439:

440: \bibitem{Rost2003}

441: Rost B: \textbf{Prediction in {1D}: secondary structure, membrane helices, and

442:   accessibility}. In \emph{Structural Bioinformatics}. Edited by Bourne PE,

443:   Weissig H, Hoboken, U.S.A.: Wiley-Liss, Inc. 2003:559--587.

444:

445: \bibitem{Jones1999}

446: Jones DT: \textbf{Protein secondary structure prediction based on

447:   position-specific scoring matrices}. \emph{J. Mol. Biol.} 1999,

448:   \textbf{292}:195--202.

449:

450: \bibitem{PollastriANDMcLysaght2005}

451: Pollastri G, {McLysaght} A: \textbf{Porter: a new, accurate server for protein

452:   secondary structure prediction}. \emph{Bioinformatics} 2005,

453:   \textbf{21}:1719--1720.

454:

455: \bibitem{NishikawaANDOoi1980}

456: Nishikawa K, Ooi T: \textbf{Prediction of the surface-interior diagram of

457:   globular proteins by an empirical method}. \emph{Int. J. Peptide Protein

458:   Res.} 1980, \textbf{16}:19--32.

459:

460: \bibitem{NishikawaANDOoi1986}

461: Nishikawa K, Ooi T: \textbf{Radial locations of amino acid residues in a

462:   globular protein: Correlation with the sequence}. \emph{J. Biochem.} 1986,

463:   \textbf{100}:1043--1047.

464:

465: \bibitem{KinjoETAL2005}

466: Kinjo AR, Horimoto K, Nishikawa K: \textbf{Predicting absolute contact numbers

467:   of native protein structure from amino acid sequence}. \emph{Proteins} 2005,

468:   \textbf{58}:158--165. [Doi:10.1002/prot.20300].

469:

470: \bibitem{Yuan2005}

471: Yuan Z: \textbf{Better prediction of protein contact number using a support

472:   vector regression analysis of amino acid sequence}. \emph{BMC Bioinformatics}

473:   2005, \textbf{6}:248.

474:

475: \bibitem{KinjoANDNishikawa2005c}

476: Kinjo AR, Nishikawa K: \textbf{Predicting secondary structures, contact

477:   numbers, and residue-wise contact orders of native protein structure from

478:   amino acid sequence using critical random networks}. \emph{BIOPHYSICS} 2005,

479:   \textbf{1}:67--74. [Doi:10.2142/biophysics.1.67].

480:

481: \bibitem{DSSP}

482: Kabsch W, Sander C: \textbf{Dictionary of Protein Secondary Structure: Pattern

483:   recognition of hydrogen bonded and geometrical features}. \emph{Biopolymers}

484:   1983, \textbf{22}:2577--2637.

485:

486: \bibitem{CrooksANDBrenner2004}

487: Crooks GE, Brenner SE: \textbf{Protein secondary structure: entropy,

488:   correlations and prediction}. \emph{Bioinformatics} 2004,

489:   \textbf{20}:1603--1611.

490:

491: \bibitem{BaldiETAL1999}

492: Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G: \textbf{Exploiting the past

493:   and the future in protein secondary structure prediction}.

494:   \emph{Bioinformatics} 1999, \textbf{15}:937--946.

495:

496: \bibitem{ChenANDChaudhari2006}

497: Chen J, Chaudhari NS: \textbf{Bidirectional segmented-memory recurrent neural

498:   network for protein secondary structure prediction}. \emph{Soft Computing}

499:   2006, \textbf{10}:315--324.

500:

501: \bibitem{AltschulETAL1997}

502: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DL:

503:   \textbf{Gapped Blast and {PSI}-Blast: A new generation of protein database

504:   search programs}. \emph{Nucleic Acids Res.} 1997, \textbf{25}:3389--3402.

505:

506: \bibitem{PollastriETAL2002}

507: Pollastri G, Baldi P, Fariselli P, Casadio R: \textbf{Prediction of

508:   coordination number and relative solvent accessibility in proteins}.

509:   \emph{Proteins} 2002, \textbf{47}:142--153.

510:

511: \bibitem{SCOP}

512: Murzin AG, Brenner SE, Hubbard T, Chothia C: \textbf{{SCOP}: A structural

513:   classification of proteins database for the investigation of sequences and

514:   structures}. \emph{J. Mol. Biol.} 1995, \textbf{247}:536--540.

515:

516: \bibitem{UniProt}

517: Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E,

518:   Huang H, Lopez R, Magrane M, Martin MJ, Natale D, {O'Donovan} C, Redaschi N,

519:   Yeh LS: \textbf{The universal protein resource ({UniProt})}. \emph{Nucleic

520:   Acids Res.} 2005, \textbf{33}:D154--D159.

521:

522: \bibitem{PrzybylskiANDRost2002}

523: Przybylski D, Rost B: \textbf{Alignments grow, secondary structure prediction

524:   improves}. \emph{Proteins} 2002, \textbf{46}:197--205.

525:

526: \bibitem{PSIPRED}

527: McGuffin LJ, Bryson K, Jones DT: \textbf{The PSIPRED protein structure

528:   prediction server}. \emph{Bioinformatics} 2000, \textbf{16}:404--405.

529:

530: \bibitem{HSSP}

531: Sander C, Schneider R: \textbf{Database of homology-derived protein

532:   structures}. \emph{Proteins} 1991, \textbf{9}:56--68.

533:

534: \bibitem{KinjoANDNishikawa2004}

535: Kinjo AR, Nishikawa K: \textbf{Eigenvalue analysis of amino acid substitution

536:   matrices reveals a sharp transition of the mode of sequence conservation in

537:   proteins}. \emph{Bioinformatics} 2004, \textbf{20}:2504--2508.

538:

539: \bibitem{BastollaETAL2005}

540: Bastolla U, Porto M, Roman HE, Vendruscolo M: \textbf{Principal eigenvector of

541:   contact matrices and hydrophobicity profiles in proteins}. \emph{Proteins}

542:   2005, \textbf{58}:22--30.

543:

544: \bibitem{MinezakiETAL2006}

545: Minezaki Y, Homma K, Kinjo AR, Nishikawa K: \textbf{Human transcription factors

546:   contain a high fraction of intrinsically disordered regions essential for

547:   transcriptional regulation}. \emph{J. Mol. Biol.} 2006.  in press.

548:

549: \bibitem{TomiiANDAkiyama2004}

550: Tomii K, Akiyama Y: \textbf{{FORTE}: a profile-profile comparison tool for

551:   protein fold recognition}. \emph{Bioinformatics} 2004, \textbf{20}:594--595.

552:

553: \bibitem{SOV99}

554: Zemla A, Venclovas C, Fidelis K, Rost B: \textbf{A modified definition of Sov,

555:   a segment-based measure for protein secondary structure prediction

556:   assessment}. \emph{Proteins} 1999, \textbf{34}:220--223.

557:

558: \end{thebibliography}

559:

560: \newcommand{\BMCxmlcomment}[1]{}

561:

562: \BMCxmlcomment{

563:

564: <refgrp>

565:

566: <bibl id="B1">

567:   <title><p>Reconstruction of protein structures from a vectorial

568:   representation</p></title>

569:   <aug>

570:     <au><snm>Porto</snm><fnm>M.</fnm></au>

571:     <au><snm>Bastolla</snm><fnm>U.</fnm></au>

572:     <au><snm>Roman</snm><fnm>H. E.</fnm></au>

573:     <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>

574:   </aug>

575:   <source>Phys. Rev. Lett.</source>

576:   <pubdate>2004</pubdate>

577:   <volume>92</volume>

578:   <fpage>218101</fpage>

579: </bibl>

580:

581: <bibl id="B2">

582:   <title><p>Recoverable one-dimensional encoding of protein three-dimensional

583:   structures</p></title>

584:   <aug>

585:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>

586:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

587:   </aug>

588:   <source>Bioinformatics</source>

589:   <pubdate>2005</pubdate>

590:   <volume>21</volume>

591:   <fpage>2167</fpage>

592:   <lpage>2170</lpage>

593:   <note>doi:10.1093/bioinformatics/bti330</note>

594: </bibl>

595:

596: <bibl id="B3">

597:   <title><p>Prediction in {1D}: secondary structure, membrane helices, and

598:   accessibility</p></title>

599:   <aug>

600:     <au><snm>Rost</snm><fnm>B.</fnm></au>

601:   </aug>

602:   <source>Structural Bioinformatics</source>

603:   <publisher>Hoboken, U.S.A.: Wiley-Liss, Inc.</publisher>

604:   <editor>Bourne, P. E. and Weissig, H.</editor>

605:   <section><title><p>28</p></title></section>

606:   <pubdate>2003</pubdate>

607:   <fpage>559</fpage>

608:   <lpage>587</lpage>

609: </bibl>

610:

611: <bibl id="B4">

612:   <title><p>Protein secondary structure prediction based on position-specific

613:   scoring matrices</p></title>

614:   <aug>

615:     <au><snm>Jones</snm><fnm>D. T.</fnm></au>

616:   </aug>

617:   <source>J. Mol. Biol.</source>

618:   <pubdate>1999</pubdate>

619:   <volume>292</volume>

620:   <fpage>195</fpage>

621:   <lpage>202</lpage>

622: </bibl>

623:

624: <bibl id="B5">

625:   <title><p>Porter: a new, accurate server for protein secondary structure

626:   prediction</p></title>

627:   <aug>

628:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>

629:     <au><snm>{McLysaght}</snm><fnm>A.</fnm></au>

630:   </aug>

631:   <source>Bioinformatics</source>

632:   <pubdate>2005</pubdate>

633:   <volume>21</volume>

634:   <fpage>1719</fpage>

635:   <lpage>-1720</lpage>

636: </bibl>

637:

638: <bibl id="B6">

639:   <title><p>Prediction of the surface-interior diagram of globular proteins by

640:   an empirical method</p></title>

641:   <aug>

642:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

643:     <au><snm>Ooi</snm><fnm>T.</fnm></au>

644:   </aug>

645:   <source>Int. J. Peptide Protein Res.</source>

646:   <pubdate>1980</pubdate>

647:   <volume>16</volume>

648:   <fpage>19</fpage>

649:   <lpage>32</lpage>

650: </bibl>

651:

652: <bibl id="B7">

653:   <title><p>Radial locations of amino acid residues in a globular protein:

654:   Correlation with the sequence</p></title>

655:   <aug>

656:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

657:     <au><snm>Ooi</snm><fnm>T.</fnm></au>

658:   </aug>

659:   <source>J. Biochem.</source>

660:   <pubdate>1986</pubdate>

661:   <volume>100</volume>

662:   <fpage>1043</fpage>

663:   <lpage>1047</lpage>

664: </bibl>

665:

666: <bibl id="B8">

667:   <title><p>Predicting absolute contact numbers of native protein structure

668:   from amino acid sequence</p></title>

669:   <aug>

670:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>

671:     <au><snm>Horimoto</snm><fnm>K.</fnm></au>

672:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

673:   </aug>

674:   <source>Proteins</source>

675:   <pubdate>2005</pubdate>

676:   <volume>58</volume>

677:   <fpage>158</fpage>

678:   <lpage>165</lpage>

679:   <note>doi:10.1002/prot.20300</note>

680: </bibl>

681:

682: <bibl id="B9">

683:   <title><p>Better prediction of protein contact number using a support vector

684:   regression analysis of amino acid sequence</p></title>

685:   <aug>

686:     <au><snm>Yuan</snm><fnm>Z.</fnm></au>

687:   </aug>

688:   <source>BMC Bioinformatics</source>

689:   <pubdate>2005</pubdate>

690:   <volume>6</volume>

691:   <fpage>248</fpage>

692: </bibl>

693:

694: <bibl id="B10">

695:   <title><p>Predicting secondary structures, contact numbers, and residue-wise

696:   contact orders of native protein structure from amino acid sequence using

697:   critical random networks</p></title>

698:   <aug>

699:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>

700:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

701:   </aug>

702:   <source>BIOPHYSICS</source>

703:   <pubdate>2005</pubdate>

704:   <volume>1</volume>

705:   <fpage>67</fpage>

706:   <lpage>74</lpage>

707:   <note>doi:10.2142/biophysics.1.67</note>

708: </bibl>

709:

710: <bibl id="B11">

711:   <title><p>Dictionary of Protein Secondary Structure: Pattern recognition of

712:   hydrogen bonded and geometrical features</p></title>

713:   <aug>

714:     <au><snm>Kabsch</snm><fnm>W.</fnm></au>

715:     <au><snm>Sander</snm><fnm>C.</fnm></au>

716:   </aug>

717:   <source>Biopolymers</source>

718:   <pubdate>1983</pubdate>

719:   <volume>22</volume>

720:   <fpage>2577</fpage>

721:   <lpage>2637</lpage>

722: </bibl>

723:

724: <bibl id="B12">

725:   <title><p>Protein secondary structure: entropy, correlations and

726:   prediction</p></title>

727:   <aug>

728:     <au><snm>Crooks</snm><fnm>G. E.</fnm></au>

729:     <au><snm>Brenner</snm><fnm>S. E.</fnm></au>

730:   </aug>

731:   <source>Bioinformatics</source>

732:   <pubdate>2004</pubdate>

733:   <volume>20</volume>

734:   <fpage>1603</fpage>

735:   <lpage>1611</lpage>

736: </bibl>

737:

738: <bibl id="B13">

739:   <title><p>Exploiting the past and the future in protein secondary structure

740:   prediction</p></title>

741:   <aug>

742:     <au><snm>Baldi</snm><fnm>P.</fnm></au>

743:     <au><snm>Brunak</snm><fnm>S.</fnm></au>

744:     <au><snm>Frasconi</snm><fnm>P.</fnm></au>

745:     <au><snm>Soda</snm><fnm>G.</fnm></au>

746:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>

747:   </aug>

748:   <source>Bioinformatics</source>

749:   <pubdate>1999</pubdate>

750:   <volume>15</volume>

751:   <fpage>937</fpage>

752:   <lpage>946</lpage>

753: </bibl>

754:

755: <bibl id="B14">

756:   <title><p>Bidirectional segmented-memory recurrent neural network for protein

757:   secondary structure prediction</p></title>

758:   <aug>

759:     <au><snm>Chen</snm><fnm>J.</fnm></au>

760:     <au><snm>Chaudhari</snm><fnm>N. S.</fnm></au>

761:   </aug>

762:   <source>Soft Computing</source>

763:   <pubdate>2006</pubdate>

764:   <volume>10</volume>

765:   <fpage>315</fpage>

766:   <lpage>324</lpage>

767: </bibl>

768:

769: <bibl id="B15">

770:   <title><p>Gapped Blast and {PSI}-Blast: A new generation of protein database

771:   search programs</p></title>

772:   <aug>

773:     <au><snm>Altschul</snm><fnm>S. F.</fnm></au>

774:     <au><snm>Madden</snm><fnm>T. L.</fnm></au>

775:     <au><snm>Schaffer</snm><fnm>A. A.</fnm></au>

776:     <au><snm>Zhang</snm><fnm>J.</fnm></au>

777:     <au><snm>Zhang</snm><fnm>Z.</fnm></au>

778:     <au><snm>Miller</snm><fnm>W.</fnm></au>

779:     <au><snm>Lipman</snm><fnm>D. L.</fnm></au>

780:   </aug>

781:   <source>Nucleic Acids Res.</source>

782:   <pubdate>1997</pubdate>

783:   <volume>25</volume>

784:   <fpage>3389</fpage>

785:   <lpage>3402</lpage>

786: </bibl>

787:

788: <bibl id="B16">

789:   <title><p>Prediction of coordination number and relative solvent

790:   accessibility in proteins</p></title>

791:   <aug>

792:     <au><snm>Pollastri</snm><fnm>G.</fnm></au>

793:     <au><snm>Baldi</snm><fnm>P.</fnm></au>

794:     <au><snm>Fariselli</snm><fnm>P.</fnm></au>

795:     <au><snm>Casadio</snm><fnm>R.</fnm></au>

796:   </aug>

797:   <source>Proteins</source>

798:   <pubdate>2002</pubdate>

799:   <volume>47</volume>

800:   <fpage>142</fpage>

801:   <lpage>153</lpage>

802: </bibl>

803:

804: <bibl id="B17">

805:   <title><p>{SCOP}: A structural classification of proteins database for the

806:   investigation of sequences and structures</p></title>

807:   <aug>

808:     <au><snm>Murzin</snm><fnm>A. G.</fnm></au>

809:     <au><snm>Brenner</snm><fnm>S. E.</fnm></au>

810:     <au><snm>Hubbard</snm><fnm>T.</fnm></au>

811:     <au><snm>Chothia</snm><fnm>C.</fnm></au>

812:   </aug>

813:   <source>J. Mol. Biol.</source>

814:   <pubdate>1995</pubdate>

815:   <volume>247</volume>

816:   <fpage>536</fpage>

817:   <lpage>540</lpage>

818: </bibl>

819:

820: <bibl id="B18">

821:   <title><p>The universal protein resource ({UniProt})</p></title>

822:   <aug>

823:     <au><snm>Bairoch</snm><fnm>A.</fnm></au>

824:     <au><snm>Apweiler</snm><fnm>R.</fnm></au>

825:     <au><snm>Wu</snm><fnm>C. H.</fnm></au>

826:     <au><snm>Barker</snm><fnm>W. C.</fnm></au>

827:     <au><snm>Boeckmann</snm><fnm>B.</fnm></au>

828:     <au><snm>Ferro</snm><fnm>S.</fnm></au>

829:     <au><snm>Gasteiger</snm><fnm>E.</fnm></au>

830:     <au><snm>Huang</snm><fnm>H.</fnm></au>

831:     <au><snm>Lopez</snm><fnm>R.</fnm></au>

832:     <au><snm>Magrane</snm><fnm>M.</fnm></au>

833:     <au><snm>Martin</snm><fnm>M. J.</fnm></au>

834:     <au><snm>Natale</snm><fnm>D.A.</fnm></au>

835:     <au><snm>{O'Donovan}</snm><fnm>C.</fnm></au>

836:     <au><snm>Redaschi</snm><fnm>N.</fnm></au>

837:     <au><snm>Yeh</snm><fnm>L. S.</fnm></au>

838:   </aug>

839:   <source>Nucleic Acids Res.</source>

840:   <pubdate>2005</pubdate>

841:   <volume>33</volume>

842:   <fpage>D154</fpage>

843:   <lpage>D159</lpage>

844: </bibl>

845:

846: <bibl id="B19">

847:   <title><p>Alignments grow, secondary structure prediction

848:   improves</p></title>

849:   <aug>

850:     <au><snm>Przybylski</snm><fnm>D.</fnm></au>

851:     <au><snm>Rost</snm><fnm>B.</fnm></au>

852:   </aug>

853:   <source>Proteins</source>

854:   <pubdate>2002</pubdate>

855:   <volume>46</volume>

856:   <fpage>197</fpage>

857:   <lpage>205</lpage>

858: </bibl>

859:

860: <bibl id="B20">

861:   <title><p>The PSIPRED protein structure prediction server</p></title>

862:   <aug>

863:     <au><snm>McGuffin</snm><fnm>L. J.</fnm></au>

864:     <au><snm>Bryson</snm><fnm>K.</fnm></au>

865:     <au><snm>Jones</snm><fnm>D. T.</fnm></au>

866:   </aug>

867:   <source>Bioinformatics</source>

868:   <pubdate>2000</pubdate>

869:   <volume>16</volume>

870:   <fpage>404</fpage>

871:   <lpage>405</lpage>

872: </bibl>

873:

874: <bibl id="B21">

875:   <title><p>Database of homology-derived protein structures</p></title>

876:   <aug>

877:     <au><snm>Sander</snm><fnm>C.</fnm></au>

878:     <au><snm>Schneider</snm><fnm>R.</fnm></au>

879:   </aug>

880:   <source>Proteins</source>

881:   <pubdate>1991</pubdate>

882:   <volume>9</volume>

883:   <fpage>56</fpage>

884:   <lpage>68</lpage>

885: </bibl>

886:

887: <bibl id="B22">

888:   <title><p>Eigenvalue analysis of amino acid substitution matrices reveals a

889:   sharp transition of the mode of sequence conservation in proteins</p></title>

890:   <aug>

891:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>

892:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

893:   </aug>

894:   <source>Bioinformatics</source>

895:   <pubdate>2004</pubdate>

896:   <volume>20</volume>

897:   <fpage>2504</fpage>

898:   <lpage>2508</lpage>

899: </bibl>

900:

901: <bibl id="B23">

902:   <title><p>Principal eigenvector of contact matrices and hydrophobicity

903:   profiles in proteins</p></title>

904:   <aug>

905:     <au><snm>Bastolla</snm><fnm>U.</fnm></au>

906:     <au><snm>Porto</snm><fnm>M.</fnm></au>

907:     <au><snm>Roman</snm><fnm>H. E.</fnm></au>

908:     <au><snm>Vendruscolo</snm><fnm>M.</fnm></au>

909:   </aug>

910:   <source>Proteins</source>

911:   <pubdate>2005</pubdate>

912:   <volume>58</volume>

913:   <fpage>22</fpage>

914:   <lpage>30</lpage>

915: </bibl>

916:

917: <bibl id="B24">

918:   <title><p>Human transcription factors contain a high fraction of

919:   intrinsically disordered regions essential for transcriptional

920:   regulation</p></title>

921:   <aug>

922:     <au><snm>Minezaki</snm><fnm>Y.</fnm></au>

923:     <au><snm>Homma</snm><fnm>K.</fnm></au>

924:     <au><snm>Kinjo</snm><fnm>A. R.</fnm></au>

925:     <au><snm>Nishikawa</snm><fnm>K.</fnm></au>

926:   </aug>

927:   <source>J. Mol. Biol.</source>

928:   <pubdate>2006</pubdate>

929:   <inpress />

930: </bibl>

931:

932: <bibl id="B25">

933:   <title><p>{FORTE}: a profile-profile comparison tool for protein fold

934:   recognition</p></title>

935:   <aug>

936:     <au><snm>Tomii</snm><fnm>K.</fnm></au>

937:     <au><snm>Akiyama</snm><fnm>Y.</fnm></au>

938:   </aug>

939:   <source>Bioinformatics</source>

940:   <pubdate>2004</pubdate>

941:   <volume>20</volume>

942:   <fpage>594</fpage>

943:   <lpage>595</lpage>

944: </bibl>

945:

946: <bibl id="B26">

947:   <title><p>A modified definition of Sov, a segment-based measure for protein

948:   secondary structure prediction assessment</p></title>

949:   <aug>

950:     <au><snm>Zemla</snm><fnm>A</fnm></au>

951:     <au><snm>Venclovas</snm><fnm>C.</fnm></au>

952:     <au><snm>Fidelis</snm><fnm>K.</fnm></au>

953:     <au><snm>Rost</snm><fnm>B.</fnm></au>

954:   </aug>

955:   <source>Proteins</source>

956:   <pubdate>1999</pubdate>

957:   <volume>34</volume>

958:   <fpage>220</fpage>

959:   <lpage>223</lpage>

960: </bibl>

961:

962: </refgrp>

963: } % end of \BMCxmlcomment

964:

965: \newpage

966: \section*{Figures}

967:   \subsection*{Figure 1}

968: Average accuracy measure for given minimum number of homologs found by PSI-BLAST. From top to bottom: $Q_3$ of secondary structure predictions, $Cor$ of contact number predictions, and $Cor$ of residue-wise contact number predictions.

969:

970: \includegraphics[width=6cm]{fig1.eps}

971:

972: \newpage

973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

974: %%                               %%

975: %% Tables                        %%

976: %%                               %%

977: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

978:

979: \section*{Tables}

980:   \subsection*{Table 1 - Summary of average prediction accuracies per chain (median in parentheses).}

981:     \par

982:     \mbox{

983: \begin{tabular}{lll}\hline

984: SS & $Q_3$= 80.5\% (81.6) & $SOV$= 80.0\% (81.1)\\

985: CN & $Cor$= 0.746 (0.768) & $DevA$= 0.686 (0.670) \\

986: RWCO & $Cor$= 0.613 (0.646) & $DevA$= 0.877 (0.812)\\\hline

987:   \end{tabular}

988:       }\\

989:

990: SS, Secondary structure prediction: $Q_3$ is the percentage of correct prediction.; $SOV$ is the segment overlap measure~\cite{SOV99}.\\

991: CN, Contact number prediction: $Cor$ is the Pearson's correlation coefficient between the predicted and native CNs; $DevA$ is the RMS error normalized by the standard deviation of the native CN \cite{KinjoETAL2005}.\\

992: RWCO, Residue-wise contact order prediction: $Cor$ and $DevA$ are defined as for

993: CN but calculated with predicted and native RWCOs.

994:

995:

996: \subsection*{Table 2: Summary of per-residue accuracies for SS predictions.}

997: \par

998: \mbox{

999:   \begin{tabular}[tbh]{lrrr}\hline

1000: measure    & $H$ & $E$ & $C$ \\\hline

1001: $Q_s$      & 82.7 & 69.3 & 84.0 \\

1002: $Q_s^{pre}$ & 84.4 & 78.9 & 78.3\\

1003: $MC$       &  0.754 & 0.674 & 0.645 \\\hline

1004:   \end{tabular}

1005: }\\

1006:

1007: $Q_s$: The number of correctly predicted residues of the SS class $s = H, E, C$

1008:  divided by the number of residues in the class in native structures.\\

1009: $Q_s^{pre}$: The number of correctly predicted residues of the SS class $s = H, E, C$

1010:  divided by the number of residues predicted as the corresponding class.\\

1011: $MC$: Matthews' correlation coefficient.

1012: \end{document}

1013: