q-bio0408017/evin3.TEX
1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: \documentclass{elsart}
3: 
4: %%%%%%%%%%%%%%%%%%%%%%PACKAGES%%%%%%%%%%%%%%%%%%%%%%
5: \usepackage{graphicx}
6: \usepackage{amssymb}
7: 
8: %%%%%%%%%%%%%%%%%%%%%%DOCUMENT%%%%%%%%%%%%%%%%%%%%%%
9: \begin{document}
10: %%%%%%%%%%%%%%%%%%%%%%TITLE%%%%%%%%%%%%%%%%%%%%%%%%%
11: \begin{frontmatter}
12: 
13: \title{MONTE CARLO SIMULATION AND STATISTICAL ANALYSIS OF GENETIC INFORMATION CODING}
14: \author{E. Gultepe\corauthref{Northeastern}}
15: \author{M.~L. Kurnaz\corauthref{BU}}
16: \corauth[Northeastern]{Present Address: Northeastern University}
17: \address{Department of Physics, Bogazici University, 34342
18: Bebek Istanbul}
19: \ead{kurnaz@boun.edu.tr}
20: \corauth[BU]{Corresponding Author}
21: 
22: %%%%%%%%%%%%%%%%%%%%%ABSTRACT%%%%%%%%%%%%%%%%%%%%%%%
23: \begin{abstract}
24: The rules that specify how the information contained in DNA codes
25: amino acids, is called ``the genetic code". Using a simplified
26: version of the Penna nodel, we are using computer simulations to
27: investigate the importance of the genetic code and the number of
28: amino acids in Nature on population dynamics. We find that the 
29: genetic code is not a random pairing of codons to amino acids and 
30: the number of amino acids in Nature is an optimum under mutations.
31: \end{abstract}
32: 
33: \end{frontmatter}
34: 
35: \section{INTRODUCTION\protect\\ }
36: \label{sec:level1}
37: In general population dynamics is a matter of interest for biologists, 
38: however it has attracted the attention of physicists since it is a 
39: subject very closely related to statistical mechanics. Investigation 
40: of population dynamics in Nature is not a simple task because to 
41: get any idea about the dynamics of population growth, one has to 
42: consider many generations of a population. Even if one can find 
43: fast-reproducing species like the fruit fly, checking all individuals in
44: such a population is not an easy task either. Therefore, modelling
45: with computers has lots of advantages such as considerably small
46: time consumption and simplicity in population monitoring.
47: 
48: The most successful computational model for age-structured
49: populations is the Penna model \cite{Penna}. In this model, 
50: individuals are represented by bit-strings which are 32 bits long 
51: and are initially set to zero. Each bit represents a given age: 
52: as the individual gets older we move down on the bit-string.
53: Bits which are set to zero represent that no bad
54: mutations is stored at that age. However, if a bit is set to one,
55: it means that the individual suffers a disease at that age and its
56: probability of staying alive is decreased.
57: 
58: The Penna model has been successfully used to investigate the 
59: advantages of sexual reproduction over asexual reproduction
60: \cite{Stauffer13580}\cite{Stauffer13600}\cite{Stauffer13570}\cite{Sousa680}\cite{Tuzel2770}\cite{Tuzel21970}\cite{Orcal3410}, 
61: certain features of ecology \cite{Penna2510} and population dynamics 
62: \cite{Penna13590}\cite{Penna13610}\cite{Huang3140}.
63: 
64: To investigate the importance of \textit{the genetic code} and 
65: number of amino acids in population dynamics we have constructed
66: a model based on the Penna model.
67: 
68: The genetic information about the individuals is stored in the DNA.
69: DNA is made up of different monomers. Each monomer, nucleotide, 
70: in the chain carries a heterocyclic base. In DNA, these bases 
71: are adenine (A), guanine (G), cytosine (C) and thymine (T). 
72: Proteins are synthesized from amino acids using the information 
73: stored in the DNA. As there are four bases in the DNA, and 20 amino
74: acids used in proteins, during protein synthesis there is not a
75: one-to-one correspondence between the nucleotides in the DNA and the
76: amino acids in a protein. Rather, the linear sequence of bases
77: which constitutes the protein-coding information is "read" by the
78: cell in blocks of three nucleotide residues, or codons, each of
79: which specifies a different amino acid. If we consider a
80: nucleotide on the DNA to be a letter in a four-letter alphabet,
81: codons can be thought as words with three letters. Hence, there
82: are sixty four words to code the twenty different amino acids. The
83: set of rules that specifies which nucleic acid codons correspond
84: to which amino acid is known as the genetic code.
85: 
86: 
87: 
88: \section{COMPUTATIONAL METHOD\protect\\ }
89: \label{sec:level2}
90: 
91: If there is a mutation on a gene which causes a change in the
92: amino acid chain, we will think that the organism may not be able
93: to build the protein which may be crucial for the organism. If so;
94: it will not function properly or it may simply die; hence it is
95: simplistic yet reasonable to represent the organism by a single
96: gene.
97: 
98: In our model, to represent a whole individual, we took a real gene
99: from Nature ``human cytokine" (LD78 Homo sapiens blood lymphocyte
100: gene on the DNA 17$^{th}$ chromosome) \cite{gene}. This gene is
101: necessary for activating lymphocytes; therefore if it is missing
102: the human body cannot perform immune responses.
103: 
104: If all other effects (aging, food restriction, illness etc.) are
105: neglected, mutation will be the only possible cause of death.
106: Also, in our simplistic model reproduction is not included, 
107: therefore we have a population which can only decrease as a 
108: result of mutations. We use this model to investigate the 
109: effects of mutations to population decrease.
110: 
111: A mutation in our model is a process acting at each site 
112: independently. We disregard more complicated processes such as
113: deletions or insertions, and we only look at single nucleotide 
114: replacements by another nucleotide in the gene. Normally the 
115: rates for these replacements depend on the two nucleotides being 
116: interchanged. The simplest approach to the problem is to take all
117: mutation rates to be equal, known as the Jukes-Cantor mutation scheme
118: \cite{Jukes21920}.
119: 
120: The mutation is taken to be deleterious if it causes a change in 
121: the amino acid chain; and not all the mutations kill the individual. 
122: A real gene is composed of two different parts: a coding portion and 
123: a noncoding portion. The coding part, exon, is responsible for coding 
124: for proteins whereas the rest, intron, does not code for a protein and 
125: the purpose of this part is not clearly understood yet. If a mutation
126: takes place on intron part, it is considered to be simply harmless
127: but if it takes place on exon part, it is usually harmful, but there 
128: is still a chance: The interchanged codon may still code the same amino 
129: acid since more than one codon can code one amino acid in nature.
130: 
131: To be more explicit, the codons AAA and AAG code the same amino
132: acid, ``lysine''; hence if AAA turns into AAG as a result of a
133: mutation the amino acid will not change and the protein can be
134: constructed safely. However; if AAA turns into AGA, which codes
135: the amino acid ``arginine'', the amino acid chain will change and
136: we assume that the protein can not build up, which means the
137: represented organism will die.
138: 
139: There can be a mutation which converts AAA to AAX where X $\neq$
140: {A, G, C, T}; then the individual dies automatically. As a model,
141: we are looking at a simpler case where a mutation changes A to one
142: of G, C, T not X.
143: 
144: Since reproduction is not included in the model, the population
145: can only diminish. The decrease in population can be found by
146: calculating the probability of a deleterious mutation. The
147: probability of the mutation changing the amino acid depends on 
148: the codon; so one needs to find the probability of hitting each 
149: different codon type. First, the probability of hiting a codon
150: type ($P_{\alpha}$) is calculated as the ratio of the number 
151: of codons of that type in the gene ($N_{\alpha}$) to total 
152: number of codons. Then we need to exclude the mutations that
153: do not cause a change in the amino acid and calculate the 
154: probability of a change occurring in the amino acid caused by
155: a change in one nucleotide ($P(d/{\alpha}$)) is calculated. 
156: 
157: As an example; only two codons code the amino acid ``lysine'': 
158: AAA and  AAG. In the exon part of human cytokine gene, there are 
159: only three ``AAA'' codons and the total number of codons in the gene 
160: is 207, hence the probability of the mutation hitting an ``AAA'' 
161: codon is simply $P_{\alpha} = 3\div207 = 0.0145$. By a point 
162: mutation to ``AAA" we can have 9 different codons (AAC, AAG, AAU, 
163: ACA, AGA, AUA, CAA, GAA, UAA). One of these codons still codes 
164: the same amino acid (AAG). Therefore the probability of deleterious
165: mutation ($P(d/{\alpha})$) is 8/9 for ``AAA'' in the human cytokine
166: gene.
167: 
168: Next, we need to calculate the probability of hitting the exon part 
169: of the gene as the ratio of the exon part to the total gene. In the 
170: human cytokine gene, there are 621 nucleotides in exon part and 
171: 1447 ones in intron part:
172: \begin{equation}
173: P(hitting \: exon) = \frac{621}{2068}= 0.3032\label{}
174: \end{equation}
175: 
176: \noindent Hence; the probability of having a deleterious mutation
177: for all of the gene is simply:
178: \begin{equation}
179: P(deleterious) = P(hitting \, exon) \sum_{\alpha = 1}^{64} [
180: P_{\alpha} P(d/\alpha)]  = 0.2344 \label{}
181: \end{equation}
182: 
183: \noindent The survival probability can be calculated by:
184: 
185: \begin{equation}
186: P(surviving) = 1- P(deleterious) = 0.7656 \label{}
187: \end{equation}
188: 
189: \noindent If we take a population of $N_0$ gene (individuals),
190: after n mutations, to the first approximation, the number of 
191: surviving individuals is given by:
192: 
193: \begin{equation}
194: N_n  \approx N_0P(surviving)^n \label{}
195: \end{equation}
196: 
197: \noindent Hence, we obtain the ``probability of survival" with
198: the slope of the number of surviving individuals versus 
199: time graph:
200: 
201: \begin{equation}
202: slope \approx ln[P(surviving)] = -0.2670 \label{slope}
203: \end{equation}
204: 
205: During this calculation we used a simple assumption that after each
206: timestep the genome remain the same as the wildtype. A mutation may 
207: result in a different nucleotide sequence, but if this sequence codes 
208: the same amino acid, we assume that this mutation has never happened 
209: and go to the next stage. Hence, all alive individuals can be represented
210: by the same array, wildtype, as in the calculations. However, in the 
211: less likely event of a harmless mutation the number of the codons of
212: each type changes which will in turn slightly change the probabilities. 
213: We have designed a test simulation where after each mutation and deletion
214: of the individual, we have set all the sequence back to the wildtype. As 
215: this simulation gives the same results (within the error bars) as the
216: original case, we have used the modified sequence in the later stages.
217: 
218: \section{SIMULATION\protect\\ }
219: \label{sec:level3}
220: 
221: In the simulation, an individual (a gene) is represented by an array 
222: which contains 0, 1, 2, and 3's instead of the nucleotides Adenine (A), 
223: Guanine (G), Cytosine (C) and Thymine (Uracil (U)) respectively and 
224: also a sign bit which shows if the gene has a deleterious mutation (1) 
225: or not (0).
226: 
227: In each ``cycle", each individual has to go through one mutation event, 
228: then it is determined whether or not the individual should die. In the 
229: mutation event; the place of mutation and the mutant nucleotide is 
230: determined randomly. If the nucleotide is not in the exon part, the 
231: sign bit remains 0. Otherwise, the changed codon is checked for the 
232: amino acid which it codes. If it is coding the same amino acid, the 
233: protein can still be built, therefore the sign bit is not changed and 
234: the individual survives. However, if the amino acid is changed then the
235: sign bit becomes '1'  that means this individual will be deleted. Deletion 
236: time is recorded for each individual [Fig.\ref{flowchart} (a)]. In the 
237: control simulation, if the mutation is harmless, modification of the gene 
238: will be recovered [Fig.\ref{flowchart}(b)].
239: 
240: \begin{figure}[!]
241: \begin{center}
242: \includegraphics[width=14cm]{figure1.eps}
243: \caption{a.) Flowchart of the simulation
244: b.) Flowchart of the control simulation}
245: \label{flowchart}
246: \end{center}
247: \end{figure}
248: 
249: After $N$ individuals, the number of surviving individuals in each time 
250: step is calculated. Since the probability of mutation is independent
251: of the number of individuals, this number also gives us the population 
252: size. Hence, we have an exponential population decay and the exponent 
253: depends on the probability of surviving ($P(surviving)$). Logarithm of 
254: the population is fitted to a straight line and the slope of the line 
255: is calculated.
256: 
257: We run all simulations ten times. The average of the slopes of the 
258: control simulations  $-0.266 \pm 0.001$, which is noticeably close to 
259: the slope derived from calculations. After the control, we run the 
260: simulation using genetic code of Nature. One example of such runs is 
261: shown in Figure \ref{slope}. The average of slopes for Nature's 
262: simulation is $-0.266 \pm 0.001$.\\
263: 
264: \begin{figure}[!]
265: \begin{center}
266: \includegraphics[width=14cm]{figure2.eps}
267: \caption {Population decreasing: one of the simulations of the
268: amino acids table of Nature}\label{slope}
269: \end{center}
270: \end{figure}
271: 
272: \section{ARTIFICIAL TABLES\protect\\ }
273: \label{sec:level4}
274: 
275: With a few exceptions, twenty different kinds of amino acids 
276: are used to build the proteins. Even though in some rare
277: cases certain organisms use selenocysteine and pyrrolysine,
278: in Nature, the majority uses the same table. Recently a team of
279: investigators at the Scripps Research Institute modified a 
280: form of the bacterium Escherichia coli to use a 22-amino acid
281: genetic code instead of 20 \cite{Anderson21930}. They have
282: engineered the modified form of E. Coli to make myoglobin
283: proteins with 22 amino acids, using the unnatural amino acids
284: O-methyl-L-tyrosine and L-homoglutamine in addition to the 
285: naturally occurring 20. This work opens up the possibility
286: that the same procedure can be used to expand the amino acid
287: family even further. So, the question is why did life stop 
288: with twenty amino acids? To investigate the importance 
289: of the number of amino acids, we create amino acid tables 
290: based on Nature's table but with different amino acid numbers 
291: and we use them in the simulation.
292: 
293: If we change the number of amino acids in the genetic code, it
294: means that we change the amount of information in the genome.
295: Hence, to conserve the information, the genome length needs to be
296: adjusted also. Moreover, if we want our simulation to represent
297: Nature, the process of extending or shortening the amino acid
298: table needs to obey some rules of biochemistry.
299: 
300: The twenty amino acids contain with their twenty different side
301: chains of different chemical properties. This allows proteins to have
302: such a great variety of structures and properties. There are
303: several classes of side chains, grouped by their dominant chemical
304: features. While developing tables, we try to make them fit the
305: natural structure of amino acids obeying the classification in
306: \cite{Matthews}.
307: 
308: We have tried to use amino acid tables with more and with less number 
309: of amino acids. To shorten the amino acid tables, we first randomly 
310: choose which amino acid will be removed from the table. The random 
311: choice is made such that no two amino acids are removed from the 
312: same group (as long as the number of removed amino acids is less than 
313: the number of amino acids groups).
314: 
315: For example, in the table which has 18 amino acids Glutamine and 
316: Isoleucine are removed. Glutamine is in the acidic group and its 
317: frequency of occurrence is $3.9 \%$. The codons which code Glutamine 
318: formerly (CAA, CAG) will code Glutamic Acid which is also in acidic 
319: group and has the frequency of $6.2 \%$. Isoleucine is in the aliphatic 
320: group and its frequency is $4.6 \%$. The codons which code Isoleucine 
321: formerly (AUU, AUC, AUA) will code Glycine which is also in aliphatic 
322: group and has the frequency of $7.5 \%$. 
323: 
324: To conserve the information content of the gene, the gene should be 
325: lengthened. As an example, by changing the codons CAA and CAG from
326: Glutamine to Glutamic Acid we have lost the information carried by 
327: Glutamine. Now we take Asparagine and Aspartic Acid, which are also
328: in the acidic group, and insert them where Glutamine was originally.
329: The same procedure was repeated to decrease the number of amino acids
330: to 16 and then to 14.
331: 
332: To extend the amino acid table, first we determine which consecutive 
333: amino acid pair in the gene has the highest frequency. To do this, we 
334: calculate the number of occurrence of pairs and construct the pair matrix. 
335: Then, each frequent pair is replaced by a new amino acid. For example,
336: Leucine - Leucine pair has the highest frequency and they will be replaced
337: by the new amino acid called New1. Similarly Threonine - Serine pairs are 
338: replaced by New2. Leucine is from the aliphatic group, and New1 is 
339: constructed by dividing Alanine, which is also from the aliphatics group
340: and is represented by the highest number of codons. Now the codons GCU 
341: and GCC still code Alanine but the remaining (GCA, GCG) will code New1. 
342: Similary, New2 is formed by dividing Serine: the codons UCU, UCC, UCA, 
343: and UCG code still Serine but the others (AGU and AGG) code New2. The 
344: tables with 24 amino acid , with 26 amino acids and with 28 amino acids
345: are constructed just the same way.
346: 
347: As control cases we have also done simulations with different amino acid 
348: tables, both increased and decreased number of amino acids, where we 
349: neglected the conservation of information and kept the genome length
350: constant. As expected, the results of these simulations were very close
351: (within error bars) to the results of the original table.
352: 
353: Biologists have also been trying to find simplified amino acid alphabets.
354: One of these methods is the MJ matrix constructed using Wang and Wang's 
355: method \cite{Wang21950} which is based on Miyazawa-Jernigan's (MJ) residue -
356: residue potentials \cite{Miyazawa21960}. Their reduction algorithm, which
357: connects different representations of a protein, is generally based on
358: the idea that the amino acids can be distributed into different groups,
359: with different interactions. The interactions between amino acids of two 
360: different groups should have similar characteristics for a successful
361: reduction.
362: 
363: Another method is the BLOSUM50 matrix, built using Murphy, Wallqvist and
364: Levy's method \cite{Murphy21940} derived by Henikoff and Henikoff 
365: \cite{Henikoff7460}. Their reduction scheme is based on the analysis of
366: correlations among similarity matrix elements used for sequence alignment.
367: We have constructed reduced amino acid tables using both the MJ matrix and
368: BLOSUM50 matrix methods.
369: 
370: \section{CONCLUSION\protect\\ }
371: \label{sec:level6}
372: 
373: In this paper, we developed a computer simulation which represents
374: a living organism under mutations. Furthermore, we changed the
375: genetic code used in the simulations to analyze its effect on
376: population stability.
377: 
378: All the results of different simulations are summarized in
379: Table \ref{results} and plotted in Figure \ref{resultfig}.
380: 
381: \begin{center}
382: \begin{table}[!]
383: \caption{Results of the simulations using different genetic code 
384: tables.} 
385: \label{results}
386: \begin{tabular}{|l|c|}
387: \hline Table Name & ``probability of survival" \\ 
388: \hline Nature & $-0.266 \pm 0.001$\\
389: \hline with 18 & $-0.281 \pm 0.001$\\
390: \hline with 16 & $-0.291 \pm 0.004$\\
391: \hline with 14 & $-0.313 \pm 0.004$\\
392: \hline with 18 (using MJ Matrix)& $-0.282 \pm 0.002$\\
393: \hline with 16 (using MJ Matrix)& $-0.287 \pm 0.003$\\
394: \hline with 14 (using MJ Matrix)& $-0.320 \pm 0.003$\\
395: \hline with 18 (using BLOSUM50 Matrix)& $-0.288 \pm 0.003$\\
396: \hline with 16 (using BLOSUM50 Matrix)& $-0.294 \pm 0.003$\\
397: \hline with 14 (using BLOSUM50 Matrix)& $-0.302 \pm 0.004$\\
398: \hline with 22 & $-0.265 \pm 0.001$\\
399: \hline with 24 & $-0.265 \pm 0.001$\\
400: \hline with 26 & $-0.273 \pm 0.001$\\
401: \hline with 28 & $-0.291 \pm 0.002$\\
402: \hline
403: \end{tabular}%
404: \end{table}
405: \end{center}
406: 
407: \begin{figure}[!]
408: \begin{center}
409: \includegraphics[width=14cm]{figure3.eps}
410: \caption{``probabilities of survival" of different genetic code
411: tables. The fit to a parabola is just to giude the eye.}
412: \label{resultfig}
413: \end{center}
414: \end{figure}
415: 
416: The results of shorter amino acid tables show that if we try to
417: conserve the information, the population which is represented by
418: less amino acids is affected more severely by mutations. However,
419: if we do not mind the information transferred by the gene (the
420: simulations with conserved genome length), the population is not
421: affected much.
422: 
423: Even if we use different reducing algorithms (MJ or BLOSUM50) for
424: genetic code, the population is affected by mutations more than
425: the population represented by the genetic code of Nature.
426: 
427: While we extend the amino acid table, we shorten the size of the
428: gene which means that we try to conserve the information. The
429: resistance of the population against mutations does not change
430: when the amino acid number is 22 or 24. However, after 24, the
431: resistance starts to decrease. 
432: 
433: The slopes of the simulations with 20, 22 and 24 amino acids are
434: very close. These results are along the same line with the results 
435: of the investigators from the Scripps Research Institute 
436: \cite{Anderson21930} and provides computational justification for their
437: belief that genetic codes with even more amino acids is feasible.
438: However the number of amino acids will be restricted to 20-22-24 if
439: we want the resulting life form to be resilient against mutations.
440: 
441: \section{ACKNOWLEDGEMENTS}
442: I am grateful to Dr. Isil Aksan Kurnaz and Dr. Muhittin Mungan for
443: their contributions on the model and the calculations.
444: 
445: 
446: 
447: \begin{thebibliography}{10}
448: \expandafter\ifx\csname bibnamefont\endcsname\relax
449:   \def\bibnamefont#1{#1}\fi
450: \expandafter\ifx\csname bibfnamefont\endcsname\relax
451:   \def\bibfnamefont#1{#1}\fi
452: \expandafter\ifx\csname url\endcsname\relax
453:   \def\url#1{\texttt{#1}}\fi
454: \expandafter\ifx\csname urlprefix\endcsname\relax\def\urlprefix{URL }\fi
455: \providecommand{\bibinfo}[2]{#2}
456: \providecommand{\eprint}[2][]{\url{#2}}
457: 
458: \bibitem{Penna}
459: \bibinfo{author}{\bibfnamefont{T. J. P.} \bibnamefont{Penna}},
460:   \bibinfo{journal}{J. Stat. Phys.} \textbf{\bibinfo{volume}{78}},
461:   \bibinfo{pages}{1629} (\bibinfo{year}{1995}).
462: 
463: \bibitem{Stauffer13580}
464: \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
465:   \bibinfo{journal}{Physica A} \textbf{\bibinfo{volume}{273}},
466:   \bibinfo{pages}{132} (\bibinfo{year}{1999}).
467: 
468: \bibitem{Stauffer13600}
469: \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
470:   \bibinfo{author}{\bibfnamefont{P. M. C.} \bibnamefont{de Oliveira}},
471:   \bibinfo{author}{\bibfnamefont{S. M.} \bibnamefont{de Oliveira}} \bibnamefont{and}
472:   \bibinfo{author}{\bibfnamefont{R. M. Z.} \bibnamefont{dos Santos}},
473:   \bibinfo{journal}{Physica A} \textbf{\bibinfo{volume}{231}},
474:   \bibinfo{pages}{504} (\bibinfo{year}{1996}).
475: 
476: \bibitem{Stauffer13570}
477: \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
478:   \bibinfo{author}{\bibfnamefont{P. M. C.} \bibnamefont{de Oliveira}},
479:   \bibinfo{author}{\bibfnamefont{S. M.} \bibnamefont{de Oliveira}}, 
480:   \bibinfo{author}{\bibfnamefont{T. J. P.} \bibnamefont{Penna}} \bibnamefont{and}
481:   \bibinfo{author}{\bibfnamefont{J. S.} \bibnamefont{Sa Martins}},
482:   \bibinfo{journal}{Anais Da Academia Brasileira De Ciencias} \textbf{\bibinfo{volume}{73}},
483:   \bibinfo{pages}{15} (\bibinfo{year}{2001}).
484: 
485: \bibitem{Sousa680}
486: \bibinfo{author}{\bibfnamefont{A. O.} \bibnamefont{Sousa}},
487:   \bibinfo{author}{\bibfnamefont{S. M.} \bibnamefont{de Oliveira}} \bibnamefont{and}
488:   \bibinfo{author}{\bibfnamefont{J. S.} \bibnamefont{Sa Martins}},
489:   \bibinfo{journal}{Phys. Rev. E} \textbf{\bibinfo{volume}{67}},
490:   \bibinfo{pages}{32903} (\bibinfo{year}{2003}).
491: 
492: \bibitem{Tuzel2770}
493: \bibinfo{author}{\bibfnamefont{E.} \bibnamefont{Tuzel}},
494:   \bibinfo{author}{\bibfnamefont{V.} \bibnamefont{Sevim}} \bibnamefont{and}
495:   \bibinfo{author}{\bibfnamefont{A.} \bibnamefont{Erzan}},
496:   \bibinfo{journal}{Proc. Natl. Acad. Sci.} \textbf{\bibinfo{volume}{98}},
497:   \bibinfo{pages}{13774} (\bibinfo{year}{2001}).
498: 
499: \bibitem{Tuzel21970}
500: \bibinfo{author}{\bibfnamefont{E.} \bibnamefont{Tuzel}},
501:   \bibinfo{author}{\bibfnamefont{V.} \bibnamefont{Sevim}} \bibnamefont{and}
502:   \bibinfo{author}{\bibfnamefont{A.} \bibnamefont{Erzan}},
503:   \bibinfo{journal}{Phys. Rev. E} \textbf{\bibinfo{volume}{64}},
504:   \bibinfo{pages}{061908} (\bibinfo{year}{2001}).
505: 
506: \bibitem{Orcal3410}
507: \bibinfo{author}{\bibfnamefont{B.} \bibnamefont{Orcal}},
508:   \bibinfo{author}{\bibfnamefont{E.} \bibnamefont{Tuzel}},
509:   \bibinfo{author}{\bibfnamefont{V.} \bibnamefont{Sevim}},
510:   \bibinfo{author}{\bibfnamefont{N.} \bibnamefont{Jan}} \bibnamefont{and}
511:   \bibinfo{author}{\bibfnamefont{A.} \bibnamefont{Erzan}},
512:   \bibinfo{journal}{Int. J. Mod. Phys. C} \textbf{\bibinfo{volume}{11}},
513:   \bibinfo{pages}{973} (\bibinfo{year}{2000}).
514: 
515: \bibitem{Penna2510}
516: \bibinfo{author}{\bibfnamefont{T. J. P.} \bibnamefont{Penna}},
517:   \bibinfo{author}{\bibfnamefont{A.} \bibnamefont{Racco}},
518:   \bibinfo{author}{\bibfnamefont{A. O.} \bibnamefont{Sousa}},
519:   \bibinfo{journal}{Physica A} \textbf{\bibinfo{volume}{295}},
520:   \bibinfo{pages}{31} (\bibinfo{year}{2001}).
521: 
522: \bibitem{Penna13590}
523: \bibinfo{author}{\bibfnamefont{T. J. P.} \bibnamefont{Penna}} \bibnamefont{and}
524:   \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
525:   \bibinfo{journal}{Zeitschrift Fur Physik B-Condensed Matter} \textbf{\bibinfo{volume}{101}},
526:   \bibinfo{pages}{469} (\bibinfo{year}{1996}).
527: 
528: \bibitem{Penna13610}
529: \bibinfo{author}{\bibfnamefont{T. J. P.} \bibnamefont{Penna}},
530:   \bibinfo{author}{\bibfnamefont{S. M.} \bibnamefont{de Oliveira}} \bibnamefont{and}
531:   \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
532:   \bibinfo{journal}{Phys. Rev. E} \textbf{\bibinfo{volume}{52}},
533:   \bibinfo{pages}{R3309} (\bibinfo{year}{1995}).
534: 
535: \bibitem{Huang3140}
536: \bibinfo{author}{\bibfnamefont{Z. F.} \bibnamefont{Huang}} \bibnamefont{and}
537:   \bibinfo{author}{\bibfnamefont{D.} \bibnamefont{Stauffer}},
538:   \bibinfo{journal}{Theory in Biosciences} \textbf{\bibinfo{volume}{120}},
539:   \bibinfo{pages}{21} (\bibinfo{year}{2001}).
540: 
541: \bibitem{gene}
542: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide\&val=285912.
543: 
544: \bibitem{Jukes21920}
545: \bibinfo{author}{\bibfnamefont{T. H.} \bibnamefont{Jukes}} \bibnamefont{and}
546:   \bibinfo{author}{\bibfnamefont{C. R.} \bibnamefont{Cantor}}
547:   \emph{\bibinfo{title}{in Mammalian Protein Metabolism, edited by H. N. Munro}} 
548:   (\bibinfo{publisher}{Academic Press, New York}, 
549:   \bibinfo{pages}{21} \bibinfo{year}{1969}).
550: 
551: \bibitem{Anderson21930}
552: \bibinfo{author}{\bibfnamefont{J. C.} \bibnamefont{Anderson}},
553:   \bibinfo{author}{\bibfnamefont{N.} \bibnamefont{Wu}},
554:   \bibinfo{author}{\bibfnamefont{S. W.} \bibnamefont{Santoro}},
555:   \bibinfo{author}{\bibfnamefont{V.} \bibnamefont{Lakshman}},
556:   \bibinfo{author}{\bibfnamefont{D. S.} \bibnamefont{King}} \bibnamefont{and}
557:   \bibinfo{author}{\bibfnamefont{P. G.} \bibnamefont{Schultz}},
558:   \bibinfo{journal}{Proc. Natl. Acad. Sci.} \textbf{\bibinfo{volume}{101}},
559:   \bibinfo{pages}{7566} (\bibinfo{year}{2004}).
560: 
561: \bibitem{Matthews}
562: \bibinfo{author}{\bibfnamefont{T. J.}~\bibnamefont{Matthews}},
563:   \emph{\bibinfo{title}{Biochemistry}} (\bibinfo{publisher}{Addison
564:   Wesley Longman, San Fransisco}, \bibinfo{year}{2000}).
565: 
566: \bibitem{Wang21950}
567: \bibinfo{author}{\bibfnamefont{J.} \bibnamefont{Wang}} \bibnamefont{and}
568:   \bibinfo{author}{\bibfnamefont{W.} \bibnamefont{Wang}}
569:   \bibinfo{journal}{Nature Structral Biology} \textbf{\bibinfo{volume}{6}},
570:   \bibinfo{pages}{1033} (\bibinfo{year}{1999}).
571: 
572: \bibitem{Miyazawa21960}
573: \bibinfo{author}{\bibfnamefont{S.} \bibnamefont{Miyazawa}} \bibnamefont{and}
574:   \bibinfo{author}{\bibfnamefont{L. R.} \bibnamefont{Jernigan}}
575:   \bibinfo{journal}{J. Mol. Biol.} \textbf{\bibinfo{volume}{256}},
576:   \bibinfo{pages}{623} (\bibinfo{year}{1996}).
577: 
578: \bibitem{Murphy21940}
579: \bibinfo{author}{\bibfnamefont{L. R.} \bibnamefont{Murphy}},
580:   \bibinfo{author}{\bibfnamefont{A.} \bibnamefont{Wallqvist}} \bibnamefont{and}
581:   \bibinfo{author}{\bibfnamefont{R. M.} \bibnamefont{Levy}},
582:   \bibinfo{journal}{Prot. Eng.} \textbf{\bibinfo{volume}{13}},
583:   \bibinfo{pages}{149} (\bibinfo{year}{2000}).
584: 
585: \bibitem{Henikoff7460}
586: \bibinfo{author}{\bibfnamefont{S.} \bibnamefont{Henikoff}} \bibnamefont{and}
587:   \bibinfo{author}{\bibfnamefont{J. G.}~\bibnamefont{Henikoff}}
588:   \bibinfo{journal}{Proc. Natl. Acad. Sci.} \textbf{\bibinfo{volume}{89}},
589:   \bibinfo{pages}{10915} (\bibinfo{year}{1992}).
590: 
591: \end{thebibliography}
592: 
593: 
594: 
595: \end{document}
596: