1: \documentclass[11pt]{article}
2: \newif\ifPDF
3: \ifx\pdfoutput\undefined\PDFfalse
4: \else\ifnum\pdfoutput >0\PDFtrue
5: \else\PDFfalse
6: \fi
7: \fi
8:
9: \ifPDF
10: \usepackage{amssymb}
11: \usepackage{amsfonts}
12: \usepackage[pdftex]{graphicx,color}
13: \else
14: \usepackage{amssymb}
15: \usepackage{amsfonts}
16: \usepackage[dvips]{graphicx}
17: \fi
18: %\usepackage{amssymb}
19: %\usepackage{amsfonts}
20: %\usepackage{graphics}
21: \setlength{\textwidth}{16cm}
22: \setlength{\textheight}{23cm}
23: \renewcommand{\baselinestretch}{1.0}
24: \addtolength{\oddsidemargin}{-15mm}
25:
26: \setlength{\topmargin}{-60pt}
27: \begin{document}
28: \begin{titlepage}
29:
30: August 2005
31: \vskip 1.6in
32: \begin{center}
33: {\Large {\bf Salerno's model of DNA reanalysed: could solitons
34: have biological significance?}}
35: \\[5pt]
36: \end{center}
37:
38: \normalsize
39: \vskip .4in
40:
41: \begin{center}
42: J. D. Bashford \\
43: {\it School of Mathematics and Physics, University of Tasmania} \\
44: {\it Private Bag 37, Hobart 7001, Tasmania Australia} \\
45: %\par \vskip .1in \noindent
46: %
47:
48: \end{center}
49: \par \vskip .3in
50:
51: \begin{center}
52: {\Large {\bf Abstract}}\\
53: \end{center}
54: We investigate the sequence-dependent behaviour of localised excitations
55: in a toy, nonlinear model of DNA base-pair opening originally proposed by
56: Salerno. Specifically we ask whether ``breather'' solitons could play a role
57: in the facilitated location of promoters by RNA polymerase.
58: In an effective potential formalism, we find excellent correlation between
59: potential minima and {\em Escherichia coli} promoter recognition sites in the
60: T7 bacteriophage genome. Evidence for a similar relationship between phage
61: promoters and downstream coding regions is found and alternative reasons
62: for links between AT richness and transcriptionally-significant sites are
63: discussed.
64: Consideration of the soliton energy of translocation provides a novel
65: dynamical picture of sliding: steep potential gradients correspond to
66: deterministic motion, while ``flat'' regions, corresponding to homogeneous AT
67: or GC content, are governed by random, thermal motion.
68: Finally we demonstrate an interesting equivalence between planar, breather
69: solitons and the helical motion of a sliding protein ``particle'' about a
70: bent DNA axis.
71:
72: \vspace{3cm}
73: {\bf Keywords}: DNA soliton, RNA polymerase, sliding, bacteriophage T7\\
74: \end{titlepage}
75:
76: \section{Introduction}
77: Protein-DNA interactions play many of the fundamental roles in gene
78: regulation. An understanding of the mechanisms involved in these
79: processes is one of the major current goals for numerous biological sciences.
80: With large repositories of genetic information available - and costs
81: associated with difficult, highly specific experiments - the question of
82: how well such molecular interactions can be simulated is clearly important
83: to investigate. Enzymes and many transcriptional factors are proteins, often
84: composed of tens to hundreds of amino acids, while the DNA domains to which
85: they bind can contain, in the case of prokaryotes, up to $10^{5}$ nucleotide bases.
86: All-atom modelling of such a molecular complex, even neglecting the key roles of hydration and
87: ions, is beyond current computational ability.
88:
89: An alternative, logical first step is to consider one specific kind of
90: interaction, focussing exclusively on its salient features and develop
91: an accordingly simplified model. In this spirit, simple dynamical models of
92: DNA have been studied for almost two decades, most successfully in regard
93: to describing denaturation experiments (see Ref. \cite{pey} for a review). The process by which the motions of small DNA molecules containing $\sim 10^{6}$
94: atoms are ``coarsened'' to the nucleotide base-pair, level has
95: previously been qualitatively argued \cite{Gaeta1}. Typically the DNA
96: molecule is modelled by one degree of freedom per base-pair: a radial
97: ``stretching''\cite{PB1} or a pendulum-like base ``flipping'' (Ref. \cite{Yak} being a recent review).
98:
99: We note, in this context, that {\em ab initio} calculations of small DNA
100: oligomers \cite{dinuc} suggest that base-pair motions can be accurately
101: approximated at the dinucleotide level in terms of two or three quasi-rigid,
102: internal degrees of freedom, lending some weight to the coarse-graining
103: assumptions previously made.
104: The motivation for these simple models is that, if experimental results
105: can be described with a small number of degrees of freedom, then these
106: degrees of freedom must be the dominant ones for the process in question.
107: The applications of such a model are necessarily restricted to extremely
108: specific instances of DNA behaviour \cite{Gaeta1}. Given that many regulatory
109: processes are governed by highly-specific, localised denaturation of the
110: DNA helix, it is logical to investigate sequence-dependent dynamical
111: behaviours in such a setting.
112:
113: In 1991 Salerno \cite{S1} proposed a base-flipping model of the bacterial
114: promoter DNA sequence $A_{1}$ in the T7 genome, suggesting the sequence had
115: special, ``dynamically active'' qualities with regard to propagating kink
116: solitons. Subsequent investigations of other host-specific promoter T7 sequences \cite{S2}-\cite{S3b} made similar findings. Moreover in Ref. \cite{S3b} it was suggested that solitons could be created as conformational changes to the DNA helixdue to DNA-RNAP interactions.
117: Recently the propagation of kinks through the entire T7 genome sequence has been studied \cite{kl}, although the (significant) differences between host- and phage-specific promoter sequences were neglected. Another paper \cite{sanchez} investigated, whether kinks might propagate differently in coding and non-coding sequences.
118:
119: Solitons had been previously suggested to have a role \cite{englander} in DNA
120: transcription in the 1980's, however at least one picture which developed
121: \cite{riv}, of an RNA polymerase (RNAP) molecule ``surfing'' a
122: thermally-driven region of open base pairs is inconsistent with the known
123: conformational changes of RNAP and DNA which occur during open complex
124: formation. Secondly, no absorption resonances have been observed in
125: microwave spectroscopy of DNA \cite{edwards}, \cite{bigio}. Crucially the
126: original motivation for invoking solitons: anomalously long lifetimes of
127: DNA base-pair openings, was shown to result from misinterpretation of data
128: \cite{gueron}. DNA solitons were vigorously dispelled by some researchers
129: \cite{Kam} with the result that they remain little more than a curiosity
130: outside of the nonlinear physics community.
131:
132: More generally, a variety of studies \cite{ben3}, \cite{PB2}, \cite{bmc}
133: have suggested connections between the base-sequence dependence of helical
134: thermal stability and transcriptional regulation sites.
135: The common feature in all these studies, soliton models included, is that
136: AT-rich regions are distinguished by way of reduced thermal stability
137: relative to GC-rich regions.
138: Of course regions rich in AT might stand out from a regulatory
139: perspective for geometric, mechanical or chemical reasons. For example, tracts of A or T nucleotides carry intrinsic
140: curvature \cite{trif} and confer rigidity to the DNA superstructure
141: \cite{anselmi} - \cite{lankas}, leading to significant departures
142: from the average B-DNA form.
143: Finally the role of counterions in determining DNA structure such as bending
144: and groove width \cite{naplus} and affinity for such tracts is not yet fully
145: understood. Sequence-dependent variations in the electrostatic surface of
146: DNA may also present a unique ``signature'' in promoter regions \cite{proz}.
147:
148: It should be emphasised that the solitons we consider have nothing to do with
149: thermally-driven, transient base-pair openings, the source of the original
150: controversy. Protein-DNA interactions involve conformational changes of the
151: DNA helix and in our opinion it is logical to investigate, if a small change
152: might be modelled by a structural perturbation to a regular B-DNA helix,
153: whether its translocation might be approximated by a soliton propagating
154: through a nonlinear medium.
155:
156: The structure of the paper is as follows: We briefly discuss aspects of
157: protein-sliding and review the mechanism of lytic infection of {\em Escherichia coli} by T7 bacteriophage.
158: Assuming DNA molecules can actually support nonlinear, quasi-solitonic
159: excitations like those of the simple ``base-flipping''model \cite{S1}, we discuss the kind of biological roles they might serve.
160: To this end we introduce the inhomogeneous Frenkel-Kontorova (IFK) model
161: \cite{kbk}, the basis of Salerno's approach \cite{S1}-\cite{S3}, and its
162: breather soliton solutions.
163: The propagation of breather solutions is analysed via an effective potential
164: formalism and we compute the energy landscape, comparing extrema with
165: bacterial and phage promoters of the T7 genome.
166: Supposing then, that solitonic excitations of DNA do not exist, we discuss
167: alternative reasons why the correlations obtained between regulatory features
168: and potential extrema might have been obtained. In particular a novel
169: equivalence between base-flipping stability in the planar IFK model and
170: bending in a 3-dimensional, helical model is outlined.
171:
172: \subsection{Protein sliding}
173: The mechanisms by which regulatory proteins, such as RNAP, can
174: recognise their specific binding sites among tens, or even hundreds of
175: thousands, of structurally identical nonspecific sites on a DNA strand are
176: generally not well understood. A widely accepted
177: hypothesis is that many proteins have several modes \cite{vonhip} with
178: which to bind to DNA:
179:
180: Nonspecific binding occurs when a polar domain of the protein displaces
181: cations in the major/minor grooves of the DNA helix. The protein effectively
182: slides, one-dimensionally, along the groove through a series of nonspecific
183: binding events. The translocation mechanism for the nonspecific complex
184: is not known, but is ATP-independent \cite{proz} and widely assumed to be
185: driven by thermal motion.
186: Other possible diffusion modes include ``hopping'', where a nonspecifically
187: bound protein dissociates and re-associates within the same DNA domain.
188: For some proteins, such as {\em lac} repressor \cite{lac1}, a possibility for
189: transfer between sequentially-distant regions of DNA brought
190: close together in 3-dimensional space also exists. In general, the
191: facilitated location of an operator by a protein is likely to consist of a
192: sequence of sliding, hopping or intersegmental transfer events \cite{vonhip}.
193: A growing body of evidence exists that many
194: regulatory proteins, such as repressors \cite{lac1}, \cite{p53} RNAP's\cite{wu}-\cite{bustamente}, nucleases \cite{ecorv} and methylases \cite{meth} locate
195: their operator sites in this way.
196:
197: While the sliding component of facilitated target location is commonly assumed
198: to be thermally driven we point out there is no experimental evidence to
199: preclude dynamical effects resulting from local, sequence-dependent mechanical
200: properties of DNA.
201: In the seminal paper of Berg and co-workers \cite{berg} there are two
202: assumptions, in particular, which may be unsatisfactory. Firstly the
203: facilitated transport model is derived under the assumption of
204: a homogeneous, free protein distribution. While such conditions can be
205: arranged {\em in vitro} this is not the case for a biological system.
206: Secondly there is no real account of degrees of molecular recognition:
207: operator sites are treated as ``sinks'', defining a boundary condition for the
208: one-dimensional diffusion equation. It is highly probable that some kind of
209: ``reading'' process also occurs, mediated by the electrostatic interactions
210: between protein residues and nucleotide functional groups lying in one, or
211: both, of the grooves.
212:
213: The limited empirical studies of sliding RNAP performed to date, for example
214: Refs \cite{wu}-\cite{bustamente}, invariably average behaviour over many
215: individual sliding events, masking any sequence-dependent variation which
216: might exist. On the other hand a recent model of the hypothetical reading
217: process \cite{S4} was built, based upon the assertion that
218: \begin{quote}...the protein
219: should follow a noise-influenced, sequence-dependent motion that includes the
220: possibility of slowing down, pauses and stops...
221: \end{quote}
222:
223: Let us qualitatively envisage then, how soliton-like deformations might arise
224: in RNAP-DNA interactions. The presence of enzymes as ``mass defects'' in a
225: 1-dimensional DNA model have previously been considered \cite{satar}, \cite{ting0}, \cite{ting1} with regard to thermal breathers and transcribing RNAP.
226: In distinction, initial binding to a nonspecific DNA site entails
227: insertion of polymerase domains into the major groove of the helix, where the
228: displacement of counterions occurs. Suppose that the initial contact
229: and recoil of RNAP during association induces a localised deformation in the
230: B-DNA helix, which we approximate by a breather soliton.
231: Breather excitations in the IFK model \cite{kbk} are, due to the discreteness
232: of the model, inherently unstable. Therefore the initially stationary breather would
233: propagate along the strand, preferentially in a direction determined by local
234: inhomogeneities in the base sequence. Further, the deformation is not a true
235: soliton, owing to the discreteness of the DNA lattice, and radiates
236: energy, eventually dissipating. In this regard we also note the mean sliding
237: distance of RNAP's are known to be highly sensitive to variations in cation
238: concentration \cite{wu2},\cite{smeekins}.
239:
240: There are two pictures which are plausibly consistent with noisy,
241: deterministic dynamics: either the RNAP can effectively ``surf'' the breather
242: or that randomly moving RNAP and deterministically travelling breathers can
243: interact somehow on collision. Regarding the latter, little is known about the
244: structure of nonspecific protein-DNA complexes and for the remainder of
245: the paper we consider the former, less speculative, scenario.
246:
247: \subsection{T7 bacteriophage}
248: Bacteriophage T7 is a member of the {\em Podovirales} family of viruses, which cause lytic infection of bacteria. Its simple regulatory apparatus is one of the most widely studied, serving as a model for genomes of more complex organisms.
249: As mentioned above, previous nonlinear DNA studies \cite{S1}-\cite{S3}, \cite{kl} have
250: involved the T7 phage genome sequence.
251: However these studies focussed exclusively on the base sequence, with no
252: consideration for the possible changing biological context of the information
253: it contains. In fact the essentially linear processes of T7 DNA
254: translocation and gene expression make this phage an excellent case study.
255:
256: T7 is known to inject its double-stranded, linear DNA into a host {\em E. coli} cell in a stepwise, transcription-dependent manner \cite{inf1}.
257: The T7 genome contains 39,937 base pairs but initially only the first 850 of
258: these base are translocated from the phage particle \cite{inf2}.
259: This initial fragment contains three strong promoters specific to {\em E. coli} RNAP $A_{1}-A_{3}$ (in addition to the minor $A_{0}$, or $D$, promoter with
260: no known {\em in vivo} function) which initiate transcription of the phage
261: sequence. The remainder of the genome is divided into three sections:
262: The ``early'' region contains class I genes - those responsible for modifying
263: host metabolism to favour phage production; the middle region, where
264: class II genes govern phage replication; the ``late region'' of class III
265: genes driving maturation and packaging of newly assembled phage DNA strands.
266:
267: Transcription of the initial fragment serves a dual function: ``pulling''
268: downstream, early DNA from the phage particle into the host cell, in
269: addition to transcribing the class I genes.
270: The product of the first of these, gene 0.3, inactivates host defence
271: (specifically type I restriction/modification) systems,
272: therefore rapid recognition of a major promoter is vital for successful
273: infection by wild-type T7.
274: Another product of this early region is a T7 RNA polymerase, recognising
275: its own specific promoters, which is responsible for transcribing the
276: remainder of the genome, a process which proceeds in two steps:
277: Entry of the middle region into the host cell is dependent upon the
278: successful translation of class I genes. In turn, translocation of the
279: late-transcribed region requires the products of early and mid-regions.
280:
281: In contrast to gene expression in more complex organisms, there are very
282: few ``feedback'' loops, indeed a virtual simulation of the T7 life cycle has
283: been developed \cite{endy}. There are two known loops which may
284: have relevance to our analysis below: mid-late inhibition of class I (host-specific) promoters \cite{hes} and late inhibition of the mid (phage-specific) promoters\cite{lys}.
285: \section{The model}
286: At physiological temperatures the physically dominant mode of base-pair opening is the base-flipping, pendular oscillations of bases about their N-glycosydic
287: bond in the mean base-pair plane. Such models previously considered
288: for biological roles \cite{S1}-\cite{sanchez} are based upon the IFK \cite{kbk} Hamiltonian:
289: \begin{eqnarray}
290: {\cal H} & = & \frac{1}{2} \sum_{i=1}^{n}
291: I_{i}(\dot{\theta}_{i}^{2} + \dot{\psi}_{i}^{2})
292: + \frac{1}{2} \sum_{i=1}^{n-1}( \kappa_{i}
293: (\theta_{i+1}-\theta_{i})^{2} + \bar{\kappa}_{i}
294: (\psi_{i+1}-\psi_{i})^{2} ) \nonumber \\
295: & & + \sum_{i=1}^{n} \sigma_{i}(1-\cos(\theta_{i}-\psi_{i})). \label{ifk}
296: \end{eqnarray}
297: Here $\theta_{i}$, $\psi_{i}$ are the angles of deflection of the
298: $i^{th}$ base ``pendulum'' and that of its complement from equilibrium, while
299: $I_{i}$ is the inertial moment. Nearest-neighbour bases are coupled by an
300: harmonic torsion potential with ``stiffnesses'' $\kappa_{i}$,
301: $\bar{\kappa}_{i}$. Finally $\sigma_{i}$ is the characteristic strength of the
302: nonlinear H-bonding potential between complementary bases.
303:
304: In earlier studies \cite{S1}-\cite{S3} homogeneous inertial moments and
305: stiffness constants were assumed, with the only sequence-dependence residing
306: in the H-bonding coupling constants $\sigma_{i}$.
307: Specifically, for $i, 1 \ldots, n$
308: \begin{eqnarray*}
309: I_{i}=I, & &\kappa_{i} \equiv K =\bar{\kappa}_{i}.
310: \end{eqnarray*}
311: In addition it was assumed that $\sigma_{i} = \lambda_{i} k$ where $k$ is a
312: generic coupling and $\lambda_{i}$ takes the values 2 and 3 for A.T and
313: G.C pairs respectively, accounting for the differing numbers of base pairs.
314: With these approximations, one passes to angle sum- and difference-coordinates \begin{eqnarray*}
315: \theta_{i}=\frac{1}{2}(u_{i}+v_{i}), & & \phi_{i}=\frac{1}{2}(u_{i}-v_{i}).
316: \end{eqnarray*}
317: The Hamiltonian thus obtained is
318: \begin{eqnarray}
319: {\cal H}' & = & \frac{1}{2} \sum_{i=1}^{n}
320: I(\dot{u}_{i}^{2} + \dot{v}_{i}^{2})
321: + \frac{K}{2} \sum_{i=1}^{n-1} (u_{i+1}-u_{i})^{2} + (v_{i+1}+v_{i})^{2} ) \nonumber \\
322: & & + \sum_{i=1}^{n} \lambda_{i}k(1-\cos(u_{i})). \label{ifk2}
323: \end{eqnarray}
324: The equations of motion for the $u_{i}$ reduce to the set of dimensionless,
325: coupled equations:
326: \begin{equation}
327: \ddot{u}_{i}-(u_{i+1}-2 u_{i}+u_{i-1})+\beta_{i} \sin u_{i}=0; \hspace{0.1cm}
328: u_{i}=\theta_{i}-\psi_{i}, \label{dsc}
329: \end{equation}
330: where the time variable has been rescaled, $t \to \sqrt{I/k}t$, and the
331: parameter $\beta_{i}=\lambda_{i}k/K\equiv \lambda_{i}\eta$.
332: To model the sequence variation as small perturbations to a homogenous solution, we first require the average value of the parameters $\beta_{i}$:
333: \begin{equation}
334: \beta = \left(2 \frac{n_{AT}}{n} + 3 \left( 1 -\frac{n_{AT}}{n} \right)\right)\eta,
335: \end{equation}
336: where there are $n_{AT}$ occurrences of A.T pairs in the molecule.
337: In a purely homogeneous approximation, $\beta_{i}\to \beta$, in the continuum
338: limit the system of equations (\ref{dsc}) reduces to the sine-Gordon equation,
339: \begin{equation}
340: \ddot{u}-u'' +\beta \sin u =0, \label{sg}
341: \end{equation}
342: which has a rich variety of solitonic solutions. A family of
343: ``breather'' solutions of Eq.(\ref{sg}) with lengths $L_{\mu}$ and internal
344: frequencies $\omega_{\mu}$ is defined by
345: \begin{equation}
346: u_{br}(x,t)=4 \tan^{-1} \left(\frac{\sin \omega_{\mu} t}{\omega_{\mu} L_{\mu}}
347: \textrm{sech}(\frac{x}{L_{\mu}})\right),
348: \end{equation}
349: where in terms of the classifying parameter $\mu$
350: \begin{eqnarray*}
351: L_{\mu}=\beta^{-1/2} \textrm{cosec} \mu, & & \omega_{\mu}=\beta^{1/2} \cos\mu.
352: \end{eqnarray*}
353: Note that the above relation imposes a minimum breather width and frequency which
354: the model can support for a given set of environmental conditions.
355: The smallness of $\beta$ ensures that an approximate solution of the
356: discrete, inhomogeneous model is of a similar form with slowly-varying
357: parameters, thus our {\em ansatz} for Eq.(\ref{dsc}) is
358: \begin{equation}
359: u_{n}=4 \tan^{-1} \frac{\sin \omega t}{\omega L}
360: \textrm{sech}z_{n}; \hspace{0.3cm} z_{n}=(n-X)/L. \label{sech}
361: \end{equation}
362: Here $X$ is understood as a collective coordinate for the breather and, for
363: convenience we have omitted the $\mu$ subscript.
364:
365: If the total energy is approximately conserved, upon substituting
366: (\ref{sech}) into the Hamiltonian (\ref{ifk2}) one arrives at an expression
367: for the effective potential in the collective coordinate $X$ \cite{S3},
368: associated with the propagation of the initial excitation on an inhomogenous
369: background. We find, using the identity
370: \begin{eqnarray*}
371: 1-\cos u = 8/(\tan u/4 + \cot u/4)^{2},
372: \end{eqnarray*}
373: the expression for the total energy takes the form
374: \begin{equation}
375: E(X;t)\equiv K(X;t)+V(X;t)+O(\beta^{2})=0.
376: \end{equation}
377: Here
378: \begin{eqnarray*}
379: K(X;t)&=&
380: \frac{8}{L^{2}} \sum_{i=1}^n D_{i}(X;t) \left( \alpha(t)\sinh^{2} z_{i}(\dot{X})^{2} \nonumber \right. \\
381: & & \mbox{} \left. +\omega L\sqrt{\alpha(t)(\frac{1}{(\omega L)^{2}}-\alpha(t))} \sinh 2z_{i} \dot X\right), \\
382: V(X;t)& =& 8 \sum_{i=1}^n D_{i}(X;t) \left((\frac{1}{L^{2}}-\omega^{2}\alpha(t))\cosh^{2}z_{i} \right. \nonumber \\
383: & & \mbox{}\left.+ \alpha(t)(\frac{1}{L^{2}}\sinh^{2} z_{i} + \beta_{i}\cosh^{2}z_{i})\right), \\
384: D_{i}(X;t)& =&\frac{1}{(\alpha(t)+\cosh^{2}z_{i})^{2}}.
385: \end{eqnarray*}
386: where the function $\alpha(t)\equiv (\sin (\omega t)/\omega L)^{2}$ governs
387: the time-dependence of $V$. If the breather oscillation timescale is typically
388: orders of magnitude smaller than that of its propagation along the DNA
389: \cite{ting1} we can replace the time-dependent potential by its average value
390: \cite{zhang}:
391: \begin{eqnarray}
392: V_{av}(X)& \equiv &\frac{1}{T}\int_{0}^{T} V(X; t) \nonumber \\
393: & = & \frac{4}{L^{2}} \sum^{n}_{i=1}
394: \frac{(\textrm{sec}^{2}\mu+\beta_{i}L^{2}\tan^{2}\mu)\cosh z_{i}}{(\tan^{2}\mu+\cosh^{2}z_{i})^{3/2}} \label{vpot}
395: \end{eqnarray}
396: Owing to the nonlinear nature of the model the energy to translocate the
397: initial deformation is several orders of magnitude less that required to
398: create the breather initially. We can derive a simple estimate of the
399: ``noisiness'' of the sliding dynamics from the energy required to shift the
400: breather by one base-pair:
401: \begin{equation}
402: \varepsilon(X)=\frac{K}{k_{B}T} (V_{av}(X_{i+1})-V_{av}(X_{i})) \label{theta}
403: \end{equation}
404: where $V_{av}$ is the time-averaged potential (\ref{vpot}) and $k_{B}$ is
405: Boltzmann's constant. For steep gradients the picture of sliding RNAP is
406: thus analogous to a particle moving through an energy landscape while in flat
407: regions it is more akin to a random walk.
408:
409: \section{Results}
410: Having derived the breather effective potential (\ref{vpot}) we compute the
411: ``landscape'' corresponding to the T7 genome.
412: It is natural, initially, to assume breather width is the size of a nonspecific RNAP complex. This size is not directly known, either for {\em E. coli} or
413: T7 RNAP. For certain other proteins the size of a nonspecific complex is
414: estimated to be several times smaller \cite{IHF} than that of a specific one.
415: Therefore we assume upper bounds on $L$ are provided by the DNA
416: ``footprint'' size protected by RNAP in nuclease digestion experiments.
417: For translocating {\em E. coli} and T7 elongation complexes these values
418: are $L_{B}=30$ bp \cite{scale1} and $L_{\phi}=24$ bp \cite{scale2}
419: respectively.
420:
421: \subsection{Sequence Analysis}
422: For our initial sequence analysis we adopt model parameter values coinciding
423: with those of Salerno's \cite{S1} original study of T7 promoters. Setting the
424: ratio $\eta = 2\times 10^-3$ implies the lower bound for breather
425: width is $L_{min} \equiv \beta^{-1/2} \sim 15$ bp.
426: Figure 1a shows the region of $V_{av}$ corresponding to the initial 850bp
427: fragment of the T7 phage for a breather of width 30bp. For comparison Figure
428: 1c shows the time evolution of the system (\ref{dsc}) with breathers initially
429: placed at intervals within the fragment. Comparison of the three trajectories
430: with the effective potential landscape in Fig. 1a serves to verify that the
431: direction and range of propagation agree for the two methods.\footnote{Figure 1 to go here}.
432:
433: Now the $\sigma^{70}$ subunit of {\em E. coli} holoenzyme RNAP recognises
434: hexamers located 35 and 10 bases upstream of transcription initiation
435: \cite{sig70}. In addition many strong promoters are enhanced \cite{UP} by a
436: UP element: contacts between the $\alpha$ subunit of RNAP and AT rich
437: sequences centred approximately 40-60 sites upstream.
438: Inspection of Figure 1a shows that the UP region of $A_{1}$ and the -35
439: sites for $A_{2}$, $A_{3}$ (shown as dots) lie close to the bottom of
440: potential wells: the respective initiation sites are 62, 29 and 21 bp
441: upstream of these minima. Comparison with the noise parameter, $\varepsilon$
442: plotted in Fig. 1c
443: shows that when the motion is strongly deterministic ($|\varepsilon|>2$)
444: it is invariably towards regions where promoter recognition can occur.
445: The $\varepsilon$ values in the initiation region of the strongest ($A_{1}$)
446: bacterial promoter are 1.5 times greather than anywhere else in the T7 genome.
447:
448: In fact there are seven {\em E. coli} RNAP specific promoters in the T7
449: genome, the first recognition sites for the six earliest are shown in Figure 2
450: as dots. The four minor ($A_{0}$, $B$, $C$
451: and $E$) promoters, while having no recognised {\em in vivo} function, were
452: found to have initiation sites 61 ($A_{0}$ transcribes leftwards), 27, 28
453: and 17 bp downstream of deep minima.
454: Figure 2 also shows the full class I region of the T7 genome, transcribed by
455: the bacterial RNAP, extending from the 5' DNA end to the bacterial
456: transcription terminator, TE. In the Genbank \cite{GB} reference sequence
457: (accession number NC\_001604) this corresponds to sites $\sim$500-7588.
458: Note that other aspects of facilitated transport: dissociation
459: followed by ``hopping'' or interdomain transfer are likeliest to occur in
460: locally flat regions, where the breather spends most time. In this way,
461: the effect of multiple, broad-bottomed wells as kinetic ``traps'' might be
462: minimised. \footnote{Figure 2 to go here}
463:
464: On the other hand, the deep minimum at approximately 6 Kb could be a
465: desirable kinetic trap for the host RNAP as it lies 100 bp downstream of the
466: gene coding for T7 RNA polymerase. The T7 RNAP intitiates transcription at
467: one of the two specific promoters (unfilled dots at the right side of Figure
468: 2) and is thus responsible for the subsequent internalisation and expression
469: of the remainder of the T7 genome. This deep minimum thus represents the end
470: of the region where the host RNAP is ``useful''. One finds a similar, deep
471: minimum at the class II/class III interface for a wide range of breather
472: widths which could play a similar role, inhibiting late transcription from
473: weaker class II promoters in favour of class III promoters.
474:
475: \subsection{Parameter variation}
476: Given the coarseness of the current model, it is important to understand
477: how the results obtained may vary with respect to the parameter values.
478: Up to an overall scaling, all parameter variation in the sequence-dependent
479: part of (\ref{vpot}) enter via the breather width, $L$, which governs
480: sensitivity to sequence-dependent inhomogeneities: an increase of width
481: leads to landscapes with fewer extrema which are also broader and larger in
482: amplitude. The fundamental relationship governing effects of parameter
483: variations is therefore $\textrm{cosec} \mu= L \sqrt{\beta}$. It is
484: natural to associate the breather family parameter $\mu$ with the dimension
485: of the protein DNA-interface and $L$ with the ``response'' of the system for
486: a given set of environmental conditions, encapsulated in $\eta$.
487:
488: Understanding of model robustness is complicated by the
489: way in which the sliding of RNAP changes. For example, if breathers do
490: play a role in the location of T7 promoters $A_{1}-A_{3}$ by bacterial RNAP
491: then one might expect environmental changes which alter sliding behaviour to
492: also influence promoter activity. It is known \cite{pTemp} that the
493: activities of $A_{1}-A_{3}$ are temperature dependent, with $A_{1}$
494: increasing from $20-37^{\circ}$ C while initiation at $A_{2,3}$ decreases under the same circumtances
495:
496: Due to decreased thermal stability, one expects a greater ``reponse'' of the
497: helix to a deformation at increased temperature. For fixed $ \mu$ this
498: corresponds to an increase in $L$ and decrease in $\eta$. The two graphs in
499: Fig 3 are calculated for such circumstances with $L=30$, $\eta=0.002$ and
500: $L=67$, $\eta=0.0004$ respectively. For the higher $\eta$ value (on the left)
501: a sliding RNAP is extremely likely to fall into one of the three wells
502: associated with a major promoter.\footnote{Figure 3 to go here}
503:
504: Conversely for the lower $\eta$ value the well containing $A_{1}$ has greatly
505: widened at the expense of the other two. Indeed the -35 sites for $A_{2}$, $A_{3}$ are now situated close to local maxima and the probability of an
506: encounter with sliding RNAP would be significantly reduced.
507: We note that minima close to one or more major promoter sites exist for a
508: broad range of parameter values. One could argue that the overall
509: sequence composure of the T7 initial fragment appears to confer some
510: robustness of host promoter recognition against environmental variations.
511:
512: Having outlined the qualitative variation of the system behaviour with
513: parameter changes, we recompute the potential for $L_{\phi}=24$ bp.
514: From Figure 4a it is immediately seen that for none of the T7 promoters
515: does the locally deepest minimum concide with upstream, recognition sites.
516: With replication origins $\phi_{L}$ and $\phi_{R}$ and the earliest
517: phage promoters, $\phi 1.1A$, $\phi 1.1B$ omitted, minima appear to be
518: correlated to the start of the first downstream coding sequence, as evidenced
519: in Fig. 4b.\footnote{Figure 4 to go here}
520:
521: \section{Discussion}
522: \subsection{Model Assumptions}
523: The planar model of DNA presented is a highly simplified one, containing
524: numerous assumptions which are unrealistic for modelling many DNA processes:
525: There is no explicit allowance for the helical structure and its
526: writhing/twisting behaviour. Many interactions with proteins involve major,
527: localised conformational changes of DNA however the specific case of
528: sliding RNAP may be an exception. Firstly, because such
529: conformational changes are unlikely to be present immediately prior to
530: closed complex formation \cite{y1} and secondly, there is some evidence that
531: rates of RNAP sliding , under some conditions at least, are
532: independent of supercoiling \cite{smeekins}.
533:
534: Another important assumption was the homogeneous, harmonic nature of
535: the restoring torques. In fact it is known that simple, ``base content''
536: models of helix-coil transition thermodynamics reproduce empirical data
537: for short ($\leq 15$ bp) DNA oligoucleotides quite well \cite{kam2}, \cite{unif}.
538: Specifically, encapsulating sequence dependence as AT and GC contents
539: enables reproduction of such data at 310K in 1M NaCl solution (corrections
540: due to change in salt concentration are discussed in \cite{unif}) with a
541: mean (median) error of 9\% (5\%) (Bashford, J; unpublished).
542: From our previous study of the thermodynamics of B-DNA helix-coil
543: transition \cite{bas2} we further estimate that the enthalpies of A.T and
544: G.C pairs are in the ratio 1.56/3, which serves to enhance the distinction
545: between the two types of base pair in Eq.(\ref{vpot}). This accounts, in
546: addition to differing numbers of H bonds, to the averaged effects of solution,
547: neighbouring base-pairs and other interactions between the complementary pair.
548:
549: The assumption of harmonicity for the stacking potential at large opening
550: angles, however, is more questionable and should be further refined.
551: Also molecular calculations of the ``base-flipping'' in Watson-Crick pairs
552: suggest \cite{bflip} opening into the major groove is more energetically
553: favoured for purine bases.
554:
555: \subsection{Breather dynamics}
556: The shape of the breather potential, used in the qualitative arguments above
557: depend only upon the ratios of $\eta=k/K$ and $\lambda_{A/T}/\lambda_{G/C}$. But physical properties of any breather depend on the actual parameter values.
558: For example, the breather energy $E$ and oscillation frequency
559: $\omega$ may be derived as
560: \begin{eqnarray}
561: E&=& \frac{16K}{L}\sim \sqrt{kK}, \\
562: \omega^{2} &=& \frac{K}{I}(\beta -\frac{1}{L^{2}}).
563: \end{eqnarray}
564: Using the parameter values in Ref. \cite{kl}: $K=5\times 10^{-18}$ J,
565: $I=2 \times 10^{-43}$ kg m$^{2}$, in combination with our estimate
566: based on data from Ref.\cite{bas2}: $k=1\times 10^{-20}$ J, yields $\eta=4.5\times 10^{-3}$.
567: Thus for a breather of width $L=30$ bp we get
568: \begin{eqnarray*}
569: E \simeq 2.7 \times 10^{-18} J, & & \omega \simeq 1.0\times 10^{12} s^{-1}.
570: \end{eqnarray*}
571: The energetic cost of creating this breather may be of the magnitude
572: of the electrostatic attractions responsible for the nonspecific contact.
573:
574: Concerning the size of the DNA helix deformation, we note that parameter
575: $\mu$ provides an estimate of the amplitude of the base-pair opening.
576: $u_{max} = 4 \mu$ when $\mu <\pi/4$. For $\beta=0.0045$, as above,
577: the amplitude for a 30 bp breather is $2\pi/3$, corresponding to individual
578: pendulum deformations of $60^{\circ}$. This parameter set does not support
579: breathers of width less than $\beta^{-1/2}\simeq 15$ bp.
580: A variation of 20\% in the value of $K$ leads to maximum deformations of
581: $52^{\circ}-65^{\circ}$: base pairs are bent but not fully opened.
582: These moderate conformational changes need not be incompatible with an anticipated
583: absence of large deformations \cite{y1} accompanying nonspecific RNAP-DNA complexes.
584:
585: The values for model parameters appearing in the literature are estimated
586: from old experiments on DNA homopolymers, for example Refs. \cite{nonk},
587: \cite{yak2} which is a difficult process. However the main results of our
588: paper stem from i) the {\em shape} of the potential (\ref{vpot}) and ii) the
589: noise parameter, $\varepsilon$, defined by (\ref{theta}). For these two
590: expressions changes in the parameter $\eta$ can be offset by ``tuning'' the
591: value of $\mu$ which is a relatively free parameter. The only potentially
592: serious sensitivity is that of $\varepsilon$ to large changes in $k$, the
593: measure of dissociation energy for H-bonded base pairs. Fortunately, of the
594: three parameters in (\ref{ifk}), this is the most reliable quantity to
595: estimate.
596:
597: \subsection{Helical model}
598: If the picture of sliding RNAP as a soliton-like deformation is subsequently
599: shown to be incorrect, the correlations observed between potential minima and
600: promoter sites still have to be explained. The soliton solutions of
601: (\ref{ifk}) preferentially move to AT-rich regions. Inspection of (\ref{vpot})
602: shows the variation due to sequence is not linear in AT content, but
603: a first ``moment'', where the contribution from each base is weighted by its
604: position relative to the central site $X$:
605: \begin{eqnarray}
606: V_{var} (X) & \sim & \sum_{i} \beta_{i} w(z_{i}), \label{vx} \\
607: w(z)&=&\frac{\cosh z}{(\tan^{2}\mu+\cosh^{2} z)^{3/2}}. \nonumber
608: \end{eqnarray}
609: Curiously, this weighting function coincides with the inverse radius of
610: curvature for a hyperbolic curve $f(z)=\cosh z$. Such a term arises naturally
611: in the Lorenz force experienced by a charged particle following a curved magnetic field line. Initially consider a particle of mass $m$, charge
612: $q$, travelling along a uniform, straight magnetic field line. Its motion is
613: determined by the Lorenz equation
614: \begin{eqnarray*}
615: \frac{d}{dt}\vec{v} = \frac{q}{m} \vec{v}\times \vec{B}.
616: \end{eqnarray*}
617: Assuming the field line lies along the $z$ axis, $\vec{B}=B \vec{e}_{z}$, the
618: velocity equation is split into parallel and perpendicular components
619: \begin{eqnarray*}
620: \frac{d}{dt}v_{||} & = & 0, \\
621: \frac{d}{dt}\vec{v}_{\perp} & = & \frac{qB}{m}\vec{v}_{\perp} \times \vec{e}_{z}
622: \end{eqnarray*}
623: The general solution to these equations is a helical trajectory, with
624: time-dependent coordinates
625: \begin{eqnarray*}
626: x(t) & = & x_{0}+ \frac{|v_{\perp}|}{\omega} \sin (\omega t + \phi), \\
627: y(t) & = & y_{0}+ \frac{|v_{\perp}|}{\omega} \cos (\omega t + \phi), \\
628: z(t) & = & z_{0}+ v_{||} t,
629: \end{eqnarray*}
630: where $(x_{0},y_{0},z_{0})$ denotes the initial location of the particle
631: and $\omega$ determines the helical frequency.
632: This problem naturally resembles the electrostatic sliding of a protein
633: ``particle'' along the grooves of the DNA helix. Here the role of gyro
634: frequency is played by the twist of the helix, while the guiding centre of
635: particle motion $(x_{0},y_{0},z(t))$ corresponds to the central helical axis
636: of the DNA.
637:
638: Consider now the effect of introducing a curve into the helical axis: a
639: particle travelling along a curved field line experiences a centrifugal
640: force upon its guiding centre. In a local coordinate system this is
641: \begin{eqnarray*}
642: \frac{mv^{2}_{||}}{|r_{c}(s)|}\frac{\vec{r_{c}(s)}}{|r_{c}(s)|}
643: \end{eqnarray*}
644: where $|r_{c}|$ and $s$ denote the radius of curvature and line element
645: along the field line.
646: Similarly let us here write an analogous expression
647: \begin{equation}
648: \vec{F}_{c}=\frac{{\cal E}}{r_{c}}\vec{r}_{c}
649: \end{equation}
650: where the quantity ${\cal E}$ has the dimensions of energy. In particular,
651: we assume that locally the bend can be approximated by $z(\xi)=\cosh \xi$
652: Then, c.f. (\ref{vx}),
653: \begin{equation}
654: |\vec{F}_{c}(\xi)|={\cal E}\frac{\cosh{\xi}}{(1+\sinh^{2} \xi)^{3/2}}.
655: \end{equation}
656: It follows that in the continuum limit the time-averaged breather potential
657: could also be thought of as the work done by a ``centrifugal force'' on a
658: sliding RNAP as it navigates a bend in the helix.
659: Therefore the ``potential'' (\ref{vpot}) can conceivably be arrived at
660: via simple considerations of thermal stability (in a planar model) or bending
661: deformations (in a helical model), two of the most commonly suggested
662: mechanisms for enhancing promoter recognition.
663:
664: \subsection{Superhelicity}
665: A mechanism of localised DNA deformation with demonstrated biological
666: significance \cite{ben3}, \cite{ben1},\cite{ben2} is that of superhelical
667: stress-induced DNA denaturation (SSID). Roles for SSID in gene
668: regulation have been proposed \cite{ben3} in regard to both open complex
669: formation and transcription. In the former instance, promoter sites are
670: easily destabilized by superhelical stress. In the latter,
671: the action of local helix unwinding by transcribing RNAP results in waves of
672: positive (negative) superhelicity propagating downstream (upstream) of the
673: transcription complex. Computation of SSID profiles indicates \cite{ben3},
674: \cite{ben2} AT rich regions (down-) up-stream of the (3') 5' ends of
675: transcription units are prone to localised over/under-winding acting as a
676: possible ``sink'' for propagating superhelicity and ensuring smooth transcription.
677:
678: The breather potential (\ref{vpot}), which also picks out regions of AT
679: shows that transcription units of at least $10^{3}$ bp in length are often
680: demarcated by minima, in agreement with the above observations.
681: This is especially the case for the 3' ends of T7 genes 1 and 6, the last
682: genes in class I and II regions respectively. In these instances the AT
683: richness may also confer extra rigidity, making these suitable pause sites
684: in the stepwise internalisation of the phage genome, or as mentioned above
685: act as a kinetic trap, used in inhibiting class I or II transcription.
686:
687: \subsection{Correlations}
688: In reporting promoter-extrema correlations two points should be kept in mind.
689: Firstly, the assumed breather widths coincide with the sizes of the elongation
690: RNAP-DNA complexes. Therefore potential minima could be indicative of
691: deformation associated with transcription, as appears to be the case for T7
692: phage promoters, shown in Figure 4. Regarding nonspecific complexes, the
693: values $L_{B}=30$ and $L_{\phi}=24$ bp should be considered as upper bounds
694: for an experimentally undetermined quantity.
695: The correlations reported in this study persist for the ranges
696: $20\leq L_{B}\leq 30$ and $18\leq L_{\phi}\leq 24$. For sizes less than 18bp,
697: the increasing roughness of Eq.(\ref{vpot}) causes difficulty in identifying
698: correlations.
699:
700: The second caveat is that only correlations between promoter initiation
701: and the deepest local minimum have been considered. For some T7 promoters
702: shallow upstream wells also exist. Moreover the effect of thermal noise has
703: not been considered. Only with full dynamical simulations can connections
704: between the local topography of Eq.(\ref{vpot}) and facilitated target
705: location be properly studied.
706:
707: It is difficult to see how kink solutions of the planar model
708: (\ref{ifk}), previously considered \cite{S1}-\cite{sanchez} might mimic
709: physical profiles of base-pair opening. Kinks will also move preferentially to AT rich regions, presumably the reason why promoter sequences $A_{1}$ \cite{S1}, $A_{3}$ and $A_{0}$ \cite{S2} were concluded to be ``dynamically active''.
710: The unit-mass potential for kinks, initially at rest, moving in a
711: slowly-varying background was derived by Salerno and Kivshar \cite{S3}. The sequence variation is contained in a term analogous to (\ref{vx}), however the
712: weighting function is
713: \begin{eqnarray*}
714: W_{k}(z)=\textrm{sech}^2 z.
715: \end{eqnarray*}
716: This coincides with the breather function for small $\tan^{2} \mu$,
717: illustrating why similar results for the major T7 promoter sequences
718: are obtained for both kink \cite{S1}-\cite{S3} and breather solitons.
719:
720: \section{Conclusion}
721: In this paper we have re-examined Salerno's nonlinear DNA model, postulating
722: a role for localised soliton excitations in approximating the sliding
723: component of facilitated target location of RNA polymerase.
724: We found that such deformations would involve moderate bending of individual
725: base pairs and that their energy of translocation is consistent with a picture
726: of noisy, deterministic dynamics. Both of these observations are also
727: consistent with current, limited knowledge of RNAP sliding and nonspecific
728: complexes. A qualitative correspondence of these solitons and localised
729: bending in a helical model was also demonstrated.
730:
731: The dynamical picture of sliding which emerged also suggests that the
732: random/deterministic nature of the motion is sequence-dependent, with
733: translocation in relatively homogeneous regions being effectively random.
734: The corollary, that interplay between adjacent random and deterministic
735: regions could constitute a search ``algorithm'', is speculative and, we
736: believe, merits further investigation.
737:
738: Our analysis of the T7 genome showed good correlations between AT-rich
739: regions and the recognition sites of host-specific promoters used for
740: early phage transcription. For phage-specific promoters, regions of
741: maximal AT-richness correlated with the start of the coding sequence
742: immediately downstream. As discussed above this may be connected with
743: transcription and while there is no obvious correlation with recognition
744: sites, a full description of facilitated target location needs to account
745: for the thermal background. This is a subject of current investigation.
746:
747: We note that there has been suggestion \cite{mol4} that virion proteins
748: injected into the host cell with the initial T7 fragment may i) inhibit the
749: nonspecific binding of restriction enzymes and other proteins to DNA; ii)
750: have an affinity for {\em E. coli} RNAP, negating the requirement for direct
751: promoter recognition {\em in vivo}. Similarly, inhibition of class I and II
752: transcription is known to be performed by T7 gene products: kinase (gene 0.7)
753: and lysozyme (gene 3.5) respectively.
754:
755: However we see similar correlations for the UP and $\sigma^{70}$ sites of
756: bacterial promoters in other members of the T7 viral supergroup, in
757: addition to genomes of the unrelated phages T4 and T5 (see Figure 5).
758: This may be suggestive of a mechanism at work to enhance promoter
759: recognition/inhibition in lytic phage genomes, although in the presence of
760: functional proteins this mechanism can be relegated to an auxiliary role,
761: such as in T7. \footnote{Figure 5 to go here}
762:
763: It is important to investigate whether planar base-flipping/helical bending
764: deformation patterns can be used to simulate protein-DNA interactions in
765: DNA sequence analysis. The correlations reported here, to our knowledge for
766: the first time, could have been made via other ``nonlinear'' analyses of AT
767: content, had a motivation been apparent.
768: Propagation of breathers in a non-linear, toy model of DNA provide a source,
769: for such motivation. It may be that herein lies the true value of a model
770: with such a controversial history.
771:
772: \vspace{0.5cm}
773: \noindent
774: \large{ {\bf Acknowledgements}}
775:
776: \noindent
777: This research was funded by Australian Research Council grant DP0344996 and a
778: visiting fellowship to the Centre for Nonlinear Physics, Australian National
779: University, where part of this work took place.
780: The author thanks G. Yang for helpful remarks and is grateful to Yu. Kivshar
781: and I. Molineux for discussions and comments on earlier versions of the manuscript.
782:
783: \begin{thebibliography}{99}
784: \bibitem{pey} Peyrard, M. ``Nonlinear dynamics and statistical physics of DNA'', {\em Nonlinearity} {\bf 17} (2004), R1-R40.
785: \bibitem{Gaeta1} Gaeta, G.
786: ``Results and limitations of the soliton theory of DNA transcription'', {\em J. Biol. Phys.} {\bf 24} (1999), 81-96.
787: \bibitem{PB1} Peyrard, M. and Bishop, A.R.
788: ``Statistical mechanics of a nonlinear model for DNA denaturation'', {\em Phys. Rev. Lett.} {\bf 62} (1989), 2755-2758.
789: \bibitem{Yak} Yakushevich, L.V. ``Is DNA a nonlinear dynamical system where solitary conformational waves are possible?'', {\em J. Biosci.} {\bf 26} (2001), 305-313.
790: \bibitem{dinuc} Bruant, N., Flatters, D., Lavery, R. and Genest, D.
791: ``From atomic to mesoscopic descriptions of the internal dynamics of DNA'', {\em Biophys. J.} {\bf 77} (1999), 2366-2376.
792: \bibitem{S1} Salerno, M. ``Discrete model for DNA-promoter dynamics'', {\em Phys. Rev.} {\bf A44} (1991), 5292-5297.
793: \bibitem{S2} Salerno, M. ``Dynamical properties of DNA promoters'', {\em Phys. Lett.} {\bf A167} (1992), 49-53.
794: \bibitem{S3} Salerno, M. and Kivshar, Yu.S. ``DNA promoters and nonlinear dynamics'', {\em Phys. Lett.} {\bf A193} (1994), 263-266.
795: \bibitem{S3b} Salerno, M. ``Nonlinear dynamics of plasmid PBR322 promoter'',
796: chapter 10 in M. Peyrard (ed.), {\em Nonlinear excitations in biomolecules}, Edition de Physique, Springer, New York (1995).
797: \bibitem{kl} Lennholm and E.; H\"{o}rnquist, M. ``Revisiting Salerno's sine-Gordon model of DNA: active regions and robustness'', {\em Physica} {\bf D177} (2003), 233-241.
798: \bibitem{sanchez} Cuenda, S., S\'{a}nchez, A.
799: ``Disorder and fluctuations in nonlinear excitations in DNA'', {\em Fluct. Noise Lett.} {\bf 4} (2004), L491-L504.
800: \bibitem{englander}
801: Englander, S.W. {\em et al.} ``Nature of the open state in long polynucleotide double helices: possibility of solition excitations'', {\em Proc. Natl. Acad. Sci.} {\bf 77} (1980), 7222-7226.
802: \bibitem{riv} Gaeta. G., Reiss, C., Peyrard, M. and Dauxios, T. ``Simple models of nonlinear DNA dynamics'', {\em Riv. del. Nuov. Cim.} {\bf 17} (1994), 1-48.
803: \bibitem{edwards} Gabriel, C. {\em et al.} ``Microwave absorption in aqueous solutions of DNA'', {\em Nature} {\bf 328} (1987) 145-146.
804: \bibitem{bigio} Bigio, I.J., Gosnell, T.R., Mukherjee, P. and Safer, J.D.
805: ``Microwave absorption spectroscopy of DNA'', {\em Biopolymers} {\bf 33} (1993), 147-150.
806: \bibitem{gueron} Gu\'{e}ron, M., Kochoyan, M. and Leroy, J.L. ``A single mode of DNA base-pair opening drives imino proton exhange'', {\em Nature} {\bf 328} (1987), 89-92.
807: \bibitem{Kam} Frank-Kamensteskii, M. ``Physicists retreat again'', {\em Nature} {\bf328} (1987), 108.
808: \bibitem{ben3} Benham, C.J. ``Duplex destabilization in superhelical DNA is
809: predicted to occur at specific transcriptional regulatory regions'', {\em J. Mol. Biol.} {\bf255} (1996), 425-434.
810: \bibitem{PB2} Choi, C.H. {\em et al.} ``DNA dynamically directs its own transcription initiation'', {\em Nucl. Acids. Res.} {\bf 32} (2004), 1584-1590.
811: \bibitem{bmc} Kanhere, A. and Bansal K. ``A novel method for prokaryotic promoter prediction based on DNA stability'', {\em BMC Bioinformatics} {\bf 6} (2005), 1-10.
812: \bibitem{trif} Bolshoy, A., McNamara, P., Harrington, R.E. and Trifonov, E. ``Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles'', {\em Proc. Natl. Acad. Sci.} {\bf 88}, (1991) 2312-2316.
813: \bibitem{anselmi} Scipioni, A. {\em et al.}
814: ``Sequence-dependent DNA curvature and flexibility from scanning force microscopy images'', {\em Biophys. J.} {\bf 83} (2002), 2408-2418.
815: \bibitem{lankas} Lankas, F. ``DNA sequence-dependent deformability - insights from computer simulations'', {\em Biopolymers} {\bf 73} (2004), 327-339.
816: \bibitem{naplus} Ponomarev, S.Y., Thayer, K.M., Beveridge, D.L. ``Ion motions in molecular dynamics simulations on DNA'', {\em Proc. Natl. Acad. Sci.} {\bf 101} (2005), 14771-14775.
817: \bibitem{proz} Polozov, R.V. {\em et al.} ``Electrostatic potentials of DNA. Comparative analysis of promoter and nonpromoter sequences.'', {\em J. Biomol. Struct. Dyn.} {\bf 16} (1999), 1135-1143.
818: \bibitem{kbk} Braun, O.M. and Kivshar, Yu.S.: {\em The Frenkel-Kontorova Model: Concepts, Methods and Applications}, Springer, Berlin, 2004.
819: \bibitem{vonhip} von Hippel, P.H. and Berg, O.G. ``Facilitated target location in biological systems'', {\em J. Biol. Chem.} {\bf 264} (1989), 675-678.
820: \bibitem{lac1} Fickert, R. and M\"{u}llerhill, B. ``How lac repressor finds {\em lac} operator {\em in vivo}'', {\em J. Mol. Biol.} {\bf 226} (1992), 59-68.
821: \bibitem{p53} Jiao, Y., Cherny, D.I., Heim, G., Jovin, T.M. and Sch\"{a}ffer, T.E.`` Dynamic interactions of p53 with DNA in solution by time-lapse atomic force microscopy'', {\em J. Mol. Biol.} {\bf 314} (2001), 233-243.
822: \bibitem{wu} Park, C.S., Wu, F.Y.H. and Wu, C.S. ``Molecular mechanism of
823: promoter selection in gene transcription'', {\em J. Biol. Chem.} {\bf 257} (1982), 6950-6956.
824: \bibitem{wu2} Singer, P.T. and Wu, C.S. ``Kinetics of promoter search by {\em Escherichia coli} RNA polymerase'' {\em J. Biol. Chem.} {\bf 263} (1988), 4208-4214.
825: \bibitem{smeekins} Smeekins, S.P. and Romano, L.J. ``Promoter and nonspecific DNA binding by the T7 RNA polymerase'', {\em Nucl. Acids. Res.} {\bf 14} (1986), 2811-2827.
826: \bibitem{kabata} Kabata, H. {\em et al.} ``Visualisation of single molecules of RNA polymerase sliding along DNA'', {\em Science} {\bf 262} (1993), 1561-1563.
827: \bibitem{bustamente} Guthold, M. {\em et al.}
828: ``Direct observation of one-dimensional diffusion and transcription by {\em Escherichia coli} RNA polymerase'', {\em Biophys. J.} {\bf 77} (1999), 2284-2294.
829: \bibitem{ecorv} Jeltsch, A. and Pingoud, A.
830: ``Kinetic characterisation of linear diffusion of the restriction endonuclease
831: {\em Eco}RV on DNA'', {\em Biochemistry} {\bf 97} (1998), 2160-2169.
832: \bibitem{meth} Nardone, G., George, J. and Chirikjian, J.G. ``Differences in the kinetic properties of BamH1 endonuclease and methylase with linear DNA substrates'', {\em J. Biol. Chem.} {\bf 261} (1986) 2128-2133.
833: \bibitem{berg} Berg, O.G., Winter, R.B. and von Hippel, P.H.
834: ``Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and Theory'', {\em Biochemistry} {\bf 20} (1981), 6929-6948.
835: \bibitem{S4} Barbi, M., Place, C., Popkov, V. and Salerno, M.
836: ``A model of sequence-dependent protein diffusion along DNA'', {\em J. Biol. Phys.} {\bf 30} (2004), 203-226.
837: \bibitem{satar} Satari\`{c}, M.V. and Tuszy\`{n}ski, J.A. ``Impact of regulatory proteins on the nonlinear dynamics of DNA'', {\em Phys. Rev.} {\bf E65} (2002), 1901-1911.
838: \bibitem{ting0} Ting, J.J-L. and Peyrard, M. ``Effective breather-trapping mechanism for DNA transcription'' {\em Phys. Rev.} {\bf E53} (1996), 1011-1018.
839: \bibitem{ting1} Ting, J.J-L. ``DNA transcription mechanism with a moving enzyme'', {\em Intl. J. Mod. Phys.} {\bf A7} (1997), 1125-1132.
840: \bibitem{endy} Endy, D., You, L., Yin J. and Molineux, I.J. ``Computation, predictions and experimental tests of fitness for bacteriophage T7 mutants with permuted genomes'', {\em Proc. Natl. Acad. Sci.} {\bf 97} (2000), 5375-5380.
841: \bibitem{hes} Hesselbach, B.A. and Nakada, D. ```Host shut off' function of bacteriophage T7: involvement of T7 gene 2 and gene 0.7 in the inactivation of {\em Escherichia coli} RNA polymerase'',{\em J. Virol} {\bf 24} (1977), 736-745.
842: \bibitem{lys} Moffat, B.A. and Studier, F.W. ``T7 lysozyme inhibits transcription by T7 RNA polymerase'', {\em Cell} {\bf 49} (1987), 221-227.
843: \bibitem{inf1} Zavriev, S.K. and Shemyakin. M.F.
844: ``RNA polymerase-dependent mechanism for the stepwise T7 phage DNA transport from the virion into {\em E. coli}'', {\em Nucl. Acids. Res.} {\bf 10} (1982), 1635-1652.
845: \bibitem{inf2} Garcia, L.R., and Molineux, I.J. ``Rate of translocation of bacteriophage T7 DNA across the membranes of {\em Escherichia coli}'', {\em J. Bacteriol.} {\bf 177} (1995), 4066-4076.
846: \bibitem{zhang} Zhang, F. ``Breather scattering by impurities in the sine-Gordon model'', {\em Phys. Rev.} {\bf E58} (1998), 2558-2563.
847: \bibitem{IHF} Tsodikov, O.V., Holbrook, J.A., Shkel, I.A., and Record, M.T., Jnr. ``Analytic binding isotherms describing competitive interactions of a protein ligand with specific and nonspecific sites on the same DNA oligomer'',
848: {\em Biophys. J.} {\bf 81} (2001), 1960-1969.
849: \bibitem{scale1} von Hippel, P.H. ``An integrated model of the transcription complex in elongation, termination and editing'', {\em Science} {\bf 281} (1998), 660-665.
850: \bibitem{scale2} Imburgio, D., Rong, K. Ma. and McAllister, W.T. ``Studies of promoter recognition and start site selection by T7 RNA polymerase using a comprehensive collection of promoter variants'', {\em Biochemistry} {\bf 39} (2000), 10419-10430.
851: \bibitem{sig70} Mulligan, M.E., Hawley, D.K., Entriken, R. and McClure, W.R.
852: ``{\em Escherichia coli} promoter sequences predict {\em in vitro} RNA polymerase selectivity'', {\em Nucleic. Acids. Res.} {\bf 12} (1984), 789-800.
853: \bibitem{UP} Estrem, S.T. {\em et al.} ``Bacterial promoter architecture: subsite structure of UP elements and interactions with the carboxy-terminal domain of the RNA polymerase $\alpha$ subunit'', {\em Genes Dev.} {\bf 13} (1999), 2134-2147.
854: \bibitem{A1} Sclavi, B. {\em et al.} ``Real-time characterisation of intermediates in the pathway to open complex fomration by {\em Escherichia coli} RNA polymerase at the T7A1 promoter'', {\em Proc. Natl. Acad. Sci.} {\bf 102} (2005), 4706-4711.
855: \bibitem{GB} National Center for Biotecnhnology Information website. http://www.ncbi.nlm.nih.gov/Entrez
856: \bibitem{pTemp} Dausse, J.P., Sentenac, A. and Fromageot, P.
857: ``Interaction of RNA polymerase from {\em Escherichia coli} with DNA. Effect of temperature and ionic strength on selection of T7 DNA early promoters.''
858: {\em Eur. J. Biochem} {\bf 65} (1976), 387-393.
859: \bibitem{bflip} Giudice, E., V\'{a}rnai, P. and Lavery, R.
860: ``Base-pair opening within B-DNA: free energy pathways for GC and AT pairs from umbrella sampling situations'', {\em Nucl. Acids. Res.} {\bf 31} (2003), 1434-1443.
861: \bibitem{y1} Murakami, K.S., Masuda, S. and Darst, S.A. ``Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 \AA resolution'', {\em Science} {\bf 296} (2002), 1280-1284.
862: \bibitem{kam2} Frank-Kamanetskii, M. ``Simplification of the empirical relationship between melting DNA, its GC content and concentration of sodium ions in solution'', {\em Biopolymers} {\bf 10} (1971), 2623-2624.
863: \bibitem{unif} SantaLucia, J. Jnr.
864: ``A unified view of polymer, dumbbell and oligonucleotide DNA nearest-neighbour thermodynamics'', {\em Proc. Natl. Acad. Sci.} {\bf 95} (1998), 1460-1465.
865: \bibitem{bas2} Bashford, J.D. and Jarvis, P.D. ``A base-pairing model of duplex formation I: Watson-Crick pairing geometries'', {\em Biopolymers} {\bf 78} (2005), 287-297.
866: \bibitem{santa} SantaLucia, J. Jnr., Allawi, H.T. and Seneviratne, P.A. ``Improved nearest-neighbour parameters for predicting DNA duplex stability'', {\em Biochemistry} {\bf 35} (1996), 3555-3562.
867: \bibitem{nonk} Yakushevich, L.V. ``Scattering of neutrons and light by DNA solitons'', {\em Stud. Biophys.} {\bf 103} (1984), 171-178.
868: \bibitem{yak2} Yakushevich, L.V. ``The effects of damping, external fields and inhomogeneity on the nonlinear dynamics of bioploymers'', {\em Stud. Biophys.} {\bf 121} (1987), 201-207.
869: \bibitem{mol4} Molineux, I.J. ``No syringes please, ejection of phage T7 DNA from the virion is enzyme driven'', {\em Mol. Microbiol.} {\bf 40} (2001), 1-8.
870: \bibitem{ben1} Benham, C.J. ``Sites of predicted stress-induced DNA duplex destabilization occur preferentially at regulatory regions'', {\em Proc. Natl. Acad. Sci.} {\bf 90} (1993), 2999-3003.
871: \bibitem{ben2} Wang, H., Noordewier, M. and Benham, C.J. ``Stress-Induced DNA Duplex Destabilization (SIDD) in the {\em E. coli} genome: SIDD sites are closely associated with promoters'', {\em Genome Research} {\bf 14} (2004), 1575-1584.
872:
873: \end{thebibliography}
874:
875: \section{Figure Captions}
876: \noindent
877: {\bf Figure 1:}
878: a) Effective potential (\ref{vpot}) for breathers in the initial T7 virion
879: fragment. Initial binding sites for bacterial promoters are denoted by
880: dots; b) Noise parameter $\varepsilon(X)$ for the same sequence.
881: c) Evolution over 1000 time-steps of the system (\ref{dsc}) with
882: breathers initially placed at sites 460, 570 and 680.
883: \\
884:
885: \noindent
886: {\bf Figure 2:}
887: Effective potential (\ref{vpot}) for 30bp wide breathers in the class I region of the T7 genome. Filled and unfilled dots denote respectively UP or -35 {\em E. coli} and +1 T7 promoter sites. \\
888:
889: \noindent
890: {\bf Figure 3:}
891: Potential (\ref{vpot}) computed for the T7 initial fragment
892: for $\mu=\pi/6.05$.
893: a) $\eta=0.002$, $L=30$ bp; b) $\eta=0.0004$ ($L=67$ bp);
894: Dots denote, from left to right, UP and -35 sites for $A_{1}-A_{3}$
895: bacterial promoters.\\
896:
897: \noindent
898: {\bf Figure 4:}
899: a) Location of minima of (\ref{vpot}) nearest initiation sites of T7 phage
900: promoters; b) Scatter plot of initiation-downstream transcription unit
901: distance (TU) versus initiation minima distance (Min).\\
902:
903: \noindent
904: {\bf Figure 5:}
905: Representative region of T5 genome potential, showing correlations between
906: potential minima and -35 sites for {\em E. coli} promoters ($L=30$ bp).
907:
908: \begin{figure}[tbp]
909: \centering{
910: \resizebox{14cm}{8cm}{\includegraphics{brIr.eps}}
911: }
912: \caption{}
913: \protect \label{j1}
914: \end{figure}
915:
916: \begin{figure}[htb]
917: \centering{
918: \resizebox{11cm}{6cm}{\includegraphics{br3II.eps}}
919: }
920: \caption{}
921: \protect \label{j2}
922: \end{figure}
923:
924: \begin{figure}[htb]
925: \centering{
926: \resizebox{14cm}{4cm}{\includegraphics{br3III.eps}}
927: }
928: \caption{}
929: \protect \label{j3}
930: \end{figure}
931:
932: \begin{figure}[htb]
933: \centering{
934: \resizebox{14cm}{4cm}{\includegraphics{brIVr.eps}}
935: }
936: \caption{}
937: \protect \label{j4}
938: \end{figure}
939:
940: \begin{figure}[htb]
941: \centering{
942: \resizebox{12cm}{7cm}{\includegraphics{br3V.eps}}
943: }
944: \caption{}
945: \protect \label{j5}
946: \end{figure}
947:
948: \end{document}
949:
950:
951: