q-bio0602019/br2.tex
1: \documentclass[11pt]{article}
2: \newif\ifPDF
3: \ifx\pdfoutput\undefined\PDFfalse
4: \else\ifnum\pdfoutput >0\PDFtrue
5:      \else\PDFfalse
6:      \fi
7: \fi
8: 
9: \ifPDF
10:    \usepackage{amssymb}
11:    \usepackage{amsfonts}
12:    \usepackage[pdftex]{graphicx,color}
13: \else
14:    \usepackage{amssymb}
15:    \usepackage{amsfonts}
16:    \usepackage[dvips]{graphicx}
17: \fi
18: %\usepackage{amssymb}
19: %\usepackage{amsfonts}
20: %\usepackage{graphics}
21: \setlength{\textwidth}{16cm}
22: \setlength{\textheight}{23cm}
23: \renewcommand{\baselinestretch}{1.0}
24: \addtolength{\oddsidemargin}{-15mm}
25: 
26: \setlength{\topmargin}{-60pt}
27: \begin{document}
28: \begin{titlepage}
29: 
30: August 2005
31: \vskip 1.6in
32: \begin{center}
33: {\Large {\bf Salerno's model of DNA reanalysed: could solitons
34: have biological significance?}}
35: \\[5pt]
36: \end{center}
37: 
38: \normalsize
39: \vskip .4in
40: 
41: \begin{center}
42: J. D. Bashford \\
43: {\it School of Mathematics and Physics, University of Tasmania} \\
44: {\it Private Bag 37, Hobart 7001, Tasmania Australia} \\
45: %\par \vskip .1in \noindent
46: %
47: 
48: \end{center}
49: \par \vskip .3in
50: 
51: \begin{center}
52: {\Large {\bf Abstract}}\\
53: \end{center}
54: We investigate the sequence-dependent behaviour of localised excitations 
55: in a toy, nonlinear model of DNA base-pair opening originally proposed by 
56: Salerno. Specifically we ask whether ``breather'' solitons could play a role 
57: in the facilitated location of promoters by RNA polymerase. 
58: In an effective potential formalism, we find excellent correlation between 
59: potential minima and {\em Escherichia coli} promoter recognition sites in the 
60: T7 bacteriophage genome. Evidence for a similar relationship between phage 
61: promoters and downstream coding regions is found and alternative reasons
62: for links between AT richness and transcriptionally-significant sites are 
63: discussed. 
64: Consideration of the soliton energy of translocation provides a novel 
65: dynamical picture of sliding: steep potential gradients correspond to 
66: deterministic motion, while ``flat'' regions, corresponding to homogeneous AT 
67: or GC content, are governed by random, thermal motion.
68: Finally we demonstrate an interesting equivalence between planar, breather
69: solitons and the helical motion of a sliding protein ``particle'' about a 
70: bent DNA axis.
71: 
72: \vspace{3cm}
73: {\bf Keywords}: DNA soliton, RNA polymerase, sliding, bacteriophage T7\\
74: \end{titlepage}
75: 
76: \section{Introduction}
77: Protein-DNA interactions play many of the fundamental roles in gene 
78: regulation. An understanding of the mechanisms involved in these 
79: processes is one of the major current goals for numerous biological sciences.
80: With large repositories of genetic information available - and costs
81: associated with difficult, highly specific experiments - the question of
82: how well such molecular interactions can be simulated is clearly important 
83: to investigate. Enzymes and many transcriptional factors are proteins, often 
84: composed of tens to hundreds of amino acids, while the DNA domains to which 
85: they bind can contain, in the case of prokaryotes, up to $10^{5}$ nucleotide bases. 
86: All-atom modelling of such a molecular complex, even neglecting the key roles of hydration and 
87: ions, is beyond current computational ability. 
88: 
89: An alternative, logical first step is to consider one specific kind of 
90: interaction, focussing exclusively on its salient features and develop
91: an accordingly simplified model. In this spirit, simple dynamical models of 
92: DNA have been studied for almost two decades, most successfully in regard
93: to describing denaturation experiments (see Ref. \cite{pey} for a review). The process by which the motions of small DNA molecules containing $\sim 10^{6}$ 
94: atoms are ``coarsened'' to the nucleotide base-pair, level has 
95: previously been qualitatively argued \cite{Gaeta1}. Typically the DNA 
96: molecule is modelled by one degree of freedom per base-pair: a radial 
97: ``stretching''\cite{PB1} or a pendulum-like base ``flipping'' (Ref. \cite{Yak} being a recent review).
98: 
99: We note, in this context, that {\em ab initio} calculations of small DNA 
100: oligomers \cite{dinuc} suggest that base-pair motions can be accurately
101: approximated at the dinucleotide level in terms of two or three quasi-rigid, 
102: internal degrees of freedom, lending some weight to the coarse-graining 
103: assumptions previously made.
104: The motivation for these simple models is that, if experimental results 
105: can be described with a small number of degrees of freedom, then these 
106: degrees of freedom must be the dominant ones for the process in question. 
107: The applications of such a model are necessarily restricted to extremely 
108: specific instances of DNA behaviour \cite{Gaeta1}. Given that many regulatory 
109: processes are governed by highly-specific, localised denaturation of the 
110: DNA helix, it is logical to investigate sequence-dependent dynamical 
111: behaviours in such a setting.
112: 
113: In 1991 Salerno \cite{S1} proposed a base-flipping model of the bacterial
114: promoter DNA sequence $A_{1}$ in the T7 genome, suggesting the sequence had
115: special, ``dynamically active'' qualities with regard to propagating kink 
116: solitons. Subsequent investigations of other host-specific promoter T7 sequences \cite{S2}-\cite{S3b} made similar findings. Moreover in Ref. \cite{S3b} it was suggested that solitons could be created as conformational changes to the DNA helixdue to DNA-RNAP interactions. 
117: Recently the propagation of kinks through the entire T7 genome sequence has been studied \cite{kl}, although the (significant) differences between host- and phage-specific promoter sequences were neglected. Another paper \cite{sanchez} investigated, whether kinks might propagate differently in coding and non-coding sequences.  
118: 
119: Solitons had been previously suggested to have a role \cite{englander} in DNA
120: transcription in the 1980's, however at least one picture which developed 
121: \cite{riv}, of an RNA polymerase (RNAP) molecule ``surfing'' a 
122: thermally-driven region of open base pairs is inconsistent with the known 
123: conformational changes of RNAP and DNA which occur during open complex 
124: formation. Secondly, no absorption resonances have been observed in 
125: microwave spectroscopy of DNA \cite{edwards}, \cite{bigio}. Crucially the 
126: original motivation for invoking solitons: anomalously long lifetimes of 
127: DNA base-pair openings, was shown to result from misinterpretation of data 
128: \cite{gueron}. DNA solitons were vigorously dispelled by some researchers 
129: \cite{Kam} with the result that they remain little more than a curiosity 
130: outside of the nonlinear physics community.
131: 
132: More generally, a variety of studies \cite{ben3}, \cite{PB2}, \cite{bmc}
133: have suggested connections between the base-sequence dependence of helical 
134: thermal stability and transcriptional regulation sites.
135: The common feature in all these studies, soliton models included, is that 
136: AT-rich regions are distinguished by way of reduced thermal stability
137: relative to GC-rich regions. 
138: Of course regions rich in AT might stand out from a regulatory 
139: perspective for geometric, mechanical or chemical reasons. For example, tracts of A or T nucleotides carry intrinsic
140: curvature \cite{trif} and confer rigidity to the DNA superstructure 
141: \cite{anselmi} - \cite{lankas}, leading to significant departures 
142: from the average B-DNA form.
143: Finally the role of counterions in determining DNA structure such as bending
144: and groove width \cite{naplus} and affinity for such tracts is not yet fully 
145: understood. Sequence-dependent variations in the electrostatic surface of
146: DNA may also present a unique ``signature'' in promoter regions \cite{proz}.
147: 
148: It should be emphasised that the solitons we consider have nothing to do with 
149: thermally-driven, transient base-pair openings, the source of the original 
150: controversy. Protein-DNA interactions involve conformational changes of the 
151: DNA helix and in our opinion it is logical to investigate, if a small change
152: might be modelled by a structural perturbation to a regular B-DNA helix,
153: whether its translocation might be approximated by a soliton propagating 
154: through a nonlinear medium.
155: 
156: The structure of the paper is as follows: We briefly discuss aspects of 
157: protein-sliding and review the mechanism of lytic infection of {\em Escherichia coli} by T7 bacteriophage.
158: Assuming DNA molecules can actually support nonlinear, quasi-solitonic 
159: excitations like those of the simple ``base-flipping''model \cite{S1}, we discuss the kind of biological roles they might serve.
160: To this end we introduce the inhomogeneous Frenkel-Kontorova (IFK) model 
161: \cite{kbk}, the basis of Salerno's approach \cite{S1}-\cite{S3},  and its 
162: breather soliton solutions. 
163: The propagation of breather solutions is analysed via an effective potential 
164: formalism and  we compute the energy landscape, comparing extrema with 
165: bacterial and phage promoters of the T7 genome. 
166: Supposing then, that solitonic excitations of DNA do not exist, we discuss 
167: alternative reasons why the correlations obtained between regulatory features 
168: and potential extrema might have been obtained. In particular a novel 
169: equivalence between base-flipping stability in the planar IFK model and 
170: bending in a 3-dimensional, helical model is outlined.
171: 
172: \subsection{Protein sliding}
173: The mechanisms by which regulatory proteins, such as RNAP, can 
174: recognise their specific binding sites among tens, or even hundreds of 
175: thousands, of structurally identical nonspecific sites on a DNA strand are 
176: generally not well understood.  A widely accepted 
177: hypothesis is that many proteins have several modes \cite{vonhip} with 
178: which to bind to DNA:
179:  
180: Nonspecific binding occurs when a polar domain of the protein displaces 
181: cations in the major/minor grooves of the DNA helix. The protein effectively
182: slides, one-dimensionally, along the groove through a series of nonspecific 
183: binding events. The translocation mechanism for the nonspecific complex
184: is not known, but is ATP-independent \cite{proz} and widely assumed to be 
185: driven by thermal motion.  
186: Other possible diffusion modes include ``hopping'', where a nonspecifically 
187: bound protein dissociates and re-associates within the same DNA domain.
188: For some proteins, such as {\em lac} repressor \cite{lac1}, a possibility for 
189: transfer between sequentially-distant regions of DNA brought 
190: close together in 3-dimensional space also exists. In general, the 
191: facilitated location of an operator by a protein is likely to consist of a 
192: sequence of sliding, hopping or intersegmental transfer events \cite{vonhip}. 
193: A growing body of evidence exists that many 
194: regulatory proteins, such as repressors \cite{lac1}, \cite{p53} RNAP's\cite{wu}-\cite{bustamente}, nucleases \cite{ecorv} and methylases \cite{meth} locate 
195: their operator sites in this way.
196: 
197: While the sliding component of facilitated target location is commonly assumed 
198: to be thermally driven we point out there is no experimental evidence to 
199: preclude dynamical effects resulting from local, sequence-dependent mechanical
200: properties of DNA.
201: In the seminal paper of Berg and co-workers \cite{berg} there are two 
202: assumptions, in particular, which may be unsatisfactory. Firstly the 
203: facilitated transport model is derived under the assumption of 
204: a homogeneous, free protein distribution. While such conditions can be 
205: arranged {\em in vitro} this is not the case for a biological system.
206: Secondly there is no real account of degrees of molecular recognition: 
207: operator sites are treated as ``sinks'', defining a boundary condition for the 
208: one-dimensional diffusion equation. It is highly probable that some kind of 
209: ``reading'' process also occurs, mediated by the electrostatic interactions 
210: between protein residues and nucleotide functional groups lying in one, or 
211: both, of the grooves.  
212: 
213: The limited empirical studies of sliding RNAP performed to date, for example 
214: Refs \cite{wu}-\cite{bustamente}, invariably average behaviour over many 
215: individual sliding events, masking any sequence-dependent variation which 
216: might exist. On the other hand a recent model of the hypothetical reading 
217: process \cite{S4} was built, based upon the assertion that 
218: \begin{quote}...the protein 
219: should follow a noise-influenced, sequence-dependent motion that includes the 
220: possibility of slowing down, pauses and stops...
221: \end{quote}
222: 
223: Let us qualitatively envisage then, how soliton-like deformations might arise 
224: in RNAP-DNA interactions. The presence of enzymes as ``mass defects'' in a 
225: 1-dimensional DNA model have previously been considered \cite{satar}, \cite{ting0}, \cite{ting1} with regard to thermal breathers and transcribing RNAP.
226: In distinction, initial binding to a nonspecific DNA site entails 
227: insertion of polymerase domains into the major groove of the helix, where the 
228: displacement of counterions occurs. Suppose that the initial contact 
229: and recoil of RNAP during association induces a localised deformation in the 
230: B-DNA helix, which we approximate by a breather soliton.
231: Breather excitations in the IFK model \cite{kbk} are, due to the discreteness
232: of the model, inherently unstable. Therefore the initially stationary breather would 
233: propagate along the strand, preferentially in a direction determined by local 
234: inhomogeneities in the base sequence. Further, the deformation is not a true 
235: soliton, owing to the discreteness of the DNA lattice, and radiates 
236: energy, eventually dissipating. In this regard we also note the mean sliding 
237: distance of RNAP's are known to be highly sensitive to variations in cation
238: concentration \cite{wu2},\cite{smeekins}.
239: 
240: There are two pictures which are plausibly consistent with noisy, 
241: deterministic dynamics: either the RNAP can effectively ``surf'' the breather
242: or that randomly moving RNAP and deterministically travelling breathers can 
243: interact somehow on collision. Regarding the latter, little is known about the
244: structure of nonspecific protein-DNA complexes and for the remainder of
245: the paper we consider the former, less speculative, scenario.
246:  
247: \subsection{T7 bacteriophage}
248: Bacteriophage T7 is a member of the {\em Podovirales} family of viruses, which cause lytic infection of bacteria. Its simple regulatory apparatus is one of the most widely studied, serving as a model for genomes of more complex organisms. 
249: As mentioned above, previous nonlinear DNA studies \cite{S1}-\cite{S3}, \cite{kl} have 
250: involved the T7 phage genome sequence.
251: However these studies focussed exclusively on the base sequence, with no 
252: consideration for the possible changing biological context of the information
253: it contains. In fact the essentially linear processes of T7 DNA
254: translocation and gene expression make this phage an excellent case study. 
255: 
256: T7 is known to inject its double-stranded, linear DNA into a host {\em E. coli} cell in a stepwise, transcription-dependent manner \cite{inf1}. 
257: The T7 genome contains 39,937 base pairs but initially only the first 850 of 
258: these base are translocated from the phage particle \cite{inf2}. 
259: This initial fragment contains three strong promoters specific to {\em E. coli} RNAP $A_{1}-A_{3}$ (in addition to the minor $A_{0}$, or $D$, promoter with 
260: no known {\em in vivo} function) which initiate transcription of the phage 
261: sequence. The remainder of the genome is divided into three sections:
262: The ``early'' region contains class I genes - those responsible for modifying 
263: host metabolism to favour phage production; the middle region, where
264: class II genes govern phage replication; the ``late region'' of class III 
265: genes driving maturation and packaging of newly assembled phage DNA strands.
266: 
267: Transcription of the initial fragment serves a dual function: ``pulling'' 
268: downstream, early DNA from the phage particle into the host cell, in 
269: addition to transcribing the class I genes. 
270: The product of the first of these, gene 0.3, inactivates host defence 
271: (specifically type I restriction/modification) systems,  
272: therefore rapid recognition of a major promoter is vital for successful 
273: infection by wild-type T7.
274: Another product of this early region is a T7 RNA polymerase, recognising
275: its own specific promoters,  which is responsible for transcribing the 
276: remainder of the genome, a process which proceeds in two steps:
277: Entry of the middle region into the host cell is dependent upon the 
278: successful translation of class I genes. In turn, translocation of the 
279: late-transcribed region requires the products of early and mid-regions.
280: 
281: In contrast to gene expression in more complex organisms, there are very 
282: few ``feedback'' loops, indeed a virtual simulation of the T7 life cycle has 
283: been developed \cite{endy}. There are two known loops which may 
284: have relevance to our analysis below: mid-late inhibition of class I (host-specific) promoters \cite{hes} and late inhibition of the mid (phage-specific) promoters\cite{lys}.
285: \section{The model}
286: At physiological temperatures the physically dominant mode of base-pair opening is the base-flipping, pendular oscillations of bases about their N-glycosydic 
287: bond in the mean base-pair plane. Such models previously considered 
288: for biological roles \cite{S1}-\cite{sanchez} are based upon the IFK \cite{kbk} Hamiltonian:
289: \begin{eqnarray}
290: {\cal H} & = & \frac{1}{2} \sum_{i=1}^{n}
291: I_{i}(\dot{\theta}_{i}^{2} + \dot{\psi}_{i}^{2})
292: + \frac{1}{2} \sum_{i=1}^{n-1}( \kappa_{i}
293: (\theta_{i+1}-\theta_{i})^{2} + \bar{\kappa}_{i}
294: (\psi_{i+1}-\psi_{i})^{2} ) \nonumber \\
295: & & + \sum_{i=1}^{n} \sigma_{i}(1-\cos(\theta_{i}-\psi_{i})). \label{ifk} 
296: \end{eqnarray}
297: Here $\theta_{i}$, $\psi_{i}$ are the angles of deflection of the
298: $i^{th}$ base ``pendulum'' and that of its complement from equilibrium, while
299: $I_{i}$ is the inertial moment. Nearest-neighbour bases are coupled by an 
300: harmonic torsion potential with ``stiffnesses'' $\kappa_{i}$, 
301: $\bar{\kappa}_{i}$. Finally $\sigma_{i}$ is the characteristic strength of the 
302: nonlinear H-bonding potential between complementary bases.
303: 
304: In earlier studies \cite{S1}-\cite{S3} homogeneous inertial moments and 
305: stiffness constants were assumed, with the only sequence-dependence residing 
306: in the H-bonding coupling constants $\sigma_{i}$. 
307: Specifically, for $i, 1 \ldots, n$
308: \begin{eqnarray*}
309: I_{i}=I, & &\kappa_{i} \equiv K =\bar{\kappa}_{i}.
310: \end{eqnarray*}
311: In addition it was assumed that $\sigma_{i} = \lambda_{i} k$ where $k$ is a 
312: generic coupling and $\lambda_{i}$ takes the values 2 and 3 for A.T and 
313: G.C pairs respectively, accounting for the differing numbers of base pairs.
314: With these approximations, one passes to angle sum- and difference-coordinates \begin{eqnarray*}
315: \theta_{i}=\frac{1}{2}(u_{i}+v_{i}), & & \phi_{i}=\frac{1}{2}(u_{i}-v_{i}).
316: \end{eqnarray*}
317: The Hamiltonian thus obtained is
318: \begin{eqnarray}
319: {\cal H}' & = & \frac{1}{2} \sum_{i=1}^{n}
320: I(\dot{u}_{i}^{2} + \dot{v}_{i}^{2})
321: + \frac{K}{2} \sum_{i=1}^{n-1} (u_{i+1}-u_{i})^{2} + (v_{i+1}+v_{i})^{2} ) \nonumber \\
322: & & + \sum_{i=1}^{n} \lambda_{i}k(1-\cos(u_{i})). \label{ifk2} 
323: \end{eqnarray}
324: The equations of motion for the  $u_{i}$ reduce to the set of dimensionless,
325: coupled equations:
326: \begin{equation}
327: \ddot{u}_{i}-(u_{i+1}-2 u_{i}+u_{i-1})+\beta_{i} \sin u_{i}=0; \hspace{0.1cm}
328: u_{i}=\theta_{i}-\psi_{i}, \label{dsc}
329: \end{equation}
330: where the time variable has been rescaled, $t \to \sqrt{I/k}t$, and the 
331: parameter $\beta_{i}=\lambda_{i}k/K\equiv \lambda_{i}\eta$.
332: To model the sequence variation as small perturbations to a homogenous solution, we first require the average value of the parameters $\beta_{i}$:
333: \begin{equation}
334: \beta = \left(2 \frac{n_{AT}}{n} + 3  \left( 1 -\frac{n_{AT}}{n} \right)\right)\eta,
335: \end{equation}
336: where there are $n_{AT}$ occurrences of A.T pairs in the molecule. 
337: In a purely homogeneous approximation, $\beta_{i}\to \beta$, in the continuum 
338: limit the system of equations (\ref{dsc}) reduces to the sine-Gordon equation,
339: \begin{equation}
340: \ddot{u}-u'' +\beta \sin u =0, \label{sg}
341: \end{equation}
342: which has a rich variety of solitonic solutions. A family of 
343: ``breather'' solutions of Eq.(\ref{sg}) with lengths $L_{\mu}$ and internal 
344: frequencies $\omega_{\mu}$ is defined by 
345: \begin{equation}
346: u_{br}(x,t)=4 \tan^{-1} \left(\frac{\sin \omega_{\mu} t}{\omega_{\mu} L_{\mu}}
347: \textrm{sech}(\frac{x}{L_{\mu}})\right),
348: \end{equation}
349: where in terms of the classifying parameter $\mu$
350: \begin{eqnarray*}
351: L_{\mu}=\beta^{-1/2} \textrm{cosec} \mu, & & \omega_{\mu}=\beta^{1/2} \cos\mu.
352: \end{eqnarray*}
353: Note that the above relation imposes a minimum breather width and frequency which 
354: the model can support for a given set of environmental conditions. 
355: The smallness of $\beta$ ensures that an approximate solution of the 
356: discrete, inhomogeneous model is of a similar form with slowly-varying
357: parameters, thus our {\em ansatz} for Eq.(\ref{dsc}) is
358: \begin{equation}
359: u_{n}=4 \tan^{-1} \frac{\sin \omega t}{\omega L}
360: \textrm{sech}z_{n}; \hspace{0.3cm} z_{n}=(n-X)/L.  \label{sech}
361: \end{equation}
362: Here $X$ is understood as a collective coordinate for the breather and, for 
363: convenience we have omitted the $\mu$ subscript.
364: 
365: If the total energy is approximately conserved, upon substituting 
366: (\ref{sech}) into the Hamiltonian (\ref{ifk2}) one arrives at an expression 
367: for the effective potential in the collective coordinate $X$ \cite{S3}, 
368: associated with the propagation of the initial excitation on an inhomogenous 
369: background. We find, using the identity
370: \begin{eqnarray*}
371: 1-\cos u = 8/(\tan u/4 + \cot u/4)^{2},
372: \end{eqnarray*}
373: the expression for the total energy takes the form 
374: \begin{equation}
375: E(X;t)\equiv K(X;t)+V(X;t)+O(\beta^{2})=0.
376: \end{equation}
377: Here  
378: \begin{eqnarray*}
379: K(X;t)&=&
380: \frac{8}{L^{2}} \sum_{i=1}^n D_{i}(X;t) \left( \alpha(t)\sinh^{2} z_{i}(\dot{X})^{2} \nonumber \right. \\
381: & & \mbox{} \left. +\omega L\sqrt{\alpha(t)(\frac{1}{(\omega L)^{2}}-\alpha(t))} \sinh 2z_{i} \dot X\right), \\
382: V(X;t)& =& 8 \sum_{i=1}^n D_{i}(X;t) \left((\frac{1}{L^{2}}-\omega^{2}\alpha(t))\cosh^{2}z_{i} \right. \nonumber \\
383: & & \mbox{}\left.+ \alpha(t)(\frac{1}{L^{2}}\sinh^{2} z_{i} + \beta_{i}\cosh^{2}z_{i})\right), \\
384: D_{i}(X;t)& =&\frac{1}{(\alpha(t)+\cosh^{2}z_{i})^{2}}.
385: \end{eqnarray*}
386: where the function  $\alpha(t)\equiv (\sin (\omega t)/\omega L)^{2}$ governs 
387: the time-dependence of $V$. If the breather oscillation timescale is typically
388: orders of magnitude smaller than that of its propagation along the DNA 
389: \cite{ting1} we can replace the time-dependent potential by its average value
390: \cite{zhang}:
391: \begin{eqnarray}
392: V_{av}(X)& \equiv &\frac{1}{T}\int_{0}^{T} V(X; t) \nonumber \\
393: & = & \frac{4}{L^{2}} \sum^{n}_{i=1} 
394: \frac{(\textrm{sec}^{2}\mu+\beta_{i}L^{2}\tan^{2}\mu)\cosh z_{i}}{(\tan^{2}\mu+\cosh^{2}z_{i})^{3/2}} \label{vpot}
395: \end{eqnarray}
396: Owing to the nonlinear nature of the model the energy to translocate the
397: initial deformation is several orders of magnitude less that required to 
398: create the breather initially. We can derive a simple estimate of the 
399: ``noisiness'' of the sliding dynamics from the energy required to shift the 
400: breather by one base-pair:
401: \begin{equation}
402: \varepsilon(X)=\frac{K}{k_{B}T} (V_{av}(X_{i+1})-V_{av}(X_{i})) \label{theta}
403: \end{equation}
404: where $V_{av}$ is the time-averaged potential (\ref{vpot}) and $k_{B}$ is 
405: Boltzmann's constant. For steep gradients the picture of sliding RNAP is 
406: thus analogous to a particle moving through an energy landscape while in flat 
407: regions it is more akin to a random walk. 
408: 
409: \section{Results}
410: Having derived the breather effective potential (\ref{vpot}) we compute the 
411: ``landscape'' corresponding to the T7 genome. 
412: It is natural, initially, to assume breather width is the size of a nonspecific RNAP complex. This size is not directly known, either for {\em E. coli} or
413: T7 RNAP. For certain other proteins the size of a nonspecific complex is 
414: estimated to be several times smaller \cite{IHF} than that of a specific one.
415: Therefore we assume upper bounds on $L$ are provided by the DNA 
416: ``footprint'' size protected by RNAP in nuclease digestion experiments.
417: For translocating {\em E. coli} and T7 elongation complexes these values
418: are  $L_{B}=30$ bp \cite{scale1} and  $L_{\phi}=24$ bp \cite{scale2} 
419: respectively. 
420: 
421: \subsection{Sequence Analysis}
422: For our initial sequence analysis we adopt model parameter values coinciding 
423: with those of Salerno's \cite{S1} original study of T7 promoters. Setting the 
424: ratio $\eta = 2\times 10^-3$ implies the lower bound for breather 
425: width is $L_{min} \equiv \beta^{-1/2} \sim 15$ bp. 
426: Figure 1a shows the region of $V_{av}$  corresponding to the initial 850bp 
427: fragment of the T7 phage for a breather of width 30bp. For comparison Figure 
428: 1c shows the time evolution of the system (\ref{dsc}) with breathers initially
429: placed at intervals within the fragment. Comparison of the three trajectories
430: with the effective potential landscape in Fig. 1a serves to verify that the 
431: direction and range of propagation agree for the two methods.\footnote{Figure 1 to go here}. 
432: 
433: Now the $\sigma^{70}$ subunit of {\em E. coli} holoenzyme RNAP recognises 
434: hexamers located 35 and 10 bases upstream of transcription initiation 
435: \cite{sig70}. In addition many strong promoters are enhanced \cite{UP} by a 
436: UP element: contacts between the $\alpha$ subunit of RNAP and AT rich 
437: sequences centred approximately 40-60 sites upstream. 
438: Inspection of Figure 1a shows that the UP region of $A_{1}$ and the -35
439: sites for $A_{2}$, $A_{3}$ (shown as dots) lie close to the bottom of 
440: potential wells: the respective initiation sites are 62, 29 and 21 bp 
441: upstream of these minima.  Comparison with the noise parameter, $\varepsilon$ 
442: plotted in Fig. 1c 
443: shows that when the motion is strongly deterministic ($|\varepsilon|>2$)
444: it is invariably towards regions where promoter recognition can occur.
445: The $\varepsilon$ values in the initiation region of the strongest ($A_{1}$)
446: bacterial promoter are 1.5 times greather than anywhere else in the T7 genome.
447: 
448: In fact there are seven {\em E. coli} RNAP specific promoters in the T7 
449: genome, the first recognition sites for the six earliest are shown in Figure 2
450: as dots. The four minor ($A_{0}$, $B$, $C$ 
451: and $E$) promoters, while having no recognised {\em in vivo} function, were 
452: found to have initiation sites 61 ($A_{0}$ transcribes leftwards), 27, 28 
453: and 17 bp downstream of deep minima.
454: Figure 2 also shows the full class I region of the T7 genome, transcribed by 
455: the bacterial RNAP, extending from the 5' DNA end to the bacterial 
456: transcription terminator, TE. In the Genbank \cite{GB} reference sequence 
457: (accession number NC\_001604) this corresponds to sites $\sim$500-7588. 
458: Note that other aspects of facilitated transport: dissociation 
459: followed by ``hopping'' or interdomain transfer are likeliest to occur in 
460: locally flat regions, where the breather spends most time. In this way, 
461: the effect of multiple, broad-bottomed wells as kinetic ``traps'' might be 
462: minimised. \footnote{Figure 2 to go here}
463: 
464: On the other hand, the deep minimum at approximately 6 Kb could be a 
465: desirable kinetic trap for the host RNAP as it lies 100 bp downstream of the 
466: gene coding for T7 RNA polymerase. The T7 RNAP intitiates transcription at
467: one of the two specific promoters (unfilled dots at the right side of Figure 
468: 2) and is thus responsible for the subsequent internalisation and expression 
469: of the remainder of the T7 genome. This deep minimum thus represents the end 
470: of the region where the host RNAP is ``useful''. One finds a similar, deep 
471: minimum at the class II/class III interface for a wide range of breather 
472: widths which could play a similar role, inhibiting late transcription from 
473: weaker class II promoters in favour of class III promoters.
474: 
475: \subsection{Parameter variation} 
476: Given the coarseness of the current model, it is important to understand
477: how the results obtained may vary with respect to the parameter values. 
478: Up to an overall scaling, all parameter variation in the sequence-dependent 
479: part of (\ref{vpot}) enter via the breather width, $L$, which governs 
480: sensitivity to sequence-dependent inhomogeneities: an increase of width 
481: leads to landscapes with fewer extrema which are also broader and larger in 
482: amplitude. The fundamental relationship governing effects of parameter 
483: variations is therefore $\textrm{cosec} \mu= L \sqrt{\beta}$. It is 
484: natural to associate the breather family parameter $\mu$ with the dimension 
485: of the protein DNA-interface and $L$ with the ``response'' of the system for 
486: a given set of environmental conditions, encapsulated in $\eta$.
487: 
488: Understanding of model robustness is complicated by the 
489: way in which the sliding of RNAP changes. For example, if breathers do 
490: play a role in the location of T7 promoters  $A_{1}-A_{3}$ by bacterial RNAP 
491: then one might expect environmental changes which alter sliding behaviour to 
492: also influence promoter activity. It is known \cite{pTemp} that the 
493: activities of $A_{1}-A_{3}$ are temperature dependent, with $A_{1}$ 
494: increasing from $20-37^{\circ}$ C while initiation at $A_{2,3}$ decreases under the same circumtances 
495: 
496: Due to decreased thermal stability, one expects a greater ``reponse'' of the
497: helix to a deformation at increased temperature. For fixed $ \mu$ this 
498: corresponds to an increase in $L$ and decrease in $\eta$. The two graphs in 
499: Fig 3 are calculated for such circumstances with $L=30$, $\eta=0.002$ and 
500: $L=67$, $\eta=0.0004$ respectively. For the higher $\eta$ value (on the left) 
501: a sliding RNAP is extremely likely to fall into one of the three wells 
502: associated with a major promoter.\footnote{Figure 3 to go here}
503: 
504: Conversely for the lower $\eta$ value the well containing $A_{1}$ has greatly
505: widened at the expense of the other two. Indeed the -35 sites for $A_{2}$, $A_{3}$ are now situated close to local maxima and the probability of an 
506: encounter with sliding RNAP would be significantly reduced.
507: We note that minima close to one or more major promoter sites exist for a 
508: broad range of parameter values. One could argue that the overall 
509: sequence composure of the T7 initial fragment appears to confer some 
510: robustness of host promoter recognition against environmental variations.
511: 
512: Having outlined the qualitative variation of the system behaviour with 
513: parameter changes, we recompute the potential for $L_{\phi}=24$ bp. 
514: From Figure 4a it is immediately seen that for none of the T7 promoters
515: does the locally deepest minimum concide with upstream, recognition sites.
516: With replication origins $\phi_{L}$ and $\phi_{R}$ and the earliest
517: phage promoters, $\phi 1.1A$, $\phi 1.1B$ omitted, minima appear to be
518: correlated to the start of the first downstream coding sequence, as evidenced
519: in Fig. 4b.\footnote{Figure 4 to go here}
520: 
521: \section{Discussion}
522: \subsection{Model Assumptions}
523: The planar model of DNA presented is a highly simplified one, containing
524: numerous assumptions which are unrealistic for modelling many DNA processes:
525: There is no explicit allowance for the helical structure and its
526: writhing/twisting behaviour. Many interactions with proteins involve major,
527: localised conformational changes of DNA however the specific case of 
528: sliding RNAP may be an exception. Firstly, because such
529: conformational changes are unlikely to be present immediately prior to
530: closed complex formation \cite{y1} and secondly, there is some evidence that 
531: rates of RNAP sliding , under some conditions at least, are  
532: independent of supercoiling \cite{smeekins}.
533: 
534: Another important assumption was the homogeneous, harmonic nature of
535: the restoring torques. In fact it is known that simple, ``base content''
536: models of helix-coil transition thermodynamics reproduce empirical data
537: for short ($\leq 15$ bp) DNA oligoucleotides quite well \cite{kam2}, \cite{unif}.
538: Specifically, encapsulating sequence dependence as AT and GC contents 
539: enables reproduction of such data at 310K in 1M NaCl solution (corrections
540: due to change in salt concentration are discussed in \cite{unif}) with a 
541: mean (median) error of  9\% (5\%) (Bashford, J; unpublished). 
542: From our previous study of the thermodynamics of B-DNA helix-coil 
543: transition \cite{bas2} we further estimate that the enthalpies of A.T and
544: G.C pairs are in the ratio 1.56/3,  which serves to enhance the distinction 
545: between the two types of base pair in Eq.(\ref{vpot}). This accounts, in 
546: addition to differing numbers of H bonds, to the averaged effects of solution, 
547: neighbouring base-pairs and other interactions between the complementary pair.
548: 
549: The assumption of harmonicity for the stacking potential at large opening 
550: angles, however, is more questionable and should be further refined. 
551: Also molecular calculations of the ``base-flipping'' in Watson-Crick pairs 
552: suggest \cite{bflip} opening into the major groove is more energetically
553: favoured for purine bases.
554: 
555: \subsection{Breather dynamics}
556: The shape of the breather potential, used in the qualitative arguments above
557: depend only upon the ratios of $\eta=k/K$ and $\lambda_{A/T}/\lambda_{G/C}$. But physical properties of any breather depend on the actual parameter values. 
558: For example, the breather energy $E$ and oscillation frequency 
559: $\omega$ may be derived as
560: \begin{eqnarray}
561: E&=& \frac{16K}{L}\sim \sqrt{kK}, \\
562: \omega^{2} &=& \frac{K}{I}(\beta -\frac{1}{L^{2}}).
563: \end{eqnarray}
564: Using the parameter values in Ref. \cite{kl}: $K=5\times 10^{-18}$ J, 
565: $I=2 \times 10^{-43}$ kg m$^{2}$, in combination with our estimate
566: based on data from Ref.\cite{bas2}:  $k=1\times 10^{-20}$ J, yields $\eta=4.5\times 10^{-3}$.
567: Thus for a breather of width $L=30$ bp we get  
568: \begin{eqnarray*}
569: E \simeq 2.7 \times 10^{-18} J, & & \omega \simeq 1.0\times 10^{12} s^{-1}. 
570: \end{eqnarray*}
571: The energetic cost of creating this breather may be of the magnitude
572: of the electrostatic attractions responsible for the nonspecific contact.
573:  
574: Concerning the size of the DNA helix deformation, we note that parameter
575: $\mu$ provides an estimate of the amplitude of the base-pair opening.
576: $u_{max} = 4 \mu$ when $\mu <\pi/4$. For $\beta=0.0045$, as above,
577: the amplitude for a 30 bp breather is $2\pi/3$, corresponding to individual 
578: pendulum deformations of $60^{\circ}$. This parameter set does not support 
579: breathers of width less than $\beta^{-1/2}\simeq 15$ bp.
580: A variation of 20\% in the value of $K$ leads to maximum deformations of 
581: $52^{\circ}-65^{\circ}$: base pairs are bent but not fully opened. 
582: These moderate conformational changes need not be incompatible with an anticipated
583: absence of large deformations \cite{y1} accompanying nonspecific RNAP-DNA complexes.
584: 
585: The values for model parameters appearing in the literature are estimated  
586: from old experiments on DNA homopolymers, for example Refs. \cite{nonk}, 
587: \cite{yak2} which is a difficult process. However the main results of our 
588: paper stem from i) the {\em shape} of the potential (\ref{vpot}) and ii) the 
589: noise parameter, $\varepsilon$, defined by (\ref{theta}). For these two 
590: expressions changes in the parameter $\eta$ can be offset by ``tuning'' the 
591: value of $\mu$ which is a relatively free parameter. The only potentially 
592: serious sensitivity is that of $\varepsilon$ to large changes in $k$, the 
593: measure of dissociation energy for H-bonded base pairs. Fortunately, of the 
594: three parameters in (\ref{ifk}), this is the most reliable quantity to 
595: estimate.
596: 
597: \subsection{Helical model}
598: If the picture of sliding RNAP as a soliton-like deformation is subsequently 
599: shown to be incorrect, the correlations observed between potential minima and 
600: promoter sites still have to be explained. The soliton solutions of 
601: (\ref{ifk}) preferentially move to AT-rich regions. Inspection of (\ref{vpot})
602: shows the variation due to sequence is not linear in AT content, but
603: a first ``moment'', where the contribution from each base is weighted by its 
604: position relative to the central site $X$:
605: \begin{eqnarray}
606: V_{var} (X) & \sim & \sum_{i} \beta_{i} w(z_{i}),  \label{vx} \\ 
607: w(z)&=&\frac{\cosh z}{(\tan^{2}\mu+\cosh^{2} z)^{3/2}}. \nonumber
608: \end{eqnarray}
609: Curiously, this weighting function coincides with the inverse radius of
610: curvature for a hyperbolic curve $f(z)=\cosh z$. Such a term arises naturally
611: in the Lorenz force experienced by a charged particle following a curved magnetic field line. Initially consider a particle of mass $m$, charge 
612: $q$, travelling along a uniform, straight magnetic field line. Its motion is 
613: determined by the Lorenz equation
614: \begin{eqnarray*}
615: \frac{d}{dt}\vec{v} = \frac{q}{m} \vec{v}\times \vec{B}.
616: \end{eqnarray*}
617: Assuming the field line lies along the $z$ axis, $\vec{B}=B \vec{e}_{z}$, the
618: velocity equation is split into parallel and perpendicular components
619: \begin{eqnarray*}
620: \frac{d}{dt}v_{||} & = & 0, \\
621: \frac{d}{dt}\vec{v}_{\perp} & = & \frac{qB}{m}\vec{v}_{\perp} \times \vec{e}_{z}
622: \end{eqnarray*}
623: The general solution to these equations is a helical trajectory, with 
624: time-dependent coordinates
625: \begin{eqnarray*}
626: x(t) & = & x_{0}+ \frac{|v_{\perp}|}{\omega} \sin (\omega t + \phi), \\
627: y(t) & = & y_{0}+ \frac{|v_{\perp}|}{\omega} \cos (\omega t + \phi), \\
628: z(t) & = & z_{0}+ v_{||} t,
629: \end{eqnarray*}
630: where $(x_{0},y_{0},z_{0})$ denotes the initial location of the particle
631: and $\omega$ determines the helical frequency. 
632: This problem naturally resembles the electrostatic sliding of a protein 
633: ``particle'' along the grooves of the DNA helix. Here the role of gyro 
634: frequency is played by the twist of the helix, while the guiding centre of 
635: particle motion $(x_{0},y_{0},z(t))$ corresponds to the central helical axis
636: of the DNA.
637: 
638: Consider now the effect of introducing a curve into the helical axis: a
639: particle travelling along a curved field line experiences a centrifugal 
640: force upon its guiding centre. In a local coordinate system this is
641: \begin{eqnarray*}
642: \frac{mv^{2}_{||}}{|r_{c}(s)|}\frac{\vec{r_{c}(s)}}{|r_{c}(s)|}
643: \end{eqnarray*}
644: where $|r_{c}|$ and $s$ denote the radius of curvature and line element 
645: along the field line.
646: Similarly let us here write an analogous expression
647: \begin{equation}
648: \vec{F}_{c}=\frac{{\cal E}}{r_{c}}\vec{r}_{c} 
649: \end{equation}
650: where the quantity ${\cal E}$ has the dimensions of energy. In particular,
651: we assume that locally the bend can be approximated by $z(\xi)=\cosh \xi$
652: Then, c.f. (\ref{vx}),
653: \begin{equation}
654: |\vec{F}_{c}(\xi)|={\cal E}\frac{\cosh{\xi}}{(1+\sinh^{2} \xi)^{3/2}}.
655: \end{equation}
656: It follows that in the continuum limit the time-averaged breather potential 
657: could also be thought of as the work done by a ``centrifugal force'' on a
658: sliding RNAP as it navigates a bend in the helix.
659: Therefore the ``potential'' (\ref{vpot}) can conceivably be arrived at
660: via simple considerations of thermal stability (in a planar model) or bending 
661: deformations (in a helical model), two of the most commonly suggested
662:  mechanisms for enhancing promoter recognition.
663: 
664: \subsection{Superhelicity}
665: A mechanism of localised DNA deformation with demonstrated biological 
666: significance \cite{ben3}, \cite{ben1},\cite{ben2} is that of superhelical 
667: stress-induced DNA denaturation (SSID). Roles for SSID in gene
668: regulation have been proposed \cite{ben3} in regard to both open complex 
669: formation and transcription. In the former instance, promoter sites are 
670: easily destabilized by superhelical stress. In the latter, 
671: the action of local helix unwinding by transcribing RNAP results in waves of 
672: positive (negative) superhelicity propagating downstream (upstream) of the 
673: transcription complex. Computation of SSID profiles indicates \cite{ben3},
674: \cite{ben2} AT rich regions (down-) up-stream of the (3') 5' ends of 
675: transcription units are prone to localised over/under-winding  acting as a 
676: possible ``sink'' for propagating superhelicity and ensuring smooth transcription.
677: 
678: The breather potential (\ref{vpot}), which also picks out regions of AT 
679: shows that transcription units of at least $10^{3}$ bp in length are often 
680: demarcated by minima, in agreement with the above observations.
681: This is especially the case for the 3' ends of T7 genes 1 and 6, the last
682: genes in class I and II regions respectively. In these instances the AT 
683: richness may also confer extra rigidity, making these suitable pause sites 
684: in the stepwise internalisation of the phage genome, or as mentioned above
685: act as a kinetic trap, used in inhibiting class I or II transcription.
686: 
687: \subsection{Correlations}
688: In reporting promoter-extrema correlations two points should be kept in mind.
689: Firstly, the assumed breather widths coincide with the sizes of the elongation
690: RNAP-DNA complexes. Therefore potential minima could be indicative of 
691: deformation associated with transcription, as appears to be the case for T7
692: phage promoters, shown in Figure 4. Regarding nonspecific complexes, the 
693: values $L_{B}=30$ and $L_{\phi}=24$ bp should be considered as upper bounds 
694: for an experimentally undetermined quantity.
695: The correlations reported in this study persist for the ranges 
696: $20\leq L_{B}\leq 30$ and $18\leq L_{\phi}\leq 24$. For sizes less than 18bp, 
697: the increasing roughness of Eq.(\ref{vpot}) causes difficulty in identifying 
698: correlations.
699: 
700: The second caveat is that only correlations between promoter initiation 
701: and the deepest local minimum have been considered. For some T7 promoters 
702: shallow upstream wells also exist. Moreover the effect of thermal noise has 
703: not been considered. Only with full dynamical simulations can connections
704: between the local topography of Eq.(\ref{vpot}) and facilitated target 
705: location be properly studied.
706: 
707: It is difficult to see how kink solutions of the planar model 
708: (\ref{ifk}), previously considered \cite{S1}-\cite{sanchez} might mimic 
709: physical profiles of base-pair opening. Kinks will also move preferentially to AT rich regions, presumably the reason why promoter sequences $A_{1}$ \cite{S1}, $A_{3}$ and $A_{0}$ \cite{S2} were concluded to be ``dynamically active''.
710: The unit-mass potential for kinks, initially at rest, moving in a 
711: slowly-varying background was derived by Salerno and Kivshar \cite{S3}. The sequence variation is contained in a term analogous to (\ref{vx}), however the 
712: weighting function is 
713: \begin{eqnarray*}
714: W_{k}(z)=\textrm{sech}^2 z.
715: \end{eqnarray*}
716: This coincides with the breather function for small $\tan^{2} \mu$, 
717: illustrating why similar results for the major T7 promoter sequences
718: are obtained for both kink \cite{S1}-\cite{S3} and breather solitons.
719: 
720: \section{Conclusion}
721: In this paper we have re-examined Salerno's nonlinear DNA model, postulating 
722: a role for localised soliton excitations in approximating the sliding 
723: component of facilitated target location of RNA polymerase.
724: We found that such deformations would involve moderate bending of individual 
725: base pairs and that their energy of translocation is consistent with a picture 
726: of noisy, deterministic dynamics. Both of these observations are also 
727: consistent with current, limited knowledge of RNAP sliding and nonspecific 
728: complexes. A qualitative correspondence of these solitons and localised 
729: bending in a helical model was also demonstrated.
730: 
731: The dynamical picture of sliding which emerged also suggests that the 
732: random/deterministic nature of the motion is sequence-dependent, with 
733: translocation in relatively homogeneous regions being effectively random.
734: The corollary, that interplay between adjacent random and deterministic 
735: regions could constitute a search ``algorithm'', is speculative and, we 
736: believe, merits further investigation.
737: 
738: Our analysis of the T7 genome showed good correlations between AT-rich 
739: regions and the recognition sites of host-specific promoters used for 
740: early phage transcription. For phage-specific promoters, regions of 
741: maximal AT-richness correlated with the start of the coding sequence 
742: immediately downstream. As discussed above this may be connected with 
743: transcription and while there is no obvious correlation with recognition 
744: sites, a full description of facilitated target location needs to account
745: for the thermal background. This is a subject of current investigation.
746: 
747: We note that there has been suggestion \cite{mol4} that virion proteins 
748: injected into the host cell with the initial T7 fragment may i) inhibit the 
749: nonspecific binding of restriction enzymes and other proteins to DNA;  ii) 
750: have an affinity for {\em E. coli} RNAP, negating the requirement for direct 
751: promoter recognition {\em in vivo}. Similarly, inhibition of class I and II 
752: transcription is known to be performed by T7 gene products: kinase (gene 0.7) 
753: and lysozyme (gene 3.5) respectively.
754: 
755: However we see similar correlations for the UP and $\sigma^{70}$ sites of 
756: bacterial promoters in other members of the T7 viral supergroup, in 
757: addition to genomes of the unrelated phages T4 and T5 (see Figure 5).
758: This may be suggestive of a mechanism at work to enhance promoter 
759: recognition/inhibition in lytic phage genomes, although in the presence of 
760: functional proteins this mechanism can be relegated to an auxiliary role, 
761: such as in T7. \footnote{Figure 5 to go here}
762: 
763: It is important to investigate whether planar base-flipping/helical bending 
764: deformation patterns can be used to simulate protein-DNA interactions in 
765: DNA sequence analysis. The correlations reported here, to our knowledge for 
766: the first time, could have been made via other ``nonlinear'' analyses of AT 
767: content, had a motivation been apparent. 
768: Propagation of breathers in a non-linear, toy model of DNA provide a source, 
769: for such motivation. It may be that herein lies the true value of a model 
770: with such a controversial history.
771: 
772: \vspace{0.5cm}
773: \noindent
774: \large{ {\bf Acknowledgements}}
775: 
776: \noindent
777: This research was funded by Australian Research Council grant DP0344996 and a 
778: visiting fellowship to the Centre for Nonlinear Physics, Australian National 
779: University, where part of this work took place. 
780:  The author thanks G. Yang for helpful remarks and is grateful to Yu. Kivshar
781: and I. Molineux for discussions and comments on earlier versions of the manuscript.
782: 
783: \begin{thebibliography}{99}
784: \bibitem{pey}  Peyrard, M. ``Nonlinear dynamics and statistical physics of DNA'', {\em Nonlinearity} {\bf 17} (2004), R1-R40.
785: \bibitem{Gaeta1} Gaeta, G. 
786: ``Results and limitations of the soliton theory of DNA transcription'', {\em J. Biol. Phys.} {\bf 24} (1999), 81-96.
787: \bibitem{PB1} Peyrard, M. and Bishop, A.R. 
788: ``Statistical mechanics of a nonlinear model for DNA denaturation'', {\em Phys. Rev. Lett.} {\bf 62} (1989), 2755-2758.
789: \bibitem{Yak} Yakushevich, L.V. ``Is DNA a nonlinear dynamical system where solitary conformational waves are possible?'', {\em J. Biosci.} {\bf 26} (2001), 305-313.
790: \bibitem{dinuc} Bruant, N., Flatters, D., Lavery, R. and Genest, D. 
791: ``From atomic to mesoscopic descriptions of the internal dynamics of DNA'', {\em Biophys. J.} {\bf 77} (1999), 2366-2376. 
792: \bibitem{S1} Salerno, M. ``Discrete model for DNA-promoter dynamics'', {\em Phys. Rev.} {\bf A44} (1991), 5292-5297.
793: \bibitem{S2} Salerno, M. ``Dynamical properties of DNA promoters'', {\em Phys. Lett.} {\bf A167} (1992), 49-53.
794: \bibitem{S3} Salerno, M. and Kivshar, Yu.S. ``DNA promoters and nonlinear dynamics'', {\em Phys. Lett.} {\bf A193} (1994), 263-266.
795: \bibitem{S3b} Salerno, M. ``Nonlinear dynamics of plasmid PBR322 promoter'',
796: chapter 10 in  M. Peyrard (ed.), {\em Nonlinear excitations in biomolecules}, Edition de Physique, Springer, New York (1995).
797: \bibitem{kl} Lennholm and E.; H\"{o}rnquist, M. ``Revisiting Salerno's sine-Gordon model of DNA: active regions and robustness'', {\em Physica} {\bf D177} (2003), 233-241.
798: \bibitem{sanchez} Cuenda, S., S\'{a}nchez, A. 
799: ``Disorder and fluctuations in nonlinear excitations in DNA'', {\em Fluct. Noise Lett.} {\bf 4} (2004), L491-L504. 
800: \bibitem{englander}
801: Englander, S.W. {\em et al.} ``Nature of the open state in long polynucleotide double helices: possibility of solition excitations'', {\em Proc. Natl. Acad. Sci.} {\bf 77} (1980), 7222-7226.
802: \bibitem{riv} Gaeta. G., Reiss, C., Peyrard, M. and Dauxios, T. ``Simple models of nonlinear DNA dynamics'', {\em Riv. del. Nuov. Cim.} {\bf 17} (1994), 1-48.
803: \bibitem{edwards} Gabriel, C. {\em et al.} ``Microwave absorption in aqueous solutions of DNA'', {\em Nature} {\bf 328} (1987) 145-146. 
804: \bibitem{bigio} Bigio, I.J., Gosnell, T.R., Mukherjee, P. and Safer, J.D. 
805: ``Microwave absorption spectroscopy of DNA'', {\em Biopolymers} {\bf 33} (1993), 147-150. 
806: \bibitem{gueron} Gu\'{e}ron, M., Kochoyan, M. and Leroy, J.L. ``A single mode of DNA base-pair opening drives imino proton exhange'', {\em Nature} {\bf 328} (1987), 89-92.
807: \bibitem{Kam} Frank-Kamensteskii, M. ``Physicists retreat again'', {\em Nature}  {\bf328} (1987), 108.
808: \bibitem{ben3} Benham, C.J. ``Duplex destabilization in superhelical DNA is 
809: predicted to occur at specific transcriptional regulatory regions'', {\em J. Mol. Biol.} {\bf255} (1996), 425-434.
810: \bibitem{PB2} Choi, C.H. {\em et al.} ``DNA dynamically directs its own transcription initiation'', {\em Nucl. Acids. Res.} {\bf 32} (2004), 1584-1590.
811: \bibitem{bmc} Kanhere, A. and Bansal K. ``A novel method for prokaryotic promoter prediction based on DNA stability'', {\em BMC Bioinformatics} {\bf 6} (2005), 1-10.
812: \bibitem{trif} Bolshoy, A., McNamara, P., Harrington, R.E. and Trifonov, E. ``Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles'', {\em Proc. Natl. Acad. Sci.} {\bf 88}, (1991) 2312-2316.
813: \bibitem{anselmi} Scipioni, A. {\em et al.}
814: ``Sequence-dependent DNA curvature and flexibility from scanning force microscopy images'', {\em Biophys. J.} {\bf 83} (2002), 2408-2418.
815: \bibitem{lankas} Lankas, F. ``DNA sequence-dependent deformability - insights from computer simulations'', {\em Biopolymers} {\bf 73} (2004), 327-339.
816: \bibitem{naplus} Ponomarev, S.Y., Thayer, K.M., Beveridge, D.L. ``Ion motions in molecular dynamics simulations on DNA'', {\em Proc. Natl. Acad. Sci.} {\bf 101} (2005), 14771-14775.
817: \bibitem{proz} Polozov, R.V. {\em et al.} ``Electrostatic potentials of DNA. Comparative analysis of promoter and nonpromoter sequences.'', {\em J. Biomol. Struct. Dyn.} {\bf 16} (1999), 1135-1143.
818: \bibitem{kbk} Braun, O.M. and Kivshar, Yu.S.: {\em The Frenkel-Kontorova Model: Concepts, Methods and Applications}, Springer, Berlin, 2004.
819: \bibitem{vonhip} von Hippel, P.H. and Berg, O.G. ``Facilitated target location in biological systems'', {\em J. Biol. Chem.} {\bf 264} (1989), 675-678.
820: \bibitem{lac1} Fickert, R. and M\"{u}llerhill, B. ``How lac repressor finds {\em lac} operator {\em in vivo}'', {\em J. Mol. Biol.} {\bf 226} (1992), 59-68.
821: \bibitem{p53} Jiao, Y., Cherny, D.I.,  Heim, G., Jovin, T.M. and Sch\"{a}ffer, T.E.`` Dynamic interactions of p53 with DNA in solution by time-lapse atomic force microscopy'', {\em J. Mol. Biol.} {\bf 314} (2001), 233-243.
822: \bibitem{wu} Park, C.S., Wu, F.Y.H. and Wu, C.S. ``Molecular mechanism of 
823: promoter selection in gene transcription'', {\em J. Biol. Chem.} {\bf 257} (1982), 6950-6956.
824: \bibitem{wu2} Singer, P.T. and Wu, C.S. ``Kinetics of promoter search by {\em Escherichia coli} RNA polymerase'' {\em J. Biol. Chem.} {\bf 263} (1988), 4208-4214.
825: \bibitem{smeekins} Smeekins, S.P. and Romano, L.J. ``Promoter and nonspecific DNA binding by the T7 RNA polymerase'', {\em Nucl. Acids. Res.} {\bf 14} (1986), 2811-2827.
826: \bibitem{kabata} Kabata, H. {\em et al.} ``Visualisation of single molecules of RNA polymerase sliding along DNA'', {\em Science} {\bf 262} (1993), 1561-1563.
827: \bibitem{bustamente} Guthold, M. {\em et al.} 
828: ``Direct observation of one-dimensional diffusion and transcription by {\em Escherichia coli} RNA polymerase'', {\em Biophys. J.} {\bf 77} (1999), 2284-2294.
829: \bibitem{ecorv} Jeltsch, A. and Pingoud, A. 
830: ``Kinetic characterisation of linear diffusion of the restriction endonuclease
831: {\em Eco}RV on DNA'', {\em Biochemistry} {\bf 97} (1998), 2160-2169.
832: \bibitem{meth} Nardone, G., George, J. and Chirikjian, J.G. ``Differences in the kinetic properties of BamH1 endonuclease and methylase with linear DNA substrates'', {\em J. Biol. Chem.} {\bf 261} (1986) 2128-2133.
833: \bibitem{berg} Berg, O.G., Winter, R.B. and von Hippel, P.H. 
834: ``Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and Theory'', {\em Biochemistry} {\bf 20} (1981), 6929-6948.
835: \bibitem{S4} Barbi, M., Place, C., Popkov, V. and Salerno, M. 
836: ``A model of sequence-dependent protein diffusion along DNA'', {\em J. Biol. Phys.} {\bf 30} (2004), 203-226.
837: \bibitem{satar} Satari\`{c}, M.V. and Tuszy\`{n}ski, J.A. ``Impact of regulatory proteins on the nonlinear dynamics of DNA'', {\em Phys. Rev.} {\bf E65} (2002), 1901-1911.
838: \bibitem{ting0} Ting, J.J-L. and Peyrard, M. ``Effective breather-trapping mechanism for DNA transcription'' {\em Phys. Rev.} {\bf E53} (1996), 1011-1018.
839: \bibitem{ting1} Ting, J.J-L. ``DNA transcription mechanism with a moving enzyme'', {\em Intl. J. Mod. Phys.} {\bf A7} (1997), 1125-1132.
840: \bibitem{endy} Endy, D., You, L., Yin J. and Molineux, I.J. ``Computation, predictions and experimental tests of fitness for bacteriophage T7 mutants with permuted genomes'', {\em Proc. Natl. Acad. Sci.} {\bf 97} (2000), 5375-5380. 
841: \bibitem{hes} Hesselbach, B.A. and Nakada, D. ```Host shut off' function of bacteriophage T7: involvement of T7 gene 2 and gene 0.7 in the inactivation of {\em Escherichia coli} RNA polymerase'',{\em J. Virol} {\bf 24} (1977), 736-745.
842: \bibitem{lys} Moffat, B.A. and Studier, F.W. ``T7 lysozyme inhibits transcription by T7 RNA polymerase'', {\em Cell} {\bf 49} (1987), 221-227.
843: \bibitem{inf1} Zavriev, S.K. and Shemyakin. M.F.
844: ``RNA polymerase-dependent mechanism for the stepwise T7 phage DNA transport from the virion into {\em E. coli}'', {\em Nucl. Acids. Res.} {\bf 10} (1982), 1635-1652.
845: \bibitem{inf2} Garcia, L.R., and Molineux, I.J. ``Rate of translocation of bacteriophage T7 DNA across the membranes of {\em Escherichia coli}'', {\em J. Bacteriol.} {\bf 177} (1995), 4066-4076.
846: \bibitem{zhang} Zhang, F. ``Breather scattering by impurities in the sine-Gordon model'', {\em Phys. Rev.} {\bf E58} (1998), 2558-2563.
847: \bibitem{IHF} Tsodikov, O.V., Holbrook, J.A., Shkel, I.A., and Record, M.T., Jnr. ``Analytic binding isotherms describing competitive interactions of a protein ligand with specific and nonspecific sites on the same DNA oligomer'',
848:  {\em Biophys. J.} {\bf 81} (2001), 1960-1969.
849: \bibitem{scale1} von Hippel, P.H. ``An integrated model of the transcription complex in elongation, termination and editing'', {\em Science} {\bf 281} (1998), 660-665.
850: \bibitem{scale2} Imburgio, D., Rong, K. Ma. and McAllister, W.T. ``Studies of promoter recognition and start site selection by T7 RNA polymerase using a comprehensive collection of promoter variants'', {\em Biochemistry} {\bf 39} (2000), 10419-10430.
851: \bibitem{sig70}  Mulligan, M.E., Hawley, D.K., Entriken, R. and McClure, W.R. 
852: ``{\em Escherichia coli} promoter sequences predict {\em in vitro} RNA polymerase selectivity'', {\em Nucleic. Acids. Res.} {\bf 12} (1984), 789-800.
853: \bibitem{UP} Estrem, S.T. {\em et al.} ``Bacterial promoter architecture: subsite structure of UP elements and interactions with the carboxy-terminal domain of the RNA polymerase $\alpha$ subunit'', {\em Genes Dev.} {\bf 13} (1999), 2134-2147.
854: \bibitem{A1} Sclavi, B. {\em et al.} ``Real-time characterisation of intermediates in the pathway to open complex fomration by {\em Escherichia coli} RNA polymerase at the T7A1 promoter'', {\em Proc. Natl. Acad. Sci.} {\bf 102} (2005), 4706-4711.
855: \bibitem{GB} National Center for Biotecnhnology Information website. http://www.ncbi.nlm.nih.gov/Entrez
856: \bibitem{pTemp} Dausse, J.P., Sentenac, A. and Fromageot, P. 
857: ``Interaction of RNA polymerase from {\em Escherichia coli} with DNA. Effect of temperature and ionic strength on selection of T7 DNA early promoters.''
858: {\em Eur. J. Biochem} {\bf 65} (1976), 387-393.
859: \bibitem{bflip} Giudice, E., V\'{a}rnai, P. and Lavery, R. 
860: ``Base-pair opening within B-DNA: free energy pathways for GC and AT pairs from umbrella sampling situations'', {\em Nucl. Acids. Res.} {\bf 31} (2003), 1434-1443.
861: \bibitem{y1} Murakami, K.S., Masuda, S. and Darst, S.A. ``Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 \AA resolution'', {\em Science} {\bf 296} (2002), 1280-1284.
862: \bibitem{kam2} Frank-Kamanetskii, M. ``Simplification of the empirical relationship between melting DNA, its GC content and concentration of sodium ions in solution'', {\em Biopolymers} {\bf 10} (1971), 2623-2624.
863: \bibitem{unif} SantaLucia, J. Jnr. 
864: ``A unified view of polymer, dumbbell and oligonucleotide DNA nearest-neighbour thermodynamics'', {\em Proc. Natl. Acad. Sci.} {\bf 95} (1998), 1460-1465.
865: \bibitem{bas2} Bashford, J.D. and Jarvis, P.D. ``A base-pairing model of duplex formation I: Watson-Crick pairing geometries'', {\em Biopolymers} {\bf 78} (2005), 287-297.
866: \bibitem{santa} SantaLucia, J. Jnr., Allawi, H.T. and Seneviratne, P.A. ``Improved nearest-neighbour parameters for predicting DNA duplex stability'', {\em Biochemistry} {\bf 35} (1996), 3555-3562.
867: \bibitem{nonk} Yakushevich, L.V. ``Scattering of neutrons and light by DNA solitons'', {\em Stud. Biophys.} {\bf 103} (1984), 171-178.
868: \bibitem{yak2} Yakushevich, L.V. ``The effects of damping, external fields and inhomogeneity on the nonlinear dynamics of bioploymers'', {\em Stud. Biophys.} {\bf 121} (1987), 201-207.
869: \bibitem{mol4} Molineux, I.J. ``No syringes please, ejection of phage T7 DNA from the virion is enzyme driven'', {\em Mol. Microbiol.} {\bf 40} (2001), 1-8.
870: \bibitem{ben1} Benham, C.J. ``Sites of predicted stress-induced DNA duplex destabilization occur preferentially at regulatory regions'', {\em Proc. Natl. Acad. Sci.} {\bf 90} (1993), 2999-3003.
871: \bibitem{ben2} Wang, H., Noordewier, M. and Benham, C.J. ``Stress-Induced DNA Duplex Destabilization (SIDD) in the {\em E. coli} genome: SIDD sites are closely associated with promoters'', {\em Genome Research} {\bf 14} (2004), 1575-1584.
872: 
873: \end{thebibliography}
874: 
875: \section{Figure Captions}
876: \noindent
877: {\bf Figure 1:}
878: a) Effective potential (\ref{vpot}) for breathers in the initial T7 virion 
879: fragment. Initial binding sites for bacterial promoters are denoted by
880: dots; b) Noise parameter $\varepsilon(X)$ for the same sequence.
881: c) Evolution over 1000 time-steps of the system (\ref{dsc}) with 
882: breathers initially placed at sites 460, 570 and 680.
883: \\
884: 
885: \noindent
886: {\bf Figure 2:}
887: Effective potential (\ref{vpot}) for 30bp wide breathers in the class I region of the T7 genome. Filled and unfilled dots denote respectively UP or -35 {\em E. coli} and +1 T7 promoter sites. \\
888: 
889: \noindent
890: {\bf Figure 3:}
891: Potential (\ref{vpot}) computed for the T7 initial fragment
892: for $\mu=\pi/6.05$.
893: a) $\eta=0.002$, $L=30$ bp; b) $\eta=0.0004$ ($L=67$ bp); 
894: Dots denote, from left to right, UP and -35 sites for $A_{1}-A_{3}$ 
895: bacterial promoters.\\
896: 
897: \noindent
898: {\bf Figure 4:}
899: a) Location of minima of (\ref{vpot}) nearest initiation sites of T7 phage 
900: promoters; b) Scatter plot of initiation-downstream transcription unit 
901: distance (TU) versus initiation minima distance (Min).\\
902: 
903: \noindent
904: {\bf Figure 5:}
905: Representative region of T5 genome potential, showing correlations between 
906: potential minima and -35 sites for {\em E. coli} promoters ($L=30$ bp).
907: 
908: \begin{figure}[tbp]
909:  \centering{
910: \resizebox{14cm}{8cm}{\includegraphics{brIr.eps}}
911: }
912: \caption{}
913: \protect \label{j1}
914: \end{figure}
915: 
916: \begin{figure}[htb]
917:  \centering{
918: \resizebox{11cm}{6cm}{\includegraphics{br3II.eps}}
919: }
920: \caption{}
921: \protect \label{j2}
922: \end{figure}
923: 
924: \begin{figure}[htb]
925:  \centering{
926: \resizebox{14cm}{4cm}{\includegraphics{br3III.eps}}
927: }
928: \caption{}
929: \protect \label{j3}
930: \end{figure}
931: 
932: \begin{figure}[htb]
933:  \centering{
934: \resizebox{14cm}{4cm}{\includegraphics{brIVr.eps}}
935: }
936: \caption{}
937: \protect \label{j4}
938: \end{figure}
939: 
940: \begin{figure}[htb]
941:  \centering{
942: \resizebox{12cm}{7cm}{\includegraphics{br3V.eps}}
943: }
944:  \caption{}
945: \protect \label{j5}
946: \end{figure}
947: 
948: \end{document}
949: 
950: 
951: