1:
2:
3: %%%%%%%%%%%%%%%%%%%%%%%% Ams-Style %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4: %%%
5: %%% Style and Inputs
6: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7:
8: \documentclass[10pt]{amsart}
9: \usepackage{amsfonts}
10: \usepackage{epsfig}
11: \usepackage{latexsym}
12: \usepackage{amsmath,amssymb,amsfonts,amsthm,graphics}
13: \usepackage{eucal}% caligraphic-euler fonts: \mathcal{ }
14: \usepackage{eufrak}% frak-euler fonts: \mathfrak{ }
15: \usepackage[all]{xypic}
16: \usepackage{xspace}
17: %\usepackage{layout}% displays settings; use \layout in text
18:
19: %%%
20: %%%
21: %%%%%%%%%%%%%%%%%%%%%%%%% Pagestyle %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
22: %%%
23: %%%
24:
25: \renewcommand{\baselinestretch}{1.2}% spacing between lines
26:
27: %\hoffset=0truecm
28: %\voffset=0truecm
29: \textwidth=15truecm
30: \textheight=18truecm
31: \baselineskip=0.8truecm
32: \overfullrule=0pt
33: \parskip=0.8\baselineskip
34: \parindent=0truecm
35: \topmargin=0.5truecm
36: \headsep=1.2truecm
37: %\oddsidemargin=0.5in % options for double-side printouts
38: %\evensidemargin=0in
39:
40: %%%
41: %%%
42: %%%%%%%%%%%%%%%%%%%% New Settings %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43: %%%
44: %%%
45:
46: \theoremstyle{plain}
47: \newtheorem{theorem}{Theorem}
48: \newtheorem{corollary}{Corollary}
49: \newtheorem*{main}{Main~Theorem}
50: \newtheorem{lemma}{Lemma}
51: \newtheorem{proposition}{Proposition}
52:
53: \theoremstyle{definition}
54: \newtheorem{definition}{Definition}
55:
56: \theoremstyle{remark}
57: \newtheorem{remark}{Remark}
58: \newtheorem*{notation}{Notation}
59: \newtheorem{example}{Example}
60: \numberwithin{equation}{section}
61:
62:
63: \begin{document}
64:
65: %%%
66: %%%
67: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68: %%
69: %%%
70: \title[Neutral Networks of Sequence to Shape Maps]
71: {Neutral Networks of Sequence to Shape Maps}
72: \author{Emma Y. Jin, Jing Qin and Christian M. Reidys$^{\,\star}$}
73: \address{Center for Combinatorics, LPMC-TJKLC \\
74: Nankai University \\
75: Tianjin 300071\\
76: P.R.~China\\
77: Phone: *86-22-2350-5133-8013\\
78: Fax: *86-22-2350-9272}
79: \email{reidys@nankai.edu.cn}
80: \thanks{}
81: \keywords{combinatory map, component, diameter, neutral network, shape,
82: bipartite}
83: \date{June, 2007}
84: \begin{abstract}
85: In this paper we present a combinatorial model of sequence to shape maps.
86: Our particular construction arises in the context of representing nucleotide
87: interactions beyond Watson-Crick base pairs and its key feature is to
88: replace sterical by combinatorial constraints. We show that these combinatory
89: maps produce exponentially many shapes and induce sets of sequences which
90: contain extended connected sub graphs of diameter $n$, i.e.~we show that
91: exponentially many shapes have neutral networks.
92: \end{abstract}
93: \maketitle
94: {{\small
95: %\tableofcontents
96: }}
97: %%%
98: %%%
99: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
100: %%%
101: %%%
102:
103: \section{Introduction}
104:
105: %%%
106: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
107: %%%
108:
109: \subsection{Background}
110: Arguably one of the greatest challenges in present day biophysics is the
111: understanding of sequence structure relations of bio polymers. For one
112: particular class of bio polymers, the ribonucleic acid (RNA) secondary
113: structures, (Fig.~\ref{F:1}) molecular folding maps have been systematically
114: analyzed by Schuster~{\it et.al.} \cite{Fontana:98,Schuster:94,Schuster:02}.
115: %%%
116: %%%%%%%%%%%%%%%%%%%%%%%% Figures ex1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
117: %%%
118: \begin{figure}[ht]
119: \centerline{%
120: \epsfig{file=F11.eps,width=0.8\textwidth}\hskip15pt }
121: \caption{\small RNA secondary structures. Diagram representation
122: (top): the primary sequence, {\bf GAGAGCCUUUGGACCUCA}, is drawn
123: horizontally and its backbone bonds are ignored. All bonds are drawn
124: in the upper half plane and secondary structures have the property
125: that no two arcs intersect and all arcs have minimum length $2$.
126: Outer planar graph representation (bottom). } \label{F:1}
127: \end{figure}
128: %%%
129: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
130: %%%
131: Folding maps play a central role in understanding the evolution of molecular
132: sequences. Specific properties like, for instance {\it shape space covering}
133: \cite{Schuster:95B} and {\it neutral networks} (Fig.~\ref{F:2})
134: \cite{Reidys:97a} are critical
135: for what may be paraphrased as ``molecular computation by white noise''.
136: For instance, neutral networks played a central role in the {\it Science}
137: publication authored by E.~Schultes and P.~Bartels {\it One sequence, two
138: ribozymes: implications for the emergence of new ribozyme folds}, (v289, n5478,
139: 448-452) where the authors designed experimentally a single RNA sequence (whose
140: existence is implied by the intersection theorem in \cite{Reidys:97a}) that
141: folds into two different, non-related, RNA secondary structures
142: \cite{Clote:05}.
143: Exhaustive enumeration of sequence spaces and subsequent detailed analysis
144: of the mappings for {\bf G},{\bf C}-sequences of length $30$ were undertaken
145: in \cite{Gruener:95a,Gruener:95b}. In addition detailed analysis of
146: neutral networks as well as exhaustive enumeration of
147: {\bf G},{\bf C},{\bf A},{\bf U}-sequences can be found in \cite{Goebel:04}.
148: The findings were intriguing. Folding maps into RNA secondary structures
149: exhibit a collection of distinct properties which makes them ideally suited
150: for evolutionary optimization.\\
151: {\sf (a)} Many structures have preimages of sequences (neutral networks)
152: which have large components and large diameter.\\
153: {\sf (b)} Many structures have the property that any two of them have neutral
154: networks that come close in sequence space.\\
155: Obviously, {\sf (a)} is of central importance in the context of
156: neutral evolution. Since replication is erroneous and only few if
157: not single nucleotides can be exchanged the preimages of structures
158: must contain large connected components.
159: %%%%%%%%%%%%%
160: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
161: %%%%%%%%%%%%%%
162: \begin{figure}[ht]
163: \centerline{%
164: \epsfig{file=f1.eps,width=0.95\textwidth}\hskip15pt } \caption{\small
165: The neutral network of a structure. Sequence space (right) and shape
166: space (left) represented as lattices.
167: We draw the edges between two sequences bold if they map into the one
168: particular structure on the left. The two key properties of neutral nets
169: are their connectivity and percolation. They allow sequences to move
170: while maintaining a shape through sequence space.} \label{F:2}
171: \end{figure}
172: %%%%%%%%%%%
173: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
174: %%%%%%%%%%%
175: {\sf (b)} showed that (many) new structures can easily be found
176: during a random walk on a neutral network using only steps in which
177: a single nucleotide is altered (point mutations).
178:
179: Folding maps, however, are not obtained analytically. They are a result of a
180: computer algorithm, based on the combinatorial analysis of RNA
181: secondary structures pioneered by Waterman~{\it et.al.}
182: \cite{Waterman:94,Waterman:78a,Waterman:79}. It has to be remarked
183: in this context that comparative sequence analysis \cite{Woese:93,Puglisi:99}
184: provides more reliable means for determining the secondary structure
185: of biological RNA \cite{Doudna:99}, i.e.~folding
186: maps represent already an abstraction.
187: In order to step beyond the secondary structure paradigm two main approaches
188: with distinct goals are: (1) to study more advanced nucleotide interactions
189: in RNA, like for instance pseudoknots, base triples or (2) consider genuine
190: abstractions of molecular structures not aiming to model a biophysical
191: folding map. In \cite{Reidys:07rna1} we
192: pursue the first by developing the combinatorics of RNA structures
193: with pseudoknots and in this contribution the second by studying
194: combinatory maps. While (1) eventually produces the mathematical
195: framework enabling us to derive more advanced representations (which
196: eventually result in folding algorithms capable of producing
197: structures like phenylalanine tRNA) (2) provides insights on the
198: core question of which principles produce sequence to structure maps
199: suitable for evolution. A type (2) abstraction inevitably evokes
200: skepticism since what can possibly be gained if no attempt is made
201: to mimic the biological reality? However, we argue that sometimes it
202: is exactly the right strategy to fundamentally understand the object under
203: investigation.
204:
205: \subsection{Structures and correlations}
206: A well studied class of maps over sequence spaces are the
207: NK-landscapes introduced by Kauffman \cite{Kauffman:93}, where each
208: index (locus) of a binary $n$-tuple viewed as the genotype composed
209: by $n$ loci is randomly linked to K other indices. The idea is that
210: a locus $i$ makes a contribution to the total fitness of the
211: genotype which depends on the value of the allele ($0$ or $1$) at
212: $i$ and the values at each of the epistatically linked loci. To each
213: of those $2^{{\rm K}+1}$ combinations there is a value (fitness)
214: assigned uniformly at random. The apparent lack of neutrality led
215: Barnett \cite{Barnett:98} to refine NK-landscapes by NKp-landscapes,
216: introducing a probability $p$ with which an arbitrarily chosen
217: allelic combination makes no contribution to the fitness. Our
218: approach is connected to Kauffmann's intuition in that we consider a
219: molecular structure as a combinatorial representation of
220: nucleotide-correlations. As for nucleotide-correlations
221: observations {\sf (a)} and {\sf (b)} are not bound to the particular
222: concept of RNA secondary structures. For instance Stadler {\it et.al.}
223: \cite{SchusterStadler:99}
224: as well as Bastolla {\it et.al.} have shown \cite{Porto:03} that neutral
225: networks exist for proteins, where nucleotide interactions are much more
226: involved \cite{Reidys:00p1}.
227: Therefore it is certainly not the uniqueness of Watson-Crick base pairings
228: implying the existence of neutral networks. Our particular approach
229: comes from this correlation perspective and observations from molecular
230: interaction in RNA molecules.
231: %%%
232: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
233: %%%
234: \begin{figure}[ht]
235: \centerline{%
236: \epsfig{file=f2.eps,width=0.8\textwidth}\hskip15pt } \caption{\small
237: Beyond secondary structures. Suppose we are given an abtract alphabet
238: $\{{\bf A},{\bf B},{\bf C},{\bf D}\}$ with base pairs
239: $\{\{{\bf A},{\bf B}\},\{{\bf D},{\bf C}\},\{{\bf D},{\bf B}\}\}$.
240: We present diagram representations of a secondary structure (top),
241: $3$-noncrossing structure (middle) and a $2$-diagram structure (bottom).
242: The difference between the first two structures is the crossing of bonds
243: and the difference between the second two is the number of interactions
244: for a nucleotide.}
245: \label{F:3}
246: \end{figure}
247: %%%
248: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
249: %%%
250: First there are secondary and tertiary interactions \cite{Doudna:99},
251: the latter typically involving secondary structural elements.
252: Furthermore interaction within RNA molecules can be categorized into
253: three classes, helix-helix interaction, loop/bulge-helix and
254: loop-loop interaction \cite{Westhof:92a,Doudna:99}. The structure of
255: phenylalanine tRNA, and the hammerhead ribozyme \cite{Wedekind:98}
256: have served as paradigms in this context. Base
257: triples and tetra-loops, as well as pseudoknots,
258: \cite{Westhof:92a,Konings:95a,Chamorro:91a,Science:05a}
259: representing loop-loop interactions have led to generalizations of
260: the secondary structure concept. These interactions are subject to
261: steric constraints arising from the biochemistry of the interactions
262: involved.
263: These observations give rise to two different combinatorial abstractions:
264: the consideration of $k$-noncrossing chemical bonds and of
265: $2$-diagrams i.e.~a graph whose vertices are drawn as a horizontal
266: line having degree less than two (and the combination of them,
267: $k$-noncrossing $2$-diagrams). The notion of $k$-noncrossing arises
268: naturally in the context of pseudoknots leading to the concepts of
269: $k$-noncrossing RNA structures \cite{Reidys:07rna1} and to Stadler's
270: bi-secondary structures \cite{Stadler:99} (which are exactly the
271: planar $3$-noncrossing RNA structures). The notion of $2$-diagrams
272: comes up when restricting nucleotide interactions to at most two and
273: therefore allowing the expression or interactions of secondary
274: structure elements.
275:
276: %%%
277: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
278: %%%
279: \section{The Basic Construction}
280: %%%
281: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
282: %%%
283: The notion of $2$-diagrams discussed in the introduction is exactly the
284: motivation of our
285: particular approach. In the following we detail how to derive molecular
286: shapes in which each nucleotide has at most two interactions but which, in
287: difference to biophysical structures, have combinatorial constraints
288: on their nucleotide interactions.
289: This idea is to the best of our knowledge new.
290: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
291: For a given alphabet base pairing rules specify which nucleotides
292: can pair. However, not any two nucleotides are able to establish a
293: bond. For instance, they may be restricted by conditions like no two
294: edges can cross each other when representing a shape as a diagram
295: \cite{Stadler:99}. The non-crossing condition and uniqueness of base
296: pairs are two key properties of RNA secondary structures and allow
297: for Motzkin-path enumeration and tree bijections
298: \cite{Waterman:94,Waterman:79,Zuker:79,Waterman:78a,Schuster:98}. We
299: replace these restrictions on nucleotide interactions by stipulating
300: that {\sf (a)} there exists some base graph $H$ whose sole purpose
301: is to restrict all possible correlations and {\sf (b)} we are given
302: a symmetric relation $\mathcal{R}$, tantamount to a base pairing
303: rule. In order to avoid any confusion we work over the abstract
304: alphabet $\{{\bf A},{\bf B},{\bf C},{\bf D}\}$.
305:
306: %%%
307: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
308: %%%
309: \begin{figure}[ht]
310: \centerline{%
311: \epsfig{file=f3.eps,width=1.0\textwidth}\hskip20pt
312: }
313: \caption{\small Combinatory maps: the base graph $\mathcal{H}$ is displayed
314: on the l.h.s.. The r.h.s.~shows two shapes $\mathcal{S}_1$ and
315: $\mathcal{S}_2$ with two particular sequences that are contained in
316: their respective preimages. For both sequences the shapes are
317: maximal, i.e.~not a single $\mathcal{H}$-edge can be drawn without
318: violating base pairing rules, here $\{\{{\bf A},{\bf B}\}$, $\{{\bf D},
319: {\bf C}\}$, $\{{\bf D},{\bf B}\}\}$.}
320: \label{fig:base}
321: \end{figure}
322: %%%
323: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
324: %%%
325: In this framework a shape $\mathcal{S}$ of a sequence is then the unique
326: maximal $H$-subgraph subject to the property that for any
327: $\mathcal{S}$-edge the incident nucleotides satisfy $\mathcal{R}$.
328: It is remarkable that this simple definition already produces a well
329: defined sequence to structure map! Moreover this definition is in line
330: with the biological point of view: mapping sequences into shapes
331: rather than fixing some shape and then to consider its sequences.
332: It now can be asked what the right choice of $H$ should be and how robust
333: the respective conclusion are. As for dependency on $H$ the answer is that
334: it a.s.~(almost surely in the sense of random graph theory, i.e.~in the
335: limit of long sequences) depends on the number of edges, only.
336: Therefore, the choice of $H=\mathcal{H}$ is not critical for the validity of
337: the main results. To understand why, we consider a generalization of the
338: concept of combinatory maps, i.e.~combinatory maps induced the random graph
339: $G_{n,p}$ (the random graph in which each edge is selected with independent
340: probability $p$). In the sub-critical phase these random combinatory maps
341: a.s.~produce, modulo constants, all properties of the maps induced by
342: $\mathcal{H}$ (Theorem~\ref{T:nn}).
343:
344: %%%
345: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
346: %%%
347: {\bf Theorem}\cite{Reidys:07priv} {\bf (Neutral networks)} {\it
348: Let $p_n=\frac{1-\epsilon}{n}$, $\beta<\sqrt{2}$ and
349: suppose $\omega_n$ tends to $\infty$
350: arbitrarily slowly and $\vartheta_{G_{n,p}}$ is a random combinatory map.
351: Then there exist with high probability at least $\beta^n$ shapes $\mathcal{S}$
352: with the following two properties:\\
353: {\sf (I)} the set of all sequences mapping into $\mathcal{S}$ has a connected
354: component of size at least $\left(\sqrt{2}\right)^{n}$\\
355: {\sf (II)} the set of all sequences mapping into $\mathcal{S}$ percolates, i.e.
356: has diameter $n-\omega_n$.\\
357: }
358: %%%
359: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
360: %%%
361:
362: The great advantage of choosing $H=\mathcal{H}$ is the simplicity and
363: algorithmic nature of all proofs. We can explicitly construct all paths
364: involved by diagram chasing. In contrast, the proof of the above
365: result is based on a non trivial analysis of tree components in the
366: random graph $G_{n,p}$.
367: We have
368: the following situation: let $H$ be a graph over $\{1,2,\ldots,n\}$,
369: $\mathcal{A}= \{{\bf A},{\bf B},{\bf D}, {\bf C}\}$ and $Q_4^n$ be
370: the generalized $n$-cube, $Q_4^n$, i.e.~the graph over the sequences
371: $(x_1,\dots,x_n)$, where $x_i\in\mathcal{A}$ and in which two
372: sequences are adjacent if they differ in exactly one nucleotide. Let
373: $d(v,v')$ be the number of nucleotide by which $v$ and $v'$ differ.
374: A component of a graph $H$ is a maximal connected subgraph. We
375: consider relations $\mathcal{R}$ over the abstract alphabet
376: $\mathcal{A}=\{{\bf A},{\bf B},{\bf D}, {\bf C}\}$,
377: i.e.~$\mathcal{R}\subset \mathcal{A}\times \mathcal{A}$ satisfying
378: the following three conditions
379: \begin{eqnarray}
380: \label{E:1} (x,y)\in \mathcal{R} & \Leftrightarrow & (y,x)\in \mathcal{R} \\
381: \label{E:2} (x,y)\in \mathcal{R} & \Rightarrow &\ x\neq y \\
382: \label{E:3} \forall\, x\neq z\quad (x,y)\in
383: \mathcal{R}\,\wedge\,(y,z)\in \mathcal{R} & \Rightarrow &
384: (x,z)\not\in \mathcal{R} \ .
385: \end{eqnarray}
386: These conditions are motivated from abstracting form $2$-D and $3$-D
387: interactions of the phenylalanine tRNA and the hammerhead ribozyme
388: \cite{Doudna:99}. In both molecules mutual interactions of $3$-nucleotides
389: are absent but multiple pair interactions are responsible for the tertiary
390: structure. In view of eq.~(\ref{E:1}) and eq.~(\ref{E:2}) each relation can be
391: viewed as a graph over $\{{\bf A},{\bf B},{\bf D},{\bf C}\}$ and obviously,
392: eq.~(\ref{E:3}) is equivalent to this graph being
393: bipartite\footnote{For instance, it is easy to check that the relation
394: implied by all Watson-Crick base pairs (i.e.~\{({\bf A},{\bf U}),({\bf
395: U},{\bf A}),({\bf G},{\bf C}),({\bf C},{\bf G})\}) and \{({\bf G},{\bf
396: U}),({\bf U},{\bf G})\}, satisfy
397: conditions eq.~(\ref{E:1}), eq.~(\ref{E:2}) and eq.~(\ref{E:3}).}.
398: We will be particularly interested in the base pairing rule
399: $\mathcal{R}^\dagger$ represented as the graph $\diagram {\bf A}\rline
400: &{\bf B} \rline & {\bf D}\rline & {\bf C}
401: \enddiagram$ i.e.~we allow for the following interactions:
402: $\{\{{\bf A},{\bf B}\},\{{\bf D},{\bf C}\},\{{\bf D},{\bf B}\}\}$.
403: In this sense our nucleotide interactions are more general than those
404: of RNA secondary structures since, for instance, we can express coaxial
405: stacking of helical regions and the formation of isosteric
406: ${\bf C}\cdot {\bf G}-{\bf G}$ triples \cite{Doudna:99}.
407: We introduce the $H$-subgraph $H_{\mathcal{R}}(v)$ having vertex and edge set
408: given by
409: \begin{equation}
410: V_{H_{\mathcal{R}}(v)}=\{1,\dots,n\}, \quad \text{\rm and}\quad
411: E_{H_{\mathcal{R}}(v)}=\{\{i,k\} \mid \{i,k\}\,\text{\rm is an
412: $H$-edge and}\,
413: (x_i,x_k)\in \mathcal{R}\}
414: \end{equation}
415: and call $H_{\mathcal{R}}(v)$ a shape $\mathcal{S}$ and the mapping
416: $
417: \vartheta_{H}:Q_4^n \longrightarrow \{\mathcal{S}\mid
418: \mathcal{S}=H_{\mathcal{R}}(v)\}
419: $
420: a combinatory map. Note that the above construction entails an implicit
421: notion of maximality, i.e.~ a shape of a sequence $(x_1,\dots,x_n)$ is the
422: maximal $H$-subgraph which satisfies $\mathcal{R}^\dagger$ for all
423: $2$-sets of coordinates $\{x_i,x_j\}$, $\{i,j\}$ being a $H$-edge.
424: In this sense a shape represents a saturated structure.
425: As for $\mathcal{H}$, suppose first $n$ is even. We set
426: $C_{n}(1)$ to be the graph over $\{1,\dots,n\}$ with edge set
427: $\{i,i+1\}$ where the vertices are labeled modulo $n$.
428: Let $\sigma_n$ some permutation of $n$-letters, we then set
429: $C_{n}(\sigma_{n})$ with edges $\{\sigma_n(i),\sigma_n(i+1)\}$
430: and $\mathcal{H}=C_{n}(\sigma_n)$. Next assume $n$ is odd. Then we
431: select an arbitrary element of $\{1,\dots,n\}$, say $u$ and define
432: $\mathcal{H}=C_{n-1}(\sigma_{n-1})\cup \{u\}$ i.e.~the graph
433: with edges $\{\sigma_{n-1}(i),\sigma_{n-1}(i+1)\}$ for $i\neq u$ and
434: $i+1\neq u$, where $\sigma_{n-1}$ is an arbitrary permutation of
435: $\{1,\dots,n\}\setminus \{u\}$. To summarize we have
436: \begin{equation}\label{E:H}
437: \mathcal{H}=
438: \begin{cases}
439: C_n(\sigma_n) & \ \text{\rm for $n$ even}\\
440: C_{n-1}(\sigma_{n-1})\cup \{u\} &\ \text{\rm for $n$ odd} \ .
441: \end{cases}
442: \end{equation}
443: %%%
444: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
445: %%%
446:
447:
448: \section{Shapes}
449:
450: %%%
451: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
452: %%%
453: In this section we answer the following basic questions: \\
454: {\sf (1)} What is the relation between base pairing rules and the
455: resulting molecular shapes? \\
456: {\sf (2)} How many shapes does a combinatory map have?\\
457: {\sf (3)} Are there ``many'' shapes with large sets of sequences folding
458: into them?\\
459: All of the above properties are central for RNA secondary structures and
460: none of them can be answered analytically, despite the fact that we have
461: generating functions for RNA secondary structures.
462: For instance, it is impossible to
463: assess {\it a priori} how many secondary structures have an actual sequence
464: folding into them. The number of RNA structures that actually occur as minimum
465: free energy structures can be much smaller than the total number. For $n=16$,
466: due to finite size effects for the RNA folding, only $63\%$ of the possible
467: RNA structures are realized as minimum free energy structures
468: \cite{Goebel:04}.
469:
470: Let us begin by providing some more background: graph $H'$ is
471: called an induced subgraph of $H$ iff there exists some set
472: $M\subset \{1,\dots,n\}$ such that $E_{H'}=\{\{i,j\}\mid \{i,j\}\in
473: E_H\,\wedge i,j\in M\}$. Intuitively, induced subgraphs come from
474: vertex sets and are far more restricted that arbitrary subgraphs. We
475: now give a simple example of the fact that not every bipartite
476: subgraph of a shape is a shape. For instance, consider
477: $\vartheta_H:Q_4^6\longrightarrow \{H'<H\}$ where
478: \begin{equation}
479: H=\diagram
480: {\bf 1} \ar@{-}[r] \ar@{-}[d]
481: & \ar@{-}[r] {\bf 4} & {\bf 5} \\
482: {\bf 2} \ar@{-}[r]
483: & \ar@{-}[u] {\bf 3} \ar@{-}[r] & {\bf 6}\ar@{-}[u] \\
484: \enddiagram
485: \quad\text{\rm and}\quad
486: H_0=\diagram
487: {\bf 1} \ar@{.}[r] \ar@{-}[d]
488: & \ar@{-}[r] {\bf 4} & {\bf 5} \\
489: {\bf 2} \ar@{-}[r]
490: & \ar@{-}[u] {\bf 3} \ar@{.}[r] & {\bf 6}\ar@{-}[u] \\
491: \enddiagram
492: \end{equation}
493: where the dotted lines represent missing edges. Clearly, $H$ is bipartite
494: and it is easy to check that indeed $H=H({\bf D},{\bf C},{\bf D},{\bf C},
495: {\bf D},{\bf C})$, $H$ holds. Therefore $H$ is a shape but $H_0$ is not.
496: Every sequence realizing $H_0$ has necessarily either
497: {\bf A} at {\bf 1}, and {\bf C} at {\bf 4} or vice versa. In the first case
498: {\bf D} is necessarily at {\bf 3} and {\bf 5}, which leaves no valid choice
499: for {\bf 6}. The second case follows analogously.
500:
501: This is insofar remarkable since making the universal graph $H$
502: (being responsible for all interactions) more complex can simply
503: imply that not all of its subgraphs can be folded by sequences. This
504: is due, as the example indicates, to the nature of the base pairing
505: rule and shows clearly that both: $H$ and $\mathcal{R}$ determine what
506: is a shape and what is not.
507: For simple base graphs, like for instance $\mathcal{H}$, the lemma below
508: shows that {\it any} subgraph (eq~(\ref{E:H})) is a shape.
509: What we can deduce from this is (a) there exist many shapes and (b)
510: $\mathcal{H}$ is so simple that it is indeed only $\mathcal{R}^\dagger$
511: that is relevant for the shapes.
512: The result is
513:
514: %%%
515: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
516: %%%
517: \begin{lemma}\label{L:bip}
518: Suppose $H$ is an arbitrary combinatorial graph over
519: $\{1,\dots,n\}$. \\
520: {\sf (a)} For any relation $\mathcal{R}$ any shape $\mathcal{S}$ is
521: bipartite.\\
522: {\sf (b)} For the relation $\mathcal{R}^\dagger$ and arbitrary base
523: graph $H$, any induced,
524: bipartite subgraph of $H$ is a shape.\\
525: {\sf (c)} For the relation $\mathcal{R}^\dagger$ and the base graph
526: $\mathcal{H}$
527: any $\mathcal{H}$-subgraph $H'$ is a shape.
528: \end{lemma}
529: %%%
530: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
531: %%%
532:
533: Since any $\mathcal{H}$-subgraph is a shape we have for instance for sequences
534: of length $16$ exactly $2^{16}=65536$ different shapes in difference to only
535: $274$ RNA secondary structures realized by the minimum free energy folding
536: analyzed in \cite{Goebel:04}. This seems to indicate a vast difference between
537: combinatory maps and RNA secondary structure folding, however, closer
538: inspection reveals that in fact most of these structures are very ``rare'',
539: i.e.~only a few have large preimage sizes.
540: To understand what is happening we present in Figure~\ref{fig:pre} the data
541: on the complete mapping from sequences of length $16$ into subgraphs of the
542: cycle $\mathcal{H}_{16}$.
543: We plot the logarithm of the preimage sizes of a combinatory map over the
544: logarithm of the rank. We can deduce from Figure~\ref{fig:pre} that there
545: are $393$ shapes with a preimage of size greater than $0.5\times 10^6$. The
546: data on RNA secondary structures in \cite{Goebel:04} show that there are
547: $132$ RNA minimum free energy structures with this property.
548: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
549: \begin{figure}[ht]
550: \centerline{%
551: \epsfig{file=fig4_cm.eps,width=0.75\textwidth}\hskip20pt
552: }
553: \caption{\small A double logarithmic plot (base $10$) of the preimage
554: sizes of a combinatory map for $n=16$ as a function of the rank. The
555: underlying graph $\mathcal{H}_{16}$ is displayed in the lower right. The plot
556: shows that there are a few shapes with large and many shapes with very small
557: preimages. This observation is in complete analogy with RNA secondary structure
558: folding maps.
559: }
560: \label{fig:pre}
561: \end{figure}
562: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
563: Figure~\ref{fig:pre} shows that combinatory maps exhibit $393$ shapes
564: with a preimage of size greater than $0.5\times 10^6$. As for RNA secondary
565: structures the data in \cite{Goebel:04} show that there are $132$ RNA minimum
566: free energy structures with this property. But what happens for larger
567: sequence length? The asymptotics of RNA secondary
568: structures \cite{Schuster:98,Reidys:07rna2} shows that the number of RNA
569: secondary structures, ${\sf S}_2(n)$, satisfies ${\sf S}_2(n)\sim \kappa\,
570: n^{-\frac{3}{2}}\alpha^n$ where $1.8488\le \alpha \le 2.64$, depending on
571: what one considers a ``realistic'' secondary structure. In comparison a
572: combinatory map produces (Lemma~\ref{L:bip}) $2^n$ shapes.
573: Therefore combinatory maps produce a total number of structures
574: which is, for large $n$, in a comparable size-range.
575:
576: The above observations motivate the question about the number of
577: shapes with large preimages \cite{Stroh:01}. For notational convenience let
578: \begin{equation}\label{number:1}
579: \mu_+=\left(\frac{1+\sqrt{5}}{2} \right)\qquad \text{\rm and}\qquad
580: \mu_-= \left(\frac{1-\sqrt{5}}{2} \right) \ .
581: \end{equation}
582: We next prove that there are many shapes with large preimages
583: %%%
584: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
585: %%%
586: \begin{lemma}\label{L:many}
587: Suppose the relation $\mathcal{R}^\dagger$ and the base graph
588: $\mathcal{H}$ are given, then there exist at least
589: $\left(\sqrt{2}\right)^{n-1}$
590: shapes with the property that there are at least $2(\mu_+^n+\mu_-^n)$ sequences
591: folding into them.
592: \end{lemma}
593: %%%
594: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
595: %%%
596: Lemma~\ref{L:many} sets the stage for the further investigation of how this set
597: of sequences is organized. Now, knowing that there are exponentially large
598: sets of sequences realizing particular shapes what can be said about their
599: organization? Are they randomly distributed or clustered in sequence
600: space? What is their graph-structures as induced subgraphs of sequence space?
601:
602:
603: %%%
604: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
605: %%%
606:
607: \section{Neutral networks of Combinatory Maps}
608:
609: %%%
610: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
611: %%%
612:
613: One difficulty in the context of neutral networks is that it is practically
614: impossible to prove they exist. Exhaustive enumeration of sequence spaces is
615: limited to small sequence length $n\le 20$ for four letter alphabets
616: \cite{Gruener:95a} and the results are of limited value since finite size
617: effects distort the picture. In case of
618: ${\bf A},{\bf U},{\bf G},{\bf C}$-sequences about $60$\% of all sequences fold
619: into the open structure \cite{Goebel:04}. Several attempts have been made to
620: derive somewhat local criteria whether neutral networks exist \cite{Forst:00},
621: where the key idea is the probing for paths adopted from the actual random
622: graph proof in \cite{Reidys:97a,Reidys:02p2}.
623: In this context local parameters are the only quantities that give some clue
624: about the existence and properties of neutral networks.
625: In case of neutral networks modeled as random graphs, it is the number of
626: neutral neighbors that controls global properties like connectivity
627: and density of the corresponding neutral network. A neutral neighbor is a
628: neighboring sequence which folds into the same structure and the fraction
629: \cite{Reidys:97p}
630: \begin{equation}
631: \lambda^* = 1-\sqrt[\alpha-1]{\alpha^{-1}}
632: \end{equation}
633: is actually the threshold value for connectivity and density. In the following
634: we can derive for combinatory maps the entire distribution of neutral neighbors
635: of particular shapes. The result is actually not ``local'' at all and entails
636: detailed information about the {\it entire} preimage of these shapes. To be
637: precise we can actually derive the underlying rational generating function
638: using the transfer matrix method of enumerative combinatorics.
639: We study the quantity $\lambda_{\mathcal{S}_M}(m)$ being the number of
640: sequences folding into the particular shape $\mathcal{S}$ having exactly $m$
641: neutral neighbors. Our result reads
642: %%%
643: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
644: %%%
645: \begin{theorem}\label{T:NN}
646: For arbitrary shape $\mathcal{S}_M$, where $M\subset \{1,\dots,k\}$ denotes its
647: set of isolated nucleotides, we have
648: \begin{equation}\label{E:ist}
649: \forall\, m\in \mathbb{N}\colon \lambda_{\mathcal{S}_M}(m) \ge
650: \lambda_{C_{2k}}(m)
651: \end{equation}
652: and the generating function of $\lambda_{C_{2k}}(m)$, $F(x,y)=\sum_{k\ge 2}
653: \sum_{m}\lambda_{C_{2k}}(m)x^{m}y^{2k}$ is given by
654: \begin{equation}\label{E:generate}
655: F(x,y)= \frac{2(-4x^{3}y^{6}+2x^{2}y^{6}+3x^{2}y^{4}-5+4x^{2}y^{2}+8xy^{2}-
656: 6x^{3}y^{4}+2x^{4}y^{6})}{-2x^{3}y^{6}+x^{2}y^{6}+x^{2}y^{4}-1+2xy^{2}+
657: x^{2}y^{2}-2x^{3}y^{4}+x^{4}y^{6}}.
658: \end{equation}
659: \end{theorem}
660: %%%
661: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
662: %%%
663: The bi-variate function $F(x,y)$ provides detailed information about
664: neutral neighbors, of the entire preimages of shapes
665: $\mathcal{S}_M$. For instance, Taylor expansion of
666: eq.~(\ref{E:generate}) yields
667: $$
668: F(x,y)=10+(2x^2+4x)y^2+(12x^2+2x^4)y^4+(6x^2+16
669: x^3+12x^4+2x^6)y^6+O(y^{8})
670: $$
671: and the term $(12x^2+2x^4)y^4$ shows that for $n=4$ there are at least $12$
672: vertices with $2$ and $2$ vertices with $4$ neutral neighbors. Likewise, for
673: $n=6$, there are at least $6$ with $2$, $16$ with $3$, $12$ with $4$ and $2$
674: vertices with $6$ neutral neighbors. In addition eq.~(\ref{E:ist}) guarantees
675: that $\mathcal{H}$ itself provides a lower bound on the numbers of neutral
676: neighbors. I.e.~we can pinpoint a specific reference shape providing key
677: information about the neutrality of the entire combinatory map.
678: %%
679: %%%
680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
681: %%%
682: \begin{figure}[ht]
683: \centerline{
684: \epsfig{file=fig3_cm.eps,width=0.75\textwidth}\hskip20pt
685: }
686: \caption{\small The distribution of neutral neighbors for the entire
687: preimage of the ``reference'' shape $\mathcal{S}=\mathcal{H}_{40}$,
688: where $n=40$ denotes the sequence length.
689: We plot the fequency (y-axis) of numbers
690: of neutral neighbors (x-axis) obtained from Theorem~\ref{T:NN}.
691: Note that the degree of a vertex in $Q_4^{40}$ is $120$, showing that
692: the lower bounds on the fractions of neutral neighbors range between
693: $13$\% and $24$\% . }
694: \label{fig:$C_{40}$}
695: \end{figure}
696: %%%
697: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
698: %%%
699:
700: In the previous section we have shown that there are many shapes
701: with large preimages. However, it is not obvious what the graph
702: structure of these preimages is. In this section we will study this
703: structure in detail and prove two remarkable properties. First there
704: are many shapes with sets of sequences having diameter $n$
705: i.e.~there exist two sequences which differ in {\it all} nucleotides
706: both of which map into the particular shape. This finding
707: is tantamount to percolation and indicates that the preimages are indeed
708: extended and not confined in some ``local'' region of sequence
709: space. Secondly we prove that the preimages of exponentially many shapes
710: contain large connected components. In other words we can
711: actually prove the existence of neutral networks for sequence to
712: shape maps, i.e.~many shapes have sets of sequences in which there exists
713: a component of size $\ge \left(\sqrt{2}\right)^{n}$ and of diameter $n$.
714:
715: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
716: %%%%%%%%%%%%%%
717: \begin{figure}[ht]
718: \centerline{%
719: \epsfig{file=f4.eps,width=0.40\textwidth}\hskip15pt } \caption{\small
720: Neutral network. Sequence space is represented as lattice and the neutral net
721: is an induced subgraph (bold edges). We label the pairs of sequences
722: representing antipodal pairs by $({\sf A},{\sf B})$ and $({\sf C},{\sf D})$.
723: The two key properties of neutral nets are their connectivity and
724: percolation.} \label{F:2}
725: \end{figure}
726: %%%%%%%%%%%
727: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
728: %%%
729: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
730: %%%
731: \begin{theorem}\label{T:nn}{\bf (Neutral networks)}
732: Suppose the relation $\mathcal{R}^\dagger$ and the base graph $\mathcal{H}$ are
733: given. Then there exist at least $\left(\sqrt{2}\right)^{n-1}$ many shapes
734: $\mathcal{S}$ with the properties\\
735: {\sf (I)} the set of all sequences mapping into $\mathcal{S}$ has a connected
736: component of size at least $\mu_+^n+\mu_-^n$.\\
737: {\sf (II)} the set of all sequences mapping into $\mathcal{S}$ percolates, i.e.
738: has diameter $n$.\\
739: \end{theorem}
740: %%%
741: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
742: %%%
743: In comparison with the corresponding result for random graphs we
744: observe that the neutral networks are in fact slightly bigger and the
745: diameter indeed {\it equals} $n$. This is a result from the fact that the
746: simpler graph $\mathcal{H}$ allows for a different proof, which is very
747: algorithmic. In fact the proof indicates how to
748: explicitly obtain these paths of diameter $n$, while the random
749: graph analogue can only produce their existence. In this sense both
750: constructions complement each other. To illustrate the idea of
751: Theorem~\ref{T:nn} we consider the cycle $\mathcal{H}_4$ and the
752: shape $\mathcal{S}=\mathcal{H}_4$. Then we have the following situation (using
753: the notation of the proof of Theorem~\ref{T:nn})
754: $$
755: a^\varnothing=({\bf C},{\bf D},{\bf C},{\bf D}) \quad \text{\rm and }
756: \quad C_4(({\bf C},{\bf D},{\bf C},{\bf D}))=C_4 \ .
757: $$
758: Theorem~\ref{T:nn} guarantees the existence of the antipodal sequence
759: $\tilde{a}^\varnothing=({\bf B},{\bf A},{\bf B},{\bf A})$ and a path
760: connecting $a^\varnothing$ and $\tilde{a}^\varnothing$ obtained via
761: the steps {\sf (a)}, {\sf (b)} and {\sf (c)}. Explicitly this path
762: for $\mathcal{S}_\varnothing$ from $a^\varnothing$ to
763: $\tilde{a}^\varnothing$ is given by {\small
764: $$
765: \underbrace{
766: \diagram
767: \fbox{{\bf C}} \ar@{-}[r] & {\bf D} \\
768: {\bf D} \uline\rline & {\bf C}\uline \\
769: \enddiagram
770: \ \mapsto
771: \diagram
772: {\bf B} \ar@{-}[r] & {\bf D} \\
773: {\bf D} \uline\rline & \fbox{{\bf C}}\uline \\
774: \enddiagram}_{\text{\rm step}
775: {\sf (a)}: \,\text{\rm replace {\bf C} by {\bf B}}}
776: \ \mapsto
777: \underbrace{\diagram
778: {\bf B} \ar@{-}[r] & {\bf D} \\
779: \fbox{{\bf D}} \uline\rline & {\bf B}\uline\\
780: \enddiagram
781: \ \mapsto
782: \diagram
783: {\bf B} \ar@{-}[r] & \fbox{{\bf D}} \\
784: {\bf A} \uline\rline & {\bf B}\uline\\
785: \enddiagram}_{{\rm step} {\sf (b)}: \,\text{\rm replace {\bf D} by {\bf A}}}
786: \ \mapsto
787: \diagram
788: {\bf B} \ar@{-}[r] & {\bf A} \\
789: {\bf A} \uline\rline & {\bf B}\uline\\
790: \enddiagram
791: $$
792: }
793:
794: Theorem \ref{T:nn} holds for many shapes. For instance the neutral path for
795: $\mathcal{S}_{\{1\}}$, which has length ${\sf diam}(Q_4^4)=4$ and
796: which connects the sequences $a^{\{1\}},\tilde{a}^{\{1\}}$ is given
797: by {\small
798: $$
799: \underbrace{
800: \diagram
801: {\bf A} \ar@{.}[r] & {\bf D} \\
802: {\bf D} \ar@{.}[u]\rline &\fbox{{\bf C}}\uline \\
803: \enddiagram
804: }_{{\rm step} {\sf (a)}: \text{\rm replace {\bf C} by {\bf B}}}
805: \ \mapsto
806: \underbrace{\diagram
807: {\bf A} \ar@{.}[r] & \fbox{{\bf D}} \\
808: {\bf D} \ar@{.}[u]\rline & {\bf B}\uline \\
809: \enddiagram
810: \ \mapsto
811: \diagram
812: {\bf A} \ar@{.}[r] & {\bf A} \\
813: \fbox{{\bf D}} \ar@{.}[u]\rline & {\bf B}\uline\\
814: \enddiagram}_{{\rm step} {\sf (b)}: \,\text{\rm replace {\bf D} by {\bf A}}}
815: \ \mapsto
816: \underbrace{
817: \diagram
818: \fbox{ {\bf A}} \ar@{.}[r] & {\bf A} \\
819: {\bf A} \ar@{.}[u]\rline & {\bf B}\uline\\
820: \enddiagram}_{{\rm step} {\sf (c)}: \,\text{\rm replace {\bf A} by {\bf C}}}
821: \ \mapsto
822: \diagram
823: {\bf C} \ar@{.}[r] & {\bf A} \\
824: {\bf A} \ar@{.}[u]\rline & {\bf B}\uline\\
825: \enddiagram
826: $$
827: }
828:
829:
830: %%%
831: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
832: %%%
833:
834:
835: %%%
836: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
837: %%%
838: \section{Appendix}
839: %%%
840: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
841: %%%
842:
843: {\bf Proof of Lemma}~\ref{L:bip}
844: To show {\sf (a)} we first prove that for any relation satisfying
845: eq.~(\ref{E:1}), eq.~(\ref{E:2})
846: and eq.~(\ref{E:3}) a shape $\mathcal{S}$ is bipartite. \\
847: {\it Claim.} Any closed walk in $\mathcal{S}$ has even length.\\
848: Since $\mathcal{S}$ is a shape we have $\mathcal{S}=H(v)$, whence
849: for any closed walk $w=(w_1,w_2\dots,w_r,w_1)$ in $\mathcal{S}$
850: there exists at least one sequence
851: $x=(x_{w_1},x_{w_2},\dots,x_{w_r},x_{w_1})$, where $x_h\in
852: \{\bf{A},\bf{U},\bf{G},\bf{C}\}$. Therefore there exists an
853: injection
854: \begin{eqnarray*}
855: \{(x_{w_1},x_{w_2},\dots,x_{w_r},x_{w_1})\mid w \
856: \text{\rm is a closed walk in $\mathcal{S}$}\}
857: \longrightarrow \{\gamma \mid \text{\rm $\gamma$ is a closed walk in
858: $G(\mathcal{R})$}\}
859: \end{eqnarray*}
860: The idea is to show that
861: $$
862: \{\gamma \mid \text{\rm $\gamma$ is a closed walk in $G(\mathcal{R})$ of odd
863: length}\}=\varnothing \ .
864: $$
865: Suppose $\gamma$ is a closed walk of minimal, odd length in
866: $G(\mathcal{R})$. Obviously, there are only $4$ vertices in
867: $G(\mathcal{R})$. We can conclude from this that $\gamma$
868: contains a cycle of length $3$ which is in view of eq.~(\ref{E:3})
869: impossible, whence the claim.\\
870: We next select an arbitrary vertex, $i\in \{1\,\dots n\}$ and color
871: all vertices in even distance to $i$ blue and all vertices in odd distance
872: red. Suppose this procedure leads to two monochromatic adjacent
873: vertices $j,r$. Then we obtain a closed walk containing $i,j$ and
874: $r$ of odd length. By induction we can conclude that this walk
875: contains a cycle of odd length, which is impossible, whence
876: $\mathcal{S}$ is bipartite and
877: assertion {\sf (a)} follows.\\
878: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
879: Next we show {\sf (b)} by constructing a vertex
880: $v=(x_1,\dots,x_n)\in Q_4^n$ with the property
881: $H_{\mathcal{R}_{NC}}(v)=H'$, where $H'$ is an arbitrary induced,
882: bipartite subgraph of $H$. Since $H'$ is induced in $H$ there exists
883: some set $M\subset \{1,\dots,n\}$ such that $E_{H'}=\{\{i,j\}\mid
884: \{i,j\}\in E_H\,\wedge\, i,j\in M\}$. First, for all coordinates
885: $x_j$ where $j\not \in M$ we set $x_j={\bf A}$. Then by definition
886: of $\mathcal{R}^\dagger$ for $i,i'\not \in M$, $\{x_i,x_{i'}\}\not\in
887: \mathcal{R}^\dagger$ holds. Since $H'$ is bipartite there exists for
888: the vertices $j\in M$ a bi-coloring (red/blue) such that no two
889: $H'$-adjacent vertices are monochromatic. Suppose $x_j,x_k$ are
890: coordinates where $j,k\in M$. We choose a bi-coloring (red/blue) and
891: set $x_j={\bf D}$ for $j$ being colored red and $x_k={\bf C}$ for
892: $k$ being colored blue, respectively. In view of $({\bf D},{\bf
893: C}),({\bf C},{\bf D})\in \mathcal{R}^\dagger$, we can conclude that for
894: $j,k\in M$ and $\{j,k\}\in H$ we have $\{x_j,x_{k}\}\in
895: \mathcal{R}^\dagger$. Since $({\bf A},{\bf C}),({\bf A},{\bf D})\not
896: \in \mathcal{R}^\dagger$ we derive that for $i\not\in M$ and $j\in M$,
897: $\{x_i,x_j\}\not \in\mathcal{R}^\dagger$ holds. Therefore
898: $H_{\mathcal{R}^\dagger}((x_1,\dots,x_n))=H'$ i.e.~any
899: induced bipartite subgraph of $H$ is a shape.\\
900: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
901: Next we show {\sf (c)}, i.e.~for $\mathcal{H}$ (eq~(\ref{E:H})) any
902: $H'<\mathcal{H}$ is a shape. We proceed by explicitly constructing a
903: vertex $v=(x_1,\dots,x_n)\in Q_4^n$ with the property
904: $\mathcal{H}_{\mathcal{R}^\dagger}(v)=H'$. W.l.o.g.~we can assume
905: that $n$ is even since the isolated point $u$ does not contribute to
906: the $\mathcal{H}$-shapes. Then we have $\mathcal{H}=C_{2k}$ and
907: $V_{C_{2k}}=\{1,\dots,2k\}$. We label the $H'$-vertices $\{1,\dots,
908: 2k\}$ clock-wise such that the (clockwise) first vertex in one
909: largest $H'$-component is $1$. Then $H'$ corresponds to a unique
910: sequence of components. We assume now $x_i\in\{{\bf A},{\bf B}\}$
911: and label all $H'$-vertices except of those contained in the
912: component proceeding vertex $1$. We set inductively
913: \begin{equation}
914: x_i=
915: \begin{cases}
916: {\bf A} & \text{\rm iff $i=1$}\\
917: x_{i-1} & \text{\rm iff $\{i-1,i\}$ is not an edge in $H'$}\\
918: \overline{x_{i-1}} & \text{\rm iff $\{i-1,i\}$ is an edge in $H'$} \ ,
919: \end{cases}
920: \end{equation}
921: where $\overline{{\bf B}}={\bf A}$ and $\overline{{\bf A}}={\bf B}$.
922: As for the labeling of the component preceding the component containing
923: vertex $1$, we start with $x_j={\bf C}$ and continue inductively
924: $x_{j+1}={\bf D},x_{j+2}={\bf C},\dots $. This procedure
925: results in a
926: labeling compatible with $H'$ since for $\{i-1,i\}\in H'$ we have either
927: $\{{\bf C},{\bf D}\}$ or $\{{\bf A},{\bf B}\}$ and for $\{i-1,i\}\not\in H'$
928: we have $\{{\bf A},{\bf A}\}$, $\{{\bf B},{\bf B}\}$ and $\{{\bf A},{\bf C}\}$
929: or $\{{\bf B},{\bf C}\}$ (at the beginning of the last component) and
930: $\{{\bf D},{\bf A}\}$ or $\{{\bf C},{\bf A}\}$ (at the end of the
931: last component). Accordingly we obtain a sequence $\tilde{v}_{H'}$
932: with the property $\mathcal{H}(\tilde{v}_{H'})=H'$.$\ \square$
933:
934:
935: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
936:
937: {\bf Proof of Lemma}~\ref{L:many}
938: By definition, there exists a unique component of $\mathcal{H}$
939: which is a cycle of even length, $C_{2k}$. $C_{2k}$ contains for $n$
940: even all and for $n$ odd all but one $\mathcal{H}$-vertices. Suppose
941: $C_{2k}$ contains the vertices
942: $\{i_1,j_1,\dots,i_{k},j_k\}$, where $i_1<j_1<i_2<\dots i_{k}<j_k$. \\
943: {\it Claim.} The number of $2k$-tuples $(x_{i_1},x_{j_1},\dots,x_{i_{k}},
944: x_{j_k})$ such that $C_{2k}((x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}))=
945: C_{2k}$ i.e.
946: $
947: (x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k})\in\vartheta_{C_{2k}}^{-1}(C_{2k})
948: $
949: is given by
950: \begin{equation}\label{E:precise}
951: 2\, \left(\mu_+^{2k}+\mu_-^{2k}\right) \ .
952: \end{equation}
953: To prove the claim we observe that $\mathcal{R}^\dagger$ induces the
954: digraph $D_{\mathcal{R}^\dagger}$ defined as follows: {\small
955: $$
956: D_{\mathcal{R}^\dagger}= \diagram
957: {\bf A} \ar@/^1pc/@{<-}[rr]|{} \ar@/_1pc/@{->}[rr]|{}& & {\bf B} \\
958: {\bf C} \ar@/^1pc/@{<-}[rr]|{} \ar@/_1pc/@{->}[rr]|{}& & {\bf D}
959: \ar@/^1pc/@{<-}[u]|{} \ar@/_1pc/@{->}[u]|{}
960: \enddiagram \qquad\text{\rm and} \quad A_{D_{\mathcal{R}^\dagger}}=
961: \bordermatrix{%
962: & {\bf A} & {\bf B} & {\bf D} & {\bf C} \cr%
963: & 0 & 1 & 0 & 0 \cr%
964: & 1 & 0 & 1 & 0 \cr%
965: & 0 & 1 & 0 & 1 \cr%
966: & 0 & 0 & 1 & 0 \cr%
967: }%
968: $$}
969: The number of $2k$-tuples $(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})$
970: with the property
971: $C_{2k}((x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k}))=C_{2k}$ is equal
972: to the number of closed walks of length $2k$ in
973: $D_{\mathcal{R}^\dagger}$. Indeed, in order to obtain such an
974: $2k$-tuple we fix an index, $i_1$, say. Then we start with
975: successively ${\bf A}$, ${\bf B}$, ${\bf D}$ and ${\bf C}$ and form
976: of closed walks of length $2k$ in $D_{\mathcal{R}^\dagger}$ starting
977: and ending at ${\bf A}$, ${\bf B}$, ${\bf D}$ and ${\bf C}$. All
978: these walks are counted respectively, since we have labeled graphs.
979: The number of closed walks of length $\ell$ in
980: $D_{{\mathcal{R}}_{NC}}$ starting and ending at $i$ is given by
981: $(A_{D_{{\mathcal{R}}_{NC}}}^\ell)_{i,i}$, whence the number of all
982: closed walks of length $\ell$ is simply ${\sf
983: Tr}(A_{D_{{\mathcal{R}^\dagger}}}^\ell) =\sum_i
984: (A_{D_{{\mathcal{R}^\dagger}}}^\ell)_{i,i}$. From the definition of the
985: characteristic polynomial,~i.e.~${\sf
986: Tr}(A_{D_{{\mathcal{R}^\dagger}}}^\ell)=\omega_1^\ell+\dots
987: +\omega_r^\ell$, where $\omega_1,\dots,\omega_r$ are the
988: eigenvalues of $A_{D_{{\mathcal{R}^\dagger}}}$ (note $r=4$). We obtain
989: \begin{eqnarray*}
990: \sum_{\ell\ge 0}{\sf Tr}(A_{D_{{\mathcal{R}}_{NC}}}^\ell) z^\ell & =
991: &
992: \sum_{\ell\ge 0} \left[\omega_1^\ell+\dots +\omega_r^\ell \right]z^\ell \\
993: & = & \sum_{\ell\ge 0}
994: \left[(1+(-1)^\ell)\left(\mu_+^\ell+
995: \mu_-^\ell\right)\right] z^\ell
996: \end{eqnarray*}
997: and the claim follows.\\
998: Suppose $(x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k})\in \vartheta_{C_{2k}}^{-1}
999: (C_{2k})$ and $M\subset \{1,\dots,k\}$. We consider the involution
1000: $\tau\colon \mathcal{A}\rightarrow \mathcal{A}$, where $\tau({\bf A})=
1001: {\bf B}$ and $\tau({\bf D})={\bf C}$ and set
1002: \begin{eqnarray}\label{E:I_M}
1003: \quad I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) & = &
1004: (y_{i_1},x_{j_1}\dots,y_{i_{k}},x_{j_k}), \ \text{\rm where} \
1005: y_{i_\ell}=
1006: \begin{cases}
1007: \tau(x_{i_\ell}) & \text{\rm for } i_\ell \in M \\
1008: x_{i_\ell} & \text{\rm for } i_\ell \not\in M \ .
1009: \end{cases}
1010: \end{eqnarray}
1011: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1012: {\it Claim.} There exists a bijection
1013: $$
1014: \beta\colon \{M\subset \{1,2,\dots,k\}\}\rightarrow \{\mathcal{S}_M\}, \quad
1015: M\mapsto\mathcal{S}_{M}
1016: $$
1017: where $\mathcal{S}_M$ is obtained by deleting any two $C_{2k}$-edges
1018: incident to the vertices $i_{h}\in M$ and
1019: \begin{equation}
1020: \forall \,(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})
1021: \in \vartheta_{C_{2k}}^{-1}(C_{2k});\,\quad
1022: \mathcal{S}_M=C_{2k}(I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})) \ .
1023: \end{equation}
1024: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1025: Suppose $M\neq M'$ then w.l.o.g.~we can assume that there exists
1026: some index $i_{h}\in M\setminus M'$, i.e.~$i_{h}$ is isolated in
1027: $\mathcal{S}_{M}$ but not in $\mathcal{S}_{M'}$. Since $j_{h-1}$ and
1028: $j_{h}$ are both in $\mathcal{S}_M$ and $\mathcal{S}_{M'}$ we have
1029: $\{j_{h-1},i_h\},\{j_{h},i_h\},\in \mathcal{S}_{M'}$ but not in
1030: $\mathcal{S}_{M}$, whence $\mathcal{S}_{M}$ and $\mathcal{S}_{M'}$
1031: are different shapes. Since $\mathcal{S}_M$ is an induced bipartite
1032: subgraph, Lemma~\ref{L:bip} implies that any $\mathcal{S}_M$ is a
1033: shape. When $i_{h}\in M$ the following diagram {\small
1034: $$
1035: \diagram
1036: & & x_{j_h} \\
1037: x_{j_{h-1}} \ar@{-}[r] & \text{\rm \fbox{$x_{i_{h}}$}}
1038: \ar@{-}[ur] \ar@{-}[r] & x_{j_{h}} \\
1039: \enddiagram
1040: \quad \mapsto\quad
1041: \diagram
1042: & & {x_{j_{h}}} \\
1043: {x_{j_{h-1}}} \ar@{.}[r] & \text{\rm \fbox{$\tau(x_{i_{h}})$}}
1044: \ar@{.}[ur] \ar@{.}[r] & {x_{j_{h}}}
1045: \enddiagram
1046: $$}
1047: shows that $I_M$ has the property: for arbitrary
1048: $$
1049: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\in\vartheta_{C_{2k}}^{-1}(C_{2k})
1050: $$
1051: the shape $C_{2k}(I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}))$
1052: differs from $C_{2k}$ exactly by deleting the two $C_{2k}$-edges
1053: incident to all $i_\ell\in M$; explicitly {\tiny
1054: $$
1055: \diagram
1056: & & {\bf D} \\
1057: {\bf A} \ar@{-}[r] & \fbox{{\bf B}} \ar@{-}[ur] \ar@{-}[r] & {\bf A} \\
1058: \enddiagram
1059: \ \mapsto \
1060: \diagram
1061: & & {{\bf D}} \\
1062: {\bf A} \ar@{.}[r] & \fbox{{\bf A}} \ar@{.}[ur] \ar@{.}[r] & {\bf A} \\
1063: \enddiagram
1064: \quad
1065: \diagram
1066: & & {\bf B} \\
1067: {\bf C} \ar@{-}[r] & \fbox{{\bf D}} \ar@{-}[ur] \ar@{-}[r] & {\bf C}\\
1068: \enddiagram
1069: \ \mapsto \
1070: \diagram
1071: & & {\bf B} \\
1072: {\bf C} \ar@{.}[r] & \fbox{{\bf C}} \ar@{.}[ur] \ar@{.}[r] & {\bf C}\\
1073: \enddiagram
1074: \quad
1075: $$
1076: }
1077: {\tiny
1078: $$
1079: \diagram
1080: & \fbox{{\bf A}}\ar@{-}[r] & {\bf B} \\
1081: {\bf B}\ar@{-}[ur] \ar@{-}[r] & \fbox{{\bf D}} \ar@{-}[ur]
1082: \ar@{-}[r] &
1083: {\bf C}\\
1084: \enddiagram
1085: \ \mapsto \
1086: \diagram
1087: & \fbox{{\bf B}}\ar@{.}[r] & {\bf B} \\
1088: {\bf B}\ar@{.}[ur] \ar@{.}[r]&\fbox{{\bf C}} \ar@{.}[ur] \ar@{.}[r] & {\bf C}\\
1089: \enddiagram
1090: \quad
1091: \diagram
1092: & \fbox{{\bf C}}\ar@{-}[r] & {\bf D} \\
1093: {\bf D}\ar@{-}[ur] \ar@{-}[r]&\fbox{{\bf B}} \ar@{-}[ur] \ar@{-}[r] & {\bf A}\\
1094: \enddiagram
1095: \ \mapsto \
1096: \diagram
1097: & \fbox{{\bf D}}\ar@{.}[r] & {\bf D} \\
1098: {\bf D}\ar@{.}[ur] \ar@{.}[r] &\fbox{{\bf A}} \ar@{.}[ur]
1099: \ar@{.}[r] & {\bf A}\\
1100: \enddiagram
1101: $$
1102: }
1103: and the claim is proved. The claim implies that $I_M$ induces the injection
1104: \begin{eqnarray}
1105: \quad I_M \colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &
1106: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad
1107: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto
1108: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) \ .
1109: \end{eqnarray}
1110: This injection allows us to relate the sets $\vartheta_{C_{2k}}^{-1}(C_{2k})$
1111: and $\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)$ and in particular
1112: \begin{equation}
1113: \vert\vartheta_{C_{2k}}^{-1}(C_{2k})\vert
1114: \le \vert\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)\vert \ .
1115: \end{equation}
1116: Since $M\subset \{1,\dots,k\}$ was arbitrary we can conclude that there
1117: are $2^k$ subsets and hence $2^k$ distinct shapes $\mathcal{S}_M$.
1118: Hence there exist at least
1119: $$
1120: 2^{k} \ge \left(\sqrt{2}\right)^{n-1}
1121: $$
1122: shapes $\mathcal{S}$ with the property
1123: \begin{eqnarray*}
1124: \vert \vartheta_\mathcal{H}^{-1}(\mathcal{S})\vert \ge
1125: \vert \vartheta_\mathcal{H}^{-1}(\mathcal{H})\vert
1126: \ge 2\, \left(\mu_+^{2k}+
1127: \mu_-^{2k}\right) \ .
1128: \end{eqnarray*}
1129: In case of $n\not\equiv 0\mod 2$ we have exactly one more isolated point,
1130: i.e.
1131: \begin{equation}\label{E:isolated}
1132: \vert \vartheta_\mathcal{H}^{-1}(\mathcal{S})\vert\ge 8\,
1133: \left(\mu_+^{n-1}+
1134: \mu_-^{n-1}\right)
1135: \end{equation}
1136: and since $4\ge \left( \mu_+ + \mu_- \right)$
1137: the lemma follows. $\ \square$
1138:
1139:
1140:
1141: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1142:
1143: {\bf Proof of Theorem}~\ref{T:nn}
1144: We first prove that at least $\left(\sqrt{2}\right)^{n-1}$ shapes $\mathcal{S}$
1145: have a preimage $\vartheta_\mathcal{H}^{-1}(\mathcal{S})$ with diameter $n$.
1146: We will work with the particular set of shapes
1147: $\{\mathcal{S}_M \mid M\subset \{1,\dots,k\}\}$, introduced in
1148: Lemma~\ref{L:many} and prove that all of them have a
1149: component of size $\ge \mu_+^{n}+ \mu_-^{n} >\left(\sqrt{2}\right)^{n}$ and
1150: ${\sf diam}(\vartheta_\mathcal{H}^{-1}(\mathcal{S}))=n$.
1151: Let $C_{2k}$ be the $\mathcal{H}$-cycle, which contains all
1152: $\mathcal{H}$-vertices for $n$ even and all but one $\mathcal{H}$-vertices,
1153: for $n$ odd.
1154: Let $V_{C_{2k}}=\{i_1,j_1,\dots,i_{k},j_k\}$, where
1155: $i_1<j_1<i_2<\dots i_{k}<j_k$. \\
1156: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1157: {\it Claim $1$.} Let $M\subset \{1,\dots,k\}$, then there exist at
1158: least $2^{k}$ shapes $S_M$ over $Q_4^{2k}$ such that
1159: \begin{equation}
1160: \text{\sf diam}(\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M))=
1161: \begin{cases}
1162: n & \text{\rm for}\ n\equiv 0\mod 2 \\
1163: n-1 & \text{\rm for}\ n\not\equiv 0\mod 2 \ .
1164: \end{cases}
1165: \end{equation}
1166: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1167: We first show that for each $M$ there exists a pair of antipodal sequences,
1168: i.e.~$(a^M,\tilde{a}^M)$ with $d(a^M,\tilde{a}^M)=2k$ and a path
1169: $(a^M,w_1^M,\dots,w_{2k-1}^M,\tilde{a}^M)$ such that
1170: $\vartheta_{C_{2k}}(w_i^M)=\mathcal{S}_M$.
1171: \begin{equation}\label{E:a^M}
1172: a^M=(a^M_{i_1},a_{j_1}\dots,a^M_{i_{k}},a_{j_k}), \quad \text{\rm where}\quad
1173: a_{j_h}={\bf D}, \ \text{\rm and}\ a^M_{i_h}=
1174: \begin{cases}
1175: {\bf A} & \text{\rm for} \ i_h\in M \\
1176: {\bf C} & \text{\rm otherwise.}
1177: \end{cases}
1178: \end{equation}
1179: In particular we have $a^\varnothing=({\bf C},{\bf D},\dots, {\bf
1180: C},{\bf D})$. Then $\mathcal{S}_M=C_{2k}(a^M)$, i.e.~$\mathcal{S}_M$
1181: is the shape obtained by removing for each $i_h\in M$ the two
1182: incident $C_{2k}$-edges. Next we define an antipode $\tilde{a}^M$,
1183: i.e.~an element of $Q_4^{2k}$ with the property
1184: $d(a^M,\tilde{a}^M)=2k$ as follows
1185: \begin{equation}
1186: \tilde{a}^M=(\tilde{a}^M_{i_1},\tilde{a}_{j_1}\dots,\tilde{a}^M_{i_{k}},
1187: \tilde{a}_{j_k}), \quad \text{\rm where}\quad
1188: \tilde{a}_{j_h}={\bf A}, \ \text{\rm and}\ \tilde{a}^M_{i_h}=
1189: \begin{cases}
1190: {\bf C} & \text{\rm for} \ i_h\in M \\
1191: {\bf B} & \text{\rm otherwise.}
1192: \end{cases}
1193: \end{equation}
1194: We can transform $a^M$ into $\tilde{a}^M$ by successively changing
1195: exactly one coordinate in three steps: {\sf (a)} replace (in any
1196: order) for $i_h\not\in M$ successively all $a_{i_h}={\bf C}$ by
1197: ${\bf B}$, {\sf (b)} replace (in any order) successively all
1198: $a_{j_h}={\bf D}$ by ${\bf A}$ and finally {\sf (c)} substitute (in
1199: any order) for all $i_h\in M$ $a_{i_h}={\bf A}$ by
1200: ${\bf C}$. \\
1201: This proves that there exists a $Q_4^{2k}$-path
1202: \begin{equation}
1203: (a^M,w_1^M,\dots,w_{2k-1}^M,\tilde{a}^M)
1204: \end{equation}
1205: connecting $a^M$ and $\tilde{a}^M$, such that
1206: \begin{equation}
1207: \forall \, 1\le i\le 2k-1, \quad C_{2k}(w_i^M)=\mathcal{S}_M \ .
1208: \end{equation}
1209: I.e.~all intermediate steps of the path are mapped by
1210: $\vartheta_{\mathcal{H}}$ into
1211: the shape $\mathcal{S}_M$.
1212: As shown in Lemma~\ref{L:many} there are $2^k$ different shapes
1213: $\mathcal{S}_M$
1214: induced by the subsets $M\subset \{1,\dots,k\}$, whence Claim $1$.\\
1215: In case of $n\equiv 0\mod 2$ we derive
1216: $2^k=\left(\sqrt{2}\right)^n$. In case of $n\not \equiv 0\mod 2$
1217: there exists exactly one vertex $u$ which is isolated in
1218: $\mathcal{H}$. Then we simply add the isolated point $u$ to each
1219: shape $\mathcal{S}_M$ and shall in the following identify these new
1220: shapes with $\mathcal{S}_M$. Then
1221: $\vert\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M)\vert =
1222: 4\vert\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)\vert$. We can choose
1223: $a_u={\bf A}$ and $\tilde{a}_u={\bf B}$ and
1224: \begin{eqnarray*}
1225: a_u^M & = & (a^M_{i_1},a_{j_1}\dots,a_u,\dots, a^M_{i_{k}},a_{j_k})\\
1226: \tilde{a}_u^M & = & (\tilde{a}^M_{i_1},\tilde{a}_{j_1}\dots,\tilde{a}_u,\dots,
1227: \tilde{a}^M_{i_{k}},\tilde{a}_{j_{k}})
1228: \end{eqnarray*}
1229: satisfy $d(a_u^M,\tilde{a}_u^M)=n$ and there exists a $Q_4^{n}$-path
1230: $(a_u^M,w_1^M,\dots,w_{2k}^M,\tilde{a}_u^M)$ connecting $a_u^M$ and
1231: $\tilde{a}_u^M$, with the property
1232: \begin{equation}
1233: \forall \, 1\le i\le 2k, \quad C_{2k}(w_i^M)=\mathcal{S}_M \ .
1234: \end{equation}
1235: Therefore we have proved that at least $\left(\sqrt{2}\right)^{n-1}$
1236: shapes $\mathcal{S}_M$ have a preimage $\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M)$
1237: with diameter $n$.\\
1238: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1239: {\it Claim $2$.}
1240: \begin{equation}\label{E:claim2}
1241: \vert \left\{\mathcal{S}_M\mid \vert \mathcal{C}(\vartheta_\mathcal{H}^{-1}
1242: (\mathcal{S}))\vert \ \ge
1243: \mu_+^{2k}+\mu_-^{2k}
1244: \right\}\vert \ge 2^k \ .
1245: \end{equation}
1246: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1247: To prove the Claim $2$ we first observe that
1248: $\vartheta_\mathcal{H}^{-1}(\mathcal{H})$ has
1249: exactly two components of equal size
1250: \begin{equation}\label{E:cool2}
1251: \mu_+^{2k}+\mu_-^{2k} \ .
1252: \end{equation}
1253: Indeed, any vertex $v\in \vartheta_\mathcal{H}^{-1}(\mathcal{H})$ can be
1254: transformed into either
1255: $$
1256: a^\varnothing=({\bf C},{\bf D},{\bf C},\dots,{\bf D},{\bf C}),
1257: \quad \text{\rm or}
1258: \quad
1259: b^\varnothing=({\bf D},{\bf C},\dots,{\bf D},{\bf C},{\bf D})
1260: $$
1261: successively using
1262: the two steps {\sf (I)} replace (in any order) all ${\bf A}$ by ${\bf D}$
1263: and {\sf (II)} replace all (in any order) ${\bf B}$ by ${\bf C}$. Hence
1264: there exist exactly two components and the map
1265: $$
1266: \sigma(x_{i_1},x_{j_1},\dots, x_{i_k},x_{j_k})=
1267: (x_{j_k},x_{i_1},\dots, x_{j_{k-1}},x_{i_k})
1268: $$
1269: is a bijection between them, whence they have equal size.
1270: Eq.~(\ref{E:cool2}) then follows from eq.~(\ref{E:precise}) in
1271: Lemma~\ref{L:many}. We next claim that the mapping
1272: $I_M$ of eq.~(\ref{E:I_M}) is in fact an injective graph morphism
1273: \begin{eqnarray}
1274: \ I_M\colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &
1275: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad
1276: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto
1277: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}).
1278: \end{eqnarray}
1279: I.e. for two adjacent vertices $v,v'\in\vartheta_{C_{2k}}^{-1}$, the
1280: vertices $I_{M}(v)$ and $I_{M}(v')$ are adjacent. To prove
1281: this we consider the diagrams {\small
1282: $$
1283: x_{j_{h-1}}=x_{j_h}={\bf B}:\quad
1284: \underbrace{\diagram
1285: & \fbox{{\bf A}}\ar@{-}[r] & {\bf B} \\
1286: {\bf B}\ar@{-}[ur] \ar@{-}[r] & \fbox{{\bf D}}
1287: \ar@/^2pc/@{->}[u]|{}
1288: \ar@/_2pc/@{<-}[u]|{}
1289: \ar@{-}[ur] & \\
1290: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1291: \qquad \mapsto
1292: \qquad
1293: \underbrace{\diagram
1294: & \fbox{{\bf B}}\ar@{.}[r] & {\bf B} \\
1295: {\bf B}\ar@{.}[ur] \ar@{.}[r] & \fbox{{\bf C}} \ar@{.}[ur]
1296: \ar@/^2pc/@{->}[u]|{}
1297: \ar@/_2pc/@{<-}[u]|{}
1298: & \\
1299: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1300: $$}
1301: {\small
1302: $$
1303: x_{j_{h-1}}=x_{j_h}={\bf D}:\quad
1304: \underbrace{\diagram
1305: & \fbox{{\bf C}}\ar@{-}[r] & {\bf D} \\
1306: {\bf D}\ar@{-}[ur] \ar@{-}[r]&\fbox{{\bf B}} \ar@{-}[ur]
1307: \ar@/^2pc/@{->}[u]|{}
1308: \ar@/_2pc/@{<-}[u]|{}
1309: &\\
1310: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1311: \qquad \mapsto
1312: \qquad
1313: \underbrace{\diagram
1314: & \fbox{{\bf D}}\ar@{.}[r] & {\bf D} \\
1315: {\bf D}\ar@{.}[ur] \ar@{.}[r]&\fbox{{\bf A}} \ar@{.}[ur]
1316: \ar@/^2pc/@{->}[u]|{}
1317: \ar@/_2pc/@{<-}[u]|{}
1318: &\\
1319: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1320: $$}
1321: The above diagrams represent the two scenarios for two adjacent
1322: vertices $v,v'\in\vartheta^{-1}_{C_{2k}}(C_{2k})$. I.e.~if $v$ and
1323: $v'$ are both contained in $\vartheta_{C_{2k}}^{-1}(C_{2k})$ and
1324: differ in $x_{i_h}$ and $x_{i_h}'$ then we have either
1325: $x_{j_{h-1}}=x_{j_h}={\bf B}$ and $x_{i_h}={\bf D}$ and
1326: $x_{i_h}'={\bf A}$ or $x_{j_{h-1}}=x_{j_h}={\bf D}$ and
1327: $x_{i_h}={\bf B}$ and $x_{i_h}'={\bf C}$. Suppose we apply $I_{M}$
1328: and ${i_h}\in M$, then the resulting vertices $I_{M}(v)$ and
1329: $I_{M}(v')$ are again adjacent, whence $I_M$ is an injective graph
1330: morphism. \ Accordingly, $I_M$ maps components into components, from
1331: which we can conclude that for each $M\subset \{1,\dots,k\}$ the
1332: shape $\mathcal{S}_M$ has a component of size
1333: $\mu_+^{2k}+\mu_-^{2k}$ and Claim $2$ is proved.\\
1334: In case of $2k=n$ the
1335: assertion follows directly. For $n$ odd we have to repeat the
1336: argument in Lemma~\ref{L:many}, where we considered the isolated
1337: point $u$ in eq.~(\ref{E:isolated}). Since we used the same set of
1338: shapes $\{\mathcal{S}_M\mid M\subset \{1,\dots,k\}\}$ for both
1339: claims the theorem follows. $\ \square$
1340:
1341: {\bf Proof of Theorem}~\ref{T:NN}
1342: It is clear that we can restrict our analysis to the case $n\equiv
1343: 0\mod 2$, i.e.~$\mathcal{H}=C_{2k}$, since the isolated point
1344: contributes always $4$ neutral neighbors for any shape.
1345: Eq.~(\ref{E:ist}) is a direct consequence of
1346: \begin{eqnarray*}
1347: \ I_M\colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &
1348: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad
1349: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto
1350: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) \ .
1351: \end{eqnarray*}
1352: being an injective graph morphism. Thus it suffices to prove
1353: eq.~(\ref{E:generate}). We observe that for $v\in
1354: \vartheta_{C_{2k}}^{-1}(C_{2k})$
1355: $$
1356: v=(x_{i_1},x_{j_1},\dots, x_{i_k},x_{j_k})
1357: \mapsto
1358: (t_{i_1},t_{j_1},\dots, t_{i_k},t_{j_k}), \ \text{\rm where} \
1359: t_s=
1360: \begin{cases}
1361: (x_{j_{h-1}},x_{i_h},x_{j_{h}}) & \text{\rm for } s=i_h \\
1362: (x_{i_{h}},x_{j_h},x_{i_{h+1}}) & \text{\rm for } s=j_h
1363: \end{cases}
1364: $$
1365: is a bijection, where $h$ is considered modulo $k$.
1366: Hence every $v\in \vartheta_{C_{2k}}^{-1}(C_{2k})$
1367: can be uniquely decomposed into a sequence of triples. Since $v\in
1368: \vartheta_{C_{2k}}^{-1}(C_{2k})$ there are exactly the following ten triples
1369: $$
1370: V_{D}=\{\textbf{ABA,ABD,BAB,BDB,BDC,DBD,DBA,DCD,CDC,CDB}\}
1371: $$
1372: and setting
1373: $$
1374: E_{D}=
1375: \{\left((x_{j_{h-1}},x_{i_h},x_{j_{h}}),(x_{i_h},x_{j_{h}},x_{i_{h+1}})\right)
1376: \mid (x_{j_{h-1}},x_{i_h},x_{j_{h}})\in V_{D}\}
1377: $$
1378: we obtain the digraph ${D}$. Suppose we are given $v,v'\in
1379: \vartheta_{C_{2k}}^{-1}(C_{2k})$ with $d(v,v')=1$ then we have the
1380: following alternative {\small
1381: $$
1382: x_{j_{h-1}}=x_{j_h}={\bf B}:
1383: \underbrace{\diagram
1384: & \fbox{{\bf D}}\ar@{-}[r] & {\bf B} \\
1385: {\bf B}\ar@{-}[ur] \ar@{-}[r] & \fbox{{\bf A}}
1386: \ar@/^2pc/@{->}[u]|{}
1387: \ar@/_2pc/@{<-}[u]|{}
1388: \ar@{-}[ur] & \\
1389: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1390: \qquad x_{j_{h-1}}=x_{j_h}={\bf D}:
1391: \qquad
1392: \underbrace{\diagram
1393: & \fbox{{\bf C}}\ar@{-}[r] & {\bf D} \\
1394: {\bf D}\ar@{-}[ur] \ar@{-}[r] & \fbox{{\bf B}} \ar@{-}[ur]
1395: \ar@/^2pc/@{->}[u]|{}
1396: \ar@/_2pc/@{<-}[u]|{}
1397: & \\
1398: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}
1399: $$}
1400:
1401:
1402: The idea is now to count all triples i.e.~$(x_{j_{h-1}},x_{i_h},
1403: x_{j_{h}})$, $(x_{i_{h-1}},x_{j_{h-1}},x_{i_{h}})$ contained in
1404: $\Theta=\{ {\bf BAB}, {\bf BDB}, {\bf DBD}, {\bf DCD}\}$ in
1405: $\vartheta^{-1}_{C_{2k}}(C_{2k})$. Let next $R[x]$ be a polynomial
1406: ring and $w\colon E_{D}\longrightarrow R[x]$ a function given by
1407: $w(e)=x$ iff the arc $e$ has terminus $\tau\in\Theta$, otherwise
1408: $w(e)=1$. If $\Gamma=e_{1}e_{2}\dots e_{\ell}$ is a walk of length
1409: $\ell$ in $E_{D}$, then the weight of $\Gamma$ is defined by
1410: $w(\Gamma)=w(e_1)w(e_2)\dots w(e_\ell)$. Introducing the formal
1411: variable $x$ in $w$ allows us to count the triples in $\Theta$
1412: within some $v\in\vartheta_{C_{2k}}^{-1}(C_{2k})$. The number of
1413: closed walks of length $\ell$ in ${D}$ is $\sum_{v\in
1414: {V_{D}}}{\left[{A_{D}}^\ell\right]}_{v,v} =\text{\sf
1415: Tr}(A_{D}^\ell)$, where
1416: $A_{D}$ is the adjacency matrix of ${D}$.\\
1417: Suppose $B$ is a $p\times p$ matrix and $\{\eta_{i}\}_{i=1}^{p}$ are
1418: all the eigenvalues of $B$, then we have ${\sf
1419: det}B=\prod_{i}{\eta_{i}}.$ Let $\{\xi_{i}\}_{i=1}^{p}$ and
1420: $\{\omega_{i}\}_{i=1}^{p}$ be all the eigenvalues of $I-yA$ and $A$
1421: respectively, then we have $\xi_{i}=1-y\omega_{i}$, where $1\leq
1422: i\leq p$. For the set of all the nonzero eigenvalues of $A$,
1423: $\{\omega_{i}\}_{i=1}^{r}$ we derive ${\sf det}(I-yA)=
1424: \prod_{i=1}^{r}(1-y\omega_{i})$. We set $Q(y)={\sf det}(I-yA)$ and
1425: have $p=10=\vert V_{D}\vert$, $A=A_{{D}}$ and $r=6$ for $x\neq 1$,
1426: whence
1427: \begin{equation}\label{E:reihe}
1428: \sum_{\ell \ge 1} \text{\sf Tr}(A_{D}^\ell) y^\ell =\sum_{\ell \ge
1429: 1}(\omega_{1}^{\ell}+\dots+\omega_{r}^{\ell})y^{\ell}= \sum_{i=1}^r
1430: \frac{\omega_iy}{1-\omega_iy}=\frac{-y\, Q'(y)}{Q(y)}.
1431: \end{equation}
1432: After some computation we derive
1433: $Q(y)=1-2xy^2-x^2y^2+2x^{3}y^{4}-x^{4}y^{6}+2x^{3}y^{6}-x^{2}y^{6}-x^{2}y^{4}$
1434: and the lemma follows from eq.~(\ref{E:reihe}). $\ \square$
1435:
1436:
1437:
1438: %%%
1439: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1440: %%%
1441: {\bf Acknowledgments.}
1442: %%%
1443: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1444: %%%
1445: We thank F.W.D.~Huang and L.C.~Zuo for helpful suggestions.
1446: This work was supported by the 973 Project, the PCSIRT Project of the
1447: Ministry of Education, the Ministry of Science and Technology, and
1448: the National Science Foundation of China.
1449:
1450:
1451:
1452: \bibliography{cm}
1453: \bibliographystyle{plain}
1454:
1455: %%%
1456: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1457: %%%
1458: %%%
1459: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1460: %%%
1461:
1462: \end{document}
1463:
1464:
1465: