0706:0706.0760/cm.tex

1:

2:

3: %%%%%%%%%%%%%%%%%%%%%%%% Ams-Style %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

4: %%%

5: %%%                   Style and Inputs

6: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

7:

8: \documentclass[10pt]{amsart}

9: \usepackage{amsfonts}

10: \usepackage{epsfig}

11: \usepackage{latexsym}

12: \usepackage{amsmath,amssymb,amsfonts,amsthm,graphics}

13: \usepackage{eucal}%    caligraphic-euler fonts: \mathcal{ }

14: \usepackage{eufrak}%   frak-euler        fonts: \mathfrak{ }

15: \usepackage[all]{xypic}

16: \usepackage{xspace}

17: %\usepackage{layout}%  displays settings; use \layout in text

18:

19: %%%

20: %%%

21: %%%%%%%%%%%%%%%%%%%%%%%%% Pagestyle %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

22: %%%

23: %%%

24:

25: \renewcommand{\baselinestretch}{1.2}% spacing between lines

26:

27: %\hoffset=0truecm

28: %\voffset=0truecm

29: \textwidth=15truecm

30: \textheight=18truecm

31: \baselineskip=0.8truecm

32: \overfullrule=0pt

33: \parskip=0.8\baselineskip

34: \parindent=0truecm

35: \topmargin=0.5truecm

36: \headsep=1.2truecm

37: %\oddsidemargin=0.5in % options for double-side printouts

38: %\evensidemargin=0in

39:

40: %%%

41: %%%

42: %%%%%%%%%%%%%%%%%%%% New Settings %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

43: %%%

44: %%%

45:

46: \theoremstyle{plain}

47: \newtheorem{theorem}{Theorem}

48: \newtheorem{corollary}{Corollary}

49: \newtheorem*{main}{Main~Theorem}

50: \newtheorem{lemma}{Lemma}

51: \newtheorem{proposition}{Proposition}

52:

53: \theoremstyle{definition}

54: \newtheorem{definition}{Definition}

55:

56: \theoremstyle{remark}

57: \newtheorem{remark}{Remark}

58: \newtheorem*{notation}{Notation}

59: \newtheorem{example}{Example}

60: \numberwithin{equation}{section}

61:

62:

63: \begin{document}

64:

65: %%%

66: %%%

67: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

68: %%

69: %%%

70: \title[Neutral Networks of Sequence to Shape Maps]

71:       {Neutral Networks of Sequence to Shape Maps}

72: \author{Emma Y. Jin, Jing Qin and Christian M. Reidys$^{\,\star}$}

73: \address{Center for Combinatorics, LPMC-TJKLC \\

74:          Nankai University  \\

75:          Tianjin 300071\\

76:          P.R.~China\\

77:          Phone: *86-22-2350-5133-8013\\

78:          Fax:   *86-22-2350-9272}

79: \email{reidys@nankai.edu.cn}

80: \thanks{}

81: \keywords{combinatory map, component, diameter, neutral network, shape,

82:           bipartite}

83: \date{June, 2007}

84: \begin{abstract}

85: In this paper we present a combinatorial model of sequence to shape maps.

86: Our particular construction arises in the context of representing nucleotide

87: interactions beyond Watson-Crick base pairs and its key feature is to

88: replace sterical by combinatorial constraints. We show that these combinatory

89: maps produce exponentially many shapes and induce sets of sequences which

90: contain extended connected sub graphs of diameter $n$, i.e.~we show that

91: exponentially many shapes have neutral networks.

92: \end{abstract}

93: \maketitle

94: {{\small

95: %\tableofcontents

96: }}

97: %%%

98: %%%

99: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

100: %%%

101: %%%

102:

103: \section{Introduction}

104:

105: %%%

106: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

107: %%%

108:

109: \subsection{Background}

110: Arguably one of the greatest challenges in present day biophysics is the

111: understanding of sequence structure relations of bio polymers. For one

112: particular class of bio polymers, the ribonucleic acid (RNA) secondary

113: structures, (Fig.~\ref{F:1}) molecular folding maps have been systematically

114: analyzed by Schuster~{\it et.al.} \cite{Fontana:98,Schuster:94,Schuster:02}.

115: %%%

116: %%%%%%%%%%%%%%%%%%%%%%%% Figures ex1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

117: %%%

118: \begin{figure}[ht]

119: \centerline{%

120: \epsfig{file=F11.eps,width=0.8\textwidth}\hskip15pt }

121: \caption{\small RNA secondary structures. Diagram representation

122: (top): the primary sequence, {\bf GAGAGCCUUUGGACCUCA}, is drawn

123: horizontally and its backbone bonds are ignored. All bonds are drawn

124: in the upper half plane and secondary structures have the property

125: that no two arcs intersect and all arcs have minimum length $2$.

126: Outer planar graph representation (bottom). } \label{F:1}

127: \end{figure}

128: %%%

129: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

130: %%%

131: Folding maps play a central role in understanding the evolution of molecular

132: sequences. Specific properties like, for instance {\it shape space covering}

133: \cite{Schuster:95B} and {\it neutral networks} (Fig.~\ref{F:2})

134: \cite{Reidys:97a} are critical

135: for what may be paraphrased as ``molecular computation by white noise''.

136: For instance, neutral networks played a central role in the {\it Science}

137: publication authored by E.~Schultes and P.~Bartels {\it One sequence, two

138: ribozymes: implications for the emergence of new ribozyme folds}, (v289, n5478,

139: 448-452) where the authors designed experimentally a single RNA sequence (whose

140: existence is implied by the intersection theorem in \cite{Reidys:97a}) that

141: folds into two different, non-related, RNA secondary structures

142: \cite{Clote:05}.

143: Exhaustive enumeration of sequence spaces and subsequent detailed analysis

144: of the mappings for {\bf G},{\bf C}-sequences of length $30$ were undertaken

145: in \cite{Gruener:95a,Gruener:95b}. In addition detailed analysis of

146: neutral networks as well as exhaustive enumeration of

147: {\bf G},{\bf C},{\bf A},{\bf U}-sequences can be found in \cite{Goebel:04}.

148: The findings were intriguing. Folding maps into RNA secondary structures

149: exhibit a collection of distinct properties which makes them ideally suited

150: for evolutionary optimization.\\

151: {\sf (a)} Many structures have preimages of sequences (neutral networks)

152:           which have large components and large diameter.\\

153: {\sf (b)} Many structures have the property that any two of them have neutral

154:           networks that come close in sequence space.\\

155: Obviously, {\sf (a)} is of central importance in the context of

156: neutral evolution. Since replication is erroneous and only few if

157: not single nucleotides can be exchanged the preimages of structures

158: must contain large connected components.

159: %%%%%%%%%%%%%

160: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

161: %%%%%%%%%%%%%%

162: \begin{figure}[ht]

163: \centerline{%

164: \epsfig{file=f1.eps,width=0.95\textwidth}\hskip15pt } \caption{\small

165: The neutral network of a structure. Sequence space (right) and shape

166: space (left) represented as lattices.

167: We draw the edges between two sequences bold if they map into the one

168: particular structure on the left. The two key properties of neutral nets

169: are their connectivity and percolation. They allow sequences to move

170: while maintaining a shape through sequence space.} \label{F:2}

171: \end{figure}

172: %%%%%%%%%%%

173: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

174: %%%%%%%%%%%

175: {\sf (b)} showed that (many) new structures can easily be found

176: during a random walk on a neutral network using only steps in which

177: a single nucleotide is altered (point mutations).

178:

179: Folding maps, however, are not obtained analytically. They are a result of a

180: computer algorithm, based on the combinatorial analysis of RNA

181: secondary structures pioneered by Waterman~{\it et.al.}

182: \cite{Waterman:94,Waterman:78a,Waterman:79}. It has to be remarked

183: in this context that comparative sequence analysis \cite{Woese:93,Puglisi:99}

184: provides more reliable means for determining the secondary structure

185: of biological RNA \cite{Doudna:99}, i.e.~folding

186: maps represent already an abstraction.

187: In order to step beyond the secondary structure paradigm two main approaches

188: with distinct goals are: (1) to study more advanced nucleotide interactions

189: in RNA, like for instance pseudoknots, base triples or (2) consider genuine

190: abstractions of molecular structures not aiming to model a biophysical

191: folding map. In \cite{Reidys:07rna1} we

192: pursue the first by developing the combinatorics of RNA structures

193: with pseudoknots and in this contribution the second by studying

194: combinatory maps. While (1) eventually produces the mathematical

195: framework enabling us to derive more advanced representations (which

196: eventually result in folding algorithms capable of producing

197: structures like phenylalanine tRNA) (2) provides insights on the

198: core question of which principles produce sequence to structure maps

199: suitable for evolution. A type (2) abstraction inevitably evokes

200: skepticism since what can possibly be gained if no attempt is made

201: to mimic the biological reality? However, we argue that sometimes it

202: is exactly the right strategy to fundamentally understand the object under

203: investigation.

204:

205: \subsection{Structures and correlations}

206: A well studied class of maps over sequence spaces are the

207: NK-landscapes introduced by Kauffman \cite{Kauffman:93}, where each

208: index (locus) of a binary $n$-tuple viewed as the genotype composed

209: by $n$ loci is randomly linked to K other indices. The idea is that

210: a locus $i$ makes a contribution to the total fitness of the

211: genotype which depends on the value of the allele ($0$ or $1$) at

212: $i$ and the values at each of the epistatically linked loci. To each

213: of those $2^{{\rm K}+1}$ combinations there is a value (fitness)

214: assigned uniformly at random. The apparent lack of neutrality led

215: Barnett \cite{Barnett:98} to refine NK-landscapes by NKp-landscapes,

216: introducing a probability $p$ with which an arbitrarily chosen

217: allelic combination makes no contribution to the fitness. Our

218: approach is connected to Kauffmann's intuition in that we consider a

219: molecular structure as a combinatorial representation of

220: nucleotide-correlations. As for nucleotide-correlations

221: observations {\sf (a)} and {\sf (b)} are not bound to the particular

222: concept of RNA secondary structures. For instance Stadler {\it et.al.}

223: \cite{SchusterStadler:99}

224: as well as Bastolla {\it et.al.} have shown \cite{Porto:03} that neutral

225: networks exist for proteins, where nucleotide interactions are much more

226: involved \cite{Reidys:00p1}.

227: Therefore it is certainly not the uniqueness of Watson-Crick base pairings

228: implying the existence of neutral networks. Our particular approach

229: comes from this correlation perspective and observations from molecular

230: interaction in RNA molecules.

231: %%%

232: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

233: %%%

234: \begin{figure}[ht]

235: \centerline{%

236: \epsfig{file=f2.eps,width=0.8\textwidth}\hskip15pt } \caption{\small

237: Beyond secondary structures. Suppose we are given an abtract alphabet

238: $\{{\bf A},{\bf B},{\bf C},{\bf D}\}$ with base pairs

239: $\{\{{\bf A},{\bf B}\},\{{\bf D},{\bf C}\},\{{\bf D},{\bf B}\}\}$.

240: We present diagram representations of a secondary structure (top),

241: $3$-noncrossing structure (middle) and a $2$-diagram structure (bottom).

242: The difference between the first two structures is the crossing of bonds

243: and the difference between the second two is the number of interactions

244: for a nucleotide.}

245: \label{F:3}

246: \end{figure}

247: %%%

248: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

249: %%%

250: First there are secondary and tertiary interactions \cite{Doudna:99},

251: the latter typically involving secondary structural elements.

252: Furthermore interaction within RNA molecules can be categorized into

253: three classes, helix-helix interaction, loop/bulge-helix and

254: loop-loop interaction \cite{Westhof:92a,Doudna:99}. The structure of

255: phenylalanine tRNA, and the hammerhead ribozyme \cite{Wedekind:98}

256: have served as paradigms in this context. Base

257: triples and tetra-loops, as well as pseudoknots,

258: \cite{Westhof:92a,Konings:95a,Chamorro:91a,Science:05a}

259: representing loop-loop interactions have led to generalizations of

260: the secondary structure concept. These interactions are subject to

261: steric constraints arising from the biochemistry of the interactions

262: involved.

263: These observations give rise to two different combinatorial abstractions:

264: the consideration of $k$-noncrossing chemical bonds and of

265: $2$-diagrams i.e.~a graph whose vertices are drawn as a horizontal

266: line having degree less than two (and the combination of them,

267: $k$-noncrossing $2$-diagrams). The notion of $k$-noncrossing arises

268: naturally in the context of pseudoknots leading to the concepts of

269: $k$-noncrossing RNA structures \cite{Reidys:07rna1} and to Stadler's

270: bi-secondary structures \cite{Stadler:99} (which are exactly the

271: planar $3$-noncrossing RNA structures). The notion of $2$-diagrams

272: comes up when restricting nucleotide interactions to at most two and

273: therefore allowing the expression or interactions of secondary

274: structure elements.

275:

276: %%%

277: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

278: %%%

279: \section{The Basic Construction}

280: %%%

281: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

282: %%%

283: The notion of $2$-diagrams discussed in the introduction is exactly the

284: motivation of our

285: particular approach. In the following we detail how to derive molecular

286: shapes in which each nucleotide has at most two interactions but which, in

287: difference to biophysical structures, have combinatorial constraints

288: on their nucleotide interactions.

289: This idea is to the best of our knowledge new.

290: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

291: For a given alphabet base pairing rules specify which nucleotides

292: can pair. However, not any two nucleotides are able to establish a

293: bond. For instance, they may be restricted by conditions like no two

294: edges can cross each other when representing a shape as a diagram

295: \cite{Stadler:99}. The non-crossing condition and uniqueness of base

296: pairs are two key properties of RNA secondary structures and allow

297: for Motzkin-path enumeration and tree bijections

298: \cite{Waterman:94,Waterman:79,Zuker:79,Waterman:78a,Schuster:98}. We

299: replace these restrictions on nucleotide interactions by stipulating

300: that {\sf (a)} there exists some base graph $H$ whose sole purpose

301: is to restrict all possible correlations and {\sf (b)} we are given

302: a symmetric relation $\mathcal{R}$, tantamount to a base pairing

303: rule. In order to avoid any confusion we work over the abstract

304: alphabet $\{{\bf A},{\bf B},{\bf C},{\bf D}\}$.

305:

306: %%%

307: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

308: %%%

309: \begin{figure}[ht]

310: \centerline{%

311: \epsfig{file=f3.eps,width=1.0\textwidth}\hskip20pt

312: }

313: \caption{\small Combinatory maps: the base graph $\mathcal{H}$ is displayed

314: on the l.h.s.. The r.h.s.~shows two shapes $\mathcal{S}_1$ and

315: $\mathcal{S}_2$ with two particular sequences that are contained in

316: their respective preimages. For both sequences the shapes are

317: maximal, i.e.~not a single $\mathcal{H}$-edge can be drawn without

318: violating base pairing rules, here $\{\{{\bf A},{\bf B}\}$, $\{{\bf D},

319: {\bf C}\}$, $\{{\bf D},{\bf B}\}\}$.}

320: \label{fig:base}

321: \end{figure}

322: %%%

323: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

324: %%%

325: In this framework a shape $\mathcal{S}$ of a sequence is then the unique

326: maximal $H$-subgraph subject to the property that for any

327: $\mathcal{S}$-edge the incident nucleotides satisfy $\mathcal{R}$.

328: It is remarkable that this simple definition already produces a well

329: defined sequence to structure map! Moreover this definition is in line

330: with the biological point of view: mapping sequences into shapes

331: rather than fixing some shape and then to consider its sequences.

332: It now can be asked what the right choice of $H$ should be and how robust

333: the respective conclusion are. As for dependency on $H$ the answer is that

334: it a.s.~(almost surely in the sense of random graph theory, i.e.~in the

335: limit of long sequences) depends on the number of edges, only.

336: Therefore, the choice of $H=\mathcal{H}$ is not critical for the validity of

337: the main results. To understand why, we consider a generalization of the

338: concept of combinatory maps, i.e.~combinatory maps induced the random graph

339: $G_{n,p}$ (the random graph in which each edge is selected with independent

340: probability $p$). In the sub-critical phase these random combinatory maps

341: a.s.~produce, modulo constants, all properties of the maps induced by

342: $\mathcal{H}$ (Theorem~\ref{T:nn}).

343:

344: %%%

345: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

346: %%%

347: {\bf Theorem}\cite{Reidys:07priv} {\bf (Neutral networks)} {\it

348: Let $p_n=\frac{1-\epsilon}{n}$, $\beta<\sqrt{2}$ and

349: suppose $\omega_n$ tends to $\infty$

350: arbitrarily slowly and $\vartheta_{G_{n,p}}$ is a random combinatory map.

351: Then there exist with high probability at least $\beta^n$ shapes $\mathcal{S}$

352: with the following two properties:\\

353: {\sf (I)} the set of all sequences mapping into $\mathcal{S}$ has a connected

354:           component of size at least $\left(\sqrt{2}\right)^{n}$\\

355: {\sf (II)} the set of all sequences mapping into $\mathcal{S}$ percolates, i.e.

356:            has diameter $n-\omega_n$.\\

357: }

358: %%%

359: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

360: %%%

361:

362: The great advantage of choosing $H=\mathcal{H}$ is the simplicity and

363: algorithmic nature of all proofs. We can explicitly construct all paths

364: involved by diagram chasing. In contrast, the proof of the above

365: result is based on a non trivial analysis of tree components in the

366: random graph $G_{n,p}$.

367: We have

368: the following situation: let $H$ be a graph over $\{1,2,\ldots,n\}$,

369: $\mathcal{A}= \{{\bf A},{\bf B},{\bf D}, {\bf C}\}$ and $Q_4^n$ be

370: the generalized $n$-cube, $Q_4^n$, i.e.~the graph over the sequences

371: $(x_1,\dots,x_n)$, where $x_i\in\mathcal{A}$ and in which two

372: sequences are adjacent if they differ in exactly one nucleotide. Let

373: $d(v,v')$ be the number of nucleotide by which $v$ and $v'$ differ.

374: A component of a graph $H$ is a maximal connected subgraph. We

375: consider relations $\mathcal{R}$ over the abstract alphabet

376: $\mathcal{A}=\{{\bf A},{\bf B},{\bf D}, {\bf C}\}$,

377: i.e.~$\mathcal{R}\subset \mathcal{A}\times \mathcal{A}$ satisfying

378: the following three conditions

379: \begin{eqnarray}

380: \label{E:1} (x,y)\in \mathcal{R}  & \Leftrightarrow &  (y,x)\in \mathcal{R} \\

381: \label{E:2} (x,y)\in \mathcal{R}  & \Rightarrow &\ x\neq y        \\

382: \label{E:3} \forall\, x\neq z\quad (x,y)\in

383: \mathcal{R}\,\wedge\,(y,z)\in \mathcal{R} & \Rightarrow &

384: (x,z)\not\in \mathcal{R} \ .

385: \end{eqnarray}

386: These conditions are motivated from abstracting form $2$-D and $3$-D

387: interactions of the phenylalanine tRNA and the hammerhead ribozyme

388: \cite{Doudna:99}. In both molecules mutual interactions of $3$-nucleotides

389: are absent but multiple pair interactions are responsible for the tertiary

390: structure. In view of eq.~(\ref{E:1}) and eq.~(\ref{E:2}) each relation can be

391: viewed as a graph over $\{{\bf A},{\bf B},{\bf D},{\bf C}\}$ and obviously,

392: eq.~(\ref{E:3}) is equivalent to this graph being

393: bipartite\footnote{For instance, it is easy to check that the relation

394: implied by all Watson-Crick base pairs (i.e.~\{({\bf A},{\bf U}),({\bf

395: U},{\bf A}),({\bf G},{\bf C}),({\bf C},{\bf G})\}) and \{({\bf G},{\bf

396: U}),({\bf U},{\bf G})\}, satisfy

397: conditions eq.~(\ref{E:1}), eq.~(\ref{E:2}) and eq.~(\ref{E:3}).}.

398: We will be particularly interested in the base pairing rule

399: $\mathcal{R}^\dagger$ represented as the graph $\diagram {\bf A}\rline

400: &{\bf B} \rline & {\bf D}\rline & {\bf C}

401: \enddiagram$ i.e.~we allow for the following interactions:

402: $\{\{{\bf A},{\bf B}\},\{{\bf D},{\bf C}\},\{{\bf D},{\bf B}\}\}$.

403: In this sense our nucleotide interactions are more general than those

404: of RNA secondary structures since, for instance, we can express coaxial

405: stacking of helical regions and the formation of isosteric

406: ${\bf C}\cdot {\bf G}-{\bf G}$ triples \cite{Doudna:99}.

407: We introduce the $H$-subgraph $H_{\mathcal{R}}(v)$ having vertex and edge set

408: given by

409: \begin{equation}

410: V_{H_{\mathcal{R}}(v)}=\{1,\dots,n\}, \quad \text{\rm and}\quad

411: E_{H_{\mathcal{R}}(v)}=\{\{i,k\} \mid \{i,k\}\,\text{\rm is an

412: $H$-edge and}\,

413:           (x_i,x_k)\in \mathcal{R}\}

414: \end{equation}

415: and call $H_{\mathcal{R}}(v)$ a shape $\mathcal{S}$ and the mapping

416: $

417: \vartheta_{H}:Q_4^n \longrightarrow \{\mathcal{S}\mid

418: \mathcal{S}=H_{\mathcal{R}}(v)\}

419: $

420: a combinatory map. Note that the above construction entails an implicit

421: notion of maximality, i.e.~ a shape of a sequence $(x_1,\dots,x_n)$ is the

422: maximal $H$-subgraph which satisfies $\mathcal{R}^\dagger$ for all

423: $2$-sets of coordinates $\{x_i,x_j\}$, $\{i,j\}$ being a $H$-edge.

424: In this sense a shape represents a saturated structure.

425: As for $\mathcal{H}$, suppose first $n$ is even. We set

426: $C_{n}(1)$ to be the graph over $\{1,\dots,n\}$ with edge set

427: $\{i,i+1\}$ where the vertices are labeled modulo $n$.

428: Let $\sigma_n$ some permutation of $n$-letters, we then set

429: $C_{n}(\sigma_{n})$ with edges $\{\sigma_n(i),\sigma_n(i+1)\}$

430: and $\mathcal{H}=C_{n}(\sigma_n)$. Next assume $n$ is odd. Then we

431: select an arbitrary element of $\{1,\dots,n\}$, say $u$ and define

432: $\mathcal{H}=C_{n-1}(\sigma_{n-1})\cup \{u\}$ i.e.~the graph

433: with edges $\{\sigma_{n-1}(i),\sigma_{n-1}(i+1)\}$ for $i\neq u$ and

434: $i+1\neq u$, where $\sigma_{n-1}$ is an arbitrary permutation of

435: $\{1,\dots,n\}\setminus \{u\}$. To summarize we have

436: \begin{equation}\label{E:H}

437: \mathcal{H}=

438: \begin{cases}

439: C_n(\sigma_n)                   & \ \text{\rm for $n$ even}\\

440: C_{n-1}(\sigma_{n-1})\cup \{u\} &\  \text{\rm for $n$ odd} \ .

441: \end{cases}

442: \end{equation}

443: %%%

444: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

445: %%%

446:

447:

448: \section{Shapes}

449:

450: %%%

451: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

452: %%%

453: In this section we answer the following basic questions: \\

454: {\sf (1)} What is the relation between base pairing rules and the

455: resulting molecular shapes? \\

456: {\sf (2)} How many shapes does a combinatory map have?\\

457: {\sf (3)} Are there ``many'' shapes with large sets of sequences folding

458:            into them?\\

459: All of the above properties are central for RNA secondary structures and

460: none of them can be answered analytically, despite the fact that we have

461: generating functions for RNA secondary structures.

462: For instance, it is impossible to

463: assess {\it a priori} how many secondary structures have an actual sequence

464: folding into them. The number of RNA structures that actually occur as minimum

465: free energy structures can be much smaller than the total number. For $n=16$,

466: due to finite size effects for the RNA folding, only $63\%$ of the possible

467: RNA structures are realized as minimum free energy structures

468: \cite{Goebel:04}.

469:

470: Let us begin by providing some more background: graph $H'$ is

471: called an induced subgraph of $H$ iff there exists some set

472: $M\subset \{1,\dots,n\}$ such that $E_{H'}=\{\{i,j\}\mid \{i,j\}\in

473: E_H\,\wedge i,j\in M\}$. Intuitively, induced subgraphs come from

474: vertex sets and are far more restricted that arbitrary subgraphs. We

475: now give a simple example of the fact that not every bipartite

476: subgraph of a shape is a shape. For instance, consider

477: $\vartheta_H:Q_4^6\longrightarrow \{H'<H\}$ where

478: \begin{equation}

479: H=\diagram

480:         {\bf 1}  \ar@{-}[r] \ar@{-}[d]

481: &   \ar@{-}[r]  {\bf 4}    &  {\bf 5}     \\

482:         {\bf 2}  \ar@{-}[r]

483: & \ar@{-}[u] {\bf 3} \ar@{-}[r]       &  {\bf 6}\ar@{-}[u]   \\

484: \enddiagram

485: \quad\text{\rm and}\quad

486: H_0=\diagram

487:         {\bf 1}  \ar@{.}[r] \ar@{-}[d]

488: &   \ar@{-}[r]  {\bf 4}    &  {\bf 5}     \\

489:         {\bf 2}  \ar@{-}[r]

490: & \ar@{-}[u] {\bf 3} \ar@{.}[r]       &  {\bf 6}\ar@{-}[u]   \\

491: \enddiagram

492: \end{equation}

493: where the dotted lines represent missing edges. Clearly, $H$ is bipartite

494: and it is easy to check that indeed $H=H({\bf D},{\bf C},{\bf D},{\bf C},

495: {\bf D},{\bf C})$, $H$ holds. Therefore $H$ is a shape but $H_0$ is not.

496: Every sequence realizing $H_0$ has necessarily either

497: {\bf A} at {\bf 1}, and {\bf C} at {\bf 4} or vice versa. In the first case

498: {\bf D} is necessarily at {\bf 3} and {\bf 5}, which leaves no valid choice

499: for {\bf 6}. The second case follows analogously.

500:

501: This is insofar remarkable since making the universal graph $H$

502: (being responsible for all interactions) more complex can simply

503: imply that not all of its subgraphs can be folded by sequences. This

504: is due, as the example indicates, to the nature of the base pairing

505: rule and shows clearly that both: $H$ and $\mathcal{R}$ determine what

506: is a shape and what is not.

507: For simple base graphs, like for instance $\mathcal{H}$, the lemma below

508: shows that {\it any} subgraph (eq~(\ref{E:H})) is a shape.

509: What we can deduce from this is (a) there exist many shapes and (b)

510: $\mathcal{H}$ is so simple that it is indeed only $\mathcal{R}^\dagger$

511: that is relevant for the shapes.

512: The result is

513:

514: %%%

515: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

516: %%%

517: \begin{lemma}\label{L:bip}

518: Suppose $H$ is an arbitrary combinatorial graph over

519: $\{1,\dots,n\}$. \\

520: {\sf (a)} For any relation $\mathcal{R}$ any shape $\mathcal{S}$ is

521: bipartite.\\

522: {\sf (b)} For the relation $\mathcal{R}^\dagger$ and arbitrary base

523: graph $H$, any induced,

524:           bipartite subgraph of $H$ is a shape.\\

525: {\sf (c)} For the relation $\mathcal{R}^\dagger$ and the base graph

526: $\mathcal{H}$

527:           any $\mathcal{H}$-subgraph $H'$ is a shape.

528: \end{lemma}

529: %%%

530: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

531: %%%

532:

533: Since any $\mathcal{H}$-subgraph is a shape we have for instance for sequences

534: of length $16$ exactly $2^{16}=65536$ different shapes in difference to only

535: $274$ RNA secondary structures realized by the minimum free energy folding

536: analyzed in \cite{Goebel:04}. This seems to indicate a vast difference between

537: combinatory maps and RNA secondary structure folding, however, closer

538: inspection reveals that in fact most of these structures are very ``rare'',

539: i.e.~only a few have large preimage sizes.

540: To understand what is happening we present in Figure~\ref{fig:pre} the data

541: on the complete mapping from sequences of length $16$ into subgraphs of the

542: cycle $\mathcal{H}_{16}$.

543: We plot the logarithm of the preimage sizes of a combinatory map over the

544: logarithm of the rank. We can deduce from Figure~\ref{fig:pre} that there

545: are $393$ shapes with a preimage of size greater than $0.5\times 10^6$. The

546: data on RNA secondary structures in \cite{Goebel:04} show that there are

547: $132$ RNA minimum free energy structures with this property.

548: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

549: \begin{figure}[ht]

550: \centerline{%

551: \epsfig{file=fig4_cm.eps,width=0.75\textwidth}\hskip20pt

552: }

553: \caption{\small A double logarithmic plot (base $10$) of the preimage

554: sizes of a combinatory map for $n=16$ as a function of the rank. The

555: underlying graph $\mathcal{H}_{16}$ is displayed in the lower right. The plot

556: shows that there are a few shapes with large and many shapes with very small

557: preimages. This observation is in complete analogy with RNA secondary structure

558: folding maps.

559: }

560: \label{fig:pre}

561: \end{figure}

562: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

563: Figure~\ref{fig:pre} shows that combinatory maps exhibit $393$ shapes

564: with a preimage of size greater than $0.5\times 10^6$. As for RNA secondary

565: structures the data in \cite{Goebel:04} show that there are $132$ RNA minimum

566: free energy structures with this property. But what happens for larger

567: sequence length? The asymptotics of RNA secondary

568: structures \cite{Schuster:98,Reidys:07rna2} shows that the number of RNA

569: secondary structures, ${\sf S}_2(n)$, satisfies ${\sf S}_2(n)\sim \kappa\,

570: n^{-\frac{3}{2}}\alpha^n$ where $1.8488\le \alpha \le 2.64$, depending on

571: what one considers a ``realistic'' secondary structure. In comparison a

572: combinatory map produces (Lemma~\ref{L:bip}) $2^n$ shapes.

573: Therefore combinatory maps produce a total number of structures

574: which is, for large $n$, in a comparable size-range.

575:

576: The above observations motivate the question about the number of

577: shapes with large preimages \cite{Stroh:01}. For notational convenience let

578: \begin{equation}\label{number:1}

579: \mu_+=\left(\frac{1+\sqrt{5}}{2} \right)\qquad \text{\rm and}\qquad

580: \mu_-= \left(\frac{1-\sqrt{5}}{2} \right) \ .

581: \end{equation}

582: We next prove that there are many shapes with large preimages

583: %%%

584: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

585: %%%

586: \begin{lemma}\label{L:many}

587: Suppose the relation $\mathcal{R}^\dagger$ and the base graph

588: $\mathcal{H}$ are given, then there exist at least

589: $\left(\sqrt{2}\right)^{n-1}$

590: shapes with the property that there are at least $2(\mu_+^n+\mu_-^n)$ sequences

591: folding into them.

592: \end{lemma}

593: %%%

594: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

595: %%%

596: Lemma~\ref{L:many} sets the stage for the further investigation of how this set

597: of sequences is organized. Now, knowing that there are exponentially large

598: sets of sequences realizing particular shapes what can be said about their

599: organization? Are they randomly distributed or clustered in sequence

600: space? What is their graph-structures as induced subgraphs of sequence space?

601:

602:

603: %%%

604: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

605: %%%

606:

607: \section{Neutral networks of Combinatory Maps}

608:

609: %%%

610: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

611: %%%

612:

613: One difficulty in the context of neutral networks is that it is practically

614: impossible to prove they exist. Exhaustive enumeration of sequence spaces is

615: limited to small sequence length $n\le 20$ for four letter alphabets

616: \cite{Gruener:95a} and the results are of limited value since finite size

617: effects distort the picture. In case of

618: ${\bf A},{\bf U},{\bf G},{\bf C}$-sequences about $60$\% of all sequences fold

619: into the open structure \cite{Goebel:04}. Several attempts have been made to

620: derive somewhat local criteria whether neutral networks exist \cite{Forst:00},

621: where the key idea is the probing for paths adopted from the actual random

622: graph proof in \cite{Reidys:97a,Reidys:02p2}.

623: In this context local parameters are the only quantities that give some clue

624: about the existence and properties of neutral networks.

625: In case of neutral networks modeled as random graphs, it is the number of

626: neutral neighbors that controls global properties like connectivity

627: and density of the corresponding neutral network. A neutral neighbor is a

628: neighboring sequence which folds into the same structure and the fraction

629: \cite{Reidys:97p}

630: \begin{equation}

631: \lambda^* = 1-\sqrt[\alpha-1]{\alpha^{-1}}

632: \end{equation}

633: is actually the threshold value for connectivity and density. In the following

634: we can derive for combinatory maps the entire distribution of neutral neighbors

635: of particular shapes. The result is actually not ``local'' at all and entails

636: detailed information about the {\it entire} preimage of these shapes. To be

637: precise we can actually derive the underlying rational generating function

638: using the transfer matrix method of enumerative combinatorics.

639: We study the quantity $\lambda_{\mathcal{S}_M}(m)$ being the number of

640: sequences folding into the particular shape $\mathcal{S}$ having exactly $m$

641: neutral neighbors. Our result reads

642: %%%

643: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

644: %%%

645: \begin{theorem}\label{T:NN}

646: For arbitrary shape $\mathcal{S}_M$, where $M\subset \{1,\dots,k\}$ denotes its

647: set of isolated nucleotides, we have

648: \begin{equation}\label{E:ist}

649: \forall\, m\in \mathbb{N}\colon \lambda_{\mathcal{S}_M}(m) \ge

650: \lambda_{C_{2k}}(m)

651: \end{equation}

652: and the generating function of $\lambda_{C_{2k}}(m)$, $F(x,y)=\sum_{k\ge 2}

653: \sum_{m}\lambda_{C_{2k}}(m)x^{m}y^{2k}$ is given by

654: \begin{equation}\label{E:generate}

655: F(x,y)= \frac{2(-4x^{3}y^{6}+2x^{2}y^{6}+3x^{2}y^{4}-5+4x^{2}y^{2}+8xy^{2}-

656: 6x^{3}y^{4}+2x^{4}y^{6})}{-2x^{3}y^{6}+x^{2}y^{6}+x^{2}y^{4}-1+2xy^{2}+

657: x^{2}y^{2}-2x^{3}y^{4}+x^{4}y^{6}}.

658: \end{equation}

659: \end{theorem}

660: %%%

661: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

662: %%%

663: The bi-variate function $F(x,y)$ provides detailed information about

664: neutral neighbors, of the entire preimages of shapes

665: $\mathcal{S}_M$. For instance, Taylor expansion of

666: eq.~(\ref{E:generate}) yields

667: $$

668: F(x,y)=10+(2x^2+4x)y^2+(12x^2+2x^4)y^4+(6x^2+16

669: x^3+12x^4+2x^6)y^6+O(y^{8})

670: $$

671: and the term $(12x^2+2x^4)y^4$ shows that for $n=4$ there are at least $12$

672: vertices with $2$ and $2$ vertices with $4$ neutral neighbors. Likewise, for

673: $n=6$, there are at least $6$ with $2$, $16$ with $3$, $12$ with $4$ and $2$

674: vertices with $6$ neutral neighbors. In addition eq.~(\ref{E:ist}) guarantees

675: that $\mathcal{H}$ itself provides a lower bound on the numbers of neutral

676: neighbors. I.e.~we can pinpoint a specific reference shape providing key

677: information about the neutrality of the entire combinatory map.

678: %%

679: %%%

680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

681: %%%

682: \begin{figure}[ht]

683: \centerline{

684: \epsfig{file=fig3_cm.eps,width=0.75\textwidth}\hskip20pt

685: }

686: \caption{\small The distribution of neutral neighbors for the entire

687: preimage of the ``reference'' shape $\mathcal{S}=\mathcal{H}_{40}$,

688: where $n=40$ denotes the sequence length.

689: We plot the fequency (y-axis) of numbers

690: of neutral neighbors (x-axis) obtained from Theorem~\ref{T:NN}.

691: Note that the degree of a vertex in $Q_4^{40}$ is $120$, showing that

692: the lower bounds on the fractions of neutral neighbors range between

693: $13$\% and $24$\% . }

694: \label{fig:$C_{40}$}

695: \end{figure}

696: %%%

697: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

698: %%%

699:

700: In the previous section we have shown that there are many shapes

701: with large preimages. However, it is not obvious what the graph

702: structure of these preimages is. In this section we will study this

703: structure in detail and prove two remarkable properties. First there

704: are many shapes with sets of sequences having diameter $n$

705: i.e.~there exist two sequences which differ in {\it all} nucleotides

706: both of which map into the particular shape. This finding

707: is tantamount to percolation and indicates that the preimages are indeed

708: extended and not confined in some ``local'' region of sequence

709: space. Secondly we prove that the preimages of exponentially many shapes

710: contain large connected components. In other words we can

711: actually prove the existence of neutral networks for sequence to

712: shape maps, i.e.~many shapes have sets of sequences in which there exists

713: a component of size $\ge \left(\sqrt{2}\right)^{n}$ and of diameter $n$.

714:

715: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

716: %%%%%%%%%%%%%%

717: \begin{figure}[ht]

718: \centerline{%

719: \epsfig{file=f4.eps,width=0.40\textwidth}\hskip15pt } \caption{\small

720: Neutral network. Sequence space is represented as lattice and the neutral net

721: is an induced subgraph (bold edges). We label the pairs of sequences

722: representing antipodal pairs by $({\sf A},{\sf B})$ and $({\sf C},{\sf D})$.

723: The two key properties of neutral nets are their connectivity and

724: percolation.} \label{F:2}

725: \end{figure}

726: %%%%%%%%%%%

727: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

728: %%%

729: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

730: %%%

731: \begin{theorem}\label{T:nn}{\bf (Neutral networks)}

732: Suppose the relation $\mathcal{R}^\dagger$ and the base graph $\mathcal{H}$ are

733: given. Then there exist at least $\left(\sqrt{2}\right)^{n-1}$ many shapes

734: $\mathcal{S}$ with the properties\\

735: {\sf (I)} the set of all sequences mapping into $\mathcal{S}$ has a connected

736:           component of size at least $\mu_+^n+\mu_-^n$.\\

737: {\sf (II)} the set of all sequences mapping into $\mathcal{S}$ percolates, i.e.

738:            has diameter $n$.\\

739: \end{theorem}

740: %%%

741: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

742: %%%

743: In comparison with the corresponding result for random graphs we

744: observe that the neutral networks are in fact slightly bigger and the

745: diameter indeed {\it equals} $n$. This is a result from the fact that the

746: simpler graph $\mathcal{H}$ allows for a different proof, which is very

747: algorithmic. In fact the proof indicates how to

748: explicitly obtain these paths of diameter $n$, while the random

749: graph analogue can only produce their existence. In this sense both

750: constructions complement each other. To illustrate the idea of

751: Theorem~\ref{T:nn} we consider the cycle $\mathcal{H}_4$ and the

752: shape $\mathcal{S}=\mathcal{H}_4$. Then we have the following situation (using

753: the notation of the proof of Theorem~\ref{T:nn})

754: $$

755: a^\varnothing=({\bf C},{\bf D},{\bf C},{\bf D}) \quad \text{\rm and }

756: \quad C_4(({\bf C},{\bf D},{\bf C},{\bf D}))=C_4 \ .

757: $$

758: Theorem~\ref{T:nn} guarantees the existence of the antipodal sequence

759: $\tilde{a}^\varnothing=({\bf B},{\bf A},{\bf B},{\bf A})$ and a path

760: connecting $a^\varnothing$ and $\tilde{a}^\varnothing$ obtained via

761: the steps {\sf (a)}, {\sf (b)} and {\sf (c)}. Explicitly this path

762: for $\mathcal{S}_\varnothing$ from $a^\varnothing$ to

763: $\tilde{a}^\varnothing$ is given by {\small

764: $$

765: \underbrace{

766: \diagram

767: \fbox{{\bf C}}  \ar@{-}[r]      & {\bf D} \\

768:       {\bf D}  \uline\rline   & {\bf C}\uline \\

769: \enddiagram

770: \ \mapsto

771: \diagram

772:       {\bf B}  \ar@{-}[r]      & {\bf D} \\

773:       {\bf D}  \uline\rline    & \fbox{{\bf C}}\uline \\

774: \enddiagram}_{\text{\rm step}

775:  {\sf (a)}: \,\text{\rm replace {\bf C} by {\bf B}}}

776: \ \mapsto

777: \underbrace{\diagram

778:       {\bf B}  \ar@{-}[r]      & {\bf D} \\

779:       \fbox{{\bf D}}  \uline\rline    & {\bf B}\uline\\

780: \enddiagram

781: \ \mapsto

782: \diagram

783:       {\bf B}  \ar@{-}[r]      & \fbox{{\bf D}} \\

784:       {\bf A}  \uline\rline    & {\bf B}\uline\\

785: \enddiagram}_{{\rm step} {\sf (b)}: \,\text{\rm replace {\bf D} by {\bf A}}}

786: \ \mapsto

787: \diagram

788:       {\bf B}  \ar@{-}[r]      & {\bf A} \\

789:       {\bf A}  \uline\rline    & {\bf B}\uline\\

790: \enddiagram

791: $$

792: }

793:

794: Theorem \ref{T:nn} holds for many shapes. For instance the neutral path for

795: $\mathcal{S}_{\{1\}}$, which has length ${\sf diam}(Q_4^4)=4$ and

796: which connects the sequences $a^{\{1\}},\tilde{a}^{\{1\}}$ is given

797: by {\small

798: $$

799: \underbrace{

800: \diagram

801:       {\bf A}  \ar@{.}[r]         & {\bf D} \\

802:       {\bf D}  \ar@{.}[u]\rline   &\fbox{{\bf C}}\uline \\

803: \enddiagram

804: }_{{\rm step} {\sf (a)}: \text{\rm replace {\bf C} by {\bf B}}}

805: \ \mapsto

806: \underbrace{\diagram

807:       {\bf A}  \ar@{.}[r]      &     \fbox{{\bf D}} \\

808:       {\bf D}  \ar@{.}[u]\rline    & {\bf B}\uline \\

809: \enddiagram

810: \ \mapsto

811: \diagram

812:       {\bf A}  \ar@{.}[r]      &       {\bf A} \\

813: \fbox{{\bf D}}  \ar@{.}[u]\rline    &   {\bf B}\uline\\

814: \enddiagram}_{{\rm step} {\sf (b)}: \,\text{\rm replace {\bf D} by {\bf A}}}

815: \ \mapsto

816: \underbrace{

817: \diagram

818:      \fbox{ {\bf A}}  \ar@{.}[r]      & {\bf A} \\

819:       {\bf A}  \ar@{.}[u]\rline    & {\bf B}\uline\\

820: \enddiagram}_{{\rm step} {\sf (c)}: \,\text{\rm replace {\bf A} by {\bf C}}}

821: \ \mapsto

822: \diagram

823:       {\bf C}  \ar@{.}[r]      & {\bf A} \\

824:       {\bf A}  \ar@{.}[u]\rline    & {\bf B}\uline\\

825: \enddiagram

826: $$

827: }

828:

829:

830: %%%

831: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

832: %%%

833:

834:

835: %%%

836: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

837: %%%

838: \section{Appendix}

839: %%%

840: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

841: %%%

842:

843: {\bf Proof of Lemma}~\ref{L:bip}

844: To show {\sf (a)} we first prove that for any relation satisfying

845: eq.~(\ref{E:1}), eq.~(\ref{E:2})

846: and eq.~(\ref{E:3}) a shape $\mathcal{S}$ is bipartite. \\

847: {\it Claim.} Any closed walk in $\mathcal{S}$ has even length.\\

848: Since $\mathcal{S}$ is a shape we have $\mathcal{S}=H(v)$, whence

849: for any closed walk $w=(w_1,w_2\dots,w_r,w_1)$ in $\mathcal{S}$

850: there exists at least one sequence

851: $x=(x_{w_1},x_{w_2},\dots,x_{w_r},x_{w_1})$, where $x_h\in

852: \{\bf{A},\bf{U},\bf{G},\bf{C}\}$. Therefore there exists an

853: injection

854: \begin{eqnarray*}

855: \{(x_{w_1},x_{w_2},\dots,x_{w_r},x_{w_1})\mid w \

856: \text{\rm is a closed walk in $\mathcal{S}$}\}

857: \longrightarrow \{\gamma \mid \text{\rm $\gamma$ is a closed walk in

858: $G(\mathcal{R})$}\}

859: \end{eqnarray*}

860: The idea is to show that

861: $$

862: \{\gamma \mid \text{\rm $\gamma$ is a closed walk in $G(\mathcal{R})$ of odd

863: length}\}=\varnothing   \  .

864: $$

865: Suppose $\gamma$ is a closed walk of minimal, odd length in

866: $G(\mathcal{R})$. Obviously, there are only $4$ vertices in

867: $G(\mathcal{R})$. We can conclude from this that $\gamma$

868: contains a cycle of length $3$ which is in view of eq.~(\ref{E:3})

869: impossible, whence the claim.\\

870: We next select an arbitrary vertex, $i\in \{1\,\dots n\}$ and color

871: all vertices in even distance to $i$ blue and all vertices in odd distance

872: red. Suppose this procedure leads to two monochromatic adjacent

873: vertices $j,r$. Then we obtain a closed walk containing $i,j$ and

874: $r$ of odd length. By induction we can conclude that this walk

875: contains a cycle of odd length, which is impossible, whence

876: $\mathcal{S}$ is bipartite and

877: assertion {\sf (a)} follows.\\

878: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

879: Next we show {\sf (b)} by constructing a vertex

880: $v=(x_1,\dots,x_n)\in Q_4^n$ with the property

881: $H_{\mathcal{R}_{NC}}(v)=H'$, where $H'$ is an arbitrary induced,

882: bipartite subgraph of $H$. Since $H'$ is induced in $H$ there exists

883: some set $M\subset \{1,\dots,n\}$ such that $E_{H'}=\{\{i,j\}\mid

884: \{i,j\}\in E_H\,\wedge\, i,j\in M\}$. First, for all coordinates

885: $x_j$ where $j\not \in M$ we set $x_j={\bf A}$. Then by definition

886: of $\mathcal{R}^\dagger$ for $i,i'\not \in M$, $\{x_i,x_{i'}\}\not\in

887: \mathcal{R}^\dagger$ holds. Since $H'$ is bipartite there exists for

888: the vertices $j\in M$ a bi-coloring (red/blue) such that no two

889: $H'$-adjacent vertices are monochromatic. Suppose $x_j,x_k$ are

890: coordinates where $j,k\in M$. We choose a bi-coloring (red/blue) and

891: set $x_j={\bf D}$ for $j$ being colored red and $x_k={\bf C}$ for

892: $k$ being colored blue, respectively. In view of $({\bf D},{\bf

893: C}),({\bf C},{\bf D})\in \mathcal{R}^\dagger$, we can conclude that for

894: $j,k\in M$ and $\{j,k\}\in H$ we have $\{x_j,x_{k}\}\in

895: \mathcal{R}^\dagger$. Since $({\bf A},{\bf C}),({\bf A},{\bf D})\not

896: \in \mathcal{R}^\dagger$ we derive that for $i\not\in M$ and $j\in M$,

897: $\{x_i,x_j\}\not \in\mathcal{R}^\dagger$ holds. Therefore

898: $H_{\mathcal{R}^\dagger}((x_1,\dots,x_n))=H'$ i.e.~any

899: induced bipartite subgraph of $H$ is a shape.\\

900: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

901: Next we show {\sf (c)}, i.e.~for $\mathcal{H}$ (eq~(\ref{E:H})) any

902: $H'<\mathcal{H}$ is a shape. We proceed by explicitly constructing a

903: vertex $v=(x_1,\dots,x_n)\in Q_4^n$ with the property

904: $\mathcal{H}_{\mathcal{R}^\dagger}(v)=H'$. W.l.o.g.~we can assume

905: that $n$ is even since the isolated point $u$ does not contribute to

906: the $\mathcal{H}$-shapes. Then we have $\mathcal{H}=C_{2k}$ and

907: $V_{C_{2k}}=\{1,\dots,2k\}$. We label the $H'$-vertices $\{1,\dots,

908: 2k\}$ clock-wise such that the (clockwise) first vertex in one

909: largest $H'$-component is $1$. Then $H'$ corresponds to a unique

910: sequence of components. We assume now $x_i\in\{{\bf A},{\bf B}\}$

911: and label all $H'$-vertices except of those contained in the

912: component proceeding vertex $1$. We set inductively

913: \begin{equation}

914: x_i=

915: \begin{cases}

916: {\bf A}            & \text{\rm iff $i=1$}\\

917: x_{i-1}            & \text{\rm iff $\{i-1,i\}$ is not an edge in $H'$}\\

918: \overline{x_{i-1}} & \text{\rm iff $\{i-1,i\}$ is an edge in $H'$} \ ,

919: \end{cases}

920: \end{equation}

921: where $\overline{{\bf B}}={\bf A}$ and $\overline{{\bf A}}={\bf B}$.

922: As for the labeling of the component preceding the component containing

923: vertex $1$, we start with $x_j={\bf C}$ and continue inductively

924: $x_{j+1}={\bf D},x_{j+2}={\bf C},\dots $. This procedure

925: results in a

926: labeling compatible with $H'$ since for $\{i-1,i\}\in H'$ we have either

927: $\{{\bf C},{\bf D}\}$ or $\{{\bf A},{\bf B}\}$ and for $\{i-1,i\}\not\in H'$

928: we have $\{{\bf A},{\bf A}\}$, $\{{\bf B},{\bf B}\}$ and $\{{\bf A},{\bf C}\}$

929: or $\{{\bf B},{\bf C}\}$ (at the beginning of the last component) and

930: $\{{\bf D},{\bf A}\}$ or $\{{\bf C},{\bf A}\}$ (at the end of the

931: last component). Accordingly we obtain a sequence $\tilde{v}_{H'}$

932: with the property $\mathcal{H}(\tilde{v}_{H'})=H'$.$\ \square$

933:

934:

935: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

936:

937: {\bf Proof of Lemma}~\ref{L:many}

938: By definition, there exists a unique component of $\mathcal{H}$

939: which is a cycle of even length, $C_{2k}$. $C_{2k}$ contains for $n$

940: even all and for $n$ odd all but one $\mathcal{H}$-vertices. Suppose

941: $C_{2k}$ contains the vertices

942: $\{i_1,j_1,\dots,i_{k},j_k\}$, where $i_1<j_1<i_2<\dots i_{k}<j_k$. \\

943: {\it Claim.} The number of $2k$-tuples $(x_{i_1},x_{j_1},\dots,x_{i_{k}},

944: x_{j_k})$ such that $C_{2k}((x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}))=

945: C_{2k}$ i.e.

946: $

947: (x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k})\in\vartheta_{C_{2k}}^{-1}(C_{2k})

948: $

949: is given by

950: \begin{equation}\label{E:precise}

951: 2\, \left(\mu_+^{2k}+\mu_-^{2k}\right) \ .

952: \end{equation}

953: To prove the claim we observe that $\mathcal{R}^\dagger$ induces the

954: digraph $D_{\mathcal{R}^\dagger}$ defined as follows: {\small

955: $$

956: D_{\mathcal{R}^\dagger}= \diagram

957: {\bf A}  \ar@/^1pc/@{<-}[rr]|{} \ar@/_1pc/@{->}[rr]|{}& & {\bf B} \\

958: {\bf C} \ar@/^1pc/@{<-}[rr]|{} \ar@/_1pc/@{->}[rr]|{}& & {\bf D}

959: \ar@/^1pc/@{<-}[u]|{} \ar@/_1pc/@{->}[u]|{}

960: \enddiagram  \qquad\text{\rm and} \quad  A_{D_{\mathcal{R}^\dagger}}=

961: \bordermatrix{%

962:   & {\bf A} & {\bf B} & {\bf D} & {\bf C} \cr%

963:   & 0 & 1 & 0 & 0  \cr%

964:   & 1 & 0 & 1 & 0  \cr%

965:   & 0 & 1 & 0 & 1  \cr%

966:   & 0 & 0 & 1 & 0 \cr%

967: }%

968: $$}

969: The number of $2k$-tuples $(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})$

970: with the property

971: $C_{2k}((x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k}))=C_{2k}$ is equal

972: to the number of closed walks of length $2k$ in

973: $D_{\mathcal{R}^\dagger}$. Indeed, in order to obtain such an

974: $2k$-tuple we fix an index, $i_1$, say. Then we start with

975: successively ${\bf A}$, ${\bf B}$, ${\bf D}$ and ${\bf C}$ and form

976: of closed walks of length $2k$ in $D_{\mathcal{R}^\dagger}$ starting

977: and ending at ${\bf A}$, ${\bf B}$, ${\bf D}$ and ${\bf C}$. All

978: these walks are counted respectively, since we have labeled graphs.

979: The number of closed walks of length $\ell$ in

980: $D_{{\mathcal{R}}_{NC}}$ starting and ending at $i$ is given by

981: $(A_{D_{{\mathcal{R}}_{NC}}}^\ell)_{i,i}$, whence the number of all

982: closed walks of length $\ell$ is simply ${\sf

983: Tr}(A_{D_{{\mathcal{R}^\dagger}}}^\ell) =\sum_i

984: (A_{D_{{\mathcal{R}^\dagger}}}^\ell)_{i,i}$. From the definition of the

985: characteristic polynomial,~i.e.~${\sf

986: Tr}(A_{D_{{\mathcal{R}^\dagger}}}^\ell)=\omega_1^\ell+\dots

987: +\omega_r^\ell$, where $\omega_1,\dots,\omega_r$ are the

988: eigenvalues of $A_{D_{{\mathcal{R}^\dagger}}}$ (note $r=4$). We obtain

989: \begin{eqnarray*}

990: \sum_{\ell\ge 0}{\sf Tr}(A_{D_{{\mathcal{R}}_{NC}}}^\ell) z^\ell & =

991: &

992: \sum_{\ell\ge 0} \left[\omega_1^\ell+\dots +\omega_r^\ell \right]z^\ell \\

993:  & = & \sum_{\ell\ge 0}

994: \left[(1+(-1)^\ell)\left(\mu_+^\ell+

995:                      \mu_-^\ell\right)\right] z^\ell

996: \end{eqnarray*}

997: and the claim follows.\\

998: Suppose $(x_{i_1},x_{j_1},\dots,x_{i_{k}},x_{j_k})\in \vartheta_{C_{2k}}^{-1}

999: (C_{2k})$ and $M\subset \{1,\dots,k\}$. We consider the involution

1000: $\tau\colon \mathcal{A}\rightarrow  \mathcal{A}$, where $\tau({\bf A})=

1001: {\bf B}$ and $\tau({\bf D})={\bf C}$ and set

1002: \begin{eqnarray}\label{E:I_M}

1003: \quad I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) & = &

1004: (y_{i_1},x_{j_1}\dots,y_{i_{k}},x_{j_k}), \ \text{\rm where} \

1005: y_{i_\ell}=

1006: \begin{cases}

1007: \tau(x_{i_\ell}) & \text{\rm for } i_\ell \in M         \\

1008: x_{i_\ell}       & \text{\rm for } i_\ell \not\in M \ .

1009: \end{cases}

1010: \end{eqnarray}

1011: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1012: {\it Claim.} There exists a bijection

1013: $$

1014: \beta\colon \{M\subset \{1,2,\dots,k\}\}\rightarrow \{\mathcal{S}_M\}, \quad

1015: M\mapsto\mathcal{S}_{M}

1016: $$

1017: where $\mathcal{S}_M$ is obtained by deleting any two $C_{2k}$-edges

1018: incident to the vertices $i_{h}\in M$ and

1019: \begin{equation}

1020: \forall \,(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})

1021: \in \vartheta_{C_{2k}}^{-1}(C_{2k});\,\quad

1022: \mathcal{S}_M=C_{2k}(I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})) \ .

1023: \end{equation}

1024: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1025: Suppose $M\neq M'$ then w.l.o.g.~we can assume that there exists

1026: some index $i_{h}\in M\setminus M'$, i.e.~$i_{h}$ is isolated in

1027: $\mathcal{S}_{M}$ but not in $\mathcal{S}_{M'}$. Since $j_{h-1}$ and

1028: $j_{h}$ are both in $\mathcal{S}_M$ and $\mathcal{S}_{M'}$ we have

1029: $\{j_{h-1},i_h\},\{j_{h},i_h\},\in \mathcal{S}_{M'}$ but not in

1030: $\mathcal{S}_{M}$, whence $\mathcal{S}_{M}$ and $\mathcal{S}_{M'}$

1031: are different shapes. Since $\mathcal{S}_M$ is an induced bipartite

1032: subgraph, Lemma~\ref{L:bip} implies that any $\mathcal{S}_M$ is a

1033: shape. When $i_{h}\in M$ the following diagram {\small

1034: $$

1035: \diagram

1036:                &         &  x_{j_h} \\

1037: x_{j_{h-1}} \ar@{-}[r] & \text{\rm \fbox{$x_{i_{h}}$}}

1038: \ar@{-}[ur] \ar@{-}[r] & x_{j_{h}} \\

1039: \enddiagram

1040: \quad \mapsto\quad

1041: \diagram

1042:                          &                                           & {x_{j_{h}}} \\

1043: {x_{j_{h-1}}} \ar@{.}[r] & \text{\rm \fbox{$\tau(x_{i_{h}})$}}

1044: \ar@{.}[ur] \ar@{.}[r] & {x_{j_{h}}}

1045: \enddiagram

1046: $$}

1047: shows that $I_M$ has the property: for arbitrary

1048: $$

1049: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\in\vartheta_{C_{2k}}^{-1}(C_{2k})

1050: $$

1051: the shape $C_{2k}(I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}))$

1052: differs from $C_{2k}$ exactly by deleting the two $C_{2k}$-edges

1053: incident to all $i_\ell\in M$; explicitly {\tiny

1054: $$

1055: \diagram

1056:                &         &  {\bf D} \\

1057: {\bf A} \ar@{-}[r] & \fbox{{\bf B}} \ar@{-}[ur] \ar@{-}[r] & {\bf A} \\

1058: \enddiagram

1059: \ \mapsto \

1060: \diagram

1061:                &         &  {{\bf D}} \\

1062: {\bf A} \ar@{.}[r] & \fbox{{\bf A}} \ar@{.}[ur] \ar@{.}[r] & {\bf A} \\

1063: \enddiagram

1064: \quad

1065: \diagram

1066:                &         &  {\bf B} \\

1067: {\bf C} \ar@{-}[r] & \fbox{{\bf D}} \ar@{-}[ur] \ar@{-}[r] & {\bf C}\\

1068: \enddiagram

1069: \ \mapsto \

1070: \diagram

1071:                &         &  {\bf B} \\

1072: {\bf C} \ar@{.}[r] & \fbox{{\bf C}} \ar@{.}[ur] \ar@{.}[r] & {\bf C}\\

1073: \enddiagram

1074: \quad

1075: $$

1076: }

1077: {\tiny

1078: $$

1079: \diagram

1080:                                  &  \fbox{{\bf A}}\ar@{-}[r]  &  {\bf B}  \\

1081: {\bf B}\ar@{-}[ur] \ar@{-}[r]  & \fbox{{\bf D}} \ar@{-}[ur]

1082: \ar@{-}[r] &

1083: {\bf C}\\

1084: \enddiagram

1085: \ \mapsto \

1086:  \diagram

1087:                                  & \fbox{{\bf B}}\ar@{.}[r]  &  {\bf B}  \\

1088: {\bf B}\ar@{.}[ur] \ar@{.}[r]&\fbox{{\bf C}} \ar@{.}[ur] \ar@{.}[r] & {\bf C}\\

1089: \enddiagram

1090: \quad

1091: \diagram

1092:                                  &  \fbox{{\bf C}}\ar@{-}[r]  &  {\bf D}  \\

1093: {\bf D}\ar@{-}[ur] \ar@{-}[r]&\fbox{{\bf B}} \ar@{-}[ur] \ar@{-}[r] & {\bf A}\\

1094: \enddiagram

1095: \ \mapsto \

1096:  \diagram

1097:                                &  \fbox{{\bf D}}\ar@{.}[r]  &  {\bf D}  \\

1098: {\bf D}\ar@{.}[ur] \ar@{.}[r]  &\fbox{{\bf A}} \ar@{.}[ur]

1099: \ar@{.}[r] & {\bf A}\\

1100: \enddiagram

1101: $$

1102: }

1103: and the claim is proved. The claim implies that $I_M$ induces the injection

1104: \begin{eqnarray}

1105: \quad I_M \colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &

1106: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad

1107: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto

1108: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) \ .

1109: \end{eqnarray}

1110: This injection allows us to relate the sets $\vartheta_{C_{2k}}^{-1}(C_{2k})$

1111: and $\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)$ and in particular

1112: \begin{equation}

1113: \vert\vartheta_{C_{2k}}^{-1}(C_{2k})\vert

1114: \le \vert\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)\vert \ .

1115: \end{equation}

1116: Since $M\subset \{1,\dots,k\}$ was arbitrary we can conclude that there

1117: are $2^k$ subsets and hence $2^k$ distinct shapes $\mathcal{S}_M$.

1118: Hence there exist at least

1119: $$

1120: 2^{k} \ge  \left(\sqrt{2}\right)^{n-1}

1121: $$

1122: shapes $\mathcal{S}$ with the property

1123: \begin{eqnarray*}

1124: \vert \vartheta_\mathcal{H}^{-1}(\mathcal{S})\vert  \ge

1125:                                 \vert \vartheta_\mathcal{H}^{-1}(\mathcal{H})\vert

1126:  \ge  2\, \left(\mu_+^{2k}+

1127:                    \mu_-^{2k}\right) \ .

1128: \end{eqnarray*}

1129: In case of $n\not\equiv 0\mod 2$ we have exactly one more isolated point,

1130: i.e.

1131: \begin{equation}\label{E:isolated}

1132: \vert \vartheta_\mathcal{H}^{-1}(\mathcal{S})\vert\ge 8\,

1133: \left(\mu_+^{n-1}+

1134:                    \mu_-^{n-1}\right)

1135: \end{equation}

1136: and since $4\ge \left( \mu_+ + \mu_- \right)$

1137: the lemma follows. $\ \square$

1138:

1139:

1140:

1141: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1142:

1143: {\bf Proof of Theorem}~\ref{T:nn}

1144: We first prove that at least $\left(\sqrt{2}\right)^{n-1}$ shapes $\mathcal{S}$

1145: have a preimage $\vartheta_\mathcal{H}^{-1}(\mathcal{S})$ with diameter $n$.

1146: We will work with the particular set of shapes

1147: $\{\mathcal{S}_M \mid M\subset \{1,\dots,k\}\}$, introduced in

1148: Lemma~\ref{L:many} and prove that all of them have a

1149: component of size $\ge \mu_+^{n}+ \mu_-^{n} >\left(\sqrt{2}\right)^{n}$ and

1150: ${\sf diam}(\vartheta_\mathcal{H}^{-1}(\mathcal{S}))=n$.

1151: Let $C_{2k}$ be the $\mathcal{H}$-cycle, which contains all

1152: $\mathcal{H}$-vertices for $n$ even and all but one $\mathcal{H}$-vertices,

1153: for $n$ odd.

1154: Let $V_{C_{2k}}=\{i_1,j_1,\dots,i_{k},j_k\}$, where

1155: $i_1<j_1<i_2<\dots i_{k}<j_k$. \\

1156: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1157: {\it Claim $1$.} Let $M\subset \{1,\dots,k\}$, then there exist at

1158: least $2^{k}$ shapes $S_M$ over $Q_4^{2k}$ such that

1159: \begin{equation}

1160: \text{\sf diam}(\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M))=

1161: \begin{cases}

1162: n   & \text{\rm for}\  n\equiv 0\mod 2    \\

1163: n-1 & \text{\rm for}\  n\not\equiv 0\mod 2 \ .

1164: \end{cases}

1165: \end{equation}

1166: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1167: We first show that for each $M$ there exists a pair of antipodal sequences,

1168: i.e.~$(a^M,\tilde{a}^M)$ with $d(a^M,\tilde{a}^M)=2k$ and a path

1169: $(a^M,w_1^M,\dots,w_{2k-1}^M,\tilde{a}^M)$ such that

1170: $\vartheta_{C_{2k}}(w_i^M)=\mathcal{S}_M$.

1171: \begin{equation}\label{E:a^M}

1172: a^M=(a^M_{i_1},a_{j_1}\dots,a^M_{i_{k}},a_{j_k}), \quad \text{\rm where}\quad

1173: a_{j_h}={\bf D}, \ \text{\rm and}\  a^M_{i_h}=

1174: \begin{cases}

1175: {\bf A} & \text{\rm for} \ i_h\in M \\

1176: {\bf C} & \text{\rm otherwise.}

1177: \end{cases}

1178: \end{equation}

1179: In particular we have $a^\varnothing=({\bf C},{\bf D},\dots, {\bf

1180: C},{\bf D})$. Then $\mathcal{S}_M=C_{2k}(a^M)$, i.e.~$\mathcal{S}_M$

1181: is the shape obtained by removing for each $i_h\in M$ the two

1182: incident $C_{2k}$-edges. Next we define an antipode $\tilde{a}^M$,

1183: i.e.~an element of $Q_4^{2k}$ with the property

1184: $d(a^M,\tilde{a}^M)=2k$ as follows

1185: \begin{equation}

1186: \tilde{a}^M=(\tilde{a}^M_{i_1},\tilde{a}_{j_1}\dots,\tilde{a}^M_{i_{k}},

1187: \tilde{a}_{j_k}), \quad \text{\rm where}\quad

1188: \tilde{a}_{j_h}={\bf A}, \ \text{\rm and}\ \tilde{a}^M_{i_h}=

1189: \begin{cases}

1190: {\bf C} & \text{\rm for} \ i_h\in M \\

1191: {\bf B} & \text{\rm otherwise.}

1192: \end{cases}

1193: \end{equation}

1194: We can transform $a^M$ into $\tilde{a}^M$ by successively changing

1195: exactly one coordinate in three steps: {\sf (a)} replace (in any

1196: order) for $i_h\not\in M$ successively all $a_{i_h}={\bf C}$ by

1197: ${\bf B}$, {\sf (b)} replace (in any order) successively all

1198: $a_{j_h}={\bf D}$ by ${\bf A}$ and finally {\sf (c)} substitute (in

1199: any order) for all $i_h\in M$ $a_{i_h}={\bf A}$ by

1200: ${\bf C}$. \\

1201: This proves that there exists a $Q_4^{2k}$-path

1202: \begin{equation}

1203: (a^M,w_1^M,\dots,w_{2k-1}^M,\tilde{a}^M)

1204: \end{equation}

1205: connecting $a^M$ and $\tilde{a}^M$, such that

1206: \begin{equation}

1207: \forall \, 1\le i\le 2k-1, \quad C_{2k}(w_i^M)=\mathcal{S}_M \ .

1208: \end{equation}

1209: I.e.~all intermediate steps of the path are mapped by

1210: $\vartheta_{\mathcal{H}}$ into

1211: the shape $\mathcal{S}_M$.

1212: As shown in Lemma~\ref{L:many} there are $2^k$ different shapes

1213: $\mathcal{S}_M$

1214: induced by the subsets $M\subset \{1,\dots,k\}$, whence Claim $1$.\\

1215: In case of $n\equiv 0\mod 2$ we derive

1216: $2^k=\left(\sqrt{2}\right)^n$. In case of $n\not \equiv 0\mod 2$

1217: there exists exactly one vertex $u$ which is isolated in

1218: $\mathcal{H}$. Then we simply add the isolated point $u$ to each

1219: shape $\mathcal{S}_M$ and shall in the following identify these new

1220: shapes with $\mathcal{S}_M$. Then

1221: $\vert\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M)\vert =

1222: 4\vert\vartheta_{C_{2k}}^{-1}(\mathcal{S}_M)\vert$. We can choose

1223: $a_u={\bf A}$ and $\tilde{a}_u={\bf B}$ and

1224: \begin{eqnarray*}

1225: a_u^M & = & (a^M_{i_1},a_{j_1}\dots,a_u,\dots, a^M_{i_{k}},a_{j_k})\\

1226: \tilde{a}_u^M & = & (\tilde{a}^M_{i_1},\tilde{a}_{j_1}\dots,\tilde{a}_u,\dots,

1227: \tilde{a}^M_{i_{k}},\tilde{a}_{j_{k}})

1228: \end{eqnarray*}

1229: satisfy $d(a_u^M,\tilde{a}_u^M)=n$ and there exists a $Q_4^{n}$-path

1230: $(a_u^M,w_1^M,\dots,w_{2k}^M,\tilde{a}_u^M)$ connecting $a_u^M$ and

1231: $\tilde{a}_u^M$, with the property

1232: \begin{equation}

1233: \forall \, 1\le i\le 2k, \quad C_{2k}(w_i^M)=\mathcal{S}_M \ .

1234: \end{equation}

1235: Therefore we have proved that at least $\left(\sqrt{2}\right)^{n-1}$

1236: shapes $\mathcal{S}_M$ have a preimage $\vartheta_\mathcal{H}^{-1}(\mathcal{S}_M)$

1237: with diameter $n$.\\

1238: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1239: {\it Claim $2$.}

1240: \begin{equation}\label{E:claim2}

1241: \vert \left\{\mathcal{S}_M\mid \vert \mathcal{C}(\vartheta_\mathcal{H}^{-1}

1242: (\mathcal{S}))\vert \ \ge

1243: \mu_+^{2k}+\mu_-^{2k}

1244: \right\}\vert \ge 2^k \ .

1245: \end{equation}

1246: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1247: To prove the Claim $2$ we first observe that

1248: $\vartheta_\mathcal{H}^{-1}(\mathcal{H})$ has

1249: exactly two components of equal size

1250: \begin{equation}\label{E:cool2}

1251: \mu_+^{2k}+\mu_-^{2k} \ .

1252: \end{equation}

1253: Indeed, any vertex $v\in \vartheta_\mathcal{H}^{-1}(\mathcal{H})$ can be

1254: transformed into either

1255: $$

1256: a^\varnothing=({\bf C},{\bf D},{\bf C},\dots,{\bf D},{\bf C}),

1257: \quad \text{\rm or}

1258: \quad

1259: b^\varnothing=({\bf D},{\bf C},\dots,{\bf D},{\bf C},{\bf D})

1260: $$

1261: successively using

1262: the two steps {\sf (I)} replace (in any order) all ${\bf A}$ by ${\bf D}$

1263: and {\sf (II)} replace all (in any order) ${\bf B}$ by ${\bf C}$. Hence

1264: there exist exactly two components and the map

1265: $$

1266: \sigma(x_{i_1},x_{j_1},\dots, x_{i_k},x_{j_k})=

1267: (x_{j_k},x_{i_1},\dots, x_{j_{k-1}},x_{i_k})

1268: $$

1269: is a bijection between them, whence they have equal size.

1270: Eq.~(\ref{E:cool2}) then follows from eq.~(\ref{E:precise}) in

1271: Lemma~\ref{L:many}. We next claim that the mapping

1272: $I_M$ of eq.~(\ref{E:I_M}) is in fact an injective graph morphism

1273: \begin{eqnarray}

1274: \ I_M\colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &

1275: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad

1276: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto

1277: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}).

1278: \end{eqnarray}

1279: I.e. for two adjacent vertices $v,v'\in\vartheta_{C_{2k}}^{-1}$, the

1280: vertices $I_{M}(v)$ and $I_{M}(v')$ are adjacent. To prove

1281: this we consider the diagrams {\small

1282: $$

1283: x_{j_{h-1}}=x_{j_h}={\bf B}:\quad

1284: \underbrace{\diagram

1285:                                  &  \fbox{{\bf A}}\ar@{-}[r]  &  {\bf B}  \\

1286: {\bf B}\ar@{-}[ur] \ar@{-}[r]  & \fbox{{\bf D}}

1287: \ar@/^2pc/@{->}[u]|{}

1288: \ar@/_2pc/@{<-}[u]|{}

1289: \ar@{-}[ur]  & \\

1290: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1291: \qquad \mapsto

1292: \qquad

1293: \underbrace{\diagram

1294:                                  &  \fbox{{\bf B}}\ar@{.}[r]  &  {\bf B}  \\

1295: {\bf B}\ar@{.}[ur] \ar@{.}[r]  & \fbox{{\bf C}} \ar@{.}[ur]

1296: \ar@/^2pc/@{->}[u]|{}

1297: \ar@/_2pc/@{<-}[u]|{}

1298:   & \\

1299: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1300: $$}

1301: {\small

1302: $$

1303: x_{j_{h-1}}=x_{j_h}={\bf D}:\quad

1304: \underbrace{\diagram

1305:                                  &  \fbox{{\bf C}}\ar@{-}[r]  &  {\bf D}  \\

1306: {\bf D}\ar@{-}[ur] \ar@{-}[r]&\fbox{{\bf B}} \ar@{-}[ur]

1307: \ar@/^2pc/@{->}[u]|{}

1308: \ar@/_2pc/@{<-}[u]|{}

1309: &\\

1310: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1311: \qquad \mapsto

1312: \qquad

1313: \underbrace{\diagram

1314:                             &  \fbox{{\bf D}}\ar@{.}[r]  &  {\bf D}  \\

1315: {\bf D}\ar@{.}[ur] \ar@{.}[r]&\fbox{{\bf A}} \ar@{.}[ur]

1316: \ar@/^2pc/@{->}[u]|{}

1317: \ar@/_2pc/@{<-}[u]|{}

1318: &\\

1319: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1320: $$}

1321: The above diagrams represent the two scenarios for two adjacent

1322: vertices $v,v'\in\vartheta^{-1}_{C_{2k}}(C_{2k})$. I.e.~if $v$ and

1323: $v'$ are both contained in $\vartheta_{C_{2k}}^{-1}(C_{2k})$ and

1324: differ in $x_{i_h}$ and $x_{i_h}'$ then we have either

1325: $x_{j_{h-1}}=x_{j_h}={\bf B}$ and $x_{i_h}={\bf D}$ and

1326: $x_{i_h}'={\bf A}$ or $x_{j_{h-1}}=x_{j_h}={\bf D}$ and

1327: $x_{i_h}={\bf B}$ and $x_{i_h}'={\bf C}$. Suppose we apply $I_{M}$

1328: and ${i_h}\in M$, then the resulting vertices $I_{M}(v)$ and

1329: $I_{M}(v')$ are again adjacent, whence $I_M$ is an injective graph

1330: morphism. \ Accordingly, $I_M$ maps components into components, from

1331: which we can conclude that for each $M\subset \{1,\dots,k\}$ the

1332: shape $\mathcal{S}_M$ has a component of size

1333: $\mu_+^{2k}+\mu_-^{2k}$ and Claim $2$ is proved.\\

1334: In case of $2k=n$ the

1335: assertion follows directly. For $n$ odd we have to repeat the

1336: argument in Lemma~\ref{L:many}, where we considered the isolated

1337: point $u$ in eq.~(\ref{E:isolated}). Since we used the same set of

1338: shapes $\{\mathcal{S}_M\mid M\subset \{1,\dots,k\}\}$ for both

1339: claims the theorem follows. $\ \square$

1340:

1341: {\bf Proof of Theorem}~\ref{T:NN}

1342: It is clear that we can restrict our analysis to the case $n\equiv

1343: 0\mod 2$, i.e.~$\mathcal{H}=C_{2k}$, since the isolated point

1344: contributes always $4$ neutral neighbors for any shape.

1345: Eq.~(\ref{E:ist}) is a direct consequence of

1346: \begin{eqnarray*}

1347: \ I_M\colon \vartheta_{C_{2k}}^{-1}(C_{2k}) & \longrightarrow &

1348: \vartheta_{C_{2k}}^{-1}(\mathcal{S}_M), \ \quad

1349: (x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k})\mapsto

1350: I_M(x_{i_1},x_{j_1}\dots,x_{i_{k}},x_{j_k}) \ .

1351: \end{eqnarray*}

1352: being an injective graph morphism. Thus it suffices to prove

1353: eq.~(\ref{E:generate}). We observe that for $v\in

1354: \vartheta_{C_{2k}}^{-1}(C_{2k})$

1355: $$

1356: v=(x_{i_1},x_{j_1},\dots, x_{i_k},x_{j_k})

1357: \mapsto

1358: (t_{i_1},t_{j_1},\dots, t_{i_k},t_{j_k}), \ \text{\rm where} \

1359: t_s=

1360: \begin{cases}

1361: (x_{j_{h-1}},x_{i_h},x_{j_{h}}) & \text{\rm for } s=i_h \\

1362: (x_{i_{h}},x_{j_h},x_{i_{h+1}}) & \text{\rm for } s=j_h

1363: \end{cases}

1364: $$

1365: is a bijection, where $h$ is considered modulo $k$.

1366: Hence every $v\in \vartheta_{C_{2k}}^{-1}(C_{2k})$

1367: can be uniquely decomposed into a sequence of triples. Since $v\in

1368: \vartheta_{C_{2k}}^{-1}(C_{2k})$ there are exactly the following ten triples

1369: $$

1370: V_{D}=\{\textbf{ABA,ABD,BAB,BDB,BDC,DBD,DBA,DCD,CDC,CDB}\}

1371: $$

1372: and setting

1373: $$

1374: E_{D}=

1375: \{\left((x_{j_{h-1}},x_{i_h},x_{j_{h}}),(x_{i_h},x_{j_{h}},x_{i_{h+1}})\right)

1376:   \mid (x_{j_{h-1}},x_{i_h},x_{j_{h}})\in V_{D}\}

1377: $$

1378: we obtain the digraph ${D}$. Suppose we are given $v,v'\in

1379: \vartheta_{C_{2k}}^{-1}(C_{2k})$ with $d(v,v')=1$ then we have the

1380: following alternative {\small

1381: $$

1382: x_{j_{h-1}}=x_{j_h}={\bf B}:

1383: \underbrace{\diagram

1384:                        &  \fbox{{\bf D}}\ar@{-}[r]  &  {\bf B}  \\

1385: {\bf B}\ar@{-}[ur] \ar@{-}[r]  & \fbox{{\bf A}}

1386: \ar@/^2pc/@{->}[u]|{}

1387: \ar@/_2pc/@{<-}[u]|{}

1388: \ar@{-}[ur]  & \\

1389: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1390: \qquad x_{j_{h-1}}=x_{j_h}={\bf D}:

1391: \qquad

1392: \underbrace{\diagram

1393:                                  &  \fbox{{\bf C}}\ar@{-}[r]  &  {\bf D}  \\

1394: {\bf D}\ar@{-}[ur] \ar@{-}[r]  & \fbox{{\bf B}} \ar@{-}[ur]

1395: \ar@/^2pc/@{->}[u]|{}

1396: \ar@/_2pc/@{<-}[u]|{}

1397:   & \\

1398: \enddiagram}_{(x_{j_{h-1}},x_{i_h},x_{j_{h}})}

1399: $$}

1400:

1401:

1402: The idea is now to count all triples i.e.~$(x_{j_{h-1}},x_{i_h},

1403: x_{j_{h}})$, $(x_{i_{h-1}},x_{j_{h-1}},x_{i_{h}})$ contained in

1404: $\Theta=\{ {\bf BAB}, {\bf BDB}, {\bf DBD}, {\bf DCD}\}$ in

1405: $\vartheta^{-1}_{C_{2k}}(C_{2k})$. Let next $R[x]$ be a polynomial

1406: ring and $w\colon E_{D}\longrightarrow R[x]$ a function given by

1407: $w(e)=x$ iff the arc $e$ has terminus $\tau\in\Theta$, otherwise

1408: $w(e)=1$. If $\Gamma=e_{1}e_{2}\dots e_{\ell}$ is a walk of length

1409: $\ell$ in $E_{D}$, then the weight of $\Gamma$ is defined by

1410: $w(\Gamma)=w(e_1)w(e_2)\dots w(e_\ell)$. Introducing the formal

1411: variable $x$ in $w$ allows us to count the triples in $\Theta$

1412: within some $v\in\vartheta_{C_{2k}}^{-1}(C_{2k})$. The number of

1413: closed walks of length $\ell$ in ${D}$ is $\sum_{v\in

1414: {V_{D}}}{\left[{A_{D}}^\ell\right]}_{v,v} =\text{\sf

1415: Tr}(A_{D}^\ell)$, where

1416: $A_{D}$ is the adjacency matrix of ${D}$.\\

1417: Suppose $B$ is a $p\times p$ matrix and $\{\eta_{i}\}_{i=1}^{p}$ are

1418: all the eigenvalues of $B$, then we have ${\sf

1419: det}B=\prod_{i}{\eta_{i}}.$ Let $\{\xi_{i}\}_{i=1}^{p}$ and

1420: $\{\omega_{i}\}_{i=1}^{p}$ be all the eigenvalues of $I-yA$ and $A$

1421: respectively, then we have $\xi_{i}=1-y\omega_{i}$, where $1\leq

1422: i\leq p$. For the set of  all the nonzero eigenvalues of $A$,

1423: $\{\omega_{i}\}_{i=1}^{r}$ we derive ${\sf det}(I-yA)=

1424: \prod_{i=1}^{r}(1-y\omega_{i})$. We set $Q(y)={\sf det}(I-yA)$ and

1425: have $p=10=\vert V_{D}\vert$, $A=A_{{D}}$ and $r=6$ for $x\neq 1$,

1426: whence

1427: \begin{equation}\label{E:reihe}

1428: \sum_{\ell \ge 1} \text{\sf Tr}(A_{D}^\ell) y^\ell =\sum_{\ell \ge

1429: 1}(\omega_{1}^{\ell}+\dots+\omega_{r}^{\ell})y^{\ell}= \sum_{i=1}^r

1430: \frac{\omega_iy}{1-\omega_iy}=\frac{-y\, Q'(y)}{Q(y)}.

1431: \end{equation}

1432: After some computation we derive

1433: $Q(y)=1-2xy^2-x^2y^2+2x^{3}y^{4}-x^{4}y^{6}+2x^{3}y^{6}-x^{2}y^{6}-x^{2}y^{4}$

1434: and the lemma follows from eq.~(\ref{E:reihe}). $\ \square$

1435:

1436:

1437:

1438: %%%

1439: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1440: %%%

1441: {\bf Acknowledgments.}

1442: %%%

1443: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1444: %%%

1445: We thank F.W.D.~Huang and L.C.~Zuo for helpful suggestions.

1446: This work was supported by the 973 Project, the PCSIRT Project of the

1447: Ministry of Education, the Ministry of Science and Technology, and

1448: the National Science Foundation of China.

1449:

1450:

1451:

1452: \bibliography{cm}

1453: \bibliographystyle{plain}

1454:

1455: %%%

1456: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1457: %%%

1458: %%%

1459: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

1460: %%%

1461:

1462: \end{document}

1463:

1464:

1465: