1: \documentclass{article} % \submitted{10/7/05} \whohasit{}
2: \usepackage{fullpage}
3: \usepackage{latexsym,amsmath,amssymb,color,rotating,xspace,epic,eepic,latexsym}
4: \usepackage{pstricks,pst-coil}
5: \usepackage{graphics,graphicx}
6: \input{qmac.tex}
7: \definecolor{brown}{rgb}{0.6,0.4,0.2}
8: \definecolor{purple}{rgb}{0.8,0.0,1.0}
9: \definecolor{gray}{rgb}{0.5,0.5,0.5}
10:
11: \title{Shor's Algorithm on a Nearest-Neighbor Machine}
12: \author{Samuel A. Kutin\thanks{Center for Communications Research, 805 Bunn Drive,
13: Princeton, NJ 08540. {\tt kutin@idaccr.org}}}
14: \date{} % Month Year
15:
16: \newcommand{\caps}[1]{{\sc #1}}
17: \def\SAWUNEH{\caps{sawuneh}}
18: \newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor}
19: \newcommand{\ceil}[1]{\left\lceil #1 \right\rceil}
20: \newcommand{\xor}{\mathbin{\oplus}}
21: \newcommand{\xoreq}{\mathbin{\oplus\!=}}
22: \newcommand{\maj}{\mathop{\rm MAJ}\nolimits}
23: \newcommand{\raiseintable}[1]{\raisebox{1.8ex}[0cm][0cm]{#1}}
24: \newcommand{\logup}[1]{\ceil{\log_2 {#1}}}
25: \newcommand{\flog}[1]{\floor{\lg {#1}}}
26: \newcommand{\igate}[2]{{#1} \xoreq {1 \over 2} {#2}}
27: \newcommand{\jgate}[2]{{#1} \xoreq - {1 \over 2} {#2}}
28: \newcommand{\iorjgate}[2]{{#1} \xoreq \pm {1 \over 2} {#2}}
29: \newcommand{\igatealign}[2]{{#1} & \xoreq {1 \over 2} {#2}}
30: \newcommand{\jgatealign}[2]{{#1} & \xoreq - {1 \over 2} {#2}}
31: \newcommand{\qu}[1]{{\left| {#1} \right\rangle}}
32: \newcommand{\phihat}{\smash[t]{\hat{\phi}}}
33:
34: \newcommand{\DKRS}{cla}
35: \newcommand{\CDKM}{ripple}
36:
37: \newcommand{\QFT}{\caps{QFT}\xspace}
38: \begin{document}
39:
40: \maketitle
41: \begin{abstract}
42: We give a new ``nested adds'' circuit for implementing Shor's
43: algorithm in linear width and quadratic depth on a nearest-neighbor
44: machine. Our circuit combines Draper's transform adder with
45: approximation ideas of Zalka. The transform adder requires small
46: controlled rotations. We also give another version, with slightly
47: larger depth, using only reversible classical gates. We do not know
48: which version will ultimately be cheaper to implement.
49: \end{abstract}
50:
51: \section{Introduction}
52: \label{intro-sec}
53:
54: We describe a new quantum exponentiation circuit that obeys a
55: ``nearest-neighbor'' constraint: we imagine that qubits are arranged
56: in a line, and we are only allowed to perform interactions between
57: adjacent qubits. Previous $n$-bit nearest-neighbor exponentiation
58: circuits~\cite{FDH,V}
59: required either depth $O(n^3)$ or superlinear width, but our construction
60: has width $O(n)$ and depth $O(n^2)$. This new exponentiation circuit,
61: together with a nearest-neighbor quantum Fourier transform (QFT)~\cite{FDH},
62: gives a new circuit
63: for Shor's factorization algorithm~\cite{Shor}.
64:
65: A number of people have constructed exponentiation circuits for general
66: architectures (i.e., without the nearest-neighbor restriction).
67: See, for example,~\cite{VMI,VMIL,V} for recent summaries.
68: Many of the techniques used to reduce circuit depth
69: do not appear to apply to a nearest-neighbor architecture.
70:
71: Beauregard~\cite{Beau} has given a simple exponentiation
72: circuit using Draper's transform adder~\cite{Drap}. The adder requires
73: two QFTs together with some controlled rotations. Beauregard's circuit
74: uses only $2n + O(1)$ qubits, but has cubic depth---the dominant cost is
75: $\Theta(n^2)$ applications of the transform adder.
76: Fowler, Devitt, and Hollenberg~\cite{FDH} modify Beauregard's circuit for use on a
77: nearest-neighbor machine, and they show that these modifications do not
78: affect the dominant terms in the expression for size or depth.
79:
80: Our contribution is a new approximate controlled modular multiplier with
81: linear width and linear depth. We use
82: an idea of Zalka~\cite{Zalka} for building approximate multipliers.
83: While we still multiply by performing $O(n)$ additions, we only
84: perform a constant number of large QFTs for each multiply.
85: When we insert our multiplier into the framework of Fowler et al.,
86: we obtain a nearest-neighbor exponentiation circuit with linear
87: width and quadratic depth.\footnote{Zalka~\cite{Z2} has recently pointed
88: out this same idea of performing mulitple additions framed by a single
89: QFT, but he does not work out any details or discuss the application
90: to nearest-neighbor circuits.}
91:
92: We first set some notation and review prior work in Section~\ref{prelim-sec}.
93: We describe our multiplier and the resulting exponentiator
94: in Section~\ref{main-sec}, and we discuss a version for general
95: architectures in Section~\ref{general-sec}.
96:
97: Following Fowler et al., we assume that any interaction between two
98: adjacent qubits has unit cost. In practice, some gates may be easier
99: to implement than others. Our circuit requires small controlled rotations
100: that may prove expensive. Van Meter~\cite{V} discusses the error correction
101: requirements for various adders and suggests that the transform adder may not
102: be useful in practice.
103: In Section~\ref{classical-sec}
104: we describe a version of the circuit that is essentially classical
105: and that does not require these small rotations. However, the
106: depth increases to $O(n^2 \log n)$. This is the same asymptotic
107: depth achieved by Van Meter~\cite{V}, but we require only linear width.
108:
109: \section{Preliminaries}
110: \label{prelim-sec}
111:
112: Our goal is to compute $w = g^e \bmod m$. Here $g$ and $m$
113: are $n$-bit constants, known to the classical compiler that builds
114: our circuit. The $2n$-bit exponent $e$ is in quantum
115: memory.\footnote{More generally, $e$ has length $\alpha n$, and the
116: error rate of the algorithm depends on $\alpha$. For simplicity
117: we take $\alpha = 2$.} Using
118: a standard trick (see, for example,~\cite{Beau}),
119: we can assume that only one bit of
120: $e$ at a time is stored in our quantum computer.
121:
122: Writing $e = \sum 2^i e_i$, we have
123: $$
124: w = \left(\prod_i (g^{2^i} \bmod m)^{e_i}\right) \bmod m.
125: $$
126: That is, we can decompose our exponentiation into $2n$ controlled
127: multiplications. In each case we multiply by $1$ if the controlling
128: bit $e_i$ is $0$, and by a constant if $e_i$ is 1.
129:
130: In Section~\ref{prelim-mod-mult-sec}, we describe how we reduce
131: controlled modular multiplication to (roughly) $n$ controlled
132: additions. In Section~\ref{prelim-transform-adder-sec}, we describe
133: the addition routine we will use.
134:
135: We refer the reader to Fowler et al.~\cite{FDH} for
136: useful building blocks for nearest-neighbor circuits. We will use
137: their ``mesh'' circuit for interleaving two registers. We will
138: not use their controlled swap; instead, in Section~\ref{prelim-pseudo-sec}
139: we describe a simpler controlled swap for the case when one register is
140: known to be $0$.
141:
142: \subsection{Approximate Modular Multiplication}
143: \label{prelim-mod-mult-sec}
144:
145: We now present a scheme of Zalka~\cite{Zalka} for performing
146: controlled modular multiplication. We wish to compute
147: $$
148: r = abc \bmod m,
149: $$
150: where $a$ and $m$ are $n$-bit constants, $b = \sum_i 2^i b_i$ is
151: in $n$ bits of quantum memory, and $c$ is a control bit.
152: We can write
153: $$
154: r \equiv abc \equiv \sum_i 2^i a b_i c \equiv \sum_i (b_i c) \left(2^i a \bmod m\right) \pmod m.
155: $$
156: We can view this as repeated controlled modular addition; the
157: numbers $x_i = 2^i a \bmod m$ are known at compile-time, and
158: we have $n$ control bits $y_i = b_i c$.
159:
160: We define the partial sum
161: $$
162: s = \sum_i y_i x_i = r - qm.
163: $$
164: The sum $s$ is congruent to the answer $r \pmod m$. Also, since
165: $s < nm$, the quotient $q$ is at most $n$. In particular, we can
166: write down $q$ using only $\log_2 n$ bits.
167:
168: Zalka's key idea is to approximate the desired answer $r$ in two
169: parallel steps. First, we compute $s$ by repeated controlled addition into
170: an $n$-bit accumulator. Second,
171: we approximate $q$: We choose some $\ell_0 = O(\log n)$, and we
172: compute $\hat{q}$ using only the $\ell_0$ high
173: bits of each $x_i$. More precisely, let $\hat{x}_i = 2^{n-\ell_0}
174: \floor{x_i/2^{n-\ell_0}}$. Then $\hat{q} = \floor{(\sum y_i \hat{x}_i) / m}$.
175: We can easily compute $\hat{q}$ in depth $O(\log^2 n)$. With
176: high probability, $\hat{q} = q$.
177:
178: Once we have $\hat{q} = \sum_i 2^i \hat{q}_i$, subtracting $\hat{q}m$
179: from $s$ can be done with $\log_2 n$ additional controlled adds into
180: our accumulator (we subtract $2^i m$ controlled by $\hat{q}_i$).
181: Next, we must erase $\hat{q}$; again; this takes only $O(\log^2 n)$
182: depth. So, aside from a lower-order term, the cost of controlled
183: modular multiplication is about $n$ controlled additions,
184: or, equivalently, one controlled integer multiplication.
185:
186: There are other schemes that give modular multiplication circuits
187: at a cost of three times the cost of integer multiplication (see,
188: for example,~\cite{Dhem}). So it might seem that Zalka's idea would
189: save only a constant factor. However, Zalka's idea is conceptually
190: simpler; without it, we might not have found the linear-depth
191: multiplier of Section~\ref{main-sec}.
192:
193: \subsection{The Transform Adder}
194: \label{prelim-transform-adder-sec}
195:
196: Most quantum arithmetic circuits are essentially classical in nature.
197: Draper~\cite{Drap} has given an addition circuit that is inherently
198: quantum. We briefly describe this circuit, and then discuss how to
199: adapt it to the nearest-neighbor setting.
200:
201: Suppose we have an $n$-bit number register containing
202: $u = \sum_{j=0}^{n-1} u_j 2^j$. Then the {\QFT} maps $\qu{u}$ to
203: $$
204: \qu{\phi(u)} =
205: \frac{1}{2^{n/2}}\sum_{k=0}^{2^n - 1} e^{2 \pi i u k / 2^n} \qu{k}
206: = \bigotimes_{j=0}^{n-1} \qu{\phi_j(u)},
207: $$
208: where
209: $$
210: \phi_j(u) = {1 \over \sqrt{2}} \left(\qu{0} + e^{2 \pi i u / 2^{j+1}}\qu{1}
211: \right).
212: $$
213: Note that $\qu{\phi(u)}$ is an unentangled state.
214:
215: Suppose we want to add $v$ to $u$. We can
216: replace each bit $\phi_j(u)$ by $\phi_j(u + v)$; this is simply a
217: $Z$-rotation by an angle of $2 \pi v / 2^{j+1}$, so we can rotate each
218: bit independently. To perform controlled addition, each of these
219: rotations is controlled by a bit $c$. We can then perform an inverse
220: {\QFT} to change $\qu{\phi(u+v)}$ to $\qu{u+v}$.
221:
222: One way to view the {\QFT} is that we have moved the information about
223: $u$ into the phase of the qubits. To do a modular reduction and test
224: the high bit of $u$, we first need to perform an inverse {\QFT}.
225: So, for a naively designed modular exponentiation circuit, we perform
226: $\Theta(n^2)$ {\QFT}s and inverse {\QFT}s.
227: Our main result is
228: a circuit design with only $O(n)$ {\QFT}s.
229:
230: \begin{figure}[h]
231: \begin{center}
232: \input qft.tex
233: \end{center}
234: \caption{Quantum Fourier transform of a 4-bit register on a
235: nearest-neighbor machine. \textcircled{\scriptsize$j$}
236: %{\Large$\bigcirc$}\hspace{-11pt}$j$\hspace{5.5pt}
237: denotes a $Z$-rotation by
238: $2 \pi / 2^j$.}
239: \label{qft-fig}
240: \end{figure}
241:
242: Fowler et al.~\cite{FDH} give a nearest-neighbor
243: form of the {\QFT}. A 4-bit version is depicted
244: in Figure~\ref{qft-fig}. After each controlled rotation, we swap the
245: two bits involved, so every pair of bits can interact. (If we leave out
246: the swaps, we obtain the linear-depth {\QFT} of Moore and Nilsson~\cite{MN}.)
247: Note that we
248: assign unit cost to the controlled rotation together with the
249: accompanying swap.
250:
251: The size of this {\QFT} circuit is $n^2/2 + O(n)$. We may be able to
252: approximate the {\QFT} and skip some of the small rotations. On a
253: general machine, this reduces the size to $O(n \log n)$, but on a
254: nearest-neighbor machine we still have to perform $n \choose 2$
255: swaps.
256:
257: \subsection{Pseudo-Toffolis and Controlled Swaps}
258: \label{prelim-pseudo-sec}
259:
260: \begin{figure}[h]
261: \begin{center}
262: \input pseudo.tex
263: \end{center}
264: \caption{Pseudo-Toffoli gate $v \xoreq uw$. We also change
265: the phase when $\qu{uvw} = \qu{011}$.}
266: \label{pseudo-fig}
267: \end{figure}
268:
269: A frequent useful building block for our circuit is a {\em Toffoli\/}
270: gate, or doubly-controlled not: $v \xoreq uw$. A cascade of
271: Toffoli gates through a $k$-bit register has depth $2k$. However,
272: if we use the ``pseudo-Toffoli'' gate of Figure~\ref{pseudo-fig},
273: the depth of the cascade can be reduced to $k$.
274: See~\cite{BBCDMSSSW} for an equivalent pseudo-Toffoli gate.
275:
276: The idea of Figure~\ref{pseudo-fig} is that we correctly set $v$ to
277: $v \xor uw$, but we change the phase when $\qu{uvw} = \qu{011}$.
278: Normally this would be an unacceptable side effect, but there are
279: two cases where we are okay: First, we may plan to undo this
280: computation and fix the phase later. Second, we may know that the
281: problem input is forbidden for some reason.
282:
283: \begin{figure}
284: \begin{center}
285: \input pseudo-cascade.tex
286: \end{center}
287: \caption{Swap of 4-bit registers $X$ and $Y$ controlled by $c$
288: in depth $10$. We assume that $Y$ is initialized to $0$.}
289: \label{pseudo-cascade-fig}
290: \end{figure}
291:
292: For example, suppose we want to swap two $n$-bit registers
293: $X$ and $Y$ controlled by a bit $c$. Suppose further that $Y$ is
294: initialized to $0$. Then we can build a pseudo-Toffoli cascade
295: as in Figure~\ref{pseudo-cascade-fig}. Since each Toffoli target is
296: known to be $0$, there will be no phase shift. The depth is $2n + 2$.
297:
298: \section{Nested Adds}
299: \label{main-sec}
300:
301: We now describe our main result, the ``nested adds'' multiplier.
302: We begin by describing a controlled multiplier with linear width
303: and depth; we then explain how to modify it to be a modular multiplier.
304: We conclude with an exponentiation circuit with linear width and
305: quadratic depth.
306:
307: \subsection{Nested Controlled Addition}
308: \label{main-add-sec}
309:
310: As noted in Section~\ref{prelim-mod-mult-sec}, we can view
311: controlled multiplication as repeated controlled addition.
312: In this section, we build a repeated controlled adder.
313: We have an $n$-bit
314: register $Z$, initialized to some value $z$, and an $n$-bit
315: register $Y$ of control bits $y_i$. When the circuit concludes,
316: we want $Z$ to contain $$\left(z + \sum_i x_i y_i\right) \bmod 2^n,$$ where
317: the values $x_i$ are $n$-bit constants. In the next section, we
318: will convert this circuit to a modular multiplier.
319:
320: It is clear that $n$-bit addition controlled by a single bit $y_i$
321: requires linear depth on a nearest-neighbor machine; the control
322: bit can affect all $n$ bits of $Z$, so we need linear time to
323: move (or pseudocopy) it from one end to the other. One might at
324: first think that performing $n$ controlled additions would require
325: quadratic depth. However, if we use the transform adder, we can
326: nest the additions.
327:
328: \begin{figure}[h!]
329: \begin{center}
330: \input nested.pst
331: \end{center}
332: \caption{Schematic for the ``nested adds'' repeated controlled adder.}
333: \label{nested-fig}
334: \end{figure}
335:
336: The basic structure of the circuit is depicted in Figure~\ref{nested-fig}.
337: We begin by performing the {\QFT} on $Z$, in depth $2n-3$.
338: Next, we take each bit of $Y$ successively and swap it with each
339: bit of $Z$. As we swap $Y_i$ with $Z_j$, we also rotate $Z_j$
340: controlled by $Y_i$; the rotation amount depends on $x_i$. The idea
341: is that we are adding in $x_i$ by rotating each bit of $Z$ by the
342: proper amount; all of these rotations commute, so the order is
343: unimportant. This portion has depth $2n - 1$; when it concludes,
344: we have effectively swapped the $Z$ and $Y$ registers.
345:
346: Next, we perform the inverse \QFT on $Z$. This again has depth $2n-3$.
347: Finally, we move $Y$ back to where it started in depth $2n - 1$.
348:
349: As described, the total depth would be $8n - 8$. However, as shown
350: in Figure~\ref{nested-fig}, the inverse \QFT nests nicely with the
351: swaps with $Y$. We can start the inverse \QFT at time $3n - 5$, and
352: we can start the final swaps at time $4n-2$. The total depth is only
353: $6n - 4$.
354:
355: If we can assume $z$ is a constant, then we can replace the initial {\QFT}
356: with a single time-slice of $n$ unitary transformations\footnote{For
357: example, when $z = 0$, we apply a Hadamard to each qubit of $Z$.} on $Z$.
358: The depth is reduced to $4n - 1$. See Section~\ref{main-error-sec} for
359: the reasons
360: why we might want to allow nonzero $z$. For the remainder of this paper,
361: we will assume that $z$ is a constant, and that we can skip the initial {\QFT}.
362:
363: \subsection{Nested Controlled Modular Addition}
364: \label{main-mod-mult-sec}
365:
366: To turn the above circuit into a modular multiplier, we follow the
367: procedure described in Section~\ref{prelim-mod-mult-sec}. We
368: compute the sum $s = \sum_i y_i x_i$ congruent to the desired
369: answer $r$ modulo $m$. (Since we know our final answer has $n$ bits, we
370: need only compute the low $n$ bits of $s$.)
371: Simultaneously, we compute the approximate
372: quotient $\hat{q}$. We then subtract $\hat{q}m$ from our main register.
373: Finally, we erase $\hat{q}$.
374:
375: We compute $\hat{q}$ in an $\ell$-bit register $Q$, which we
376: locate between $Y$ and $Z$. We take $\ell = \ell_0 + \log_2 n$, so
377: we have room to write the $(n + \log_2 n)$-bit sum $\sum_i y_i \hat{x}_i$
378: (which has $0$ in the low-order $n - \ell_0$ bits).
379:
380: We need to initialize the low $\ell_0$ bits of $Q$. If we have
381: nonconstant data in $Z$, we could pseudocopy
382: $\ell_0$ bits of it to $Q$; this is
383: not expensive, but it might be costly to erase $Z$ when we are done.
384: In our case, we will initialize $Z$ to a constant $z$, and $Q$
385: to the high-order $\ell_0$ bits of $z$.
386:
387: We pass the bits of $Y$ past $Q$ and
388: then $Z$. We compute the high bits of $z + \sum_i y_i \hat{x}_i$ in $Q$,
389: and we compute $z + \sum y_i x_i \bmod 2^n$ in $Z$.
390:
391: As soon as the last $y_i$ bit has passed through $Q$, we compute $\hat{q}$.
392: For $k = \log_2 n$ down to $1$, we first subtract $2^{k-1} m$ from
393: $Q$ by doing a unary rotation on each bit. Next, we do an inverse \QFT in
394: depth at most $2\ell-1$;
395: the top bit of $Q$ is now a control bit indicating whether
396: we should have subtracted $2^{k-1} m$ or not. We label that bit $\hat{q}_k$
397: and think of it as no longer part of $Q$. We now do a \QFT on the
398: remaining bits of $Q$, and then move $\hat{q}_k$ through $Q$; this adds
399: $2^{k-1}m$ back if necessary, and also positions $\hat{q}_k$ to go through
400: $Q$.
401:
402: At step $k$, we perform an inverse \QFT on $\ell_0 + k$ bits and
403: a \QFT on $\ell_0 + k - 1$ bits, and then we move $\hat{q}_k$ through $Q$.
404: The depth is $4(\ell_0 + k) - 3$. The total depth, summing from
405: $k = 1$ to $\log_2 n$, is
406: \begin{equation}
407: \label{q-time}
408: 2\ell^2 - 2\ell_0^2 + O(1) = 2 (2\ell - \log_2 n) \log_2 n + O(1).
409: \end{equation}
410:
411: We use the $\hat{q}_k$ bits as control bits, subtracting $2^k m$ as
412: needed from $s$. When we are done, the answer $r$ is in $Z$. When we
413: pass the $\hat{q}_k$ bits back up, we again take time given
414: by~\eqref{q-time} to uncompute $\hat{q}$. (Alternatively, we could
415: move all of $Q$ past $Z$ and then uncompute $\hat{q}$.)
416:
417: We subtract $z$ from $Z$ after computing $r$. See
418: Section~\ref{main-error-sec} for details.
419:
420: The total circuit depth for repeated controlled addition is
421: $$
422: 4n + 4 (2\ell - \log_2 n) \log_2 n + O(\log n).
423: $$
424: The width is $2n + \ell + O(1)$.
425:
426: \subsection{Controlled Modular Multiplication}
427: \label{main-control-sec}
428:
429: So far, we have assumed that the $n$ control bits are present at the
430: start of the computation. To complete our modular multiplier, we need
431: to explain how to start from the multiplicand $b$ and
432: the overall control bit $c$ and produce the control bits $y_i = b_i c$.
433: Also, since we want an in-place multiplier, we need to explain how to
434: erase $b$ when we are done (if $c=1$).
435:
436: \begin{sidewaysfigure}
437: \begin{center}
438: \input mult.pst
439: \end{center}
440: \caption{Schematic for the ``nested adds'' controlled in-place
441: modular multiplier.}
442: \label{nested-mod-mult-fig}
443: \end{sidewaysfigure}
444:
445: It is easy to perform the desired steps in linear depth, given
446: the linear-depth out-of-place modular multiplication
447: circuit described above. The challenging part is to keep the
448: depth as low as possible. Our solution has depth
449: $$
450: 11n + 6 (2\ell - \log_2 n) \log_2 n + O(\log n),
451: $$
452: width
453: $$
454: 3n + 2\ell + 1,
455: $$
456: and size
457: $$
458: 5n^2 + O(n \log n),
459: $$
460: and is depicted in Figure~\ref{nested-mod-mult-fig}. We briefly
461: describe the basic features of the circuit.
462:
463: We have three $n$-bit registers (labeled $B$, $Y$, and $Z$),
464: two $\ell$-bit registers (labeled $Q_Y$ and $Q_Z$), and one
465: control bit $c$. Initially $B$ contains $b$ and the other
466: four registers contain $0$. When the circuit concludes,
467: $B$ contains $b$ (when $c=0$) or $ab$ (when $c=1$) and the
468: other four registers contain $0$.
469:
470: To start, we have $Q_Y$, then $B$ and $Y$ interleaved (i.e.,
471: we have $B_0$, $Y_0$, $B_1$, $Y_1$, \dots, $B_{n-1}$, $Y_{n-1}$),
472: and then $c$, $Q_Z$, and $Z$. When the circuit completes,
473: we have $Y$, then $Q_Y$, then $B$ interleaved with $Z$, then
474: $c$, and finally $Q_Z$. So, except for the location of $c$,
475: the bits have been flipped upside-down. (See
476: Section~\ref{main-exp-sec} for the reason we end with $c$ in a
477: different place.)
478:
479: We first move $c$ through the interleaved $B$ and $Y$,
480: performing controlled swaps. If the contents of $B$ and $Y$ were
481: wholly general, this process would have depth $4n$, but because
482: we know $Y$ contains $0$ we can use pseudo-Toffolis (see
483: Section~\ref{prelim-pseudo-sec}), and the depth is only $2n+2$.
484: After the controlled swaps, we unmesh $B$ and $Y$.
485:
486: Next, we multiply $Y$ by $a$ and write the result to $Z$.
487: These gates are depicted in blue in Figure~\ref{nested-mod-mult-fig}.
488: We use $Q_Z$ as a scratch register for computing $\hat{q}$. We
489: load a constant $z$ into $Z$ (and its high bits into $Q_Z$), then
490: we perform the circuit described in the previous section, and
491: finally we erase $Q_Z$ and unload the constant $z$. When this
492: portion concludes, if $c = 0$, then $B$ contains $b$ and $Y$ and
493: $Z$ contain $0$. If $c = 1$, then $B$ contains $0$, $Y$ contains
494: $b$, and $Z$ contains $ab$.
495:
496: We now perform the gates depicted in red in
497: Figure~\ref{nested-mod-mult-fig}. We undo a multiplication
498: of $Z$ by $a^{-1}$, writing the result into $Y$. The red circuit
499: is a backwards, upside-down version of the blue circuit. When
500: we are done, $Y$ contains $0$. If $c = 0$, then $B$
501: contains $b$ and $Z$ contains $0$; if $c = 1$, then $B$
502: contains $0$ and $Z$ contains $ab$.
503:
504: Finally, we mesh $B$ and $Z$ and perform the controlled swap
505: in reverse. (Again, we can use pseudo-Toffolis to reduce the
506: depth to $2n+2$.) We write $b$ or $ab$ to $B$, and we write $0$
507: to $Z$, as desired.
508:
509: Note that part of the red circuit overlaps part of the blue
510: circuit. In particular, we uncompute the first $\hat{q}$
511: while computing the second. This is why the second-order
512: term in the depth is $6 (2\ell -\log_2 n)\log_2 n$ rather than
513: $8 (2\ell -\log_2 n)\log_2 n$.
514:
515: We must swap $B$ and $Y$ before we can interleave
516: $B$ and $Z$. If our bits were arranged in a ring, we could
517: bring $B$ around from the other side; this would reduce
518: the depth by about $n$ and the size by about $n^2$. One
519: could construct a more symmetric version of
520: Figure~\ref{nested-mod-mult-fig} by moving $B$ down to the bottom
521: between the blue and red portions, but this increases the
522: size by about $n^2$ without changing the depth.
523:
524: \subsection{Exponentiation}
525: \label{main-exp-sec}
526:
527: We recall from Section~\ref{prelim-sec} that our goal is to
528: perform $2n$ controlled in-place modular multiplications. We
529: will repeatedly apply the circuit of Section~\ref{main-control-sec}.
530: Since that circuit leaves the machine ``upside-down,'' we alternate
531: between applying the circuit right-side-up and upside-down.
532:
533: Let $e_i$ denote the control bit in the $i$th round. We add one
534: additional bit to the circuit of Section~\ref{main-control-sec}.
535: Just before we start the swap of $B$ and $Z$ controlled by $e_i$, we
536: create our next control bit $e_{i+1}$. Then, as soon as we have
537: swapped two bits of the interleaved $B$ and $Z$ controlled by
538: $e_i$, we swap them again controlled by $e_{i+1}$ (viewing them
539: as $B$ and $Y$ for the next round). We can thus overlap these
540: two controlled swaps; we reduce the depth of each round to only
541: $9n + O(\log^2 n)$.
542:
543: There may be a technicality here because of the order in which we
544: perform measurements. After we are done using $e_i$, we measure
545: it, and we may need to rotate $e_{i+1}$ based on the observed
546: value of $e_i$. We will assume that this is not a problem in
547: practice. If necessary, we could generate $\Theta(\sqrt{n})$
548: control bits at a time and use them; we would still have a
549: depth of roughly $9n$ and a width of roughly $3n$.
550:
551: Our circuit has depth
552: \begin{gather*}
553: 18 n^2 + 12 n (2\ell -\log_2 n) \log_2 n + O(n \log n),
554: \intertext{width}
555: 3n + 2\ell + 2,
556: \intertext{and size}
557: 10 n^3 + O(n^2 \log n).
558: \end{gather*}
559: Here $\ell = O(\log n)$ is chosen to control
560: the error rate of our computation of $\hat{q}$. See the
561: next section for details.
562:
563: \subsection{Error Analysis}
564: \label{main-error-sec}
565:
566: In this section we address two questions. First, how should
567: we choose $\ell$? Second, how does filling $Z$
568: with a random value $z$ improve our error analysis?
569:
570: We perform $4n$ modular multiplications. For each of these, we
571: add $n$ quantities to compute $\hat{q}$. There are thus $4n^2$
572: additions where we might make a mistake. Given random addends,
573: the probability of an error propagating across a window of length
574: $\ell_0$ is $2^{-\ell_0}$. Our probability of making an error
575: is therefore at most
576: $$
577: 4n^2 2^{-\ell_0} = 2^{2 \log_2 n + 2 - \ell_0}.
578: $$
579: To reduce our error probability to a constant, we should take
580: $\ell_0 = 2 \log_2 n + O(1)$, or
581: $$\ell = \ell_0 + \log_2 n = 3 \log_2 n + O(1).$$
582:
583: What does an error rate of $\epsilon$ mean in the quantum setting?
584: Instead of attaining the desired state $\qu{\phi}$, we attain a
585: state $\qu{\phihat} = \alpha \qu{\phi} + \eta \qu{\psi}$, where
586: the error state $\qu{\psi}$ is orthogonal to
587: $\qu{\phi}$ and $|\eta|^2 \le \epsilon$.
588: A standard calculation yields that the distance between the probability
589: distributions on measurements for
590: $\qu{\phi}$ and $\qu{\phihat}$ is at most $\epsilon$.
591: Note that an error may mean that we fail to erase scratch space
592: correctly, invalidating future rounds, but this is irrelevant to
593: the analysis.
594:
595: The assumption above of ``random addends'' may not be reasonable.
596: Zalka~\cite{Zalka} discusses this problem: citing a ``private
597: objection'' by Manny Knill, Zalka writes that ``mathematically (and
598: therefore very cautiously) inclined people have questioned the
599: validity of this assumption.'' Our solution is to fill our
600: register with a random constant $z$. (We can use the same $z$ each
601: time, or we can choose a different one for each multiplication.)
602: The expected probability of an error in computing $\hat{q}$ over
603: all our choices of $z$ is the desired $\epsilon$.
604:
605: However, the constant $z$ introduces another place where errors can
606: occur. When we subtract $z$ at the end, we do not perform a modular
607: subtraction. If we ensure $z < m/2^t$, the probability of an error
608: at some point is $4n 2^{-t}$. We therefore take $t = \log_2 n +
609: O(1)$. Note that this increases $\ell_0$ to $3 \log_2 n + O(1)$
610: and $\ell$ to $4 \log_2 n + O(1)$.
611:
612: \section{A Classical Version}
613: \label{classical-sec}
614:
615: The circuit of this paper requires numerous small controlled rotations.
616: We now show that a variant of these ideas gives a reversible classical
617: approximate exponentiation circuit with depth $O(n^2 \log n)$ and
618: size $O(n^3)$.
619:
620: We still organize exponentiation as repeated multiplication and
621: multiplication as repeated addition. On a general architecture, we
622: can attain depth $O(n^2 \log n)$ using a logarithmic-depth
623: adder~\cite{\DKRS}. On a nearest-neighbor machine, we cannot
624: perform controlled addition in sublinear depth. As in our main
625: construction, we nest different controlled additions to obtain an
626: amortized depth of $O(\log n)$ per addition.
627:
628: We return to the setting of Section~\ref{main-add-sec}. We
629: have an $n$-bit register $Z$ (initialized to some value $z$) and
630: an $n$-bit register $Y$. We wish to write to $Z$ the quantity
631: $z + \sum_i x_i y_i \bmod 2^n$; here the $y_i$s are bits of $y$ and
632: the $x_i$s are $n$-bit constants.
633:
634: We follow the general structure of Figure~\ref{nested-fig}. Since
635: we wish to build a classical circuit, we no longer perform any
636: {\QFT}s. Instead, we choose some $t = O(\log n)$, and we write
637: $k = \ceil{n/t}$. We divide $Z$ into $k$ blocks of size $t$; each
638: ``wire'' of $Z$ in Figure~\ref{nested-fig} represents a single block
639: $Z^j$. (Each wire of $Y$ is still a single bit $y_i$.) We also
640: divide each $x_i$ into blocks $X_i^j$ of length $t$.
641:
642: We divide this portion of the circuit into $n+k-1$ rounds. In
643: round $r$, $y_{r-j}$ crosses $Z_j$ for all $j$ (as long as $0 \le j < k$
644: and $0 \le r-j < n$). At this time, we add the number
645: $$
646: A_r = \sum_j y_{r-j} X_{r-j}^j 2^{t(j-1)}
647: $$
648: into $Z$. Note that
649: $$\sum_{r=0}^{n+k-1} A_r = \sum_{i=0}^{n-1} x_i y_i$$
650: as desired. Also note that, in round $r$, the control bit
651: $y_{r-j}$ controlling the $j$th block of $A_r$ is next to $Z_j$ in
652: memory.
653:
654: To add $A_r$ into $Z$, we first do $k$ parallel controlled adds, one
655: for each block. We erase our work, but we write down the high bit
656: $h_j$ for each block. We hope that we correctly compute each $h_j$;
657: this requires that no carry propagate through an entire block.
658:
659: Next, we again do $k$ parallel controlled adds, but this time, for
660: the $j$th block, we use $h_{j-1}$ as an incoming carry bit. If
661: the $h_j$ bits are all correct, we correctly add $A_r$ into $Z$.
662:
663: Finally, we erase the $h_j$ bits. We compare $Z_j$ with
664: $y_{r-j} X_{r-j}^j$ to determine if an overflow occurred; if so,
665: $h_j$ must have been $1$. We then exchange each $y_{r-j}$ bit with
666: $Z_j$ to move the control bits into position for the next round.
667:
668: Each of these steps can be performed with a ripple-carry
669: adder~\cite{\CDKM}; the depth is $Ct$ for a small constant $C$. We need $2k$
670: extra bits:\ the high bits $h_j$ and one scratch bit for each
671: ripple.\footnote{We cannot use the ripple-carry adder of Takahashi
672: and Kunihiro~\cite{TK}. Their adder eliminates the scratch bit,
673: but it does not work on a nearest-neighbor machine.}
674:
675: To do modular multiplication, we use the same scheme as in our
676: main construction: we estimate $\hat{q}$ on the side. The error
677: analysis is the same. Note that we also perform $O(n^3)$
678: controlled additions of size $t$;
679: the probability that some $h_j$ bit is wrong at
680: some point is thus $O(n^3 2^{-t})$. We choose $t = O(\log n)$ to
681: reduce this probability to a small constant.
682:
683: We can use the pseudo-Toffoli
684: gates described in Section~\ref{prelim-pseudo-sec} to reduce the
685: depth. It is interesting to note that, for the ripple-carry adder,
686: we do not perform exactly the same gates when we undo the computation,
687: but the ``bad'' case for the pseudo-Toffoli happens on the forward
688: ripple if and only if it happens on the reverse ripple, so we fix
689: our phase errors correctly.
690:
691: The circuit depth is $O(n^2 \log n)$. The exact constant depends
692: on the choice of $\ell$ and $t$ and on precisely how we
693: perform the ripple-carry additions.
694:
695: \section{General Architectures}
696: \label{general-sec}
697:
698: The ``nested adds'' multiplier of Section~\ref{main-sec} can be
699: simplified in several ways if implemented on a machine without
700: a nearest-neighbor restriction:
701: \begin{itemize}
702: \item The controlled swaps at the start and end of the multiplier can
703: be performed in logarithmic depth. We fan the control bit $c$ out into
704: an empty $n$-bit register, perform $n$ parallel swaps, and fan $c$
705: back in. Note that we always have an empty $n$-bit register available.
706: \item The mesh and unmesh operations and any register swaps (all in
707: black in Figure~\ref{nested-mod-mult-fig}) are unnecessary. This
708: reduces the depth by about $n$ and the size by about $2n^2$.
709: \item The {\QFT} and inverse {\QFT} can be approximated. This does
710: not improve the depth, but the size of each decreases from
711: about $n^2 / 2$ to $O(n \log n)$.
712: \end{itemize}
713:
714: With these changes, the modular multiplier has depth $6n +
715: 6 (2\ell -\log_2 n)\log_2 n
716: + O(\log n)$, width $3n + 2\ell + 1$, and size $2n^2 + O(n \log n)$.
717: Taking $\ell = 3 \log_2 n + O(1)$ as in Section~\ref{main-error-sec},
718: we get an exponentiation circuit with depth
719: \begin{gather*}
720: 12n^2 + 60 n \log_2^2 n + O(n \log n),
721: \intertext{width}
722: 3n + 6 \log_2 n + O(1),
723: \intertext{and size}
724: 4n^3 + O(n^2 \log n).
725: \end{gather*}
726:
727: We could further reduce the depth by using a parallel version of
728: the {\QFT}~\cite{CW}, but each multiply would still have depth at least
729: $5n + O(\log^2 n)$.
730: We could also consolidate the registers $Q_Y$ and $Q_Z$; we would
731: get a slight increase in depth and a slight decrease in width.
732:
733: \section*{Acknowledgements}
734: The author thanks Bob Beals, Tom Draper, and David Moulton for
735: numerous discussions.
736:
737: \bibliography{nn}
738: \bibliographystyle{alpha}
739:
740: \end{document}
741:
742:
743:
744: