1: %
2: % $Id: en3.tex,v 1.4 2004/02/04 04:50:07 shouno Exp shouno $
3: %
4: %\documentclass[11pt]{jarticle}
5: %\documentclass[11pt,twocolumn,dvipdfm]{article}
6: \documentclass[amsmath,amssymb]{revtex4}
7:
8: %\documentclass[12pt]{article}
9: %\usepackage{multicol}
10: %\usepackage{amsmath}
11: %\usepackage{amssymb}
12: %\usepackage{times}
13: %\usepackage{txfonts}
14: %
15: %\usepackage[dvips]{graphicx}
16: %\usepackage[dvipdfm]{hyperref}
17:
18: %\pagestyle{empty} %%%% No page Numbering
19:
20: %\usepackage{shouno}
21: %\usepackage{times}
22: \usepackage{graphicx}
23: \usepackage{bm}
24:
25:
26:
27: %\author{
28: %H. Shouno \\
29: %\and
30: %S. Kido \\
31: %\and
32: %M. Okada\\
33: %}
34:
35:
36: \setlength{\textheight}{9.0in}
37: \setlength{\columnsep}{0.375in}
38: \setlength{\textwidth}{6.5in} %%% Preset settings
39: %\setlength{\footheight}{0.0in}
40: \setlength{\topmargin}{-0.0625in}
41: \setlength{\headheight}{0.0in}
42: \setlength{\headsep}{0.0in}
43: \setlength{\oddsidemargin}{0.0in}
44: \setlength{\parindent}{1pc}
45: \renewcommand{\textfraction}{0.01}
46: %\renewcommand{\baselinestretch}{0.98}
47: %\renewcommand{\baselinestretch}{2.0}
48:
49: \def \tx{\tilde{x}}
50: \def \tc{\tilde{c}}
51: \def \tm{\tilde{m}}
52: \def \tM{\tilde{M}}
53: \def \tU{\tilde{U}}
54: \def \th{\tilde{h}}
55: \def \tz{\tilde{z}}
56: \def \tZ{\tilde{Z}}
57: \def \tr{\tilde{r}}
58: \def \tq{\tilde{q}}
59: \def \txi{\tilde{\xi}}
60: \def \tY{\tilde{Y}}
61: \def \tSigma{\tilde{\Sigma}}
62: \def \Prob{{\mathrm {Prob}}}
63: \def\Vec#1{\boldsymbol {{#1}}}
64:
65: \begin{document}
66: \sloppy
67:
68: \title{Analysis of Bidirectional Associative Memory using SCSNA and Statistical Neurodynamics}
69: \author{Hayaru Shouno}
70: \affiliation{Dept. of Computer Science and Systems Engineering, Faculty of Engineering, Yamaguchi University}
71: \email{shouno@ai.csse.yamaguchi-u.ac.jp}
72:
73: \author{Shoji Kido}
74: \affiliation{Dept. of Computer Science and Systems Engineering, Faculty of Engineering, Yamaguchi University}
75:
76: \author{Masato Okada}
77: \affiliation{Brain Research Institute, RIKEN}
78:
79:
80: \date{\today}
81:
82: %\maketitle
83: %\begin{verbatim}
84: %$Id: en3.tex,v 1.4 2004/02/04 04:50:07 shouno Exp shouno $
85: %\end{verbatim}
86:
87: %{{\bf Abstract}
88: %{
89: \begin{abstract}
90: Bidirectional associative memory (BAM) is a kind of
91: an artificial neural network used to memorize and retrieve heterogeneous pattern pairs.
92: %
93: %Unfortunately,
94: Many efforts have been made to improve BAM from the
95: %BAM has been mainly studied from
96: the viewpoint of computer application,
97: and
98: few theoretical studies have been done.
99: %
100: We investigated the theoretical characteristics of BAM using
101: a framework of statistical-mechanical analysis.
102: To investigate the equilibrium state of BAM,
103: we applied self-consistent signal to noise analysis (SCSNA) and
104: obtained a macroscopic parameter equations and relative capacity.
105: % of $0.199 N$.
106: %which means the relative number of pattern pairs to be memorized and
107: %retrieved to the number of neurons $N$,
108: %
109: %Moreover,
110: %
111: Moreover, to investigate not only the equilibrium state but also
112: the retrieval process of reaching the equilibrium state,
113: we applied statistical neurodynamics to the update rule of BAM
114: and obtained evolution equations for the macroscopic parameters.
115: These evolution equations are consistent with the results of
116: SCSNA in the equilibrium state.
117: \end{abstract}
118: %}\\
119: %{\it Keywords: }{{\small BAM, SCSNA, Statistical neurodynamics}}
120: %}
121: %\vspace{0.5cm}
122:
123: %\noindent
124: \maketitle
125: \section{Introduction}
126: Bi-directional associative memory (BAM) \cite{Kosko88} is a kind of
127: an associative memory model which is an artificial neural network.
128: The principle function of associative memory is
129: to memorize multiple patterns and to retrieve the correct one
130: when a pattern key is given.
131:
132: Autocorrelation associative memory (AAM),
133: sometimes called the Hopfield model \cite{Hopfield86},
134: is also a kind of associative memory.
135: AAM tries to retrieve a stored pattern when
136: a degraded pattern is given as an association key;
137: this type of retrieving is called homogeneous association.
138: %
139: In contrast, BAM stores multiple pattern pairs
140: and
141: tries to retrieve a complete stored pattern pair
142: when
143: a degraded piece of the pair is given as an association key.
144: % In association process, each component of a stored pattern pair
145: % drives BAM to retrieve themselves in a coordinate manner.
146: Thus, BAM is called a heterogeneous pattern association model.
147:
148:
149:
150:
151: %
152: %
153: In the field of neural networks,
154: many efforts have been made to improve BAM from the viewpoint of
155: computer application \cite{Hassoun89} \cite{Simpson90} \cite{Wang90}
156: \cite{Zhuang93} \cite{Oh94} \cite{Wang95} \cite{Wang96} \cite{Hongchi98},
157: and few theoretical analyses have been reported \cite{Haines88} \cite{Yanai91} \cite{Tanaka00}.
158: The theoretical analysis of BAM has evolved with a focus on storage capacity,
159: which means how many patterns can be stored in a network
160: consisted of $N$ neural units.
161: %
162: Yanai {\it et al.} suggested that
163: BAM can be regarded as a variation of AAM
164: in which connections are systematically reduced \cite{Yanai91}.
165: %
166: They also showed that the relative storage capacity,
167: in which a finite amount of retrieval error is allowed,
168: of BAM to be around $0.22 N$.
169: Haines \& Hecht-Nielsen analyzed BAM along the same way,
170: and reported its absolute capacity,
171: in which no retrieval error is allowed,
172: to be $O(N/\log N)$ \cite{Haines88}.
173: %
174: Tanaka {\it et al.} analyzed BAM using a replica method
175: (see \cite{FisherHertz91}),
176: which is a statistical-mechanical analysis method,
177: and showed its relative capacity to be
178: $0.1998 N$ \cite{Tanaka00}.
179: %
180: %
181: These analyses mainly focused on the equilibrium state of BAM,
182: and
183: the transient process of retrieving,
184: which means how to reach the equilibrium state,
185: was not so conducted.
186: However, analysis of the retrieval process is as important as
187: that of the equilibrium state.
188:
189: In this paper, we have analyzed the equilibrium state of BAM using
190: the self-consistent signal-to-noise analysis (SCSNA)\cite{Shiino92}.
191: %which is known as the cavity method
192: We found that the relative capacity was $0.1998N$,
193: which agrees with the result of Tanaka {\it et al}.
194: %
195: We also investigated the retrieval process of BAM;
196: we derived macroscopic dynamical equations using
197: the statistical neurodynamics,
198: which was theoretically derived in the same manner as the SCSNA
199: \cite{Amari88a} \cite{Okada95},
200: and compared the results of between the statistical neurodynamics with
201: those of computer simulation.
202: %
203: Applying the statistical neurodynamics to BAM,
204: we obtained the evolution equations for the macroscopic parameters.
205: In the limit of these evolution equations,
206: that is, macroscopic parameters of BAM reached the equilibrium state,
207: we found these values were consistent with the results of SCSNA.
208: We also compared the results of applying the statistical
209: neurodynamics with those of the computer simulation
210: and obtained quantitative support for our analysis.
211:
212:
213:
214: %
215: %As a result,
216: %we confirmed that
217: %each macroscopic property in the limit of the transition of our dynamics
218: %reaches
219: %agreement with the result of the equilibrium state,
220: %and
221: %we also showed that the dynamical behavior of all macroscopic
222: %properties derived from the dynamics are agree with
223: %the simulation quantitatively.
224:
225: We describe the BAM formulation in Section
226: \ref{sec:formulation},
227: and we apply the SCSNA and show the results of
228: equilibrium state analysis in Section \ref{sec:SCSNA}.
229: In Section \ref{sec:dynamics},
230: we derived the evolution equations of macroscopic parameters using
231: the statistical neurodynamics
232: and compared the results with those of computer
233: simulation in Section \ref{sec:compare}.
234:
235:
236: \section{Formulation}
237: \label{sec:formulation}
238: %
239: As shown in Fig. \ref{fig:bam},
240: BAM is a two-layered neural network model \cite{Kosko88}.
241: %
242: The first layer consists of $c N$ neural units ($c\sim O(1)$),
243: and the state of the layer is denoted as $\Vec{x}$
244: with the components denoted as $x_i\:\:(1\leq i \leq cN)$.
245: %
246: The state of the second layer, which has $\tc N$ units, is denoted as
247: $\Vec{\tx}$,
248: and the $j$th unit state is described as $\tx_j \:\:(1\leq j \leq\tc N)$.
249: Each layer is connected by interlayer connection $\Vec{J}$
250: with the components described as
251: $J_{ij} \:\:(1\leq i \leq cN, \: 1 \leq j \leq \tc N)$.
252: $J_{ij}$ represents the connection weight between
253: the first layer unit $x_i$ and
254: the second layer unit ${\tx}_j$.
255:
256:
257:
258: We prepare $p$ binary pattern pairs denoted as
259: $\{\Vec{\xi}^{\mu}, \Vec{\txi}^{\mu}\}$ $(\mu = 1, \cdots, p)$,
260: where the superscript $\mu$ denotes the pattern pair index.
261: %
262: Pattern vector $\Vec{\xi}^{\mu}$ corresponds to the first layer, and
263: $\Vec{\txi}^\mu$ corresponds to the second layer.
264: Thus $\Vec{\xi}^{\mu}$ and $\Vec{\txi}^{\mu}$ have $c N$ and $\tc N$
265: components, respectively, and each component,
266: which is described as
267: $\xi^{\mu}_i$ and ${\txi}^{\mu}_j$
268: $(1 \leq i \leq c N, \:\: 1 \leq j \leq \tc N)$, respectively,
269: is generated
270: from uniform i.i.d.:
271: \begin{align}
272: \Prob[\xi_i^{\mu}=\pm 1] &= \frac{1}{2}, \label{eq:pat1}\\
273: \Prob[\txi_j^{\mu} \pm 1] &= \frac{1}{2} \label{eq:pat2}.
274: \end{align}
275: %
276: Assuming the number of stored pattern pairs to be $p\sim O(N)$,
277: we define a quantity $\alpha \left( = \frac{p}{N} \right)$,
278: and use it for the loading rate.
279: %
280: %The parameter $\alpha (0 \sim 1)$ controls the amount of pattern pairs
281: %to be stored in, so that $\alpha$ describes a capacity index.
282: %
283:
284:
285: To determine the interlayer connection weight $J_{ij}$, which connects
286: $x_i$ and $\tx_j$, we use a correlation-based learning rule:
287: %
288: \begin{equation}
289: J_{ij} = \frac{1}{N} \sum_{\mu=1}^{\alpha N} \xi^{\mu}_{i} \txi^{\mu}_{j}.
290: \label{eq3}
291: \end{equation}
292: %
293: %$\Vec{\xi}^{\mu}, \Vec{\txi}^{\mu} (\mu = 1, \cdots, \alpha N)$ are
294: %pattern pairs for association
295: %and the superscripts $\mu$ denotes the pattern pair index.
296: %
297: All the pattern pair correlations between $\Vec{\xi}^{\mu}$ and
298: $\Vec{\txi}^{\mu}$ are embedded in connection weight $\Vec{J}$.
299: In this notation,
300: the connections are not symmetrical, that is $J_{ij} \neq J_{ji}$.
301:
302:
303:
304:
305: In the retrieving,
306: we use a synchronous update rule for each layer;
307: that is, all units in each layer are updated synchronously,
308: and these layers are updated alternately.
309: %
310: The rules for updating the $i$th unit in the first layer
311: and the $j$th unit in the second layer are
312: %
313: \begin{align}
314: x_{i}^{2t} &= F( \sum_{j=1}^{\tc N} J_{ij} \tx_{j}^{2t-1} ), %\qquad{\mathrm{and}}
315: \label{eq:dynamics1} \\
316: \tx_{j}^{2t+1} &= F( \sum_{i=1}^{cN} J_{ij} x_{i}^{2t} )
317: \label{eq:dynamics2},
318: \end{align}
319: %
320: where $t$ means the one step time and $F(\cdot)$ means the output function.
321: %, sometimes represented by
322: %a sigmoid function such as $\tanh(\cdot)$ which is used in our simulation.
323: %We assumed $F(\cdot)$ as the differentiable function in our analysis.
324: %
325: In these formulations, the retrieval process is carried out as follows.
326: In a initial state, $t=0$,
327: Association key $\Vec{x}^{0} (= \{ x_i^{0} \})$ is given to the first layer.
328: Then,
329: all of the second layer units, $\tx_{j}^{1} \: (1\leq j \leq \tc N)$,
330: are updated using eq. (\ref{eq:dynamics2}), and
331: the state of the second layer is described as $\Vec{\tx}^{1}$.
332: Next, $t=1$, all the units in the first layer, $x_{i}^{2} \: (1\leq i \leq c N)$,
333: are updated using eq.(\ref{eq:dynamics1}),
334: and the state is described as $\Vec{x}^{2}$.
335: After that the second layer is updated by eq.(\ref{eq:dynamics2}).
336: For each $t = 2, 3, \cdots$, the alternate updating of each layer
337: are carried out in the same way,
338: and each layer state is denoted as $\Vec{x}^{2t}$ and $\Vec{\tx}^{2t+1}$.
339: %for all $1\leq i \leq cN$.
340: %In the next step $2t-1=1$, the whole units in the second layer
341: %$\tx_{j}^{1}$ are updated by equation (\ref{eq:dynamics2}).
342: %Then, the whole units in the first layer are updated by equation
343: %This alternate update is a characteristic of BAM.
344:
345:
346: To apply S/N analysis, we introduce overlaps, which means similarities between patterns.
347: The overlaps between first layer state $\Vec{x}^{2t}$
348: and
349: the $\mu$th pattern, $\Vec{\xi}^{\mu}$,
350: and
351: between second-layer state $\Vec{\tx}^{2t-1}$ and
352: $\Vec{\txi}^{\mu}$
353: are described as follows, respectively:
354: \begin{align}
355: m^{2t}_{\mu} &= \frac{1}{cN} \sum_{i=1}^{cN} x^{2t}_{i} \xi^{\mu}_{i}, \\
356: \tm^{2t+1}_{\mu} &= \frac{1}{\tc N} \sum_{j=1}^{\tc N} \tx^{2t+1}_{j} \txi^{\mu}_{j}.
357: \end{align}
358: %
359: Following the S/N analysis,
360: we decomposed the inner term of $F(\cdot)$ in eqs.(\ref{eq:dynamics1}) and (\ref{eq:dynamics2})
361: into signal and noise components.
362: Assuming the first pattern pair, $\{ \Vec{\xi}^1, \Vec{\txi}^1 \}$,
363: is retrieved,
364: the terms including overlaps $m_1$ and $\tm_1$,
365: %which mean how well the first pattern pair is retrieved
366: are signal components, {\it i.e.} $m_1$, $\tm_1$ $\sim O(1)$.
367: %
368: Using these overlaps,
369: eqs.(\ref{eq:dynamics1}) and (\ref{eq:dynamics2}) can be described as
370: \begin{align}
371: x^{2t}_{i} &= F( \tc \tm^{2t-1}_1 \xi^{1}_{i} + z^{2t-1}_i ),
372: \label{eq:ov_update1}\\
373: \tx^{2t+1}_{j} &= F( c m^{2t}_1 \txi^{1}_j + \tz^{2t}_j ),
374: \label{eq:ov_update2}
375: \end{align}
376: where $z^{2t-1}_i$, and $\tz^{2t}_j$ are called as crosstalk noises,
377: %which are the effects from other pattern pairs,
378: %$\{ \Vec{\xi}^{\mu}, \Vec{\txi}^{\mu}\}$ $(\mu=2, \cdots, \alpha N)$.
379: which
380: prevents the target pair $\{ \Vec{\xi}^1, \Vec{\txi}^1\}$ to be retrieved.
381: These crosstalk noises are denoted
382: \begin{align}
383: z^{2t-1}_i &= \frac{1}{N} \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}
384: \xi^{\mu}_i \txi^{\mu}_j \tx^{2t-1}_j,
385: \label{eq:noise1}\\
386: \tz^{2t}_j &= \frac{1}{N} \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{cN}
387: \txi^{\mu}_j \xi^{\mu}_i x^{2t}_i.
388: \label{eq:noise2}
389: \end{align}
390:
391:
392:
393:
394:
395: \section{Equilibrium state analysis by SCSNA}
396: \label{sec:SCSNA}
397: To derive equilibrium state macroscopic parameters,
398: we use SCSNA\cite{Shiino92}, which is an extension of a naive
399: signal-to-noise (S/N) analysis.
400: Since the SCSNA treats the equilibrium states of an associative memory model,
401: we omit the index $t$ in the update rules.
402: %the time index corresponding to $t$ is negligible
403: %in the updating rules.% , eqs. (\ref{eq:ov_update1}) and (\ref{eq:ov_update2}).
404: %Therefore, each layer units satisfies
405: %\begin{align}
406: % x_{i} &= F( \sum_{j=1}^{\tc N} J_{ij} \tx_{j} ),
407: % \notag\\
408: % \tx_{j} &= F( \sum_{i=1}^{cN} J_{ij} x_{i} ),
409: % \label{eq:equilibrium1}
410: %\end{align}
411: %and we denote the vectors pair $\{
412: %\Vec{x} (= \{x_i\}),
413: %\Vec{\tx} (=\{\tx_j\})
414: % \}$
415: %as each layer's state in the equilibrium state.
416: %
417: Hence we can rewrite eqs. (\ref{eq:ov_update1}) and (\ref{eq:ov_update2}) as
418: \begin{align}
419: x_{i} &= F( \tc \tm_1 \xi^{1}_{i} + z_i ),
420: \label{eq:eq_update1}
421: \\
422: \tx_{j} &= F( c m_1 \txi^{1}_j + \tz_j ),
423: \label{eq:eq_update2}
424: \end{align}
425: respectively.
426: %
427:
428: %In a naive S/N analysis, each noise term, $z_i$ and $\tz_j$, is
429: %assumed to obey an independent identical Gaussian distribution
430: %w.r.t the site $i$ and $j$, respectively.
431: %regarded as an i.i.d random number.
432: %In contrast, the SCSNA treats $z_i$ and $\tz_j$ more precisely
433: %\cite{Shiino92}.
434: %
435: %
436: In the SCSNA, the crosstalk noise term is decomposed into
437: a systematic bias term and
438: a Gaussian noise term with $0$ mean\cite{Shiino92}.
439: %
440: The detailed formulas of SCSNA are described in appendix.
441: %The SCSNA evaluates infinitesimal effects came from a $\mu$th pattern pair
442: %$\{ \Vec{\xi}^{\mu}, \Vec{\txi}^{\mu} \}$,
443: %and the effects are appeared as the self-dependent components of $x_i$ and $\tx_j$.
444: %%The SCSNA \cite{Shiino92} evaluates the effective self-depend components
445: %%came from the $\nu$th pattern pair,
446: %%in the crosstalk noises (\ref{eq:noise}).
447: %Taking these effects into consideration, we could derive
448: We derive
449: self-consistent equations called order parameter equations.
450: The following are the order parameter equations of BAM.
451: %
452: \small
453: \begin{align}
454: Y &= F( \tc \tm \xi +
455: \frac{\alpha \tc \tU}{1-c\tc U\tU} Y +
456: \sqrt{\alpha r} z ), \label{eq:param_from}\\
457: \tY &= F( cm \txi +
458: \frac{\alpha c U}{1-c\tc U\tU} \tY +
459: \sqrt{\alpha \tr} z ), \\
460: m &= \int Dz \: \langle \xi Y \rangle_{\xi}, \\
461: \tm &= \int Dz \langle \txi \tY \rangle_{\txi}, \\
462: q &= \int Dz \langle Y^2 \rangle_{\xi}, \\
463: \tq &= \int Dz \langle \tY^2 \rangle_{\txi}, \\
464: U &= \frac{1}{\sqrt{\alpha r}}
465: \int Dz z \langle Y \rangle_{\xi}, \\
466: \tU &= \frac{1}{\sqrt{\alpha \tr}}
467: \int Dz z \langle \tY \rangle_{\txi}, \\
468: r &= \frac{\tc}{(1-c\tc U\tU)^2} (\tq + c\tc \tU^2 q),\\
469: \tr &= \frac{c}{(1-c\tc U\tU)^2} (q + c\tc U^2 \tq).
470: \label{eq:param_to}
471: \end{align}
472: \normalsize
473: These equations are described in the manner of Shiino and Fukai \cite{Shiino92}.
474: %
475: $Y$ and $\tY$ represents
476: %equilibrium states $x_i$ and $\tx_j$, respectively.
477: the effective outputs for $x_i$ and $\tx_j$, respectively.
478: %
479: The stochastic variables $\xi$ and $\txi$,
480: obeying eq.(\ref{eq:pat1}) and (\ref{eq:pat2}),
481: corresponds to a retrieving pattern components $\xi^1_i$ and $\txi^1_j$,
482: and order parameters $m$ and $\tm$ corresponds to overlaps $m_1$ and
483: $\tm_1$.
484: %
485: Note that the operators $\langle\cdot\rangle_{\xi}$ and
486: $\langle\cdot\rangle_{\txi}$ mean the
487: expectations for stochastic variables $\xi$ or $\txi$, respectively.
488: %
489: %
490: %These expectations come from the substitution of the averaging
491: %operation described as
492: %\small
493: %$
494: % \frac{1}{cN} \sum_{i=1}^{cN} \rightarrow \langle\cdot\rangle_{\xi},
495: %$
496: %\normalsize
497: %and
498: %\small
499: %$
500: % \frac{1}{\tc N} \sum_{j=1}^{\tc N} \rightarrow \langle\cdot\rangle_{\txi}.
501: %$
502: %\normalsize
503: %%The stored pattern $\Vec{\xi}^{\mu}$ and $\Vec{\txi}^{\mu}$
504: %%can be considered as to be the set of stochastic
505: %%variables which come from independent and identical distribution
506: %(i.i.d.).
507: %Thus, this substitution is reasonable and proper.
508: %
509: %In eqs. (\ref{eq:param1}),
510: Each arguments of the function $F(\cdot)$ consists of three parts.
511: The first terms, $\tc\tm\xi$ and $cm\txi$, come from the signal components,
512: the second terms, $\frac{\alpha \tc \tU}{1-c\tc U\tU} Y$ and
513: $\frac{\alpha c U}{1-c\tc U\tU} \tY$, mean the systematic bias of
514: the crosstalk noises ($z_i$, $\tz_j$) in eqs. (\ref{eq:noise1}) and
515: (\ref{eq:noise2}),
516: and
517: %each third term comes from the other crosstalk noise components.
518: each third term is assigned to be a Gaussian distribution with
519: $0$ mean and $\alpha r$ or $\alpha \tr$ variance.
520: %
521: %
522: %
523: %%
524: %Assuming each crosstalk noise except the self-dependent components
525: %, which we denote $z'_i$ and $\tz'_j$, respectively,
526: %follows an identical independent normal distribution,
527: %we can evaluate the effect of these noises with a Gaussian integral:
528: %\small
529: %$
530: % \int Dz = \frac{1}{\sqrt{2\pi}}\int dz \exp( -\frac{z^2}{2}).
531: %$
532: %\normalsize
533: %%
534: %Normal distribution is described with a mean and a variance,
535: %and each noise ($z'_i$, $\tz'_j$) follows $N(0, \alpha r)$ and
536: %%$N(0,\alpha\tr )$, respectively.
537: %%Thus the third terms can be substituted by $\sqrt {\alpha r} z$ and
538: %%$\sqrt{\alpha \tr} z$, respectively,
539: %%where the stochastic variable $z$ follows $N(0,1)$.
540: %%
541: %%
542: %%mean and the variance of each noise distribution can be derived from eqs. (\ref{eq:noise}).
543: %%Both noise distribution means are equal to $0$, and
544: %%the variances are equal to $\alpha r$ and $\alpha \tr$, respectively.
545: %%
546: %%
547: We solved the order parameter equations from (\ref{eq:param_from}) to
548: (\ref{eq:param_to}) numerically and compared the results with those of
549: simulations.
550: Fig. \ref{fig:fig1}. shows the equilibrium overlap $m$ against
551: the capacity parameter $\alpha$.
552: An overlap of $1$ means that
553: the BAM retrieves a stored pattern pair successfully.
554: We obtained a relative capacity, $\alpha_c$ of $0.1998$
555: in which the nontrivial solution $m \neq 0$ and $\tm \neq 0$ is disappeared.
556: This agrees with the results of Tanaka {\it et al.} ($\alpha_c =
557: 0.1998$),
558: obtained with the replica method \cite{Tanaka00}.
559: In fig.\ref{fig:fig1}, we show the simulation results as error-bars,
560: which mean medians and quartile deviations for ten trials.
561: The SCSNA results quantitatively explained the simulation results very well.
562:
563:
564:
565: \section{Retrieval process of BAM}
566: \label{sec:dynamics}
567: As we have seen, the SCSNA described the equilibrium state of BAM
568: quantitatively.
569: In this section, we consider a retrieval process of BAM, which means
570: the transient process reaching the equilibrium state.
571: %
572: The statistical neurodynamics,
573: which is a theory for the retrieval for associative memory model,
574: %an analysis method for transient process of neural networks,
575: is based on S/N analysis.
576: Amari and Maginu proposed a statistical neurodynamical theory on
577: the S/N analysis\cite{Amari88a},
578: %which assumes that each crosstalk noise obey to
579: %an identical independent Gaussian distribution, and
580: %this is called one-step analysis.
581: %%
582: %Using the SCSNA concept,
583: %Okada improved the one-step analysis in order to evaluate crosstalk noise
584: %correlation precisely.
585: %The Okada's analysis method succeeded to explain the transient process
586: %of several neural networks quantitatively\cite{Okada95}\cite{Kawamura02}.
587: %%
588: %
589: %
590: It was known that
591: the storage capacity obtained by Amari \& Maginu theory does not
592: coincide with the results of the replica theory\cite{Amit85b},
593: and
594: the size of the basin of attraction derived from Amari \& Maginu theory
595: is larger than the results of the computer simulation.
596: Okada extended the Amari \& Maginu theory to improve to resolve these
597: difficulties\cite{Okada95},
598: and obtained a macroscopic equation which has hierarchical structure.
599: In the macroscopic equation,
600: the first-order approximation corresponds to the Amari \& Maginu theory,
601: and
602: the higher order approximation coincide with the replica theory.
603: %
604: %
605: %We derive the evolution equations for the macroscopic parameters of BAM
606: %in the manner of Okada \cite{Okada95}.
607: % using
608: %the statistical neurodynamics \cite{Amari88a},
609: %the concept of which corresponds to that of the SCSNA , \cite{Okada95} \cite{Kawamura02}.
610: %
611:
612: For applying the statistical neurodynamics to BAM,
613: we evaluate the crosstalk noises (eqs.(\ref{eq:noise1}) and
614: (\ref{eq:noise2})) in eqs. (\ref{eq:ov_update1}) and
615: (\ref{eq:ov_update2}).
616: Assuming the first pattern pair, $\{\Vec{\xi}^1, \Vec{\txi}^1 \}$,
617: is retrieved,
618: we can regard the overlaps of other pattern pairs,
619: $\{m^{2t}_{\mu}, \tm^{2t+1}_{\mu}\}$ where $\mu \geq 2$,
620: as small.
621: Thus, we expand the state $x_i^{2t}$ and $\tx_j^{2t+1}$:
622: \begin{align}
623: x_i^{2t} &=
624: % F(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} )x_i^{2t,(\mu)}
625: x_i^{2t,(\mu)}
626: + \tc \tm^{2t-1}_\mu \xi_i^{\mu}
627: F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} ),
628: \label{eq:expand1}
629: \\
630: \tx_j^{2t+1} &=
631: % F(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )
632: % + c m^{2t}_\mu \txi_j^{\mu}
633: % F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )
634: % =
635: \tx_j^{2t+1, (\mu)}
636: + c m^{2t}_\mu \txi_j^{\mu}
637: F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} ),
638: \label{eq:expand2}
639: \end{align}
640: for $\mu \geq 2$, where
641: %$x_i^{2t,(\mu)}$ and $\tx_j^{2t+1, (\mu)}$ mean the value drawn the
642: %effect of $\mu$th pattern pair from $\tx_j^{2t+1}$ and
643: %$x_i^{2t}$, respectively,
644: %{\it i.e.}
645: $\tx_j^{2t+1, (\mu)} =
646: F(c \displaystyle \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )$
647: and
648: $x_i^{2t,(\mu)} =
649: F(\tc \displaystyle \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} )$.
650: %
651: %
652: Substituting eqs. (\ref{eq:expand1}) and (\ref{eq:expand2}) into
653: eqs.(\ref{eq:noise1}) and (\ref{eq:noise2}), we obtain,
654: \begin{align}
655: z^{2t+1}_i &= \alpha \tc \tU_{2t+1} x_i^{2t,(\mu)} + Z_i^{2t+1},
656: \\
657: %
658: \tz^{2t}_j &= \alpha c U_{2t} \tx_j^{2t-1,(\mu)} + \tZ_j^{2t},
659: %
660: \end{align}
661: where
662: \begin{align}
663: Z_i^{2t+1} &= \frac{1}{N}
664: \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}
665: \xi^{\mu}_i \txi^{\mu}_j \tx^{2t+1,(\mu)}_j +
666: \frac{\tc \tU_{2t+1}}{N} \sum_{\mu=2}^{\alpha N} \sum_{k\neq i}^{c N}
667: \xi^{\mu}_i \xi^{\mu}_k x^{2t,(\mu)}_k +
668: \tc c \tU_{2t+1} U_{2t} Z^{2t-1}_i,
669: \label{eq:noise_expand1}
670: \\
671: \tZ_j^{2t} &= \frac{1}{N}
672: \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{c N}
673: \txi^{\mu}_j \xi^{\mu}_i x^{2t,(\mu)}_i +
674: \frac{c U_{2t}}{N} \sum_{\mu=2}^{\alpha N} \sum_{l\neq j}^{\tc N}
675: \txi^{\mu}_j \txi^{\mu}_l \tx^{2t-1,(\mu)}_l +
676: c \tc U_{2t} \tU_{2t-1} \tZ^{2t-2}_j,
677: \label{eq:noise_expand2}
678: \end{align}
679: where
680: \begin{align}
681: \tU_{2t+1} &= \frac{1}{\tc N}
682: \sum_{j=1}^{\tc N}
683: F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} ) ,\\
684: U_{2t} &= \frac{1}{c N}
685: \sum_{i=1}^{c N}
686: F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} ).
687: \end{align}
688: Since $x_i^{2t,(\mu)}$ and $\tx_j^{2t-1,(\mu)}$ are almost independent
689: with $\xi_i^{\mu}$ and $\txi_j^{\mu}$, respectively,
690: each $Z_i^{2t+1}$ and $\tZ_j^{2t}$ can be regarded as
691: independent identical Gaussian distributions,
692: that is $Z_i^{2t+1} \sim N(0, \alpha r_{2t+1})$ and
693: $\tZ_j^{2t} \sim N(0, \alpha \tr_{2t})$.
694: Each noise variance,
695: $E[(Z_i^{2t+1})^2] = \alpha r_{2t+1}$ and
696: $E[(\tZ_j^{2t})^2] = \alpha \tr_{2t}$,
697: can be described as
698: \begin{align}
699: \alpha r_{2t+1} &=
700: \alpha \tc \tq_{2t+1} +
701: \alpha c \tc^2 \tU_{2t+1}^2 q_{2t} +
702: \alpha (\tc c \tU_{2t+1} U_{2t})^2 r_{2t-1} \notag\\
703: & \:\:\:\:
704: + 2 c\tc \tU_{2t+1} U_{2t}
705: E\left[
706: Z^{2t-1}_i
707: \frac{1}{N}
708: \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}
709: \xi^{\mu}_i \txi^{\mu}_j \tx^{2t+1,(\mu)}_j
710: \right]
711: \label{eq:r_expand1}
712: %
713: ,
714: \\
715: %
716: \alpha \tr_{2t} &=
717: \alpha c q_{2t} +
718: \alpha \tc c^2 U_{2t}^2 \tq_{2t-1} +
719: \alpha (c \tc U_{2t} \tU_{2t-1})^2 \tr_{2t-2} \notag\\
720: & \:\:\:\:
721: + 2 c\tc U_{2t} \tU_{2t-1}
722: E\left[
723: \tZ^{2t-2}_j
724: \frac{1}{N}
725: \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{c N}
726: \txi^{\mu}_j \xi^{\mu}_i x^{2t,(\mu)}_i
727: \right],
728: \label{eq:r_expand2}
729: \end{align}
730: where
731: \begin{align}
732: \tq_{2t+1} &= \frac{1}{\tc N}
733: \sum_{j=1}^{\tc N} \left( \tx_j^{2t+1,(\mu)} \right)^2,
734: \\
735: q_{2t} &= \frac{1}{c N}
736: \sum_{i=1}^{c N} \left( x_i^{2t,(\mu)} \right)^2.
737: \end{align}
738: The last terms in eqs.(\ref{eq:r_expand1}) and (\ref{eq:r_expand2}) are
739: determined by correlations between
740: the current state $\tx_j^{2t+1}$ and the previous state noise variable
741: $Z_i^{2t-1}$,
742: and between $x_i^{2t}$ and $\tZ_j^{2t-2}$, respectively.
743: %
744: Assuming that the $(n+1)$ previous state noise variables
745: $Z_i^{2(t-n)-1}$ and $\tZ_j^{2(t-n)-2}$ have no correlation with the
746: current state $\tx_j^{2t+1}$ and $x_i^{2t}$, respectively,
747: we can expand $r_{2t+1}$ and $\tr_{2t}$ as recurrence formulas:
748: %of $Z^{2t+1}_i$ and $\tZ^{2t}$
749: %in eqs. (\ref{eq:noise_expand1}) and (\ref{eq:noise_expand2}),
750: %and we obtain
751: \begin{align}
752: r_{2t+1} &= \tc \tq_{2t+1} + c (\tc\tU_{2t+1})^2 q_{2t}
753: + (c\tc \tU_{2t+1} U_{2t})^2 r_{2t-1} \notag\\
754: &
755: +
756: 2\tc \sum_{\eta=1}^{n}
757: (c\tc)^{\eta}
758: \tq_{2t+1,2(t-\eta)+1}
759: \!\!\!\!
760: \prod_{\tau=t-\eta+1}^{t}
761: \!\!\!\!
762: \tU_{2\tau+1} U_{2\tau}
763: \notag \\
764: &
765: +
766: 2c (\tc\tU_{2t+1})^2 \sum_{\eta=1}^{n-1}
767: (c\tc)^{\eta}
768: q_{2t,2(t-\eta)}
769: \!\!\!\!\!
770: \prod_{\tau=t-\eta+1}^{t}
771: \!\!\!\!\!
772: U_{2\tau} \tU_{2\tau-1}
773: \label{eq:r_expand1d},
774: \end{align}
775: \begin{align}
776: \tr_{2t} &= c q_{2t} + \tc (cU_{2t}^2)^2 \tq_{2t-1}
777: + (\tc c U_{2t} \tU_{2t-1})^2 \tr_{2t-2},
778: \notag\\
779: &
780: +
781: 2 c \sum_{\eta=1}^{n}
782: (\tc c)^{\eta}
783: q_{2t,2(t-\eta)}
784: \!\!\!\!
785: \prod_{\tau=t-\eta+1}^{t}
786: \!\!\!\!
787: U_{2\tau} \tU_{2\tau-1}
788: \notag\\
789: &
790: +
791: 2 \tc (cU_{2t})^2
792: \sum_{\eta=1}^{n-1}
793: (\tc c)^{\eta}
794: \tq_{2t-1,2(t-\eta)-1}
795: \!\!\!\!\!\!\!\!
796: \prod_{\tau=t-\eta+1}^{t}
797: \!\!\!\!\!\!
798: \tU_{2\tau-1} U_{2\tau-2},
799: \label{eq:r_expand2d}
800: \end{align}
801: where $\tq_{2t+1,2(t-n)+1}$ means a cross-correlation between
802: the current state $\tx_j^{2t+1}$ and the n-step previous state
803: $\tx_j^{2(t-n)+1}$,
804: and $q_{2t, 2(t-n)}$ means a cross-correlation between $x_i^{2t}$ and
805: $x_i^{2(t-n)}$.
806: These variables can be also described with the macroscopic parameters across
807: the n-step previous state.
808: The complete formula is described in the appendix.
809: %
810:
811: %Assuming the self-averaging property,
812: We obtain the evolution equations
813: for macroscopic parameters as follows:
814: \begin{align}
815: Y^{2t} &= F( \tc \tm_{2t-1} \xi + \sqrt{\alpha r_{2t-1}} z ),
816: \label{eq:dyn_start}
817: \\
818: \tY^{2t+1} &= F( c m_{2t} \txi + \sqrt{\alpha \tr_{2t}} z ), \\
819: m^{2t} &= \int Dz \langle \xi Y^{2t} \rangle_{\xi}, \\
820: \tm^{2t+1} &= \int Dz \langle \txi \tY^{2t+1} \rangle_{\txi},\\
821: q_{2t} &= \int Dz \langle (Y^{2t})^2 \rangle_{\xi}, \\
822: \tq_{2t+1} &= \int Dz \langle (\tY^{2t+1})^2 \rangle_{\txi},
823: %\\
824: \end{align}
825: \begin{align}
826: U_{2t} &= \frac{1}{\sqrt{\alpha r_{2t-1}}}
827: \int Dz z \langle Y^{2t} \rangle_{\xi}, \\
828: \tU_{2t+1} &= \frac{1}{\sqrt{\alpha \tr_{2t}}}
829: \int Dz z \langle \tY^{2t+1} \rangle_{\txi}
830: \label{eq:dyn_end}
831: \end{align}
832: % r_{2t+1} &= \tc (\tq_{2t+1} + c\tc \tU_{2t+1}^2 q_{2t}) \\
833: % \tr_{2t} &= c (q_{2t} + c\tc U_{2t}^2 \tq_{2t-1}).
834: % \label{eq:1step}
835: %\end{align}
836: In these order parameter equations,
837: $Y^{2t}$ and $\tY^{2t+1}$ correspond to the $x_i^{2t}$
838: and $\tx_j^{2t+1}$, respectively.
839: The overlaps for the first pattern pair, $m^{2t}_1$ and $\tm^{2t+1}_1$,
840: which mean retrieval degree, correspond to the $m^{2t}$ and $\tm^{2t+1}$,
841: respectively.
842:
843:
844:
845:
846: %We first apply the one-step analysis method \cite{Amari88a}, which
847: %corresponds to the naive S/N analysis.
848: %Then, we extend the one-step analysis to the statistical neurodynamics.
849:
850: %\subsection{Analysis with one-step theory}
851: %Amari and Maginu proposed an analysis method called ``one-step theory'' \cite{Amari88a},
852: %which is based on the naive S/N analysis, and they applied it to AAM.
853: %In the one-step theory, the noise components,
854: %which corresponds to eqs.(\ref{eq:noise1}) and (\ref{eq:noise2})in each step,
855: %are assumed to be independent Gaussian noise.
856: %We applied the one-step analysis to BAM and obtained
857: %
858:
859: %The important point here is that
860: %the evolution of macroscopic parameters
861: %can be described as the recurrence formulae
862: %using one-step previous state.
863: %%these recurrence formulae can be described with the macroscopic
864: %%parameter of the one-step before state.
865: %Formally, these equations are identical to the analysis of
866: %the sequence association model, which is a variety of AAM \cite{Amari88b}.
867: %In the sequence association model,
868: %since cross correlations between sequential patterns are
869: %embedded in the connections,
870: %the stored patterns appear sequentially in each update
871: %in the successful retrieving phase.
872: %
873:
874: %We derived the critical capacity of the limit of these dynamics
875: %(\ref{eq:1step}) and obtained $\alpha_c = 0.27$, which was also
876: %suggested by Amari \cite{Amari88b}.
877: %However, the critical capacity derived from the one-step theory seems to
878: %be overestimated.
879: %
880: %This overestimation comes from the assumption that
881: %the noise components in each time step are independent Gaussian noise.
882: %%In other words, we should prcisely evaluate the parameters $r_{2t+1}$,
883: %%and $\tr_{2t}$ have no correlation to the previous state in each update.
884: %
885: %To evaluate the noise correlation in time series more accurately,
886: %we must introduce the statistical neurodynamics.
887: %% Therefore, we need to evaluate these noise correlations exactly
888: %% in accordance with the concept of SCSNA.
889: %
890: %
891: %
892: %\subsection{Analysis by statistical neurodynamics}
893: %%In the previous subsection,
894: %We pointed out above that
895: %the one-step analysis could not quantitatively explain the transient process of BAM;
896: %therefore, we introduced the statistical neurodynamics\cite{Okada95}.
897: %Just like one-step analysis is based on the naive S/N analysis,
898: %the statistical neurodynamics analysis is based on the SCSNA.
899: %
900: %This analysis evaluates the noise correlation more accurately
901: %than does the one-step analysis.
902: %
903: %Using the statistical neurodynamics, Okada described
904: %the transient process of an AAM as
905: %recurrence formulae of macroscopic parameters \cite{Okada95}.
906:
907: %To apply the statistical neurodynamics to BAM,
908: %only two parameters, $r_{2t+1}$ and $\tr_{2t}$, need to be evaluated,
909: %and
910: %the other parameters are the same as those of the one-step analysis.
911: %%There was no need to re-evaluate the other parameters
912: %%$Y_i^{2t}, \tY_j^{2t+1}, m^1_{2t}, \tm^1_{2t+1},
913: %%q_{2t}, \tq_{2t+1}, U_{2t},\tU_{2t+1}$.
914:
915:
916:
917: Yanai {\it et al.} \cite{Yanai91} applied the one-step analysis,
918: which corresponds to Amari \& Maginu theory.
919: In their analysis,
920: the macroscopic order parameter equations for
921: $Y_i^{2t}$, $\tY_j^{2t+1}$, $m^{2t}_1$, $\tm^{2t+1}_1$,
922: $q_{2t}$, $\tq_{2t+1}$, $U_{2t}$,$\tU_{2t+1}$,
923: which are described in eqs. from (\ref{eq:dyn_start})
924: to (\ref{eq:dyn_end}),
925: are identical to those of our analysis.
926: The differences are in evaluating of noise variances,
927: that is, $r_{2t+1}$ and $\tr_{2t}$.
928: They ignored the noise correlation, and derived these values as
929: \begin{align}
930: r_{2t+1} &= \tc \tq_{2t+1} + c (\tc\tU_{2t+1})^2 q_{2t}
931: \label{eq:yanai_r1}\\
932: \tr_{2t} &= c q_{2t} + \tc (cU_{2t}^2)^2 \tq_{2t-1}.
933: \label{eq:yanai_r2}
934: \end{align}
935: In their result, the critical capacity $\alpha_c$
936: is equal to $0.27$, which is not equal to
937: our SCSNA analysis and the replica analysis
938: ($\alpha_c = 0.1998$ for both analyses).
939: This overestimation comes from the lack of noise correlation evaluation.
940:
941:
942: In our analysis,
943: we consider the effect of crosstalk noise correlation across $n$-step
944: previous state, and obtained
945: the eqs.(\ref{eq:r_expand1}) and (\ref{eq:r_expand2})
946: which includes Yanais' analysis (eqs.(\ref{eq:yanai_r1}) and
947: (\ref{eq:yanai_r2})).
948: In the next section, we show that
949: the analysis accuracy improves as $n$ in increased ($n = 2, 3, \cdots$).
950: Hereafter, we call the statistical neurodynamics considering across the
951: $n$-step previous state as the ``n-step'' analysis in the following.
952: ``Full-step'' analysis means using all the macroscopic parameters from
953: the initial state ($t = 0$) to the current state.
954:
955:
956: %
957: %In eqs.(\ref{eq:noise_expand1}) and (\ref{eq:noise_expand2}),
958: %the first two terms are also in the one-step analysis
959: %eqs. (\ref{eq:1step}).
960: %When we truncate the expansion with $n=1$,
961: %these parameters $r_{2t+1}$ and $\tr_{2t}$
962: %%are identical to those in the one-step analysis,
963: %meaning that the statistical neurodynamics includes the one-step analysis,
964: %and the residual terms describes higher order correlations.
965: %
966: %Since these cross-correlations can be denoted as the recurrence formulae
967: %using across $n$ time steps before states,
968: %the computational cost for analysis is not so much high.
969:
970:
971: %The one-step analysis is identical to the statistical neurodynamics analysis of $n=1$,
972: %and
973:
974:
975: \section{Result}
976: \label{sec:compare}
977: In this section,
978: we compare the results of the statistical neurodynamics
979: with those of computer simulation.
980: %
981: %First, we compare the time evolution of the macroscopic parameter $m$,
982: %which means the retrieving degree.
983: %compared the simulation results with
984: %the time evolution of a macroscopic parameter using the statistical neurodynamics.
985: Fig. \ref{fig:overlap} shows the time evolution of the overlap $m^{2t}_{1}$,
986: which means how well the pattern $\Vec{\xi}^{1}$ is retrieved in the
987: first layer at $2t$.
988: Each abscissa axis represents the time step $t$ and
989: each ordinate axis describes overlap $m_1^{2t}$.
990: Convergence of the overlap $m_1^{2t}$ to $1.0$ means success in retrieving
991: $\Vec{\xi}^{1}$.
992: In the graphs,
993: we show several evolution curves in which the initial overlap
994: ($m_1^{0}$) starts with a different state.
995:
996:
997:
998: Fig. \ref{fig:overlap}(a) shows the simulation results.
999: We set the number of neurons $N$ to $10,000$, $c = \tc = 1$,
1000: and the number of stored pattern pairs was indexed as $\alpha=0.15$.
1001: %
1002: %In the fig. \ref{fig:overlap}(a),
1003: The retrieval was successful when we set the initial overlap larger than $0.4$,
1004: and it failed when we set it to $0.3$ or less.
1005: %
1006: Fig. \ref{fig:overlap}(b) shows the results of the one-step analysis,
1007: and figs.\ref{fig:overlap}(c) to (e) shows the result of
1008: the $2$-step, $3$-step, and full step analysis, respectively.
1009: In each analysis result, the overlap converged to $0$ when retrieving failed
1010: because of assuming infinite neuron units ($N\rightarrow\infty$) exist.
1011: In the simulation results, fig.\ref{fig:overlap}(a),
1012: the system settled into a spurious memory state when retrieving failed
1013: because the number of units was finite ($N=10,000$).
1014: Therefore, the curves starting at $0.1$ to $0.3$ can be regarded as
1015: retrieval failures.
1016:
1017:
1018: As shown in fig. \ref{fig:overlap}(b),
1019: the one-step analysis says that retrieving is successful when the initial
1020: overlap is $0.3$, which does not agree with the simulation results.
1021: %
1022: Fig. \ref{fig:overlap}(c) shows the results for $n=2$, {\it i.e.} the 2-step analysis.
1023: The 2-step analysis says that the retrieval is a failure when the
1024: initial overlap is $0.3$,
1025: which agrees with the simulation results.
1026: %
1027: Fig. \ref{fig:overlap}(d) and (e) show the 3-step results
1028: and the full-step analysis results, respectively.
1029: Each figure shows similar characteristics,
1030: and the results agree with those of the simulation results shown as fig. \ref{fig:overlap}(a).
1031: Since the 3-step analysis results are very similar to the full-step
1032: analysis results
1033: %As far as we saw these figures,
1034: the 3-step analysis is enough for approximating the full-step analysis.
1035: %
1036: In other words, the previous 3-step correlations are effective for BAM
1037: retrieval.
1038: %
1039:
1040:
1041:
1042:
1043: In the statistical neurodynamics analysis,
1044: the equilibrium state is described as the limit of the transient process,
1045: and the order parameters should be consistent to the result of the SCSNA.
1046: %
1047: Fig. \ref{fig:basin} shows the memory capacity and the basin of attraction
1048: %
1049: %, which means the retrieval limit of degrading limit of in the initial state measured by the overlap $m_1^0$.
1050: , which means the degrading limit of the retrievable pattern in the
1051: initial state measured by the overlap $m_1^0$.
1052: %
1053: %
1054: %An advantage of the statistical neurodynamics analysis is to guess
1055: %how much the initial pattern $\Vec{x}^0$ can be degraded.
1056: %
1057: %The retrieval limit of degrading which is described by
1058: %the initial overlap $m_1^{0}$ is called basin.
1059: %
1060: For example, in fig. \ref{fig:overlap}(d),
1061: the retrieval is successful when starting at $m_1^{0} = 0.4$,
1062: while it is a failure when starting at $m_1^{0} = 0.3$.
1063: There is thus a basin when $m_1^{0}$ is between $0.3$ and $0.4$ for
1064: $\alpha = 0.15$.
1065: %So that the basin exists between $m_1^{0} = 0.3$ to $0.4$
1066: %under the condition $\alpha = 0.15$.
1067: %
1068: %
1069: %In fig.\ref{fig:basin},
1070: %The abscissa axis is the capacity index $\alpha$
1071: %and the ordinate axis means the overlap $m_1$.
1072: %The solid line shows the SCSNA result.
1073: %The dashed lines are derived from the statistical neurodynamics.
1074: %
1075: %Fig.\ref{fig:basin} shows the capacity and the basin of attraction.
1076: The dashed curves in fig.\ref{fig:basin} are derived from the
1077: statistical neurodynamicses.
1078: %In these curves derived from the statistical neurodynamics,
1079: In these curves,
1080: the upper part shows equilibrium overlap $m_{1}^{\infty}$ in successful
1081: retrieval and the lower part shows the basin of attraction $m_{1}^{0}$.
1082: When we set the initial overlap $m_{1}^{0}$ to be in the area surrounded
1083: by these curves, the retrieval will be success.
1084: Therefore, the area surrounded by these curves represents
1085: the successful retrieval area.
1086: %
1087: It is clear that the one-step analysis overestimates
1088: both the relative capacity and the basin of attraction\cite{Yanai91}.
1089: The theoretical estimation accuracy improves and comes close to that of the
1090: SCSNA analysis asymptotically as the analysis accuracy is improved
1091: (2-step, 3-step, $\cdots$.)
1092: We also show the basin derived from the simulation results using
1093: error-bars in fig.\ref{fig:basin}.
1094: The results of simulation agree with those of the statistical
1095: neurodynamics quantitatively.
1096: %In the simulation, we experimented as the neuron units $N=10,000$.
1097: %The two-step and above analyses agree with these simulation results.
1098:
1099:
1100: \section{Conclusion}
1101: \label{sec:conclusion}
1102: We derived the macroscopic parameters of BAM in the equilibrium state by
1103: using the SCSNA and obtained the critical capacity $\alpha_c$ as $0.1998$.
1104: %
1105: The results agreed with the previous results and the simulation results.
1106: %\cite{Tanaka00}.
1107: %Moreover, we confirmed that the equilibrium analysis is also agree with
1108: %the computer simulation.
1109:
1110: We also analyzed the transient process of BAM using the statistical
1111: neurodynamics and
1112: obtained the evolution equations for the macroscopic parameters.
1113: Comparison of the numerical solutions with the simulation results,
1114: we showed that the analysis results can explain the simulation results
1115: with sufficient accuracy for the transient process.
1116: %in
1117: %enough accuracy in the transient process.
1118: %As a result,
1119: Therefore,
1120: to explain the transient process of BAM quantitatively,
1121: it is sufficient to consider the 3-step statistical neurodynamics,
1122: which means that the crosstalk noise has effective correlation across
1123: the 3-step previous state.
1124:
1125:
1126: \small
1127: \bibliographystyle{unsrt}
1128: \bibliography{shouno}
1129:
1130: \appendix
1131: \section{Detail SCSNA Description}
1132: In Sec.\ref{sec:SCSNA}, we introduced the overlaps,
1133: \begin{align}
1134: m_{\mu} &= \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i x_i \\
1135: \tm_{\mu} &= \frac{1}{\tc N} \sum_{j=1}^{\tc N} \txi^{\mu}_j \tx_j
1136: \end{align}
1137: for each layer state.
1138: In the equilibrium state, we assumed that the first pattern pair,
1139: ($\Vec{\xi}^1$, $\Vec{\txi}^1$), is retrieved,
1140: so that overlaps for other pattern pairs are small, {\it i.e.}
1141: $m_{\mu}, \tm_{\mu} \sim O(\frac{1}{\sqrt{N}})$ where $\mu \geq 2$.
1142: %
1143: Thus we denotes the $m_{\mu}$
1144: \begin{align}
1145: m_{\mu} &= \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i
1146: F( \tc \sum_{\mu=1}^{\alpha N} \xi_i^{\mu} \tm_{\mu}) \\
1147: &\sim
1148: \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i
1149: F( \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu} \tm_{\nu})
1150: +
1151: \frac{1}{cN} \sum_{i=1}^{cN} \tc\tm_{\mu}
1152: F'( \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu}\tm_{\nu} )
1153: \\
1154: &=
1155: M_{\mu} + \tc \tm_{\mu} U,
1156: \label{eq:M1}
1157: \end{align}
1158: where
1159: \begin{align}
1160: U &= \frac{1}{cN} \sum_{i=1}^{cN} F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu}\tm_{\nu})\\
1161: M_{\mu} &=
1162: \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i
1163: F( \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu} \tm_{\nu})
1164: =
1165: \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i x_i^{(\mu)}.
1166: \end{align}
1167: We denote the $x_i^{(\mu)}$ as the value drawn
1168: the effect of $\mu$th pattern pair from $x_i$, {\it i.e.} $x_i - x_i^{(\mu)} \sim O(\frac{1}{\sqrt{N}})$.
1169: %
1170: $\tm_{\mu}$ is also denoted
1171: \begin{align}
1172: \tm_{\mu} \sim \tM_{\mu} + c m_{\mu} \tU,
1173: \label{eq:M2}
1174: \end{align}
1175: where
1176: \begin{align}
1177: \tU &= \frac{1}{\tc N} \sum_{i=1}^{\tc N} F'(c \sum_{\nu \neq \mu}^{\alpha N} \txi_j^{\nu} m_{\nu})\\
1178: \tM_{\mu} &=
1179: % \frac{1}{\tc N} \sum_{j=1}^{\tc N} \xi^{\mu}_j
1180: % F( c \sum_{\nu \neq \mu}^{\alpha N} \txi_i^{\nu} m^{\nu})
1181: % =
1182: \frac{1}{\tc N} \sum_{j=1}^{cN} \txi^{\mu}_j \tx_j^{(\mu)}.
1183: \end{align}
1184: %
1185: Solving eqs.(\ref{eq:M1}) and (\ref{eq:M2}) for $m^{\mu}$ and $\tm^{\mu}$,
1186: we obtain
1187: %
1188: \begin{align}
1189: m_{\mu} &= \frac{1}{1-c\tc U\tU} (M_{\mu} + \tc U \tM_{\mu}),
1190: \label{eq:app_ovlp1}\\
1191: \tm_{\mu} &= \frac{1}{1-c\tc U\tU} (\tM_{\mu} + c \tU M_{\mu}).
1192: \label{eq:app_ovlp2}
1193: \end{align}
1194: %
1195: Since the noise terms in eqs.(\ref{eq:eq_update1}) and (\ref{eq:eq_update2})
1196: can be described as
1197: $
1198: z_i = \tc \displaystyle\sum_{\nu \geq 2}^{\alpha N} \xi_i^{\nu} \tm_{\nu}$,
1199: and
1200: $ \tz_j = c \displaystyle \sum_{\nu \geq 2}^{\alpha N} \txi_i^{\nu} m_{\nu}$,
1201: we substituted eqs.(\ref{eq:app_ovlp1}) and (\ref{eq:app_ovlp2}) to these noises
1202: and obtained
1203: \begin{align}
1204: z_i &= \frac{\alpha \tc \tU}{1 - c\tc U\tU} x_i^{(\mu)} + Z_i, \\
1205: \tz_j &= \frac{\alpha c U}{1 - c\tc U\tU} \tx_j^{(\mu)} + \tZ_j,
1206: \end{align}
1207: where
1208: \begin{align}
1209: Z_i &= \frac{1}{N(1- c\tc U\tU)}
1210: \sum_{\nu \geq 2}^{\alpha N}
1211: \left(
1212: \tc \tU \sum_{k\neq i} \xi_i^{\nu} \xi_k^{\nu} x_k^{(\mu)} +
1213: \sum_{j=1}^{\tc N} \xi_i^{\nu} \txi_j^{\nu} \tx_j^{(\mu)}
1214: \right)
1215: \\
1216: %
1217: \tZ_j &= \frac{1}{N(1- c\tc U\tU)}
1218: \sum_{\nu \geq 2}^{\alpha N}
1219: \left(
1220: c U \sum_{l\neq j} \txi_j^{\nu} \txi_l^{\nu} \tx_l^{(\mu)} +
1221: \sum_{i=1}^{c N} \txi_j^{\nu} xi_i^{\nu} x_i^{(\mu)}
1222: \right)
1223: \end{align}
1224:
1225: We assumed $Z_i$ and $\tZ_j$ as independent identical Gaussian noise, described as
1226: $Z_i \sim N(0, \alpha r)$ and $\tZ_j \sim N(0, \alpha \tr)$, and evaluated expectations
1227: $E[(Z_i)^2]$ and $E[(\tZ_j)^2]$.
1228: We then obtained
1229: \begin{align}
1230: E[(Z_i)^2] &= \alpha r = \frac{\alpha \tc}{(1- c \tc U\tU)^2} (c \tc \tU^2 q + \tq), \\
1231: E[(\tZ_j)^2] &= \alpha \tr = \frac{\alpha c}{(1- c \tc U\tU)^2} (c \tc U^2 \tq + q),
1232: \end{align}
1233: where
1234: \begin{align}
1235: q &= \frac{1}{cN}\sum_{i=1}^{cN} (x_i^{(\mu)})^2 \\
1236: \tq &= \frac{1}{\tc N}\sum_{j=1}^{\tc N} (\tx_j^{(\mu)})^2
1237: \end{align}
1238: From the self-averaging property,
1239: we obtained the SCSNA order parameter equation in sec.\ref{sec:SCSNA}.
1240: % using the solution of these equations:
1241: %\begin{align}
1242: % Y &= F(\tc \xi \tm + \frac{\alpha \tc \tU}{1- c\tc U\tU} Y + \sqrt{\alpha r} z), \\
1243: % \tY &= F(c \txi m + \frac{\alpha c U}{1- c\tc U\tU} \tY + \sqrt{\alpha \tr} z).
1244: %\end{align}
1245:
1246:
1247: \section{Correlation with n-step previous state }
1248: To evaluate the effect of the previous crosstalk noises,
1249: %correlation of the $n$-step previous noise ,
1250: we must derive the correlation of a unit between the current state and
1251: the $n$-step before state.
1252: These are described by $\tq_{2t+1,2(t-n)+1}$ and $q_{2t,2(t-n)}$, respectively:
1253: \small
1254: \begin{align}
1255: &
1256: \tq_{2t+1,2(t-n)+1} =
1257: \notag\\
1258: &
1259: \frac{
1260: \int D\Vec{z}\exp( -\Vec{z}^{\mathrm T} \tSigma^{-1} \Vec{z})
1261: \langle \tY_{2t+1}(z_1) \tY_{2(t-n)+1}(z_2) \rangle }
1262: {2\pi\left| \tSigma \right| }
1263: %
1264: \notag\\
1265: %
1266: &
1267: q_{2t,2(t-n)} =
1268: \notag\\
1269: &
1270: \frac{
1271: \int D\Vec{\tz}\exp( -\Vec{\tz}^{\mathrm T} \Sigma^{-1} \Vec{\tz})
1272: \langle Y_{2t}(\tz_1) Y_{2(t-n)}(\tz_2) \rangle
1273: }
1274: {2\pi\left| \Sigma \right| }
1275: \end{align}
1276: \normalsize
1277: The matrices $\Sigma$, $\tSigma$ and vectors $\Vec{z},
1278: \Vec{\tz}$ are described as follows:
1279: \small
1280: \begin{align}
1281: %
1282: \tSigma &=
1283: \begin{pmatrix}
1284: \tr_{2t} & \tr_{2t,2(t-n)} \\
1285: \tr_{2t,2(t-n)} & \tr_{2(t-n)} \\
1286: \end{pmatrix}
1287: %
1288: \notag\\
1289: %
1290: \Vec{z} &=
1291: \begin{pmatrix}
1292: z_1 \\
1293: z_2
1294: \end{pmatrix}
1295: \notag\\
1296: %\end{align}
1297: %
1298: %
1299: %\begin{align}
1300: \Sigma &=
1301: \begin{pmatrix}
1302: r_{2t-1} & r_{2t-1,2(t-n)-1} \\
1303: r_{2t,2(t-n)} & r_{2(t-n)-1} \\
1304: \end{pmatrix}
1305: \notag\\
1306: %
1307: \Vec{\tz} &=
1308: \begin{pmatrix}
1309: \tz_1 \\
1310: \tz_2
1311: \end{pmatrix}
1312: \end{align}
1313: %\normalsize
1314: %
1315: The diagonal components of each matrix correspond to the
1316: variances of the current state noises.
1317: The non-diagonal components express the noise correlation between
1318: the current state and the $\eta$-step previous state.
1319: %These can be derived as:
1320: For $\eta \geq 2$, the correlations are described as:
1321: \small
1322: \begin{align}
1323: r_{2t+1,2(t-\eta)+1} = & \tc \tq_{2t+1,2(t-\eta+1)+1}
1324: \notag\\
1325: &
1326: + c\tc \tU_{2t+1} U_{2t} r_{2t-1,2(t-\eta+1)+1},
1327: \notag\\
1328: \tr_{2t,2(t-\eta)} = & cq_{2t,2(t-\eta)}
1329: \notag\\
1330: &
1331: + c\tc U_{2t} \tU_{2t-1} \tr_{2(t-1),2(t-\eta)},
1332: \end{align}
1333: \normalsize
1334: and for $\eta \geq 3$, they are
1335: \small
1336: \begin{align}
1337: &
1338: r_{2t+1,2(t-\eta)+1} = \tc \tq_{2t+1,2(t-\lambda)+1}
1339: \notag\\
1340: &\quad
1341: + c\tc^2 \tU_{2t+1} \tU_{2(t-\lambda)+1}
1342: q_{2t,2(t-\lambda)}
1343: \notag \\
1344: &\quad
1345: + \tc \sum_{\lambda=1}^{n}
1346: (c\tc)^\lambda \tq_{2(t-\lambda)+1,2(t-\eta)+1}
1347: \!\!\!\! \prod_{\tau=t-\lambda+1}^{t}
1348: \!\!\!\! \tU_{2\tau+1} U_{2\tau}
1349: \notag \\
1350: &\quad
1351: + \tc
1352: \sum_{\lambda=1}^{n-\eta}
1353: (c\tc)^\lambda \tq_{2t+1,2(t-\eta-\lambda)+1} \!\!\!\!\!\!
1354: \prod_{\tau=t-\lambda+1}^{t} \!\!\!\!\!\!
1355: \tU_{2(\tau-\eta)+1} U_{2(\tau-\eta)}
1356: %
1357: \notag \\
1358: &\quad
1359: + c\tc^2 \tU_{2t+1} \tU_{2(t-\eta)+1}
1360: \notag \\
1361: &\qquad\qquad
1362: \sum_{\lambda=1}^{n-1}(c\tc)^{\lambda}
1363: q_{2(t-\lambda),2(t-\eta)}
1364: \prod_{\tau=t-\lambda+1}^{t} U_{2\tau} \tU_{2\tau-1}
1365: %
1366: \notag \\
1367: &\quad
1368: %
1369: + c\tc^2 \tU_{2t+1} \tU_{2(t-\eta)+1}
1370: \sum_{\lambda=1}^{n-1-\eta}(c\tc)^{\lambda}
1371: q_{2t,2(t-\eta-\lambda)}
1372: \notag \\
1373: &\quad\qquad
1374: \prod_{\tau=t-\lambda+1}^{t} U_{2(\tau-\eta)} \tU_{2(\tau-\eta)-1}
1375: %
1376: \notag \\
1377: &\quad
1378: %
1379: + (c\tc)^2 \tU_{2t+1} U_{2t} \tU_{2(t-\eta)+1} U_{2(t-\eta)}
1380: r_{2t-1,2(t-\lambda)-1}
1381: \notag
1382: \end{align}
1383: %
1384: \begin{align}
1385: %
1386: &
1387: \tr_{2t,2(t-\eta)} = cq_{2t,2(t-\eta)}
1388: \notag\\
1389: &\quad
1390: + \tc c^2 U_{2t} U_{2(t-\eta)}
1391: \tq_{2t-1,2(t-\eta)-1}
1392: \notag \\
1393: &\quad
1394: %
1395: +
1396: c \sum_{\lambda=1}^{n} (c\tc)^\lambda
1397: q_{2(t-\lambda),2(t-\eta)}
1398: \!\!\!\! \prod_{\tau=t-\lambda+1}^{t}
1399: \!\!\!\! U_{2\tau} \tU_{2\tau-1}
1400: %
1401: \notag \\
1402: &\quad
1403: %
1404: +
1405: c \sum_{\lambda=1}^{n-\eta} (c\tc)^\lambda
1406: q_{2t,2(t-\eta-\lambda)} \!\!\!\!
1407: \prod_{\tau=t-\lambda+1}^{t} \!\!\!\!
1408: U_{2(\tau-\eta)} \tU_{2(\tau-\eta)-1}
1409: %
1410: \notag \\
1411: &\quad
1412: %
1413: + {\tc}c^2 U_{2t}U_{2(t-\eta)}
1414: \notag\\
1415: & \qquad\qquad
1416: \sum_{\lambda=1}^{n-1}(c\tc)^{\lambda}
1417: \tq_{2(t-\lambda)-1,2(t-\eta)-1}
1418: \!\!\!\!\!\!
1419: \prod_{\tau=t-\lambda+1}^{t}
1420: \!\!\!\!\!\!
1421: \tU_{2\tau-1} U_{2\tau-2}
1422: \notag \\
1423: &\quad
1424: %
1425: +
1426: {\tc}c^2 U_{2t}U_{2(t-\eta)}
1427: \sum_{\lambda=1}^{n-1-\eta}(c\tc)^{\lambda}
1428: \tq_{2t-1,2(t-\eta-\lambda)-1}
1429: \notag \\
1430: &\quad\qquad
1431: \prod_{\tau=t-\lambda+1}^{t} \tU_{2(\tau-\eta)-1} U_{2(\tau-\eta)-2}
1432: %
1433: \notag \\
1434: &\quad
1435: %
1436: +
1437: (c\tc)^2 U_{2t} \tU_{2t-1} U_{2(t-\eta)} \tU_{2(t-\eta)-1}
1438: \tr_{2(t-1),2(t-\lambda-1)}
1439: \notag\\
1440: \end{align}
1441: \normalsize
1442:
1443: \newpage
1444: \begin{figure}
1445: \begin{center}
1446: \resizebox{7.8cm}{!}{\includegraphics{bam.eps}}
1447: \caption{Network structure of BAM}
1448: \label{fig:bam}
1449: \end{center}
1450: \end{figure}
1451:
1452: \begin{figure}[t]
1453: \begin{center}
1454: \resizebox{0.8\textwidth}{!}{\includegraphics{fig1.eps}}
1455: \end{center}
1456: \caption{Comparing SCSNA results with those of the
1457: simulation. The horizontal axis means the loading rate $\alpha$, and the
1458: vertical axis means the overlap. The results of computer simulation are
1459: shown as error-bars, which indicates median with minimum and maximum values.}
1460: \label{fig:fig1}
1461: \end{figure}
1462:
1463:
1464: \begin{figure*}[t]
1465: \begin{center}
1466: \begin{tabular}{ccc}
1467: \resizebox{0.32\textwidth}{!}{\includegraphics{simseq.ps}}
1468: &
1469: \resizebox{0.32\textwidth}{!}{\includegraphics{seq1.ps}}
1470: &
1471: \resizebox{0.32\textwidth}{!}{\includegraphics{seq2.ps}}
1472: \\
1473: (a) Simulation
1474: &
1475: (b) 1-step analysis
1476: &
1477: (c) 2-step analysis
1478: \\
1479: &
1480: &
1481: \\
1482: &
1483: \resizebox{0.32\textwidth}{!}{\includegraphics{seq3.ps}}
1484: &
1485: \resizebox{0.32\textwidth}{!}{\includegraphics{fullseq.ps}}
1486: \\
1487: &
1488: (d) 3-step analysis
1489: &
1490: (e) full-step analysis
1491: \end{tabular}
1492: \end{center}
1493: \caption{Retrival process of a computer simulation and the statistical
1494: neurodynamics.
1495: The horizontal axis means time index $t$, and the vertical axis means
1496: the overlap $m$.
1497: (a) shows a result of computer simulation. From (b) to (e) shows the
1498: results of statistical neurodynamics.
1499: }
1500: \label{fig:overlap}
1501: \end{figure*}
1502: %
1503:
1504: \begin{figure}
1505: \begin{center}
1506: \resizebox{0.8\textwidth}{!}{\includegraphics{dynamics.ps}}
1507: \end{center}
1508: \caption{Capacity comparing the statistical neurodynamics with
1509: SCSNA. The horizontal axis means the loading rate $\alpha$, and the
1510: vertical axis means the overlap $m$.
1511: The dashed curves shows the results of the statistical neurodynamics.
1512: The results of computer simulations are shown with error-bar which
1513: indicates mean with standard deviations.
1514: }
1515: \label{fig:basin}
1516: \end{figure}
1517:
1518:
1519:
1520: \end{document}
1521: