0402:cond-mat0402126/en3.tex

1: %

2: % $Id: en3.tex,v 1.4 2004/02/04 04:50:07 shouno Exp shouno $

3: %

4: %\documentclass[11pt]{jarticle}

5: %\documentclass[11pt,twocolumn,dvipdfm]{article}

6: \documentclass[amsmath,amssymb]{revtex4}

7:

8: %\documentclass[12pt]{article}

9: %\usepackage{multicol}

10: %\usepackage{amsmath}

11: %\usepackage{amssymb}

12: %\usepackage{times}

13: %\usepackage{txfonts}

14: %

15: %\usepackage[dvips]{graphicx}

16: %\usepackage[dvipdfm]{hyperref}

17:

18: %\pagestyle{empty}                         %%%% No page Numbering

19:

20: %\usepackage{shouno}

21: %\usepackage{times}

22: \usepackage{graphicx}

23: \usepackage{bm}

24:

25:

26:

27: %\author{

28: %H. Shouno \\

29: %\and

30: %S. Kido \\

31: %\and

32: %M. Okada\\

33: %}

34:

35:

36: \setlength{\textheight}{9.0in}

37: \setlength{\columnsep}{0.375in}

38: \setlength{\textwidth}{6.5in}              %%% Preset settings

39: %\setlength{\footheight}{0.0in}

40: \setlength{\topmargin}{-0.0625in}

41: \setlength{\headheight}{0.0in}

42: \setlength{\headsep}{0.0in}

43: \setlength{\oddsidemargin}{0.0in}

44: \setlength{\parindent}{1pc}

45: \renewcommand{\textfraction}{0.01}

46: %\renewcommand{\baselinestretch}{0.98}

47: %\renewcommand{\baselinestretch}{2.0}

48:

49: \def \tx{\tilde{x}}

50: \def \tc{\tilde{c}}

51: \def \tm{\tilde{m}}

52: \def \tM{\tilde{M}}

53: \def \tU{\tilde{U}}

54: \def \th{\tilde{h}}

55: \def \tz{\tilde{z}}

56: \def \tZ{\tilde{Z}}

57: \def \tr{\tilde{r}}

58: \def \tq{\tilde{q}}

59: \def \txi{\tilde{\xi}}

60: \def \tY{\tilde{Y}}

61: \def \tSigma{\tilde{\Sigma}}

62: \def \Prob{{\mathrm {Prob}}}

63: \def\Vec#1{\boldsymbol {{#1}}}

64:

65: \begin{document}

66: \sloppy

67:

68: \title{Analysis of Bidirectional Associative Memory using SCSNA and Statistical Neurodynamics}

69: \author{Hayaru Shouno}

70: \affiliation{Dept. of Computer Science and Systems Engineering, Faculty of Engineering, Yamaguchi University}

71: \email{shouno@ai.csse.yamaguchi-u.ac.jp}

72:

73: \author{Shoji Kido}

74: \affiliation{Dept. of Computer Science and Systems Engineering, Faculty of Engineering, Yamaguchi University}

75:

76: \author{Masato Okada}

77: \affiliation{Brain Research Institute, RIKEN}

78:

79:

80: \date{\today}

81:

82: %\maketitle

83: %\begin{verbatim}

84: %$Id: en3.tex,v 1.4 2004/02/04 04:50:07 shouno Exp shouno $

85: %\end{verbatim}

86:

87: %{{\bf Abstract}

88: %{

89: \begin{abstract}

90: Bidirectional associative memory (BAM) is a kind of

91: an artificial neural network used to memorize and retrieve heterogeneous pattern pairs.

92: %

93: %Unfortunately,

94: Many efforts have been made to improve BAM from the

95: %BAM has been mainly studied from

96: the viewpoint of computer application,

97: and

98: few theoretical studies have been done.

99: %

100: We investigated the theoretical characteristics of BAM using

101: a framework of statistical-mechanical analysis.

102: To investigate the equilibrium state of BAM,

103: we applied self-consistent signal to noise analysis (SCSNA) and

104: obtained a macroscopic parameter equations and relative capacity.

105: % of $0.199 N$.

106: %which means the relative number of pattern pairs to be memorized and

107: %retrieved to the number of neurons $N$,

108: %

109: %Moreover,

110: %

111: Moreover, to investigate not only the equilibrium state but also

112: the retrieval process of reaching the equilibrium state,

113: we applied statistical neurodynamics to the update rule of BAM

114: and obtained evolution equations for the macroscopic parameters.

115: These evolution equations are consistent with the results of

116: SCSNA in the equilibrium state.

117: \end{abstract}

118: %}\\

119: %{\it Keywords: }{{\small BAM, SCSNA, Statistical neurodynamics}}

120: %}

121: %\vspace{0.5cm}

122:

123: %\noindent

124: \maketitle

125: \section{Introduction}

126: Bi-directional associative memory (BAM) \cite{Kosko88} is a kind of

127: an associative memory model which is an artificial neural network.

128: The principle function of associative memory is

129: to memorize multiple patterns and to retrieve the correct one

130: when a pattern key is given.

131:

132: Autocorrelation associative memory (AAM),

133: sometimes called the Hopfield model \cite{Hopfield86},

134: is also a kind of associative memory.

135: AAM tries to retrieve a stored pattern when

136: a degraded pattern is given as an association key;

137: this type of retrieving is called homogeneous association.

138: %

139: In contrast, BAM stores multiple pattern pairs

140: and

141: tries to retrieve a complete stored pattern pair

142: when

143: a degraded piece of the pair is given as an association key.

144: % In association process, each component of a stored pattern pair

145: % drives BAM to retrieve themselves in a coordinate manner.

146: Thus, BAM is called a heterogeneous pattern association model.

147:

148:

149:

150:

151: %

152: %

153: In the field of neural networks,

154: many efforts have been made to improve BAM from the viewpoint of

155: computer application \cite{Hassoun89} \cite{Simpson90} \cite{Wang90}

156: \cite{Zhuang93} \cite{Oh94} \cite{Wang95} \cite{Wang96} \cite{Hongchi98},

157: and few theoretical analyses have been reported \cite{Haines88} \cite{Yanai91} \cite{Tanaka00}.

158: The theoretical analysis of BAM has evolved with a focus on storage capacity,

159: which means how many patterns can be stored in a network

160: consisted of $N$ neural units.

161: %

162: Yanai {\it et al.} suggested that

163: BAM can be regarded as a variation of AAM

164: in which connections are systematically reduced \cite{Yanai91}.

165: %

166: They also showed that the relative storage capacity,

167: in which a finite amount of retrieval error is allowed,

168: of BAM to be around $0.22 N$.

169: Haines \& Hecht-Nielsen analyzed BAM along the same way,

170: and reported its absolute capacity,

171: in which no retrieval error is allowed,

172: to be $O(N/\log N)$ \cite{Haines88}.

173: %

174: Tanaka {\it et al.} analyzed BAM using a replica method

175: (see \cite{FisherHertz91}),

176: which is a statistical-mechanical analysis method,

177: and showed its relative capacity to be

178: $0.1998 N$ \cite{Tanaka00}.

179: %

180: %

181: These analyses mainly focused on the equilibrium state of BAM,

182: and

183: the transient process of retrieving,

184: which means how to reach the equilibrium state,

185: was not so conducted.

186: However, analysis of the retrieval process is as important as

187: that of the equilibrium state.

188:

189: In this paper, we have analyzed the equilibrium state of BAM using

190: the self-consistent signal-to-noise analysis (SCSNA)\cite{Shiino92}.

191: %which is known as the cavity method

192: We found that the relative capacity was $0.1998N$,

193: which agrees with the result of Tanaka {\it et al}.

194: %

195: We also investigated the retrieval process of BAM;

196: we derived macroscopic dynamical equations using

197: the statistical neurodynamics,

198: which was theoretically derived in the same manner as the SCSNA

199: \cite{Amari88a} \cite{Okada95},

200: and compared the results of between the statistical neurodynamics with

201: those of computer simulation.

202: %

203: Applying the statistical neurodynamics to BAM,

204: we obtained the evolution equations for the macroscopic parameters.

205: In the limit of these evolution equations,

206: that is, macroscopic parameters of BAM reached the equilibrium state,

207: we found these values were consistent with the results of SCSNA.

208: We also compared the results of applying the statistical

209: neurodynamics with those of the computer simulation

210: and obtained quantitative support for our analysis.

211:

212:

213:

214: %

215: %As a result,

216: %we confirmed that

217: %each macroscopic property in the limit of the transition of our dynamics

218: %reaches

219: %agreement with the result of the equilibrium state,

220: %and

221: %we also showed that the dynamical behavior of all macroscopic

222: %properties derived from the dynamics are agree with

223: %the simulation quantitatively.

224:

225: We describe the BAM formulation in Section

226: \ref{sec:formulation},

227: and we apply the SCSNA and show the results of

228: equilibrium state analysis in Section \ref{sec:SCSNA}.

229: In Section \ref{sec:dynamics},

230: we derived the evolution equations of macroscopic parameters using

231: the statistical neurodynamics

232: and compared the results with those of computer

233: simulation in Section \ref{sec:compare}.

234:

235:

236: \section{Formulation}

237: \label{sec:formulation}

238: %

239: As shown in Fig. \ref{fig:bam},

240: BAM is a two-layered neural network model \cite{Kosko88}.

241: %

242: The first layer consists of $c N$  neural units ($c\sim O(1)$),

243: and the state of the layer is denoted as $\Vec{x}$

244: with the components denoted as $x_i\:\:(1\leq i \leq cN)$.

245: %

246: The state of the second layer, which has $\tc N$ units, is denoted as

247: $\Vec{\tx}$,

248: and the $j$th unit state is described as $\tx_j \:\:(1\leq j \leq\tc N)$.

249: Each layer is connected by interlayer connection $\Vec{J}$

250: with the components described as

251: $J_{ij} \:\:(1\leq i \leq cN, \: 1 \leq j \leq \tc N)$.

252: $J_{ij}$ represents the connection weight between

253: the first layer unit $x_i$ and

254: the second layer unit ${\tx}_j$.

255:

256:

257:

258: We prepare $p$ binary pattern pairs denoted as

259: $\{\Vec{\xi}^{\mu}, \Vec{\txi}^{\mu}\}$ $(\mu = 1, \cdots, p)$,

260: where the superscript $\mu$ denotes the pattern pair index.

261: %

262: Pattern vector $\Vec{\xi}^{\mu}$ corresponds to the first layer, and

263: $\Vec{\txi}^\mu$ corresponds to the second layer.

264: Thus $\Vec{\xi}^{\mu}$ and $\Vec{\txi}^{\mu}$ have $c N$ and $\tc N$

265: components, respectively, and each component,

266: which is described as

267: $\xi^{\mu}_i$ and ${\txi}^{\mu}_j$

268: $(1 \leq i \leq c N, \:\: 1 \leq j \leq \tc N)$, respectively,

269: is generated

270: from uniform i.i.d.:

271: \begin{align}

272:  \Prob[\xi_i^{\mu}=\pm 1] &= \frac{1}{2}, \label{eq:pat1}\\

273:  \Prob[\txi_j^{\mu} \pm 1] &=  \frac{1}{2} \label{eq:pat2}.

274: \end{align}

275: %

276: Assuming the number of stored pattern pairs to be $p\sim O(N)$,

277: we define a quantity $\alpha \left( = \frac{p}{N} \right)$,

278: and use it for the loading rate.

279: %

280: %The parameter $\alpha (0 \sim 1)$ controls the amount of pattern pairs

281: %to be stored in, so that $\alpha$ describes a capacity index.

282: %

283:

284:

285: To determine the interlayer connection weight $J_{ij}$, which connects

286: $x_i$ and $\tx_j$, we use a correlation-based learning rule:

287: %

288: \begin{equation}

289:  J_{ij} = \frac{1}{N} \sum_{\mu=1}^{\alpha N} \xi^{\mu}_{i} \txi^{\mu}_{j}.

290: \label{eq3}

291: \end{equation}

292: %

293: %$\Vec{\xi}^{\mu}, \Vec{\txi}^{\mu} (\mu = 1, \cdots, \alpha N)$ are

294: %pattern pairs for association

295: %and the superscripts $\mu$ denotes the pattern pair index.

296: %

297: All the pattern pair correlations between $\Vec{\xi}^{\mu}$ and

298: $\Vec{\txi}^{\mu}$ are embedded in connection weight $\Vec{J}$.

299: In this notation,

300: the connections are not symmetrical, that is $J_{ij} \neq J_{ji}$.

301:

302:

303:

304:

305: In the retrieving,

306: we use a synchronous update rule for each layer;

307: that is, all units in each layer are updated synchronously,

308: and these layers are updated alternately.

309: %

310: The rules for updating the $i$th unit in the first layer

311: and the $j$th unit in the second layer are

312: %

313: \begin{align}

314:   x_{i}^{2t} &= F( \sum_{j=1}^{\tc N} J_{ij} \tx_{j}^{2t-1}  ), %\qquad{\mathrm{and}}

315:   \label{eq:dynamics1} \\

316:  \tx_{j}^{2t+1} &= F( \sum_{i=1}^{cN} J_{ij} x_{i}^{2t} )

317:   \label{eq:dynamics2},

318: \end{align}

319: %

320: where $t$ means the one step time and $F(\cdot)$ means the output function.

321: %, sometimes represented by

322: %a sigmoid function such as $\tanh(\cdot)$ which is used in our simulation.

323: %We assumed $F(\cdot)$ as the differentiable function in our analysis.

324: %

325: In these formulations, the retrieval process is carried out as follows.

326: In a initial state, $t=0$,

327: Association key $\Vec{x}^{0} (= \{ x_i^{0} \})$ is given to the first layer.

328: Then,

329: all of the second layer units, $\tx_{j}^{1} \: (1\leq j \leq \tc N)$,

330: are updated using eq. (\ref{eq:dynamics2}), and

331: the state of the second layer is described as $\Vec{\tx}^{1}$.

332: Next, $t=1$, all the units in the first layer, $x_{i}^{2} \: (1\leq i \leq c N)$,

333: are updated using eq.(\ref{eq:dynamics1}),

334: and the state is described as $\Vec{x}^{2}$.

335: After that the second layer is updated by eq.(\ref{eq:dynamics2}).

336: For each $t = 2, 3, \cdots$, the alternate updating of each layer

337: are carried out in the same way,

338: and each layer state is denoted as $\Vec{x}^{2t}$ and $\Vec{\tx}^{2t+1}$.

339: %for all $1\leq i \leq cN$.

340: %In the next step $2t-1=1$, the whole units in the second layer

341: %$\tx_{j}^{1}$ are updated by equation (\ref{eq:dynamics2}).

342: %Then, the whole units in the first layer are updated by equation

343: %This alternate update is a characteristic of BAM.

344:

345:

346: To apply S/N analysis, we introduce overlaps, which means similarities between patterns.

347: The overlaps between first layer state $\Vec{x}^{2t}$

348: and

349: the $\mu$th pattern, $\Vec{\xi}^{\mu}$,

350: and

351: between second-layer state $\Vec{\tx}^{2t-1}$ and

352: $\Vec{\txi}^{\mu}$

353: are described as follows, respectively:

354: \begin{align}

355:  m^{2t}_{\mu}   &= \frac{1}{cN} \sum_{i=1}^{cN} x^{2t}_{i} \xi^{\mu}_{i}, \\

356:  \tm^{2t+1}_{\mu} &= \frac{1}{\tc N} \sum_{j=1}^{\tc N} \tx^{2t+1}_{j} \txi^{\mu}_{j}.

357: \end{align}

358: %

359: Following the S/N analysis,

360: we decomposed the inner term of $F(\cdot)$ in eqs.(\ref{eq:dynamics1}) and (\ref{eq:dynamics2})

361: into signal and noise components.

362: Assuming the first pattern pair, $\{ \Vec{\xi}^1, \Vec{\txi}^1 \}$,

363: is retrieved,

364: the terms including overlaps $m_1$ and $\tm_1$,

365: %which mean how well the first pattern pair is retrieved

366: are signal components, {\it i.e.}  $m_1$, $\tm_1$ $\sim O(1)$.

367: %

368: Using these overlaps,

369: eqs.(\ref{eq:dynamics1}) and (\ref{eq:dynamics2}) can be described as

370: \begin{align}

371:  x^{2t}_{i} &= F(  \tc \tm^{2t-1}_1 \xi^{1}_{i} + z^{2t-1}_i ),

372:  \label{eq:ov_update1}\\

373:  \tx^{2t+1}_{j} &= F( c m^{2t}_1 \txi^{1}_j + \tz^{2t}_j ),

374:  \label{eq:ov_update2}

375: \end{align}

376: where $z^{2t-1}_i$, and $\tz^{2t}_j$ are called as crosstalk noises,

377: %which are the effects from other pattern pairs,

378: %$\{ \Vec{\xi}^{\mu}, \Vec{\txi}^{\mu}\}$ $(\mu=2, \cdots, \alpha N)$.

379: which

380: prevents the target pair $\{ \Vec{\xi}^1, \Vec{\txi}^1\}$ to be retrieved.

381: These crosstalk noises are denoted

382: \begin{align}

383:  z^{2t-1}_i &= \frac{1}{N} \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}

384:  \xi^{\mu}_i \txi^{\mu}_j \tx^{2t-1}_j,

385:  \label{eq:noise1}\\

386:  \tz^{2t}_j &= \frac{1}{N} \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{cN}

387:  \txi^{\mu}_j \xi^{\mu}_i x^{2t}_i.

388:  \label{eq:noise2}

389: \end{align}

390:

391:

392:

393:

394:

395: \section{Equilibrium state analysis by SCSNA}

396: \label{sec:SCSNA}

397: To derive equilibrium state macroscopic parameters,

398: we use SCSNA\cite{Shiino92}, which is an extension of a naive

399: signal-to-noise (S/N) analysis.

400: Since the SCSNA treats the equilibrium states of an associative memory model,

401: we omit the index $t$ in the update rules.

402: %the time index corresponding to $t$ is negligible

403: %in the updating rules.% , eqs. (\ref{eq:ov_update1}) and (\ref{eq:ov_update2}).

404: %Therefore, each layer units satisfies

405: %\begin{align}

406: %  x_{i} &= F( \sum_{j=1}^{\tc N} J_{ij} \tx_{j}  ),

407: %  \notag\\

408: %  \tx_{j} &= F( \sum_{i=1}^{cN} J_{ij} x_{i} ),

409: %  \label{eq:equilibrium1}

410: %\end{align}

411: %and we denote the vectors pair $\{

412: %\Vec{x} (= \{x_i\}),

413: %\Vec{\tx} (=\{\tx_j\})

414: % \}$

415: %as each layer's state in the equilibrium state.

416: %

417: Hence we can rewrite eqs. (\ref{eq:ov_update1})  and (\ref{eq:ov_update2}) as

418: \begin{align}

419:  x_{i} &= F(  \tc \tm_1 \xi^{1}_{i} + z_i ),

420:  \label{eq:eq_update1}

421:  \\

422:  \tx_{j} &= F( c m_1 \txi^{1}_j + \tz_j ),

423:  \label{eq:eq_update2}

424: \end{align}

425: respectively.

426: %

427:

428: %In a naive S/N analysis, each noise term, $z_i$ and $\tz_j$, is

429: %assumed to obey an independent identical Gaussian distribution

430: %w.r.t the site $i$ and $j$, respectively.

431: %regarded as an i.i.d random number.

432: %In contrast, the SCSNA treats $z_i$ and $\tz_j$ more precisely

433: %\cite{Shiino92}.

434: %

435: %

436: In the SCSNA, the crosstalk noise term is decomposed into

437: a systematic bias term and

438: a Gaussian noise term with $0$ mean\cite{Shiino92}.

439: %

440: The detailed formulas of SCSNA are described in appendix.

441: %The SCSNA  evaluates infinitesimal effects came from a $\mu$th pattern pair

442: %$\{ \Vec{\xi}^{\mu}, \Vec{\txi}^{\mu} \}$,

443: %and the effects are appeared as the self-dependent components of $x_i$ and $\tx_j$.

444: %%The SCSNA \cite{Shiino92} evaluates the effective self-depend components

445: %%came from the $\nu$th pattern pair,

446: %%in the crosstalk noises (\ref{eq:noise}).

447: %Taking these effects into consideration, we could derive

448: We derive

449: self-consistent equations called order parameter equations.

450: The following are the order parameter equations of BAM.

451: %

452: \small

453: \begin{align}

454:  Y &= F( \tc \tm \xi  +

455:   \frac{\alpha \tc \tU}{1-c\tc U\tU} Y +

456:   \sqrt{\alpha r} z ), \label{eq:param_from}\\

457:  \tY &= F( cm \txi +

458:   \frac{\alpha c U}{1-c\tc U\tU} \tY +

459:   \sqrt{\alpha \tr} z ), \\

460:  m &= \int Dz \: \langle \xi Y \rangle_{\xi}, \\

461:  \tm &= \int Dz \langle \txi \tY \rangle_{\txi}, \\

462:  q &= \int Dz \langle Y^2 \rangle_{\xi}, \\

463:  \tq &= \int Dz \langle \tY^2 \rangle_{\txi}, \\

464:  U &= \frac{1}{\sqrt{\alpha r}}

465:   \int Dz z \langle Y \rangle_{\xi}, \\

466:  \tU &= \frac{1}{\sqrt{\alpha \tr}}

467:   \int Dz z \langle \tY \rangle_{\txi}, \\

468:  r &= \frac{\tc}{(1-c\tc U\tU)^2} (\tq + c\tc \tU^2 q),\\

469:  \tr &= \frac{c}{(1-c\tc U\tU)^2} (q + c\tc U^2 \tq).

470:   \label{eq:param_to}

471: \end{align}

472: \normalsize

473: These equations are described in the manner of Shiino and Fukai \cite{Shiino92}.

474: %

475: $Y$ and $\tY$ represents

476: %equilibrium states $x_i$ and $\tx_j$, respectively.

477: the effective outputs for $x_i$ and $\tx_j$, respectively.

478: %

479: The stochastic variables $\xi$ and $\txi$,

480: obeying eq.(\ref{eq:pat1}) and (\ref{eq:pat2}),

481: corresponds to a retrieving pattern components $\xi^1_i$ and $\txi^1_j$,

482: and order parameters $m$ and $\tm$ corresponds to overlaps $m_1$ and

483: $\tm_1$.

484: %

485: Note that the operators $\langle\cdot\rangle_{\xi}$ and

486: $\langle\cdot\rangle_{\txi}$ mean the

487: expectations for stochastic variables $\xi$ or $\txi$, respectively.

488: %

489: %

490: %These expectations come from the substitution of the averaging

491: %operation described as

492: %\small

493: %$

494: % \frac{1}{cN} \sum_{i=1}^{cN}  \rightarrow \langle\cdot\rangle_{\xi},

495: %$

496: %\normalsize

497: %and

498: %\small

499: %$

500: % \frac{1}{\tc N} \sum_{j=1}^{\tc N}  \rightarrow \langle\cdot\rangle_{\txi}.

501: %$

502: %\normalsize

503: %%The stored pattern $\Vec{\xi}^{\mu}$ and $\Vec{\txi}^{\mu}$

504: %%can be considered as to be the set of stochastic

505: %%variables which come from independent and identical distribution

506: %(i.i.d.).

507: %Thus, this substitution is reasonable and proper.

508: %

509: %In eqs. (\ref{eq:param1}),

510: Each arguments of the function $F(\cdot)$ consists of three parts.

511: The first terms, $\tc\tm\xi$ and $cm\txi$, come from the signal components,

512: the second terms, $\frac{\alpha \tc \tU}{1-c\tc U\tU} Y$ and

513: $\frac{\alpha c U}{1-c\tc U\tU} \tY$, mean the systematic bias of

514: the crosstalk noises ($z_i$,  $\tz_j$) in eqs. (\ref{eq:noise1}) and

515: (\ref{eq:noise2}),

516: and

517: %each third term comes from the other crosstalk noise components.

518: each third term is assigned to be a Gaussian distribution with

519: $0$ mean and  $\alpha r$ or $\alpha \tr$ variance.

520: %

521: %

522: %

523: %%

524: %Assuming each crosstalk noise except the self-dependent components

525: %, which we denote $z'_i$ and $\tz'_j$, respectively,

526: %follows an identical independent normal distribution,

527: %we can evaluate the effect of these noises with a Gaussian integral:

528: %\small

529: %$

530: % \int Dz  = \frac{1}{\sqrt{2\pi}}\int dz \exp( -\frac{z^2}{2}).

531: %$

532: %\normalsize

533: %%

534: %Normal distribution is described with a mean and a variance,

535: %and each noise ($z'_i$, $\tz'_j$) follows $N(0, \alpha r)$ and

536: %%$N(0,\alpha\tr )$,  respectively.

537: %%Thus the third terms can be substituted by $\sqrt {\alpha r} z$ and

538: %%$\sqrt{\alpha \tr} z$, respectively,

539: %%where the stochastic variable $z$ follows $N(0,1)$.

540: %%

541: %%

542: %%mean and the variance of each noise distribution can be derived from eqs. (\ref{eq:noise}).

543: %%Both noise distribution means are equal to $0$, and

544: %%the variances are equal to $\alpha r$ and $\alpha \tr$, respectively.

545: %%

546: %%

547: We solved the order parameter equations from (\ref{eq:param_from}) to

548: (\ref{eq:param_to}) numerically and compared the results with those of

549: simulations.

550: Fig. \ref{fig:fig1}. shows the equilibrium overlap $m$ against

551: the capacity parameter $\alpha$.

552: An overlap of $1$ means that

553: the BAM retrieves a stored pattern pair successfully.

554: We obtained a relative capacity, $\alpha_c$ of $0.1998$

555: in which the nontrivial solution $m \neq 0$ and $\tm \neq 0$ is disappeared.

556: This agrees with the results of Tanaka {\it et al.} ($\alpha_c =

557: 0.1998$),

558: obtained with the replica method \cite{Tanaka00}.

559: In fig.\ref{fig:fig1}, we show the simulation results as error-bars,

560: which mean  medians and quartile deviations for ten trials.

561: The SCSNA results quantitatively explained the simulation results very well.

562:

563:

564:

565: \section{Retrieval process of BAM}

566: \label{sec:dynamics}

567: As we have seen, the SCSNA described the equilibrium state of BAM

568: quantitatively.

569: In this section, we consider a retrieval process of BAM, which means

570: the transient process reaching the equilibrium state.

571: %

572: The statistical neurodynamics,

573: which is a theory for the retrieval for associative memory model,

574: %an analysis method for transient process of neural networks,

575: is based on S/N analysis.

576: Amari and Maginu proposed a statistical neurodynamical theory on

577: the S/N analysis\cite{Amari88a},

578: %which assumes that each crosstalk noise obey to

579: %an identical independent Gaussian distribution, and

580: %this is called one-step analysis.

581: %%

582: %Using the SCSNA concept,

583: %Okada improved the one-step analysis in order to evaluate crosstalk noise

584: %correlation precisely.

585: %The Okada's analysis method succeeded to explain the transient process

586: %of several neural networks quantitatively\cite{Okada95}\cite{Kawamura02}.

587: %%

588: %

589: %

590: It was known that

591: the storage capacity obtained by Amari \& Maginu theory does not

592: coincide with the results of the replica theory\cite{Amit85b},

593: and

594: the size of the basin of attraction derived from Amari \& Maginu theory

595: is larger than the results of the computer simulation.

596: Okada extended the Amari \& Maginu theory to improve to resolve these

597: difficulties\cite{Okada95},

598: and obtained a macroscopic equation which has hierarchical structure.

599: In the macroscopic equation,

600: the first-order approximation corresponds to the Amari \& Maginu theory,

601: and

602: the higher order approximation coincide with the replica theory.

603: %

604: %

605: %We derive the evolution equations for the macroscopic parameters of BAM

606: %in the manner of Okada \cite{Okada95}.

607: % using

608: %the statistical neurodynamics \cite{Amari88a},

609: %the concept of which corresponds to that of the SCSNA , \cite{Okada95} \cite{Kawamura02}.

610: %

611:

612: For applying the statistical neurodynamics to BAM,

613: we evaluate the crosstalk noises (eqs.(\ref{eq:noise1}) and

614: (\ref{eq:noise2})) in eqs. (\ref{eq:ov_update1}) and

615: (\ref{eq:ov_update2}).

616: Assuming the first pattern pair, $\{\Vec{\xi}^1,  \Vec{\txi}^1 \}$,

617: is retrieved,

618: we can regard the overlaps of other pattern pairs,

619: $\{m^{2t}_{\mu}, \tm^{2t+1}_{\mu}\}$ where $\mu \geq 2$,

620: as small.

621: Thus, we expand the state $x_i^{2t}$ and $\tx_j^{2t+1}$:

622: \begin{align}

623:  x_i^{2t} &=

624: % F(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} )x_i^{2t,(\mu)}

625:  x_i^{2t,(\mu)}

626:  + \tc \tm^{2t-1}_\mu \xi_i^{\mu}

627:  F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} ),

628:  \label{eq:expand1}

629:  \\

630:  \tx_j^{2t+1} &=

631: % F(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )

632: % + c m^{2t}_\mu \txi_j^{\mu}

633: % F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )

634: % =

635:  \tx_j^{2t+1, (\mu)}

636:  + c m^{2t}_\mu \txi_j^{\mu}

637:  F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} ),

638:  \label{eq:expand2}

639: \end{align}

640: for $\mu \geq 2$, where

641: %$x_i^{2t,(\mu)}$ and $\tx_j^{2t+1, (\mu)}$ mean the value drawn the

642: %effect of $\mu$th pattern pair from $\tx_j^{2t+1}$ and

643: %$x_i^{2t}$,  respectively,

644: %{\it i.e.}

645: $\tx_j^{2t+1, (\mu)} =

646: F(c \displaystyle \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} )$

647: and

648: $x_i^{2t,(\mu)} =

649: F(\tc \displaystyle \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} )$.

650: %

651: %

652: Substituting eqs. (\ref{eq:expand1}) and (\ref{eq:expand2}) into

653: eqs.(\ref{eq:noise1}) and (\ref{eq:noise2}), we obtain,

654: \begin{align}

655:  z^{2t+1}_i &= \alpha \tc \tU_{2t+1} x_i^{2t,(\mu)} + Z_i^{2t+1},

656: \\

657: %

658:  \tz^{2t}_j &= \alpha c U_{2t} \tx_j^{2t-1,(\mu)} + \tZ_j^{2t},

659: %

660: \end{align}

661: where

662: \begin{align}

663:  Z_i^{2t+1} &= \frac{1}{N}

664:  \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}

665:  \xi^{\mu}_i \txi^{\mu}_j \tx^{2t+1,(\mu)}_j +

666:  \frac{\tc \tU_{2t+1}}{N} \sum_{\mu=2}^{\alpha N} \sum_{k\neq i}^{c N}

667:  \xi^{\mu}_i \xi^{\mu}_k x^{2t,(\mu)}_k +

668:  \tc c \tU_{2t+1} U_{2t} Z^{2t-1}_i,

669:  \label{eq:noise_expand1}

670:  \\

671:  \tZ_j^{2t} &= \frac{1}{N}

672:  \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{c N}

673:  \txi^{\mu}_j \xi^{\mu}_i x^{2t,(\mu)}_i +

674:  \frac{c U_{2t}}{N} \sum_{\mu=2}^{\alpha N} \sum_{l\neq j}^{\tc N}

675:  \txi^{\mu}_j \txi^{\mu}_l \tx^{2t-1,(\mu)}_l +

676:  c \tc U_{2t} \tU_{2t-1} \tZ^{2t-2}_j,

677:  \label{eq:noise_expand2}

678: \end{align}

679: where

680: \begin{align}

681:  \tU_{2t+1} &= \frac{1}{\tc N}

682:  \sum_{j=1}^{\tc N}

683:  F'(c \sum_{\nu \neq \mu}^{\alpha N} m^{2t}_{\nu} \txi_j^{\nu} ) ,\\

684:  U_{2t} &= \frac{1}{c N}

685:  \sum_{i=1}^{c N}

686:  F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \tm^{2t-1}_{\mu} \xi_i^{\nu} ).

687: \end{align}

688: Since $x_i^{2t,(\mu)}$ and $\tx_j^{2t-1,(\mu)}$ are almost independent

689: with $\xi_i^{\mu}$ and $\txi_j^{\mu}$, respectively,

690: each $Z_i^{2t+1}$  and $\tZ_j^{2t}$ can be regarded as

691: independent identical Gaussian distributions,

692: that is $Z_i^{2t+1} \sim N(0, \alpha r_{2t+1})$ and

693: $\tZ_j^{2t} \sim N(0, \alpha \tr_{2t})$.

694: Each noise variance,

695: $E[(Z_i^{2t+1})^2] = \alpha r_{2t+1}$ and

696: $E[(\tZ_j^{2t})^2] = \alpha \tr_{2t}$,

697: can be described as

698: \begin{align}

699:  \alpha r_{2t+1} &=

700:  \alpha \tc \tq_{2t+1} +

701:  \alpha c \tc^2 \tU_{2t+1}^2 q_{2t} +

702:  \alpha (\tc c \tU_{2t+1} U_{2t})^2 r_{2t-1}  \notag\\

703:  & \:\:\:\:

704:  + 2 c\tc \tU_{2t+1} U_{2t}

705:  E\left[

706:  Z^{2t-1}_i

707:  \frac{1}{N}

708:  \sum_{\mu=2}^{\alpha N} \sum_{j=1}^{\tc N}

709:  \xi^{\mu}_i \txi^{\mu}_j \tx^{2t+1,(\mu)}_j

710:  \right]

711:  \label{eq:r_expand1}

712: %

713:  ,

714:  \\

715: %

716:  \alpha \tr_{2t} &=

717:  \alpha c q_{2t} +

718:  \alpha \tc c^2 U_{2t}^2 \tq_{2t-1} +

719:  \alpha (c \tc U_{2t} \tU_{2t-1})^2 \tr_{2t-2} \notag\\

720:  & \:\:\:\:

721:   + 2 c\tc U_{2t} \tU_{2t-1}

722:   E\left[

723:     \tZ^{2t-2}_j

724:     \frac{1}{N}

725:     \sum_{\mu=2}^{\alpha N} \sum_{i=1}^{c N}

726:     \txi^{\mu}_j \xi^{\mu}_i x^{2t,(\mu)}_i

727:   \right],

728:  \label{eq:r_expand2}

729: \end{align}

730: where

731: \begin{align}

732:  \tq_{2t+1} &= \frac{1}{\tc N}

733:  \sum_{j=1}^{\tc N}  \left( \tx_j^{2t+1,(\mu)} \right)^2,

734:  \\

735:  q_{2t} &= \frac{1}{c N}

736:  \sum_{i=1}^{c N} \left( x_i^{2t,(\mu)} \right)^2.

737: \end{align}

738: The last terms in eqs.(\ref{eq:r_expand1}) and (\ref{eq:r_expand2}) are

739: determined  by correlations between

740: the current state $\tx_j^{2t+1}$ and the previous state noise variable

741: $Z_i^{2t-1}$,

742: and between $x_i^{2t}$ and $\tZ_j^{2t-2}$, respectively.

743: %

744: Assuming that the $(n+1)$ previous state noise variables

745: $Z_i^{2(t-n)-1}$ and $\tZ_j^{2(t-n)-2}$ have no correlation with the

746: current state $\tx_j^{2t+1}$ and $x_i^{2t}$, respectively,

747: we can expand $r_{2t+1}$ and $\tr_{2t}$ as recurrence formulas:

748: %of $Z^{2t+1}_i$ and $\tZ^{2t}$

749: %in eqs. (\ref{eq:noise_expand1}) and (\ref{eq:noise_expand2}),

750: %and we obtain

751: \begin{align}

752:  r_{2t+1} &= \tc \tq_{2t+1} + c (\tc\tU_{2t+1})^2 q_{2t}

753:  + (c\tc \tU_{2t+1} U_{2t})^2 r_{2t-1} \notag\\

754:   &

755:    +

756:    2\tc \sum_{\eta=1}^{n}

757:   (c\tc)^{\eta}

758:   \tq_{2t+1,2(t-\eta)+1}

759:   \!\!\!\!

760:   \prod_{\tau=t-\eta+1}^{t}

761:   \!\!\!\!

762:   \tU_{2\tau+1} U_{2\tau}

763:  \notag \\

764:   &

765:   +

766:   2c (\tc\tU_{2t+1})^2 \sum_{\eta=1}^{n-1}

767:   (c\tc)^{\eta}

768:   q_{2t,2(t-\eta)}

769:   \!\!\!\!\!

770:   \prod_{\tau=t-\eta+1}^{t}

771:   \!\!\!\!\!

772:   U_{2\tau} \tU_{2\tau-1}

773:  \label{eq:r_expand1d},

774: \end{align}

775: \begin{align}

776:  \tr_{2t} &= c q_{2t} + \tc (cU_{2t}^2)^2 \tq_{2t-1}

777:  + (\tc c U_{2t} \tU_{2t-1})^2 \tr_{2t-2},

778:  \notag\\

779:  &

780:   +

781:   2 c \sum_{\eta=1}^{n}

782:   (\tc c)^{\eta}

783:   q_{2t,2(t-\eta)}

784:   \!\!\!\!

785:   \prod_{\tau=t-\eta+1}^{t}

786:   \!\!\!\!

787:   U_{2\tau} \tU_{2\tau-1}

788:  \notag\\

789:  &

790:   +

791:   2 \tc (cU_{2t})^2

792:   \sum_{\eta=1}^{n-1}

793:   (\tc c)^{\eta}

794:   \tq_{2t-1,2(t-\eta)-1}

795:   \!\!\!\!\!\!\!\!

796:   \prod_{\tau=t-\eta+1}^{t}

797:   \!\!\!\!\!\!

798:   \tU_{2\tau-1} U_{2\tau-2},

799:  \label{eq:r_expand2d}

800: \end{align}

801: where $\tq_{2t+1,2(t-n)+1}$ means a cross-correlation between

802: the current state $\tx_j^{2t+1}$ and the n-step previous state

803: $\tx_j^{2(t-n)+1}$,

804: and $q_{2t, 2(t-n)}$ means a cross-correlation between $x_i^{2t}$ and

805: $x_i^{2(t-n)}$.

806: These variables can be also described with the macroscopic parameters across

807: the n-step previous state.

808: The complete formula is described in the appendix.

809: %

810:

811: %Assuming the self-averaging property,

812: We obtain the evolution equations

813: for macroscopic parameters as follows:

814: \begin{align}

815:  Y^{2t} &= F( \tc \tm_{2t-1} \xi + \sqrt{\alpha r_{2t-1}} z ),

816:  \label{eq:dyn_start}

817:  \\

818:  \tY^{2t+1} &= F( c m_{2t} \txi  + \sqrt{\alpha \tr_{2t}} z ), \\

819:  m^{2t} &= \int Dz \langle \xi Y^{2t} \rangle_{\xi}, \\

820:  \tm^{2t+1} &= \int Dz \langle \txi \tY^{2t+1} \rangle_{\txi},\\

821:  q_{2t} &= \int Dz \langle (Y^{2t})^2 \rangle_{\xi}, \\

822:  \tq_{2t+1} &= \int Dz \langle (\tY^{2t+1})^2 \rangle_{\txi},

823: %\\

824: \end{align}

825: \begin{align}

826:  U_{2t} &= \frac{1}{\sqrt{\alpha r_{2t-1}}}

827:   \int Dz z \langle Y^{2t} \rangle_{\xi}, \\

828:  \tU_{2t+1} &= \frac{1}{\sqrt{\alpha \tr_{2t}}}

829:   \int Dz z \langle \tY^{2t+1} \rangle_{\txi}

830:  \label{eq:dyn_end}

831: \end{align}

832: % r_{2t+1} &= \tc (\tq_{2t+1} + c\tc \tU_{2t+1}^2 q_{2t}) \\

833: % \tr_{2t} &= c (q_{2t} + c\tc U_{2t}^2 \tq_{2t-1}).

834: %  \label{eq:1step}

835: %\end{align}

836: In these order parameter equations,

837: $Y^{2t}$ and $\tY^{2t+1}$ correspond to the $x_i^{2t}$

838: and $\tx_j^{2t+1}$, respectively.

839: The overlaps for the first pattern pair, $m^{2t}_1$ and $\tm^{2t+1}_1$,

840: which mean retrieval degree, correspond to the $m^{2t}$ and $\tm^{2t+1}$,

841: respectively.

842:

843:

844:

845:

846: %We first apply the one-step analysis method \cite{Amari88a}, which

847: %corresponds to the naive S/N analysis.

848: %Then, we extend the one-step analysis to the statistical neurodynamics.

849:

850: %\subsection{Analysis with one-step theory}

851: %Amari and Maginu proposed an analysis method called ``one-step theory'' \cite{Amari88a},

852: %which is based on the naive S/N analysis, and they applied it to AAM.

853: %In the one-step theory, the noise components,

854: %which corresponds to eqs.(\ref{eq:noise1}) and (\ref{eq:noise2})in each step,

855: %are assumed to be independent Gaussian noise.

856: %We applied the one-step analysis to BAM and obtained

857: %

858:

859: %The important point here is that

860: %the evolution of macroscopic parameters

861: %can be described as the recurrence formulae

862: %using one-step previous state.

863: %%these recurrence formulae can be described with the macroscopic

864: %%parameter of the one-step before state.

865: %Formally, these equations are identical to the analysis of

866: %the sequence association model, which is a variety of AAM \cite{Amari88b}.

867: %In the sequence association model,

868: %since cross correlations between sequential patterns are

869: %embedded in the connections,

870: %the stored patterns appear sequentially in each update

871: %in the successful retrieving phase.

872: %

873:

874: %We derived the critical capacity of the limit of these dynamics

875: %(\ref{eq:1step}) and obtained $\alpha_c = 0.27$, which was also

876: %suggested by Amari \cite{Amari88b}.

877: %However, the critical capacity derived from the one-step theory seems to

878: %be overestimated.

879: %

880: %This overestimation comes from the assumption that

881: %the noise components in each time step are independent Gaussian noise.

882: %%In other words, we should prcisely evaluate the parameters $r_{2t+1}$,

883: %%and $\tr_{2t}$ have no correlation to the previous state in each update.

884: %

885: %To evaluate the noise correlation in time series more accurately,

886: %we must introduce the statistical neurodynamics.

887: %% Therefore, we need to evaluate these noise correlations exactly

888: %% in accordance with the concept of SCSNA.

889: %

890: %

891: %

892: %\subsection{Analysis by statistical neurodynamics}

893: %%In the previous subsection,

894: %We pointed out above that

895: %the one-step analysis could not quantitatively explain the transient process of BAM;

896: %therefore, we introduced the statistical neurodynamics\cite{Okada95}.

897: %Just like one-step analysis is based on the naive S/N analysis,

898: %the statistical neurodynamics analysis is based on the SCSNA.

899: %

900: %This analysis evaluates the noise correlation more accurately

901: %than does the one-step analysis.

902: %

903: %Using the statistical neurodynamics, Okada described

904: %the transient process of an AAM as

905: %recurrence formulae of macroscopic parameters \cite{Okada95}.

906:

907: %To apply the statistical neurodynamics to BAM,

908: %only two parameters, $r_{2t+1}$ and $\tr_{2t}$, need to be evaluated,

909: %and

910: %the other parameters are the same as those of the one-step analysis.

911: %%There was no need to re-evaluate the other parameters

912: %%$Y_i^{2t}, \tY_j^{2t+1}, m^1_{2t}, \tm^1_{2t+1},

913: %%q_{2t}, \tq_{2t+1}, U_{2t},\tU_{2t+1}$.

914:

915:

916:

917: Yanai {\it et al.} \cite{Yanai91} applied the one-step analysis,

918: which corresponds to Amari \& Maginu theory.

919: In their analysis,

920: the macroscopic order parameter equations for

921: $Y_i^{2t}$, $\tY_j^{2t+1}$, $m^{2t}_1$, $\tm^{2t+1}_1$,

922: $q_{2t}$, $\tq_{2t+1}$, $U_{2t}$,$\tU_{2t+1}$,

923: which are described in eqs. from (\ref{eq:dyn_start})

924:  to (\ref{eq:dyn_end}),

925: are identical to those of our analysis.

926: The differences are in evaluating of noise variances,

927: that is, $r_{2t+1}$ and $\tr_{2t}$.

928: They ignored the noise correlation, and derived these values as

929: \begin{align}

930:  r_{2t+1} &= \tc \tq_{2t+1} + c (\tc\tU_{2t+1})^2 q_{2t}

931:  \label{eq:yanai_r1}\\

932:  \tr_{2t} &= c q_{2t} + \tc (cU_{2t}^2)^2 \tq_{2t-1}.

933:  \label{eq:yanai_r2}

934: \end{align}

935: In their result, the critical capacity $\alpha_c$

936: is equal to $0.27$, which is not equal to

937: our SCSNA analysis and the replica analysis

938: ($\alpha_c = 0.1998$ for both analyses).

939: This overestimation comes from the lack of noise correlation evaluation.

940:

941:

942: In our analysis,

943: we consider the effect of crosstalk noise correlation across  $n$-step

944: previous state, and obtained

945: the eqs.(\ref{eq:r_expand1}) and (\ref{eq:r_expand2})

946: which includes Yanais' analysis (eqs.(\ref{eq:yanai_r1}) and

947: (\ref{eq:yanai_r2})).

948: In the next section, we show that

949: the analysis accuracy improves as $n$ in increased ($n = 2, 3, \cdots$).

950: Hereafter, we call the statistical neurodynamics considering across the

951: $n$-step previous state as the  ``n-step'' analysis in the following.

952: ``Full-step'' analysis means using all the macroscopic parameters from

953: the initial state ($t = 0$) to the current state.

954:

955:

956: %

957: %In eqs.(\ref{eq:noise_expand1}) and (\ref{eq:noise_expand2}),

958: %the first two terms are also in the one-step analysis

959: %eqs. (\ref{eq:1step}).

960: %When we truncate the expansion with $n=1$,

961: %these parameters $r_{2t+1}$ and $\tr_{2t}$

962: %%are identical to those in the one-step analysis,

963: %meaning that the statistical neurodynamics includes the one-step analysis,

964: %and the residual terms describes  higher order correlations.

965: %

966: %Since these cross-correlations can be denoted as the recurrence formulae

967: %using across $n$ time steps before states,

968: %the computational cost for analysis is not so much high.

969:

970:

971: %The one-step analysis is identical to the statistical neurodynamics analysis of $n=1$,

972: %and

973:

974:

975: \section{Result}

976: \label{sec:compare}

977: In this section,

978: we compare the  results of the statistical neurodynamics

979: with those of computer simulation.

980: %

981: %First, we compare the time evolution of the macroscopic parameter $m$,

982: %which means the retrieving degree.

983: %compared the simulation results with

984: %the time evolution of a macroscopic parameter using the statistical neurodynamics.

985: Fig. \ref{fig:overlap} shows the time evolution  of the overlap $m^{2t}_{1}$,

986: which means how well the pattern $\Vec{\xi}^{1}$ is retrieved in the

987: first layer at $2t$.

988: Each abscissa axis represents the time step $t$ and

989: each ordinate axis  describes overlap $m_1^{2t}$.

990: Convergence of the overlap $m_1^{2t}$ to $1.0$ means success in retrieving

991: $\Vec{\xi}^{1}$.

992: In the graphs,

993: we show several evolution curves in which the initial overlap

994: ($m_1^{0}$) starts with a different state.

995:

996:

997:

998: Fig. \ref{fig:overlap}(a)  shows the simulation results.

999: We set the number of neurons $N$ to $10,000$, $c = \tc = 1$,

1000: and  the number of stored pattern pairs was indexed as $\alpha=0.15$.

1001: %

1002: %In the fig. \ref{fig:overlap}(a),

1003: The retrieval was successful when we set the initial overlap larger than $0.4$,

1004: and it failed when we set it to $0.3$ or less.

1005: %

1006: Fig. \ref{fig:overlap}(b) shows the results of the one-step analysis,

1007: and figs.\ref{fig:overlap}(c)  to (e) shows the result of

1008: the $2$-step, $3$-step, and full step analysis, respectively.

1009: In each analysis result, the overlap converged to $0$ when retrieving failed

1010: because of assuming infinite neuron units ($N\rightarrow\infty$) exist.

1011: In the simulation results, fig.\ref{fig:overlap}(a),

1012: the system settled into a spurious memory state when retrieving failed

1013: because the number of units was finite ($N=10,000$).

1014: Therefore, the curves starting at $0.1$ to $0.3$ can be regarded as

1015: retrieval failures.

1016:

1017:

1018: As shown in fig. \ref{fig:overlap}(b),

1019: the one-step analysis says that retrieving is successful when the initial

1020: overlap is $0.3$, which does not agree with the simulation results.

1021: %

1022: Fig. \ref{fig:overlap}(c) shows the results for $n=2$, {\it i.e.} the 2-step analysis.

1023: The 2-step analysis says that the retrieval is a failure when the

1024: initial overlap is $0.3$,

1025: which agrees with the simulation results.

1026: %

1027: Fig. \ref{fig:overlap}(d) and (e) show the 3-step  results

1028: and the full-step analysis results, respectively.

1029: Each figure shows similar characteristics,

1030: and the results agree with those of the simulation results shown as fig. \ref{fig:overlap}(a).

1031: Since the 3-step analysis results are very similar to the full-step

1032: analysis results

1033: %As far as we saw these figures,

1034: the 3-step analysis is enough for approximating the full-step analysis.

1035: %

1036: In other words, the previous 3-step correlations are effective for BAM

1037: retrieval.

1038: %

1039:

1040:

1041:

1042:

1043: In the statistical neurodynamics analysis,

1044: the equilibrium state is described as the limit of the transient process,

1045: and the order parameters should be consistent to the result of the SCSNA.

1046: %

1047: Fig. \ref{fig:basin} shows the memory capacity and the basin of attraction

1048: %

1049: %, which means the retrieval limit of degrading limit of in the initial state measured by the overlap $m_1^0$.

1050: , which means the degrading limit of the retrievable pattern in the

1051: initial state measured by the overlap $m_1^0$.

1052: %

1053: %

1054: %An advantage of the statistical neurodynamics analysis is to guess

1055: %how much the initial pattern $\Vec{x}^0$ can be degraded.

1056: %

1057: %The retrieval limit of degrading which is described by

1058: %the initial overlap $m_1^{0}$ is called basin.

1059: %

1060: For example, in fig. \ref{fig:overlap}(d),

1061: the retrieval is successful when starting at $m_1^{0} = 0.4$,

1062: while it is a failure when starting at $m_1^{0} = 0.3$.

1063: There is thus a basin when $m_1^{0}$ is between $0.3$ and $0.4$ for

1064: $\alpha = 0.15$.

1065: %So that the basin exists between $m_1^{0} = 0.3$ to $0.4$

1066: %under the condition $\alpha = 0.15$.

1067: %

1068: %

1069: %In fig.\ref{fig:basin},

1070: %The abscissa axis is the capacity index $\alpha$

1071: %and the ordinate axis means the overlap $m_1$.

1072: %The solid line shows the SCSNA result.

1073: %The dashed lines are derived from the statistical neurodynamics.

1074: %

1075: %Fig.\ref{fig:basin} shows the capacity and the basin of attraction.

1076: The dashed curves in fig.\ref{fig:basin} are derived from the

1077: statistical neurodynamicses.

1078: %In these curves derived from the statistical neurodynamics,

1079: In these curves,

1080: the upper part shows equilibrium overlap $m_{1}^{\infty}$ in successful

1081: retrieval and the lower part shows the basin of attraction $m_{1}^{0}$.

1082: When we set the initial overlap $m_{1}^{0}$ to be in the area surrounded

1083: by these curves, the retrieval will be success.

1084: Therefore, the area surrounded by these curves represents

1085: the successful retrieval area.

1086: %

1087: It is clear that the one-step analysis overestimates

1088: both the relative capacity and the basin of attraction\cite{Yanai91}.

1089: The theoretical estimation accuracy improves and comes close to that of the

1090: SCSNA analysis asymptotically as the analysis accuracy is improved

1091: (2-step, 3-step, $\cdots$.)

1092: We also show the basin derived from the simulation results using

1093: error-bars in fig.\ref{fig:basin}.

1094: The results of simulation agree with those of the statistical

1095: neurodynamics quantitatively.

1096: %In the simulation, we experimented as the neuron units $N=10,000$.

1097: %The two-step and above analyses agree with these simulation results.

1098:

1099:

1100: \section{Conclusion}

1101: \label{sec:conclusion}

1102: We derived the macroscopic parameters of BAM in the equilibrium state by

1103: using the SCSNA and obtained the critical capacity $\alpha_c$ as $0.1998$.

1104: %

1105: The results agreed with the previous results and the simulation results.

1106: %\cite{Tanaka00}.

1107: %Moreover, we confirmed that the equilibrium analysis is  also agree with

1108: %the computer simulation.

1109:

1110: We also analyzed the transient process of BAM using the statistical

1111: neurodynamics and

1112: obtained the evolution equations for the macroscopic parameters.

1113: Comparison of  the numerical solutions with the simulation results,

1114: we showed that the analysis results can explain the simulation results

1115: with sufficient accuracy for the transient process.

1116: %in

1117: %enough accuracy in the transient process.

1118: %As a result,

1119: Therefore,

1120: to explain the transient process of BAM quantitatively,

1121: it is sufficient to consider the 3-step statistical neurodynamics,

1122: which means that the crosstalk noise has effective correlation across

1123: the 3-step previous state.

1124:

1125:

1126: \small

1127: \bibliographystyle{unsrt}

1128: \bibliography{shouno}

1129:

1130: \appendix

1131: \section{Detail SCSNA Description}

1132: In Sec.\ref{sec:SCSNA}, we introduced the overlaps,

1133: \begin{align}

1134:  m_{\mu} &= \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i  x_i \\

1135:  \tm_{\mu} &= \frac{1}{\tc N} \sum_{j=1}^{\tc N} \txi^{\mu}_j  \tx_j

1136: \end{align}

1137: for each layer state.

1138: In the equilibrium state, we assumed that the first pattern pair,

1139: ($\Vec{\xi}^1$, $\Vec{\txi}^1$), is retrieved,

1140: so that overlaps for other pattern pairs are small, {\it i.e.}

1141: $m_{\mu}, \tm_{\mu} \sim O(\frac{1}{\sqrt{N}})$ where $\mu \geq 2$.

1142: %

1143: Thus we denotes the $m_{\mu}$

1144: \begin{align}

1145:  m_{\mu} &= \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i

1146:  F( \tc \sum_{\mu=1}^{\alpha N} \xi_i^{\mu} \tm_{\mu}) \\

1147:  &\sim

1148:  \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i

1149:  F( \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu} \tm_{\nu})

1150:  +

1151:  \frac{1}{cN} \sum_{i=1}^{cN} \tc\tm_{\mu}

1152:  F'(  \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu}\tm_{\nu} )

1153:  \\

1154:  &=

1155:  M_{\mu} + \tc \tm_{\mu} U,

1156:  \label{eq:M1}

1157: \end{align}

1158: where

1159: \begin{align}

1160:  U &= \frac{1}{cN} \sum_{i=1}^{cN} F'(\tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu}\tm_{\nu})\\

1161:  M_{\mu} &=

1162:  \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i

1163:  F( \tc \sum_{\nu \neq \mu}^{\alpha N} \xi_i^{\nu} \tm_{\nu})

1164:  =

1165:   \frac{1}{cN} \sum_{i=1}^{cN} \xi^{\mu}_i x_i^{(\mu)}.

1166: \end{align}

1167: We denote the $x_i^{(\mu)}$ as the value drawn

1168: the effect of $\mu$th pattern pair from $x_i$, {\it i.e.} $x_i - x_i^{(\mu)} \sim O(\frac{1}{\sqrt{N}})$.

1169: %

1170: $\tm_{\mu}$ is also denoted

1171: \begin{align}

1172:  \tm_{\mu} \sim \tM_{\mu} + c m_{\mu} \tU,

1173:  \label{eq:M2}

1174: \end{align}

1175: where

1176: \begin{align}

1177:  \tU &= \frac{1}{\tc N} \sum_{i=1}^{\tc N} F'(c \sum_{\nu \neq \mu}^{\alpha N} \txi_j^{\nu} m_{\nu})\\

1178:  \tM_{\mu} &=

1179: % \frac{1}{\tc N} \sum_{j=1}^{\tc N} \xi^{\mu}_j

1180: % F( c \sum_{\nu \neq \mu}^{\alpha N} \txi_i^{\nu} m^{\nu})

1181: % =

1182:   \frac{1}{\tc N} \sum_{j=1}^{cN} \txi^{\mu}_j \tx_j^{(\mu)}.

1183: \end{align}

1184: %

1185: Solving eqs.(\ref{eq:M1}) and (\ref{eq:M2}) for $m^{\mu}$ and $\tm^{\mu}$,

1186: we obtain

1187: %

1188: \begin{align}

1189:  m_{\mu} &= \frac{1}{1-c\tc U\tU} (M_{\mu} + \tc U \tM_{\mu}),

1190:  \label{eq:app_ovlp1}\\

1191:  \tm_{\mu} &= \frac{1}{1-c\tc U\tU} (\tM_{\mu} + c \tU M_{\mu}).

1192:  \label{eq:app_ovlp2}

1193: \end{align}

1194: %

1195: Since the noise terms in eqs.(\ref{eq:eq_update1}) and (\ref{eq:eq_update2})

1196: can be described as

1197: $

1198:  z_i = \tc \displaystyle\sum_{\nu \geq 2}^{\alpha N} \xi_i^{\nu} \tm_{\nu}$,

1199:  and

1200: $ \tz_j = c \displaystyle \sum_{\nu \geq 2}^{\alpha N} \txi_i^{\nu} m_{\nu}$,

1201: we substituted eqs.(\ref{eq:app_ovlp1}) and (\ref{eq:app_ovlp2}) to these noises

1202: and obtained

1203: \begin{align}

1204:  z_i &= \frac{\alpha \tc \tU}{1 - c\tc U\tU} x_i^{(\mu)} + Z_i, \\

1205:  \tz_j &= \frac{\alpha c U}{1 - c\tc U\tU} \tx_j^{(\mu)} + \tZ_j,

1206: \end{align}

1207: where

1208: \begin{align}

1209:  Z_i &= \frac{1}{N(1- c\tc U\tU)}

1210:  \sum_{\nu \geq 2}^{\alpha N}

1211:  \left(

1212:  \tc \tU \sum_{k\neq i} \xi_i^{\nu} \xi_k^{\nu} x_k^{(\mu)} +

1213:  \sum_{j=1}^{\tc N} \xi_i^{\nu} \txi_j^{\nu} \tx_j^{(\mu)}

1214:  \right)

1215:  \\

1216: %

1217:  \tZ_j &= \frac{1}{N(1- c\tc U\tU)}

1218:  \sum_{\nu \geq 2}^{\alpha N}

1219:  \left(

1220:  c U \sum_{l\neq j} \txi_j^{\nu} \txi_l^{\nu} \tx_l^{(\mu)} +

1221:  \sum_{i=1}^{c N} \txi_j^{\nu} xi_i^{\nu} x_i^{(\mu)}

1222:  \right)

1223: \end{align}

1224:

1225: We assumed $Z_i$ and $\tZ_j$ as independent identical Gaussian noise, described as

1226: $Z_i \sim N(0, \alpha r)$ and $\tZ_j \sim N(0, \alpha \tr)$, and evaluated expectations

1227: $E[(Z_i)^2]$ and $E[(\tZ_j)^2]$.

1228: We then obtained

1229: \begin{align}

1230:  E[(Z_i)^2] &= \alpha r = \frac{\alpha \tc}{(1- c \tc U\tU)^2} (c \tc \tU^2 q + \tq), \\

1231:  E[(\tZ_j)^2] &= \alpha \tr = \frac{\alpha c}{(1- c \tc U\tU)^2} (c \tc  U^2 \tq + q),

1232: \end{align}

1233: where

1234: \begin{align}

1235:  q &= \frac{1}{cN}\sum_{i=1}^{cN} (x_i^{(\mu)})^2 \\

1236:  \tq &= \frac{1}{\tc N}\sum_{j=1}^{\tc N} (\tx_j^{(\mu)})^2

1237: \end{align}

1238: From the self-averaging property,

1239: we obtained the SCSNA order parameter equation in sec.\ref{sec:SCSNA}.

1240: % using the solution of these equations:

1241: %\begin{align}

1242: % Y &= F(\tc \xi \tm + \frac{\alpha \tc \tU}{1- c\tc U\tU} Y + \sqrt{\alpha r} z), \\

1243: % \tY &= F(c \txi m + \frac{\alpha c U}{1- c\tc U\tU} \tY + \sqrt{\alpha \tr} z).

1244: %\end{align}

1245:

1246:

1247: \section{Correlation with n-step previous state }

1248: To evaluate the effect of the previous crosstalk noises,

1249: %correlation of the $n$-step previous noise  ,

1250: we must derive the correlation of a unit between the current state and

1251: the $n$-step before state.

1252: These are described by $\tq_{2t+1,2(t-n)+1}$ and $q_{2t,2(t-n)}$, respectively:

1253: \small

1254: \begin{align}

1255:  &

1256:  \tq_{2t+1,2(t-n)+1} =

1257:  \notag\\

1258:  &

1259:  \frac{

1260:  \int D\Vec{z}\exp( -\Vec{z}^{\mathrm T} \tSigma^{-1} \Vec{z})

1261:  \langle \tY_{2t+1}(z_1) \tY_{2(t-n)+1}(z_2) \rangle }

1262:  {2\pi\left| \tSigma \right| }

1263: %

1264: \notag\\

1265: %

1266:  &

1267:  q_{2t,2(t-n)} =

1268:  \notag\\

1269:  &

1270:   \frac{

1271:  \int D\Vec{\tz}\exp( -\Vec{\tz}^{\mathrm T} \Sigma^{-1} \Vec{\tz})

1272:  \langle Y_{2t}(\tz_1) Y_{2(t-n)}(\tz_2) \rangle

1273:  }

1274:  {2\pi\left| \Sigma \right| }

1275: \end{align}

1276: \normalsize

1277: The matrices  $\Sigma$, $\tSigma$ and vectors $\Vec{z},

1278: \Vec{\tz}$ are described as follows:

1279: \small

1280: \begin{align}

1281: %

1282:  \tSigma &=

1283:  \begin{pmatrix}

1284:          \tr_{2t} & \tr_{2t,2(t-n)} \\

1285:   \tr_{2t,2(t-n)} & \tr_{2(t-n)}    \\

1286:  \end{pmatrix}

1287: %

1288:  \notag\\

1289: %

1290:  \Vec{z} &=

1291:  \begin{pmatrix}

1292:   z_1 \\

1293:   z_2

1294:  \end{pmatrix}

1295:  \notag\\

1296: %\end{align}

1297: %

1298: %

1299: %\begin{align}

1300:  \Sigma &=

1301:  \begin{pmatrix}

1302:   r_{2t-1}      & r_{2t-1,2(t-n)-1} \\

1303:   r_{2t,2(t-n)} & r_{2(t-n)-1} \\

1304:  \end{pmatrix}

1305:  \notag\\

1306: %

1307:  \Vec{\tz} &=

1308:  \begin{pmatrix}

1309:   \tz_1 \\

1310:   \tz_2

1311:  \end{pmatrix}

1312:  \end{align}

1313: %\normalsize

1314: %

1315: The diagonal components of each matrix correspond to the

1316: variances of the current state noises.

1317: The non-diagonal components express the noise correlation between

1318: the current state and the $\eta$-step previous state.

1319: %These can be derived as:

1320: For $\eta \geq 2$, the correlations are described as:

1321: \small

1322: \begin{align}

1323:  r_{2t+1,2(t-\eta)+1} = & \tc \tq_{2t+1,2(t-\eta+1)+1}

1324:  \notag\\

1325:  &

1326:  +  c\tc \tU_{2t+1} U_{2t} r_{2t-1,2(t-\eta+1)+1},

1327:  \notag\\

1328:  \tr_{2t,2(t-\eta)} = & cq_{2t,2(t-\eta)}

1329:  \notag\\

1330:  &

1331:  + c\tc U_{2t} \tU_{2t-1} \tr_{2(t-1),2(t-\eta)},

1332: \end{align}

1333: \normalsize

1334: and for $\eta \geq 3$, they are

1335: \small

1336: \begin{align}

1337:  &

1338:  r_{2t+1,2(t-\eta)+1} = \tc \tq_{2t+1,2(t-\lambda)+1}

1339:  \notag\\

1340:  &\quad

1341:  + c\tc^2 \tU_{2t+1} \tU_{2(t-\lambda)+1}

1342:  q_{2t,2(t-\lambda)}

1343:  \notag \\

1344:  &\quad

1345:  + \tc \sum_{\lambda=1}^{n}

1346:  (c\tc)^\lambda \tq_{2(t-\lambda)+1,2(t-\eta)+1}

1347:  \!\!\!\! \prod_{\tau=t-\lambda+1}^{t}

1348:  \!\!\!\! \tU_{2\tau+1} U_{2\tau}

1349:  \notag \\

1350:  &\quad

1351:  + \tc

1352:  \sum_{\lambda=1}^{n-\eta}

1353:  (c\tc)^\lambda \tq_{2t+1,2(t-\eta-\lambda)+1} \!\!\!\!\!\!

1354:  \prod_{\tau=t-\lambda+1}^{t} \!\!\!\!\!\!

1355:  \tU_{2(\tau-\eta)+1} U_{2(\tau-\eta)}

1356:  %

1357:  \notag \\

1358:  &\quad

1359:  + c\tc^2 \tU_{2t+1} \tU_{2(t-\eta)+1}

1360:  \notag \\

1361:  &\qquad\qquad

1362:  \sum_{\lambda=1}^{n-1}(c\tc)^{\lambda}

1363:  q_{2(t-\lambda),2(t-\eta)}

1364:  \prod_{\tau=t-\lambda+1}^{t} U_{2\tau} \tU_{2\tau-1}

1365:  %

1366:  \notag \\

1367:  &\quad

1368:  %

1369:  + c\tc^2 \tU_{2t+1} \tU_{2(t-\eta)+1}

1370:  \sum_{\lambda=1}^{n-1-\eta}(c\tc)^{\lambda}

1371:  q_{2t,2(t-\eta-\lambda)}

1372:  \notag \\

1373:  &\quad\qquad

1374:  \prod_{\tau=t-\lambda+1}^{t} U_{2(\tau-\eta)} \tU_{2(\tau-\eta)-1}

1375:  %

1376:  \notag \\

1377:  &\quad

1378:  %

1379:  + (c\tc)^2 \tU_{2t+1} U_{2t} \tU_{2(t-\eta)+1} U_{2(t-\eta)}

1380:  r_{2t-1,2(t-\lambda)-1}

1381:  \notag

1382: \end{align}

1383: %

1384: \begin{align}

1385: %

1386:  &

1387:  \tr_{2t,2(t-\eta)} =   cq_{2t,2(t-\eta)}

1388:  \notag\\

1389:  &\quad

1390:  + \tc c^2 U_{2t} U_{2(t-\eta)}

1391:  \tq_{2t-1,2(t-\eta)-1}

1392:  \notag \\

1393:  &\quad

1394: %

1395:  +

1396:  c \sum_{\lambda=1}^{n} (c\tc)^\lambda

1397:  q_{2(t-\lambda),2(t-\eta)}

1398:  \!\!\!\! \prod_{\tau=t-\lambda+1}^{t}

1399:  \!\!\!\! U_{2\tau} \tU_{2\tau-1}

1400: %

1401:  \notag \\

1402:  &\quad

1403: %

1404:  +

1405:  c \sum_{\lambda=1}^{n-\eta} (c\tc)^\lambda

1406:  q_{2t,2(t-\eta-\lambda)} \!\!\!\!

1407:  \prod_{\tau=t-\lambda+1}^{t} \!\!\!\!

1408:  U_{2(\tau-\eta)} \tU_{2(\tau-\eta)-1}

1409: %

1410:  \notag \\

1411:  &\quad

1412: %

1413:  + {\tc}c^2 U_{2t}U_{2(t-\eta)}

1414:  \notag\\

1415:  & \qquad\qquad

1416:  \sum_{\lambda=1}^{n-1}(c\tc)^{\lambda}

1417:  \tq_{2(t-\lambda)-1,2(t-\eta)-1}

1418:  \!\!\!\!\!\!

1419:  \prod_{\tau=t-\lambda+1}^{t}

1420:  \!\!\!\!\!\!

1421:  \tU_{2\tau-1} U_{2\tau-2}

1422:  \notag \\

1423:  &\quad

1424: %

1425:  +

1426:  {\tc}c^2 U_{2t}U_{2(t-\eta)}

1427:  \sum_{\lambda=1}^{n-1-\eta}(c\tc)^{\lambda}

1428:  \tq_{2t-1,2(t-\eta-\lambda)-1}

1429:  \notag \\

1430:  &\quad\qquad

1431:  \prod_{\tau=t-\lambda+1}^{t} \tU_{2(\tau-\eta)-1} U_{2(\tau-\eta)-2}

1432: %

1433:  \notag \\

1434:  &\quad

1435: %

1436:  +

1437:  (c\tc)^2 U_{2t} \tU_{2t-1} U_{2(t-\eta)} \tU_{2(t-\eta)-1}

1438:  \tr_{2(t-1),2(t-\lambda-1)}

1439:  \notag\\

1440: \end{align}

1441: \normalsize

1442:

1443: \newpage

1444: \begin{figure}

1445:  \begin{center}

1446:   \resizebox{7.8cm}{!}{\includegraphics{bam.eps}}

1447:   \caption{Network structure of BAM}

1448:   \label{fig:bam}

1449:  \end{center}

1450: \end{figure}

1451:

1452: \begin{figure}[t]

1453:  \begin{center}

1454:   \resizebox{0.8\textwidth}{!}{\includegraphics{fig1.eps}}

1455:  \end{center}

1456:  \caption{Comparing SCSNA results with those of the

1457:  simulation. The horizontal axis means the loading rate $\alpha$, and the

1458:  vertical axis means the overlap. The results of computer simulation are

1459:  shown as error-bars, which indicates median with minimum and maximum values.}

1460:  \label{fig:fig1}

1461: \end{figure}

1462:

1463:

1464: \begin{figure*}[t]

1465:  \begin{center}

1466:   \begin{tabular}{ccc}

1467:   \resizebox{0.32\textwidth}{!}{\includegraphics{simseq.ps}}

1468:    &

1469:   \resizebox{0.32\textwidth}{!}{\includegraphics{seq1.ps}}

1470:    &

1471:   \resizebox{0.32\textwidth}{!}{\includegraphics{seq2.ps}}

1472:    \\

1473:    (a) Simulation

1474:    &

1475:    (b) 1-step analysis

1476:    &

1477:    (c) 2-step analysis

1478:    \\

1479:    &

1480:    &

1481:    \\

1482:    &

1483:    \resizebox{0.32\textwidth}{!}{\includegraphics{seq3.ps}}

1484:    &

1485:    \resizebox{0.32\textwidth}{!}{\includegraphics{fullseq.ps}}

1486:    \\

1487:    &

1488:    (d) 3-step analysis

1489:    &

1490:    (e) full-step analysis

1491:   \end{tabular}

1492:  \end{center}

1493:  \caption{Retrival process of a computer simulation and the statistical

1494:  neurodynamics.

1495:  The horizontal axis means time index $t$, and the vertical axis means

1496:  the overlap $m$.

1497:  (a) shows a result of computer simulation. From (b) to (e) shows the

1498:  results of statistical neurodynamics.

1499:  }

1500:  \label{fig:overlap}

1501: \end{figure*}

1502: %

1503:

1504: \begin{figure}

1505:  \begin{center}

1506:   \resizebox{0.8\textwidth}{!}{\includegraphics{dynamics.ps}}

1507:  \end{center}

1508:  \caption{Capacity comparing the statistical neurodynamics with

1509:  SCSNA. The horizontal axis means the loading rate $\alpha$, and the

1510:  vertical axis means the overlap $m$.

1511:  The dashed curves shows the results of the statistical neurodynamics.

1512:  The results of computer simulations are shown with error-bar which

1513:  indicates mean with standard deviations.

1514:  }

1515:  \label{fig:basin}

1516: \end{figure}

1517:

1518:

1519:

1520: \end{document}

1521: