1: \documentstyle[aps,prl,twocolumn,graphicx,amsmath]{revtex}
2:
3: \begin{document}
4: \newcommand{\J}{{\mathbf J}}
5: \newcommand{\du}{{\mathbf w}}
6: \newcommand{\T}{{\mathbf T}}
7: \newcommand{\B}{{\mathbf B}}
8: \newcommand{\s}{{\mathbf S}}
9: \newcommand{\ksi}{{\boldsymbol \xi}^{\mu}}
10: \newcommand{\bsxi}{\boldsymbol \xi}
11: \newcommand{\qh}{\hat{q}}
12: \newcommand{\Qh}{\hat{Q}}
13: \newcommand{\sgn}{\text{sgn}}
14: \newcommand{\eps}{\varepsilon}
15: \newcommand{\al}{\alpha}
16: \newcommand{\lan}{\langle\langle}
17: \newcommand{\ran}{\rangle\rangle}
18: \newcommand{\btau}{{\boldsymbol \tau}}
19:
20: \def\lim{\mathop{\rm lim}}
21: \def\extr{\mathop{\rm extr}}
22: \def\Tr{\mathop{\rm Tr}}
23:
24: \twocolumn[\hsize\textwidth\columnwidth\hsize\csname@twocolumnfalse\endcsname
25: \title
26: {Multilayer neural networks with extensively many hidden units}
27: \author
28: {Michal Rosen--Zvi$^1$, Andreas Engel$^2$, and Ido Kanter$^1$ }
29: \noindent
30: \address{$^1$ Minerva Center and Department of Physics, Bar-Ilan University,
31: Ramat-Gan, 52900 Israel\\
32: $^2$ Institut f\"ur Theoretische Physik, Otto-von-Guericke Universit\"at, \\
33: PSF 4120, 39016 Magdeburg, Germany}
34:
35: \maketitle
36:
37: \begin{abstract}
38: The information processing abilities of a multilayer neural network with a
39: number of hidden units scaling as the input dimension are studied using
40: statistical mechanics methods. The mapping from the input
41: layer to the hidden units is performed by general symmetric Boolean functions
42: whereas the hidden layer is connected to the output by either discrete or
43: continuous couplings. Introducing an overlap in the space of Boolean
44: functions as order parameter the storage capacity if found to scale with the
45: logarithm of the number of implementable Boolean functions. The generalization
46: behaviour is smooth for continuous couplings and shows a discontinuous
47: transition to perfect generalization for discrete ones.
48: \end{abstract}
49: \pacs{} ]
50:
51:
52: %introduction
53:
54: Statistical mechanics investigations of artificial neural networks continue to
55: play a stimulating and integrating role in the scientific dialogue between
56: discipline as diverse as neurophysiology, mathematical statistics, computer
57: science and information theory. In particular the study of feed-forward neural
58: networks pioneered by Gardner \cite{Gardner} has revealed a
59: variety of interesting results on how these system may learn different tasks
60: of information processing from examples (for a review see \cite{EnvB}). Of
61: particular importance in this respect are multilayer networks (MLN) because of
62: their ability to implement any function between input and output \cite{Cyb}
63: which makes them attractive candidates for many practical
64: applications. It is well known that very many hidden
65: units are needed in order to realize this vast computational
66: complexity. However, statistical mechanics studies of MLN have so far been
67: mostly restricted to systems with very few hidden units as compared to the
68: number of inputs \cite{smMLN}. In the present letter we overcome this
69: limitation and study the storage and generalization abilities of a
70: tree MLN in which the size of the hidden layer scales in the same way as the
71: input dimension.
72:
73: % the model
74:
75: We consider a MLN with $N$ binary hidden units $\tau_i=\pm 1, i=1,...,N$
76: feeding a binary output $\sigma=\sgn(\sum_i J_i \tau_i)$ through a coupling
77: vector $\J=J_1,...,J_N$. The hidden units are determined via Boolean functions
78: $\tau_i=B_i(\s_i)$ by disjoint sets of inputs $\s_i=S_{i1},...,S_{iL}$
79: containing $L$ elements each. We are interested in the limit $N\to\infty$ with
80: $L$ remaining constant.
81:
82: In order to keep the connection with neural network architectures we restrict
83: ourselves to {\it symmetric} Boolean functions characterized by
84: $B_i(-\s_i)=-B_i(\s_i)$. There are $2^{2^{L-1}}$
85: such functions with $L$ inputs, with only few of them realizable by a coupling
86: vector $\du_i$ according to $B_i(\s_i)=\sgn(\sum_j w_{ij} S_{ij})$. For $L=3$
87: there are, e.g., 16 symmetric Boolean functions but only 14 of them are
88: linearly separable.
89:
90: % statistical mechanics analysis and general expressions
91:
92: In order to investigate the storage and generalization properties of the
93: network we consider a set of $\al L N$ inputs $\ksi_i,\mu=1,...,\al LN$ the
94: components $\xi_{i1}^{\mu},...,\xi_{iL}^{\mu}$ of which are independent,
95: identically distributed
96: random variables with zero mean and unit variance. We then ask for the ability
97: of the network to map these inputs on outputs $\sigma^{\mu}=1$ for all $\mu$ by
98: adapting the Boolean functions $B_i$ and the couplings $J_i$ appropriately.
99:
100: The central quantity in the statistical mechanics analysis is the
101: {\it quenched entropy}
102: \begin{equation}\label{defs}
103: s=\lim_{N\to\infty}\frac{1}{N} \lan \int d\mu(\J) \Tr_{\{B_i\}}
104: \prod_{\mu=1}^{\al L N}\theta(\sum_i J_i B_i(\ksi_i))
105: \ran_{\{\ksi_i\}}
106: \end{equation}
107: where $d\mu(\J)$ is the proper measure in the space of couplings $\J$, the
108: trace denotes the sum over all Boolean functions, the product
109: is non-zero only if the arguments of all of the $\theta$-functions is positive
110: and the double angle stands for the average over the inputs. The
111: determination of $s$ can be performed using the replica trick and introducing
112: the overlap between two solutions in the combined space of couplings $\J$ and
113: Boolean functions $B_i$ of the form
114: \begin{equation}\label{defq}
115: q^{ab}=\frac{1}{N}\sum_i J_i^a J_i^b \;\lan B_i^a({\bsxi})
116: B_i^b({\bsxi})\ran_{\bsxi}
117: \end{equation}
118: with the average being now over a single, $L$-component vector
119: $\bsxi$.
120: Exploiting the fact that this average involves a finite number of terms only
121: and assuming replica symmetry $q^{ab}=q$ for $a\neq b$ we can write $s$ using
122: standard techniques \cite{EnvB} in the form
123: \begin{equation}\label{ress}
124: s=\extr_{q, \qh}\left[ G_C(q, \qh)+G_S(\qh)+\al L G_E(q)\right],
125: \end{equation}
126: with the explicit expressions for the functions $G_C, G_S$ and $G_E$ depending
127: on $L$, the constraints on $\J$ and on whether the storage or the
128: generalization problem is addressed.
129:
130: % the storage problem
131:
132: Let us begin with the storage problem by asking for the storage capacity
133: $\al_c$ defined as the maximal value of $\al$ for which the system can still
134: realize all desired input-output mappings with probability 1. Performing the
135: replica limit with the number of replicas tending to zero characteristic for
136: this problem we find
137: \begin{equation}\label{GEstor}
138: G_E(q)=\int Dt \ln H(Q\;t)
139: \end{equation}
140: with the abbreviations $Dt=dt\, e^{-t^2/2}/\sqrt{2\pi}$,
141: $H(x)=\int_{x}^{\infty} Dt$, and $Q=\sqrt{q/(1-q)}$. The expressions for $G_C$
142: and $G_S$ depend on the constraints on the coupling vector $\J$.
143:
144: % the Ising case
145:
146: A particular simple case is given by Ising couplings $J_i=\pm 1$. From the
147: symmetry of the Boolean functions considered it is clear that it
148: is sufficient to consider $J_i=1$ for all $i$. Consequently in this case all
149: flexibility of the network rests in the
150: choice of the Boolean functions between input and hidden layer and
151: $q$ is a sole overlap in the space of these Booleans. We find
152: $G_C=\qh(1-q)/2$ where $\qh$ denotes the conjugate order parameter to
153: $q$. Moreover, in the case where all $2^{2^{L-1}}$ symmetric
154: Boolean functions are admissible we use the identity
155: \begin{equation}
156: \Tr_{\{B_i\}}\exp(\sqrt{\frac{\qh}{2^{L-1}}} \nonumber
157: \sum_{\mathbf\xi} z_{\mathbf\xi} B_i({\mathbf\xi}))
158: =\prod_{\mathbf\xi}(2\cosh(\sqrt{\frac{\qh}{2^{L-1}}}\; z_{\mathbf\xi}))
159: \end{equation}
160: with the sums and products over ${\bsxi}$ running over all $2^{L-1}$
161: configurations of $\bsxi$ with $\xi_1=1$ to find
162: \begin{equation}
163: G_S=2^{L-1}\int Dz \ln \left[ 2\cosh(\sqrt{\frac{\qh}{2^{L-1}}}\;z)\right].
164: \end{equation}
165: Under the transformations $\qh\mapsto 2^{L-1}\qh$ and $\al\mapsto
166: 2^{L-1}\al/L$ the resulting expression for the entropy maps {\it exactly} on
167: the result for the Ising perceptron corresponding to $L=1$ and we may
168: therefore
169: use the well known results for this case \cite{KrMe}. Accordingly the storage
170: capacity is overestimated by the replica symmetric expression and the correct
171: result
172: \begin{equation}\label{h1}
173: \al_c(L)= \al_c(1)\;2^{L-1}/L \cong 0.83\; 2^{L-1}/L
174: \end{equation}
175: is given by the value of $\al$ at which the entropy $s(\al)$ turns negative.
176: The storage capacity is hence proportional to the {\it logarithm} of the
177: number of implementable Boolean functions. This result is in
178: accordance also with the rigorous upper bound $\al_c\leq2^{L-1}/L$ resulting
179: from the annealed entropy $s^{\text{ann}}=(2^{L-1}-\al L)\ln 2$.
180: As in the case of
181: the Ising perceptron this bound is related to information theory.
182: The full specification of the network with all $J_i=1$ requires
183: $N\,2^{L-1}$ bits of information necessary to pin down the $N$ Boolean
184: functions $B_i$. Therefore the machine cannot store more than $N\,2^{L-1}$
185: bits and $\al_c$ cannot exceed $2^{L-1}/L$.
186:
187: Fig.\ref{Isingcap} compares the analytical result $\al_c(3)\cong 1.11$ for
188: $L=3$ with numerical simulations using exact enumerations. Even for the small
189: sizes accessible to this numerical technique we find a steepening of the
190: transition with increasing $N$ and a crossing point of the curves near to the
191: theoretical prediction.
192:
193: If the trace over the Boolean functions in (\ref{defs}) is restricted to those
194: which can be realized by perceptrons with coupling vectors $\du_i$ the exact
195: mapping on the Ising perceptron no longer holds. Solving the
196: corresponding extremum conditions numerically for $L=3$ we find $\alpha_c
197: \cong 1.06$ for this case. The reduction of $\al_c$ compared to the
198: unrestricted case is roughly as the reduction in the logarithm of the number
199: of admissible Boolean functions $1.06/1.11\cong\ln(14)/\ln(16)$.
200:
201: \begin{figure}[tb]\vspace*{-.75cm}
202: \hspace*{-1cm}\includegraphics[width=8cm,angle=270]{capacity16.ps}
203: \caption{\label{Isingcap} Fraction $f$ of $3 \al N$ random input-output
204: mappings implementable by
205: a MLN with $3N$ inputs and $N$ hidden units as function of $\al$ for $N=3$
206: (squares) and $N=5$ (circles). The couplings between hidden units and output
207: are fixed to $J_i=1$ for all $i$ and enumerations are performed over all
208: combinations of symmetric Boolean functions $B_i$ between input and hidden
209: layer. For every value of $\al$, 200 realizations of Gaussian inputs
210: where averaged over. The solid line gives the analytical result describing
211: the limit $N\to\infty$.}
212: \end{figure}
213:
214: % Finite synaptic depth
215:
216: It is possible to generalize the above analysis to the case of discrete
217: couplings with finite synaptic depth $l$ of the form
218: $J_i=\pm 1/l,\pm 2/l,..., \pm 1$ by building on the analysis of the analogous
219: case for the perceptron \cite{GuSt,Kanter}. In this case the additional order
220: parameter $\bar{q}=\sum_{i}(J_i^a)^2/N$, and its conjugate,
221: $\hat{\bar{q}}$ have to be introduced. For $G_E$ we then again find
222: (\ref{GEstor}) with now $Q=\sqrt{q/(\bar{q}-q)}$. Moreover
223: $G_C=-\hat{\bar{q}}\bar{q} + \qh q/2$ and, if all symmetrical
224: Boolean functions are admissible,
225: \begin{align}
226: G_S&=\int\prod_{\bsxi} Dz_{\bsxi} \nonumber\\
227: & \ln \Tr_J \exp(-(\frac{\qh}{2}-\hat{\bar{q}})J^2) \nonumber
228: \prod_{\mathbf\xi} 2\cosh(J\,\sqrt{\frac{\qh}{2^{L-1}}}\; z_{\mathbf\xi}),
229: \end{align}
230: with $\Tr_J$ denoting the trace over the $2l$ possible values of the couplings
231: $J_i$.
232: Using these results we have numerically calculated the storage capacity
233: $\al_c(l)$ for the simplest case $L=3$ as a function of the synaptic depth
234: $l$. The results are shown in fig.\ref{CDigital} together with a fit to the
235: asymptotic behavior. The capacity increases from $\alpha_c\cong 1.11$ of
236: the Ising case, $\l=1$, to roughly $1.7$ for large $l$. It is rather difficult
237: to compare these analytical findings with numerical simulations since the
238: effects of the finite synaptic depth do not show up at the small values of
239: $N$ accessible to exact enumerations \cite{PBGDK}.
240:
241: \begin{figure}[tb]\vspace*{-.75cm}
242: \hspace*{-1cm}\includegraphics[width=8cm,angle=270]{Cont.ps}
243: \caption{\label{CDigital} Storage capacity of a MLN with $N\to\infty$
244: hidden units and $3N$ inputs with couplings $J_i$ between hidden layer
245: and output taking $2l$ discrete values. The inputs are mapped to the
246: hidden layer by symmetric Boolean functions $B_i$. The solid line is the
247: fit $\al_c\sim 1.70-0.91/l$
248: to the asymptotic behavior, the dashed line gives the replica
249: symmetric result for continuous couplings $J_i$.}
250: \end{figure}
251:
252: % continuous couplings
253:
254: To complete the analysis of the storage properties
255: we analyze the case of continuous couplings $\J$ between hidden
256: and output layer. It is convenient to eliminate the additional order parameter
257: $k$ necessary in this case to enforce the normalization $\J^2=N$ by
258: introducing $\Qh=\qh/(k+\qh)$. Within replica symmetry the quenched entropy
259: $s$ is then again of the form (\ref{ress}) with $G_C=0$, $G_E$ given by
260: (\ref{GEstor}), and the extremum taken now over $Q$ and $\Qh$. Moreover
261: \begin{align}
262: G_S=&\frac{1}{2}\ln(1-\frac{\Qh}{1+Q^2})\nonumber\\
263: &+\int\prod_ {\bsxi} Dz_{\bsxi}\ln \Tr_B\;\exp(\frac{\Qh}{2^L}\;
264: (\sum_{\bsxi} z_{\bsxi} B({\bsxi}))^2).
265: \end{align}
266: The storage capacity $\al_c$ can be obtained from these expressions in the
267: limits $Q\to\infty, \Qh\to\infty$ corresponding to $q\to 1$. This limit
268: indicates that different solutions of the storage problem may at most
269: differ in a non-extensive number of components $J_i$ and Boolean functions
270: $B_i$. We then find $G_E\sim -Q^2/4$ and, if all
271: Boolean functions are admissible, $G_S\sim \Qh(1/2+(2^{L-1}-1)/\pi)$ giving
272: rise to
273: \begin{equation}\label{alspherrs}
274: \al_c^{RS}=\frac{2+\frac{4}{\pi}(2^{L-1}-1)}{L}.
275: \end{equation}
276: For $L=3$ this yields $\al_c^{RS}=2/3+4/\pi\cong 1.94$. If only linearly
277: separable Boolean functions implementable by coupling vectors $\du_i$ are
278: considered the asymptotic behavior of $G_S$ is more difficult to obtain. For
279: the case $L=3$ we find $\al_c\cong 1.85$. Again the relative reduction of
280: $\al_c$ when compared to the unrestricted case is roughly given by the ratio
281: of the logarithms of the number of available Boolean functions per hidden
282: unit.
283:
284: It is possible to derive an upper bound for $\al_c$ as has been done for MLN
285: with a finite number of hidden units \cite{MiDu} by using some exact
286: results for the perceptron \cite{Cover}. For $L=3$ we find
287: $\al_c(L=3)\leq 2.394$ and the replica symmetric result is therefore within
288: the bound. For large $L$ the bound is given by
289: $\al_c(L\to\infty)\lesssim 2^{L-1}/L+\ln 2$ and shows the same scaling with
290: $L$ as (\ref{alspherrs}).
291: Nevertheless the replica symmetric result (\ref{alspherrs}) is very likely
292: to overestimate the storage capacity as can be seen from fig.\ref{CDigital} in
293: which the result for $\al_c$ for $L=3$ is included as horizontal dashed line.
294: Unlike the case of the perceptron \cite{GuSt} the values for $\alpha_c$
295: for finite synaptic depth seem not to approach the value for continuous
296: couplings when $l\to\infty$. It would hence be very interesting to
297: investigate the implications of replica symmetry breaking, both on the case
298: of continuous couplings and of couplings with finite synaptic depth
299: \cite{Robert}.
300:
301: % the generalization problem
302:
303: Let us finally elucidate the generalization problem, i.e. the ability of
304: the network to infer a rule from examples. To this end we consider as usual
305: two networks of the same type with the couplings and Boolean function of one
306: of them (the ``teacher'') fixed at random. The other network (the ``student'')
307: receives a set of randomly chosen inputs $\ksi_i,\mu=1,...,\al L N$ together
308: with the corresponding outputs $\sigma^{\mu}_T$ generated by the teacher. The
309: task for the student is to imitate the teacher as well as possible. The
310: success in doing so is quantified by the generalization error $\eps$ defined
311: as the probability that a
312: {\it newly} chosen random input is classified differently by teacher and
313: student.
314:
315: As is well known the statistical mechanics analysis of the generalization
316: problem builds again on the expression (\ref{defs}) for the quenched entropy
317: with the number of replicas now tending to 1 rather than to 0
318: \cite{OpHa,EnvB}. A nice feature of this limit is that replica symmetry is
319: known to be stable. The order parameter $q$ defined in
320: (\ref{defq}) now gives the typical overlap between teacher and student and
321: determines the generalization error $\eps$ in a simple way. In the present
322: situation we have the standard relation $\eps=(\arccos q)/\pi$. Moreover
323: (\ref{GEstor}) is replaced by
324: \begin{equation}\label{GEgen}
325: G_E(q)=2 \int Dt\; H(Q\;t)\;\ln H(Q\;t).
326: \end{equation}
327: The case using Ising couplings $J_i=\pm 1$ and all symmetric Boolean functions
328: can again be mapped exactly on the Ising perceptron. Correspondingly there is
329: a {\it discontinuous} transition to perfect learning, $\eps=0$ for $\al>\al_d$
330: \cite{Gyo} with $\al_d=1.24\;2^{L-1}/L$. This transition occurs when all
331: Boolean functions of the student ``lock'' onto the corresponding input-hidden
332: mappings of the teacher and is also expected to occur in the case where only
333: a restricted set of Booleans can be implemented.
334:
335: For continuous couplings we find $G_C=Q^2\Qh/((1+Q^2)(1-\Qh))$ and
336: \begin{equation}\label{GSgen}
337: G_S=\frac{1}{2}\ln(1-\Qh)+\sqrt{1-\Qh}
338: \int\prod_ {\bsxi} Dz_{\bsxi}\; g(z_{\bsxi})\,\ln g(z_{\bsxi})
339: \end{equation}
340: where
341: \begin{equation}\label{GSh}
342: g(z_{\bsxi})=\Tr_B \exp(\frac{\Qh}{2^L}(\sum_{\bsxi}\,z_{\bsxi} B(\bsxi))^2).
343: \end{equation}
344: For small $\al$ this gives rise to $\eps\sim 1/2-\al L/(\pi^2 2^{L-1})$ which
345: coincides with the result for the perceptron for L=1 as it should. With
346: increasing $L$ the initial decay of the generalization error becomes slower
347: reflecting the increasing complexity and storage abilities of the network.
348: There is no retarded learning because of the non-zero
349: correlation between the hidden units and the output \cite{retlearn}.
350: For large $\al$ the generalization behaviour is dominated by the fine tuning
351: of the student couplings between hidden layer and output to the respective
352: couplings of the teacher resulting in the ubiquitous power law decay
353: $\eps\sim 0.625/(L\al)$.
354:
355: In conclusion we have quantitatively characterized the storage and
356: generalization abilities of a multilayer neural network with a number of
357: hidden units scaling as the input dimension. If the mapping from the input
358: to the hidden layer is realized by symmetric Boolean functions with $L$
359: inputs the capacity is found to be proportional to the logarithm of the
360: number of these Boolean functions divided by $L$. The more conventional case
361: in which the hidden units are the outputs of perceptrons with couplings
362: $\du_i$ is more difficult to analyze. However, speculating that the above
363: scaling holds true also in this case and observing that the logarithm of
364: the number of Boolean functions which can be implemented by a perceptron with
365: $L$ inputs is $O(L^2)$ we arrive at the interesting result that the number
366: of stored input-output relations {\it per weight} of the network is
367: proportional to $L$. This implies that doubling the number of couplings
368: in the network would increase the storage capacity by a factor of 2 making
369: the proposed architecture superior to MLN with few ($K\ll N$) hidden
370: units in which the storage capacity is known to increase at most
371: logarithmically with the number of weights.
372:
373:
374:
375: \vspace*{.5cm}
376:
377:
378: {\bf Acknowledgment:} We have benefitted from discussions with Wolfgang
379: Kinzel, Robert Urbanczik, Peter Reimann, and Stephan Mertens.
380: We would like to thank the
381: Max-Planck-Institut f\"ur Physik komplexer Systeme in Dresden where this
382: work was finished for hospitality and the GIF for support.
383:
384:
385:
386: \begin{thebibliography}{99}
387: \vspace*{-1.5cm}
388: \bibitem{Gardner} E. Gardner, J. Phys. {\bf A21}, 257 (1988); E. Gardner and
389: B. Derrida, J. Phys. {\bf A21}, 271 (1988).
390: \bibitem{EnvB} A. Engel and C. Van den Broeck {\it Statistical Mechanics of
391: Learning} (Cambridge University Press, Cambrigde, 2001).
392: \bibitem{Cyb} G. Cybenko, Math. Contr. Sign. Syst. {\bf 2}, 303 (1989).
393: \bibitem{smMLN} E. Barkai, D. Hansel and I. Kanter, Phys. Rev. Lett. {\bf 18},
394: 2312 (1990); E. Barkai, D. Hansel, and H. Sompolinsky, Phys. Rev.
395: {\bf A45}, 4146 (1992); A. Engel, H. M. K\"ohler, F. Tschepke,
396: H. Vollmayr, and A. Zippelius, Phys. Rev. {\bf A45}, 5790 (1992);
397: H. Schwarze and J. Hertz, Europhys. Lett. {\bf 20}, 375 (1992);
398: R. Monasson and R. Zecchina, Phys. Rev. Lett. {\bf 75}, 2432 (1995);
399: R. Urbanczik, Europhys. Lett. {\bf 35}, 553 (1996); one of the rare
400: exceptions is \cite{BKH}.
401: \bibitem{BKH} A. Bethge, R. K\"uhn, and H. Horner, J. Phys. {\bf A27}, 1929
402: (1994).
403: \bibitem{KrMe} W. Krauth and M. Mezard, J. Phys. (Paris) {\bf 50}, 3057, 1989.
404: \bibitem{GuSt} H. Gutfreund and Y. Stein, J. Phys. {\bf A23}, 2613, 1990.
405: \bibitem{Kanter} I. Kanter, Europhys. Lett. {\bf 17}, 181, 1992.
406: \bibitem{PBGDK} A. Priel, M. Blatt, T. Grossman, E. Domany and I. Kanter
407: Phys. Rev. {\bf E50}, 577 (1994).
408: \bibitem{MiDu} G. J. Mitchison and R. M. Durbin, Biol. Cybern. {\bf 60}, 345
409: (1989).
410: \bibitem{Cover} T. M. Cover, IEEE Trans. Electron. Comput. {\bf EC-14}, 326
411: (1965).
412: \bibitem{Robert} R. Urbanczik, Europhys. Lett. {\bf 26}, 233 (1994)
413: \bibitem{OpHa} M. Opper and D. Haussler, Phys. Rev. Lett. {\bf 66} , 2677
414: (1991).
415: \bibitem{Gyo} G. Gy\"orgyi, Phys. Rev. {\bf A41}, 7097 (1990).
416: \bibitem{retlearn} M. Biehl and A. Mietzner, J. Phys. {\bf A 27}, 1885 (1994);
417: B. Schottky, J. Phys. {\bf A28}, 4515 (1995); C. Van den Broeck and
418: P. Reimann, Phys. Rev. Lett. {\bf 76}, 2188 (1996).
419: \end{thebibliography}
420:
421: \end{document}
422: