1: \documentclass[11pt]{article}
2: \usepackage{setspace}
3: \usepackage{epsfig}
4: \usepackage{palatino}
5: \usepackage{verbatim}
6: \usepackage{graphicx}
7: \usepackage{graphics}
8:
9: \bibliographystyle{apalike}
10: \usepackage[round]{natbib}
11:
12: \setlength{\textwidth}{6.0in} \setlength{\evensidemargin}{0.25in}
13: \setlength{\oddsidemargin}{0.25in} \setlength{\topmargin}{-0.0in}
14: \setlength{\textheight}{9.0in} \setlength{\headheight}{0.25in}
15: \setlength{\headsep}{0.3in} \setlength{\footskip}{0.7in}
16:
17: \def\tr{{\mathrm T}}
18: \def\rd{{\mathrm d}}
19: \def\ssum{{\mbox{\small{$\Sigma$}}}}
20: \def\ol#1{{\overline{#1}}}
21: %\def\ol#1{\bar{#1}}
22: \def\eq{\!=\!}
23: \def\deg{$^{\circ}$}
24: \def\b{\begin{equation}}
25: \def\e{\end{equation}}
26: \def\ba{\begin{eqnarray}}
27: \def\ea{\end{eqnarray}}
28: \def\rmax{$r_{\mathrm{max}}$}
29: \def\Emin{$E_{\mathrm{min}}$}
30: \def\sigmin{$\sigma_{\mathrm{min}}$}
31: \def\Epop{$E_{\mathrm{pop}}$}
32: \def\Erec{$E_{\mathrm{rec}}$}
33: \newcommand{\bm}[1]{\mbox{\boldmath $#1$}}
34: \renewcommand{\floatpagefraction}{0.9}
35: \renewcommand{\textfraction}{0.01}
36:
37: \begin{document}
38: \begin{center}
39: \begin{large}
40: {\bf WHEN RESPONSE VARIABILITY INCREASES \\
41: \vspace*{0.05in} NEURAL NETWORK ROBUSTNESS TO SYNAPTIC NOISE\\}
42: \vspace*{0.2in}
43: \end{large}
44: \vspace*{0.3in}
45: \begin{large}
46: {\bf Gleb Basalyga and Emilio Salinas}
47: \vspace*{0.3in}
48: \end{large} \\
49: Department of Neurobiology and Anatomy \\
50: Wake Forest University School of Medicine \\
51: Winston-Salem, NC 27157-1010 \\
52: E-mail: gbasalyg@wfubmc.edu, esalinas@wfubmc.edu \\
53: \vspace*{0.3in}
54: \today \\
55: \vspace*{0.3in}
56: {\small Preliminary version of paper \\
57: to appear in
58: \textbf{\textit{Neural Computation}}}
59: \vspace*{0.3in}
60: \end{center}
61:
62: \centerline{\textbf{Abstract}}
63:
64: \vspace{0.2in}
65:
66: \noindent
67: Cortical sensory neurons are known to be highly variable, in the sense
68: that responses evoked by identical stimuli often change dramatically
69: from trial to trial. The origin of this variability is uncertain, but
70: it is usually interpreted as detrimental noise that reduces the
71: computational accuracy of neural circuits. Here we investigate the
72: possibility that such response variability might, in fact, be
73: beneficial, because it may partially compensate for a decrease in
74: accuracy due to stochastic changes in the synaptic strengths of a
75: network. We study the interplay between two kinds of noise, response
76: (or neuronal) noise and synaptic noise, by analyzing their joint
77: influence on the accuracy of neural networks trained to perform
78: various tasks. We find an interesting, generic interaction: when
79: fluctuations in the synaptic connections are proportional to their
80: strengths (multiplicative noise), a certain amount of response noise
81: in the input neurons can significantly improve network performance,
82: compared to the same network without response noise. Performance is
83: enhanced because response noise and multiplicative synaptic noise are
84: in some ways equivalent. So, if the algorithm used to find the optimal
85: synaptic weights can take into account the variability of the model
86: neurons, it can also take into account the variability of the
87: synapses. Thus, the connection patterns generated with response noise
88: are typically more resistant to synaptic degradation than those
89: obtained without response noise. As a consequence of this interplay,
90: if multiplicative synaptic noise is present, it is better to have
91: response noise in the network than not to have it. These results are
92: demonstrated analytically for the most basic network consisting of two
93: input neurons and one output neuron performing a simple classification
94: task, but computer simulations show that the phenomenon persists in a
95: wide range of architectures, including recurrent (attractor) networks
96: and sensory-motor networks that perform coordinate transformations.
97: The results suggest that response variability could play an important
98: dynamic role in networks that continuously learn.
99:
100: \newpage
101: \section{Introduction}
102:
103: Neuronal networks face an inescapable tradeoff between learning new
104: associations and forgetting previously stored information. In
105: competitive learning models, this is sometimes referred to as the
106: stability-plasticity dilemma~\citep{carpenter87art2,hertz91b}: in
107: terms of inputs and outputs, learning to respond to new inputs will
108: interfere with the learned responses to familiar inputs. A
109: particularly severe form of performance degradation is known as
110: catastrophic interference~\citep{mccloskey89catastrophic}. It refers
111: to situations in which the learning of new information causes the
112: virtually complete loss of previously stored associations.
113:
114: Biological networks must face a similar problem, because once a task
115: has been mastered, plasticity mechanisms will inevitably produce
116: further changes in the internal structural elements, leading to
117: decreased performance. That is, within sub-networks that have already
118: learned to perform a specific function, synaptic plasticity must at
119: least partly appear as a source of noise. In the cortex, this problem
120: must be quite significant, given that even primary sensory areas show a
121: large capacity for reorganization~\citep{XMSJ95,KM98,CLG01}. Some
122: mechanisms, such as homeostatic regulation~\citep{TN00} and specific
123: types of synaptic modification rules~\citep{HB04}, may help alleviate
124: the problem, but by and large, how nervous systems cope with it
125: remains unknown.
126:
127: Another factor that is typically considered as a limitation for neural
128: computation capacity is response variability. The activity of cortical
129: neurons is highly variable, as measured either by the temporal
130: structure of spike trains produced during constant stimulation
131: conditions, or by spike counts collected in a given time interval and
132: compared across identical behavioral
133: trials~\citep{Dean81,nc:Softky+Koch:1992,SK93,HSKD96}. Some of the
134: biophysical factors that give rise to this variability, such as the
135: balance between excitation and inhibition, have been
136: identified~\citep{SK93,SN94,SZ98}. But its functional significance, if
137: any, is not understood.
138:
139: Here we consider a possible relationship between the two sources of
140: randomness just discussed, whereby response variability helps
141: counteract the destabilizing effects of synaptic changes. Although
142: noise generally hampers performance, recent studies have shown that
143: in nonlinear dynamical systems such as neural networks this is not
144: always the case. The best known example is stochastic resonance, in
145: which noise enhances the sensitivity of sensory neurons to weak
146: periodic signals~\citep{LM96,Gammaitoni98SR,Nozaki99}, but noise may
147: play other constructive roles as well. For instance, when a system has
148: an internal source of noise, an externally added noise can reduce the
149: total noise of the output~\citep{Vilar-Rubi-2000}. Also, adding noise
150: to the synaptic connections of a network during learning produces
151: networks that, after training, are more robust to synaptic corruption
152: and have a higher capacity to generalize~\citep{murray94enhanced}.
153:
154: In this paper we study another beneficial effect of noise on neural
155: network performance. In this case, adding randomness to the neural
156: responses reduces the impact of fluctuations in synaptic strength.
157: That is, here, performance depends on two sources of variability,
158: response noise and synaptic noise, and adding some amount of response
159: noise produces better performance than having synaptic noise alone.
160: The reason for this paradoxical effect is that response noise acts as
161: a regularization factor that favors connectivity matrices with many
162: small synaptic weights over connectivity matrices with few large
163: weights, and this minimizes the impact of a synapse that is lost or
164: has a wrong value. We study this regularization effect in three
165: different cases: (1) a classification task, which in its simplest
166: instantiation can be studied analytically, (2) a sensory-motor
167: transformation, and (3) an attractor network that produces
168: self-sustained activity. For the latter two, the interaction between
169: noise terms is demonstrated by extensive numerical simulations.
170:
171: \section{General Framework}
172: \label{general}
173:
174: First we consider networks with two layers, an input layer that
175: contains $N$ sensory neurons and an output layer with $K$ output
176: neurons. A matrix $\bm{r}$ is used to denote the firing rates of the
177: input neurons in response to $M$ stimuli, so $r_{ij}$ is the firing
178: rate of input unit $i$ when stimulus $j$ is presented. These rates
179: have a mean component $\ol{\bm{r}}$ plus noise, as described in detail
180: below. The output units are driven by the first layer responses, such
181: that the firing rate of output unit $k$ evoked by stimulus $j$ is
182: \b
183: R_{kj} = \sum_{i=1}^N w_{ki} \, r_{ij} ,
184: \label{Rdriv}
185: \e
186: or in matrix form, $\bm{R}= \bm{w} \bm{r}$, where $\bm{w}$ is the
187: $K\!\times\!N$ matrix of synaptic connections between input and output
188: neurons. The output neurons also have a set of desired responses
189: $\bm{F}$, where $F_{kj}$ is the firing rate that output unit $k$
190: should produce when stimulus $j$ is presented. In other words, $\bm F$
191: contains target values that the outputs are supposed to learn. The
192: error $E$ is the mean squared difference between the actual driven
193: responses $R_{kj}$ and the desired ones,
194: \b
195: E = \left< \frac{1}{KM} \, \sum_{k=1}^K\sum_{j=1}^M
196: \left( R_{kj} - F_{kj} \right)^2
197: \right> ,
198: \label{error}
199: \e
200: or in matrix notation,
201: \b
202: E = \frac{1}{KM}
203: \left< \mbox{Tr}
204: \left[(\bm{w} \bm{r} - \bm{F})
205: (\bm{w} \bm{r} - \bm{F})^\tr
206: \right]
207: \right> .
208: \label{errormatrix}
209: \e
210: Here, $\mbox{Tr}(\bm{A}) = \sum_i A_{ii}$ is the trace of a matrix and
211: the angle brackets indicate an average over multiple trials, which
212: corresponds to multiple samples of the noise in the inputs $\bm{r}$.
213: The optimal synaptic connections $\ol{\bm{W}}$ are those that make the
214: error as small as possible. These can be found by computing the
215: derivative of Equation (\ref{errormatrix}) with respect to $\bm{w}$
216: (or with respect to $w_{ab}$, if the summations are written
217: explicitly) and setting the result equal to zero~\citep[see
218: e.g.,][]{GolubLoan96a}. These steps give
219: \b
220: \ol{\bm{W}} = \bm{F} \, \ol{\bm{r}}^\tr \bm{C}^{-1} ,
221: \label{wopt}
222: \e
223: where $\ol{\bm{r}}\eq \left<\bm{r}\right>$ and $\bm{C}^{-1}$ is the
224: inverse (or the pseudo-inverse) of the correlation matrix
225: $\bm{C} = \left<\bm{r} \bm{r}^\tr\right>$.
226:
227: The general outline of the computer experiments proceeds in five
228: steps as follows. First, the matrix $\ol{\bm r}$ with the mean input
229: responses is generated together with the desired output responses
230: $\bm{F}$. These two quantities define the input-output transformation
231: that the network is supposed to implement. Second, response noise is
232: added to the mean input rates, such that
233: \b
234: r_{ij} = \ol{r}_{ij} (1 + \eta_{ij}).
235: \label{inputnoise1}
236: \e
237: The random variables $\eta_{ij}$ are independently drawn from a
238: distribution with zero mean and variance $\sigma_r^2$,
239: \ba
240: \left< \eta_{ij} \right > & = & 0 \nonumber \\
241: \left< \eta^2_{ij} \right> & = & \sigma_r^2 ,
242: \label{inputnoise1a}
243: \ea
244: where the brackets again denote an average over trials. We refer to
245: this as multiplicative noise. Third, the optimal connections are
246: found using Equation (\ref{wopt}). Note that these connections take
247: into account the response noise through its effect on the correlation
248: matrix $\bm{C}$. Fourth, the connections are corrupted by
249: multiplicative synaptic noise with variance $\sigma_W^2$, that is
250: \b
251: W_{ij} = \ol{W}_{ij} (1 + \epsilon_{ij}),
252: \label{weightnoisegeneral}
253: \e
254: where
255: \ba
256: \left< \epsilon_{ij} \right> & = & 0 \nonumber \\
257: \left< \epsilon^2_{ij} \right> & = & \sigma_W^2 .
258: \ea
259: Finally, the network's performance is evaluated. For this, we measure
260: the network error $E_W$, which is the square error obtained with the
261: optimal but corrupted weights $\bm{W}$, averaged over both types of
262: noise,
263: \b
264: E_W = \frac{1}{KM}
265: \left< \mbox{Tr} \left[
266: (\bm{W} \bm{r} - \bm{F}) (\bm{W} \bm{r} - \bm{F})^\tr
267: \right] \right> .
268: \label{errornet}
269: \e
270: Thus, the brackets in this case indicate an average over multiple
271: trials and multiple networks, i.e., multiple corruptions of the
272: optimal weights $\ol{\bm{W}}$.
273:
274: The main result we report here is an interaction between the two types
275: of noise: in all the network architectures that we have explored, for
276: a fixed amount of synaptic noise $\sigma_W$, the best performance is
277: typically found when the response noise has a certain nonzero
278: variance. So, given that there is synaptic noise in the network, it is
279: better to have some response noise rather than to have none.
280:
281: Before addressing the first example, we should highlight some features
282: of the chosen noise models. Regarding response noise, Equations
283: (\ref{inputnoise1}, \ref{inputnoise1a}), other models were tested in
284: which the fluctuations were additive rather than multiplicative. Also,
285: Gaussian, uniform and exponential distributions were tested. The
286: results for all combinations were qualitatively the same, so the shape
287: of the response noise distribution does not seem to play an important
288: role; what counts is mainly the variance. On the other hand, the
289: benefit of response noise is observed only when the synaptic noise is
290: multiplicative; it disappears with additive synaptic noise. However,
291: we do test several variants of the multiplicative model, including one
292: in which the random variables $\epsilon_{ij}$ are drawn from a
293: Gaussian distribution and another in which they are binary, 0 or -1.
294: The latter case represents a situation in which connections are
295: eliminated randomly with a fixed probability.
296:
297:
298: \section{Noise Interactions in a Classification Task}
299:
300: First we consider a task in which the two-layer, fully connected
301: network is used to approximate a binary function. The task is to
302: classify $M$ stimuli on the basis of the $N$ input firing rates evoked
303: by each stimulus. Only one output neuron is needed, so $K\eq1$. The
304: desired response of this output neuron is the classification function
305: \b
306: F_j = \left\{\begin{array}{l}
307: 1 \ \: \: \mbox{if } j \leq M/2 \\
308: 0 \ \: \: \mbox{else} ,
309: \end{array}\right.
310: \label{desiredoutput}
311: \e
312: where $j$ goes from 1 to $M$. Therefore, the job of the output unit is
313: to produce a 1 for the first $M/2$ input stimuli and a 0 for the rest.
314:
315:
316: \subsection{A Minimal Network}
317:
318: In order to obtain an analytical description of the noise
319: interactions, we first consider the simplest possible network that
320: exhibits the effect, which consists of two input neurons and two
321: stimuli. Thus, $N\eq M\eq 2$ and the desired output is
322: $\bm{F} = \left(1, 0\right)$. Note that, with a single output neuron,
323: the matrices $\bm{W}$ and $\bm{F}$ become row vectors. Now we proceed
324: according to the five steps outlined in the preceding section --- the
325: goal is to show analytically that, in the presence of synaptic noise,
326: performance is typically better for a nonzero amount of response
327: noise.
328:
329: The matrix of mean input firing rates is set to
330: \b
331: \ol{\bm{r}} = \left( \begin{array}{cc}
332: 1 & r_0 \\
333: r_0 & 1 \\
334: \end{array} \right) ,
335: \label{inputmeanmatrix}
336: \e
337: where $r_0$ is a parameter that controls the difficulty of the
338: classification. When it is close to 1, the pairs of responses evoked
339: by the two stimuli are very similar and large errors in the output are
340: expected; when it is close to 0, the input responses are most
341: different and the classification should be more accurate. After
342: combining the mean responses with multiplicative noise, as prescribed
343: by Equation (\ref{inputnoise1}), the input responses in a given trial
344: become
345: \b
346: \bm{r} = \left( \begin{array}{cc}
347: 1 + \eta_{11} & r_0 (1+\eta_{12}) \\
348: r_0 (1+\eta_{21}) & 1 + \eta_{22} \\
349: \end{array} \right) .
350: \label{multinputmeanmatrix}
351: \e
352: Assuming that the fluctuations are independent across neurons, the
353: correlation matrix is, therefore,
354: \b
355: \bm{C} = \left< \bm{r} \bm{r}^\tr \right>
356: = \left(\begin{array}{cc}
357: (1+r_0^2)(1+\sigma_r^2) & 2 r_0 \\
358: 2 r_0 & (1+r_0^2)(1+\sigma_r^2) \\
359: \end{array} \right) .
360: \label{multicorrelationmatrix}
361: \e
362: Next, after calculating the inverse of $\bm{C}$, Equation (\ref{wopt})
363: is used to find the optimal weights, which are
364: \ba
365: \ol{W}_1 & = & \frac{\sigma_r^2 (1+r_0^2) + (1-r_0^2)}
366: {(1+\sigma_r^2)^2 \, (1+r_0^2)^2 - 4 r_0^2} \nonumber \\
367: \ol{W}_2 & = & \frac{\sigma_r^2 (1+r_0^2) - (1-r_0^2)}
368: {(1+\sigma_r^2)^2 \, (1+r_0^2)^2 - 4 r_0^2} \: r_0 \, .
369: \label{multiwopt1}
370: \ea
371: Notice that these connections take into account the response
372: variability through their dependence on $\sigma_r$. The next step is
373: to corrupt these synaptic weights as prescribed by Equation
374: (\ref{weightnoisegeneral}), and substitute the resulting expressions
375: into Equation (\ref{errornet}). After making all the substitutions,
376: calculating the averages and simplifying, we obtain the average error,
377: %\b
378: % E_W = (1 + \sigma_W^2)
379: % (\ol{W}^2_1 + \ol{W}^2_2) (1 + \sigma_r^2) (1 + r_0^2)
380: % + 4 r_0 \ol{W}_1 \ol{W}_2
381: % - 2 r_0 \ol{W}_2 - 2 \ol{W}_1 + 1 .
382: %\e
383: \b
384: E_W = \frac{1}{2} \left(
385: \sigma_W^2 (\ol{W}^2_1 + \ol{W}^2_2)
386: (1 + \sigma_r^2) (1 + r_0^2)
387: - \ol{W}_1 - r_0 \ol{W}_2 + 1
388: \right) .
389: \label{multierror2}
390: \e
391: This is the average square difference between the desired and actual
392: responses of the output neuron given the two types of noise. It is a
393: function only of three parameters, $\sigma_r$, $\sigma_W$ and $r_0$,
394: because the optimal weights themselves depend on $\sigma_r$ and $r_0$.
395:
396: The interaction between noise terms for this simple $N\eq K\eq 2$ case
397: is illustrated in Fig.~1A, which plots the error as a function of
398: $\sigma_r$ with and without synaptic variability. Here, dashed and
399: solid lines represent the theoretical results given by Equations
400: (\ref{multiwopt1}, \ref{multierror2}) and symbols correspond to
401: simulation results averaged over $1000$ networks and $100$ trials per
402: network. Without synaptic noise (dashed line), the error increases
403: monotonically with $\sigma_r$, as one would normally expect when
404: adding response variability. In contrast, when $\sigma_W\eq 0.15$, 0.2
405: or 0.25 (solid lines), the error initially decreases and then starts
406: increasing again, slowly approaching the curve obtained with response
407: noise alone.
408:
409: \begin{figure*}[tb]
410: \centerline{\epsfig{figure=fig1.eps,width=5.0in}}
411: \caption{\label{WeightSpace}
412: Noise interaction for a simple network of two input neurons and one
413: output neuron ($K\eq 1$, $N\eq M\eq 2$). Both input responses and
414: synaptic weights were corrupted by multiplicative Gaussian noise. For
415: all curves, solid lines are theoretical results and symbols are
416: simulation results averaged over $1000$ networks and $100$ trials per
417: network. In all cases, $r_0\eq 0.8$.
418: (A) Average square difference between observed and desired output
419: responses, $E_W$, as a function of the standard deviation (SD) of the
420: response noise, $\sigma_r$. Squares and dashed line correspond to the
421: error without synaptic noise ($\sigma_{W}\eq 0$); circles and
422: continuous lines correspond to the error with synaptic noise
423: ($\sigma_{W}\eq 0.15, 0.20, 0.25$).
424: (B) Dependence of the (uncorrupted) optimal weights $\ol{\bm{W}}$ on
425: $\sigma_r$. }
426: \end{figure*}
427:
428: Figure 1B shows how the optimal weights depend on $\sigma_r$. The
429: solid lines were obtained from Equations (\ref{multiwopt1}) above.
430: The curves show that the effect of response noise is to decrease the
431: absolute values of the optimal synaptic weights. Intuitively, that is
432: why response variability is advantageous; smaller synaptic weights
433: also mean smaller synaptic fluctuations, because their standard
434: deviation (SD) is proportional to the mean values. So, there is a
435: tradeoff: the intrinsic effect of increasing $\sigma_r$ is to increase
436: the error, but with synaptic noise present, $\sigma_r$ also decreases
437: the magnitude of the weights, which lowers the impact of the synaptic
438: fluctuations. That the impact of synaptic noise grows directly with
439: the magnitude of the weights is also apparent from the first term in
440: Equation (\ref{multierror2}).
441:
442: The magnitude of the noise interaction can be quantified by the
443: ratio \Emin$/E_0$, where the numerator is the minimal value of the
444: error curve and the denominator is the error obtained when only
445: synaptic noise is present, that is, when $\sigma_r\eq 0$. The
446: minimum error \Emin\ occurs at the optimal value of $\sigma_r$,
447: denoted as \sigmin. The ratio \Emin$/E_0$ is equal to 1 if response
448: variability provides no advantage and approaches 0 as \sigmin\
449: cancels more of the error due to synaptic noise. For the lowest
450: solid curve in Fig.~1A the ratio is approximately 0.8, so response
451: variability cancels about 20\% of the square error generated by
452: synaptic fluctuations. Note, however, that in these examples the
453: error is below $E_0$ for a large range of values of $\sigma_r$, not
454: only near \sigmin, so response noise may be beneficial even if it is
455: not precisely matched to the amount of synaptic noise.
456:
457: \begin{figure*}[tb!]
458: \centerline{\epsfig{figure=fig2.eps,width=5.0in}}
459: \caption{\label{MultiNoise1}
460: Optimal amount of response noise in the minimal classification
461: network. Same network with two sensory neurons and one output neuron
462: as in Fig.~1. Lines and symbols indicate theoretical and simulation
463: results, respectively, averaged over $1000$ networks and $100$ trials
464: per network.
465: (A) Strength of the noise interaction quantified by \Emin\ (dashed
466: line) and \Emin$/E_0$ (solid line), as a function of $\sigma_{W}$, which
467: determines the synaptic variability. Here and in B, $r_0\eq 0.8$.
468: (B) Optimal amount of response variability, \sigmin, as a function of
469: $\sigma_{W}$, for the same data in A\@.
470: (C) Strength of the noise interaction as a function of $r_0$, which
471: parameterizes the discriminability of the mean input responses evoked
472: by the two stimuli. Here and in D, $\sigma_W\eq 1$.
473: (D) \sigmin, as a function of $r_0$ for the same data in C\@. }
474: \end{figure*}
475:
476: Figure 2 further characterizes the strength of the interaction between
477: the two types of noise. Figures 2A, B show how the error and the
478: optimal amount of response variability vary as functions of
479: $\sigma_W$. These graphs indicate that the fraction of the error that
480: $\sigma_r$ is able to compensate for, as well as the optimal amount of
481: response noise, increases with the SD of the synaptic noise. The
482: minimum error, \Emin, grows steadily with $\sigma_W$ --- clearly,
483: $\sigma_r$ cannot completely compensate for synaptic corruption.
484: Also, $\sigma_W$ has to be bigger than a critical value for the noise
485: interaction to be observed ($\sigma_W\!>\!0.1$, approximately).
486: However, except when synaptic noise is very small, the optimal
487: strategy is to add some response noise to the network.
488:
489: As in the previous figure, symbols and lines in Fig. 2 correspond to
490: simulation and theoretical results, respectively. To obtain the
491: latter, the key is to calculate \sigmin. This is done by, first,
492: substituting the optimal synaptic weights of Equation
493: (\ref{multiwopt1}) into the expression for the average error, Equation
494: (\ref{multierror2}), and second, calculating the derivative of the
495: error with respect to $\sigma_r^2$ and equating it to zero. The
496: resulting expression gives $\sigma^2_{\mathrm{min}}$ as a function of
497: the only two remaining parameters, $\sigma_W$ and $r_0$. The
498: dependence, however, is highly nonlinear, so in general the solution
499: is implicit:
500: \ba
501: \lefteqn{\sigma_r^8 \, (1 - \sigma_W^2) +
502: 2 \sigma_r^6 \, (1 + a^2(1 - 2 \sigma_W^2)) +
503: 6 \sigma_r^4 a^2 \, (1 - \sigma_W^2) + \mbox{} }
504: \hspace*{1.2cm} \nonumber \\
505: & & 2 \sigma_r^2 a^2 \, (1 + a^2 + 2 a^2 \sigma_W^2 - 4 \sigma_W^2) +
506: a^4 (1 + 3 \sigma_W^2) -
507: 4 a^2 \sigma_W^2 \:\: = \:\: 0 \, ,
508: \label{implicit}
509: \ea
510: where
511: \b
512: a \equiv \frac{1 - r_0^2}{1 + r_0^2} \, .
513: \e
514: The value of $\sigma_r$ that makes Equation (\ref{implicit}) true is
515: \sigmin. For Figs.~2A, B, the zero of the polynomial was found
516: numerically for each combination of $r_0$ and $\sigma_W$.
517:
518: Figures 2C, D show how \Emin, \Emin/$E_0$ and \sigmin\ depend on the
519: separation between evoked input responses, as parameterized by $r_0$.
520: For these two plots, we chose a special case in which \sigmin\ can be
521: obtained analytically from Equation (\ref{implicit}): $\sigma_W\eq 1$.
522: In this particular case the dependence of \sigmin\ on $r_0$ has a
523: closed form,
524: \b
525: \sigma_{\mathrm{min}}^2 = \frac{(1-r_0^2)^{2/3}}{1+r_0^2}
526: \left( (1+r_0)^{2/3} + (1-r_0)^{2/3} \right) .
527: \label{sigmamin}
528: \e
529: This function is shown in Fig.~2D. In general, the numerical
530: simulations are in good agreement with the theory, except that the
531: scatter in Fig.~2D tends to increase as $r_0$ approaches 0. This is
532: due to a key feature of the noise interaction, which is that it
533: depends on the overlap between input responses across stimuli. This
534: can be seen as follows.
535:
536: First, notice that in Fig.~2C the relative error approaches 1 as $r_0$
537: gets closer to 0. Thus, the noise interaction becomes weaker when
538: there is less overlap between input responses, which is precisely what
539: $r_0$ represents in Equation (\ref{inputmeanmatrix}). If there is no
540: overlap at all, the benefit of response noise vanishes. This fact
541: explains why more than one neuron is needed to observe the noise
542: interaction in the first place. This observation can be demonstrated
543: analytically by setting $r_0\eq 0$ in Equations (\ref{multiwopt1}) and
544: (\ref{multierror2}), in which case the average square error becomes
545: \b
546: E_W(r_0\eq 0) = \frac{1}{2} \left(
547: \frac{\sigma_W^2 - 1}{1 + \sigma_r^2} + 1
548: \right) .
549: \label{r0error}
550: \e
551: This result has interesting implications. If $\sigma_W^2\eq 1$,
552: response noise makes no difference, so there is no optimal value. If
553: $\sigma_W^2\!<\!1$, the error increases monotonically with response
554: noise, so the optimal value is 0. And if $\sigma_W^2\!>\!1$, the
555: optimal strategy is to add as much noise as possible! In this case,
556: the variance of the output neuron is so high that there is no hope of
557: finding a reasonable solution; the best thing to do is set the mean
558: weights to zero, disconnecting the output unit. Thus, without overlap,
559: either the synaptic noise is so high that the network is effectively
560: useless, or, if $\sigma_W$ is tolerable, response noise does not
561: improve performance. At $r_0\eq 0$, the numerical solutions oscillate
562: between these two extremes, producing an average error of 0.5
563: (leftmost point in Fig.~2C). In general, however, with non-zero
564: overlap there is a true optimal amount of response noise, and the more
565: overlap there is, the larger its benefit, as shown in Fig.~2C\@.
566:
567: The simulation data points in Fig.~2 were obtained using fluctuations
568: $\epsilon$ and $\eta$ in Equations (\ref{weightnoisegeneral}) and
569: (\ref{multinputmeanmatrix}), respectively, sampled from Gaussian
570: distributions. The results, however, were virtually identical when the
571: distribution functions were either uniform or exponential. Thus, as
572: noted earlier, the exact shapes of the noise distributions do not
573: restrict the observed effect.
574:
575:
576: \subsection{Regularization by Noise}
577: \label{RegSect}
578:
579: Above, we mentioned that response noise tends to decrease the absolute
580: value of the optimal synaptic weights. Why is this? The reason is that
581: minimization of the mean square error in the presence of response
582: noise is mathematically equivalent to minimization of the same error
583: without response noise but with an imposed constraint forcing the
584: optimal weights to be small. This is as follows.
585:
586: Consider Equation (\ref{wopt}), which specifies the optimal weights in
587: the two-layer network. Response noise enters into the expression
588: through the correlation matrix. By separating the input responses into
589: mean plus noise, we have
590: \ba
591: \bm{C} & = & \left< (\ol{\bm{r}} + \bm{\eta})
592: (\ol{\bm{r}} + \bm{\eta})^{\tr} \right>
593: \nonumber \\
594: & = & \ol{\bm{r}} \, \ol{\bm{r}}^{\tr} +
595: \left< \bm{\eta} \bm{\eta}^{\tr} \right>
596: \nonumber \\
597: & = & \ol{\bm{r}} \, \ol{\bm{r}}^{\tr} +
598: \bm{D}_{\!\sigma} \, ,
599: \label{newcorr}
600: \ea
601: where we have assumed that the noise is additive and uncorrelated
602: across neurons (additivity is considered for simplicity but is not
603: necessary). This results in the diagonal matrix $\bm{D}_{\!\sigma}$
604: containing the variances of individual units, such that element $j$
605: along the diagonal is the total variance, summed over all stimuli, of
606: input neuron $j$. Thus, uncorrelated response noise adds a diagonal
607: matrix to the correlation between average responses. In that case,
608: Equation (\ref{wopt}) can be rewritten as
609: \b
610: \ol{\bm{W}} = \bm{F} \, \ol{\bm{r}}^\tr
611: \left( \ol{\bm{r}} \, \ol{\bm{r}}^{\tr}
612: + \bm{D}_{\!\sigma}
613: \right)^{-1} .
614: \label{wopt1}
615: \e
616:
617: Now consider the mean square error without any noise but with an
618: additional term that penalizes large weights. To restrict, for
619: instance, the total synaptic weight provided by each input neuron, add
620: the penalty term
621: \b
622: \frac{1}{KM} \sum_{i, j} \lambda_i \, w_{ij}^2
623: \label{wcost}
624: \e
625: to the original error expression, Equation (\ref{errormatrix}). Here,
626: $\lambda_i$ determines how much input neuron $i$ is taxed for its
627: total synaptic weight. Rewriting this as a trace, the total error to
628: be minimized in this case becomes
629: \b
630: E = \frac{1}{KM} \left(
631: \left< \mbox{Tr}
632: \left[(\bm{w} \ol{\bm{r}} - \bm{F})
633: (\bm{w} \ol{\bm{r}} - \bm{F})^\tr
634: \right]
635: \right> +
636: \mbox{Tr}\left(\bm{w}^{\tr} \bm{D}_{\!\lambda} \bm{w} \right)
637: \right) .
638: \e
639: where $\bm{D}_{\!\lambda}$ is a diagonal matrix that contains the
640: penalty coefficients $\lambda_i$ along the diagonal. The synaptic
641: weights that minimize this error function are given by
642: \b
643: \bm{F} \, \ol{\bm{r}}^\tr
644: \left( \ol{\bm{r}} \, \ol{\bm{r}}^{\tr}
645: + \bm{D}_{\!\lambda}
646: \right)^{-1} \! .
647: \label{wopt2}
648: \e
649: But this solution has exactly the same form as Equation (\ref{wopt1}),
650: which minimizes the error in the presence of response noise alone,
651: without any other constraints. Therefore, adding response noise is
652: equivalent to imposing a constraint on the magnitude of the synaptic
653: weights, with more noise corresponding to smaller weights. The penalty
654: term in Equation (\ref{wcost}) can also be interpreted as a
655: regularization term, which refers to a common type of constraint used
656: to force the solution of an optimization problem to vary
657: smoothly~\citep{Hint89,Hayk99}. Therefore, as has been pointed out
658: previously~\citep{Bish95}, the effect of response fluctuations can be
659: described as regularization by noise.
660:
661: In our model, we assumed that the fluctuations in synaptic connections
662: are proportional to their size. What happens, then, is that response
663: noise forces the optimal weights to be small, and this significantly
664: decreases the part of the error that depends on $\sigma_W$. In this
665: way, smaller synaptic weights --- and therefore a nonzero $\sigma_r$
666: --- typically lead to smaller output errors.
667:
668: Another way to look at the relationship between the two types of noise is
669: to calculate the optimal mean synaptic weights taking the synaptic
670: variability directly into account. For simplicity, suppose that there
671: is no response noise. Substitute Equation (\ref{weightnoisegeneral})
672: directly into Equation (\ref{errormatrix}) and minimize with respect
673: to $\ol{\bm{W}}$, now averaging over the synaptic fluctuations. With
674: multiplicative noise the result is again an expression similar to
675: Equations (\ref{wopt1}) and (\ref{wopt2}), where a correction
676: proportional to the synaptic variance is added to the diagonal of the
677: correlation matrix. In contrast, with additive synaptic noise the
678: resulting optimal weights are exactly the same as without any
679: variability, because this type of noise cannot be compensated for.
680: Therefore, the recipe for counteracting response noise is equivalent
681: to the recipe for counteracting multiplicative synaptic noise. An
682: argument outlining why this is generally true is presented in the
683: Discussion, Section~\ref{disc1}.
684:
685:
686: \subsection{Classification in Larger Networks}
687:
688: When the simple classification task is extended to larger numbers of
689: first-layer neurons ($N\!>2$) and more input stimuli to classify
690: ($M\!>2$), an important question can be studied: how does the
691: interaction between synaptic and response noise depend on the
692: dimensionality of the problem, that is, on $N$ and $M$? To address
693: this issue we did the following. Each entry in the $N\times M$ matrix
694: $\ol{\bm{r}}$ of mean responses was taken from a uniform distribution
695: between 0 and 1. The desired output still consisted of a single
696: neuron's response given by Equation (\ref{desiredoutput}), as before.
697: So, each one of the $M$ input stimuli evoked a set of $N$ neuronal
698: responses, each set drawn from the same distribution, and the output
699: neuron had to divide the $M$ evoked firing rate patterns into two
700: categories. The optimal amount of response noise was found, and the
701: process was repeated for different combinations of $N$ and $M$\@.
702:
703: \begin{figure*}[tb!]
704: \centerline{\epsfig{figure=fig3.eps,width=5.0in}}
705: \caption{\label{LargeNets}
706: Interaction between synaptic noise and response noise during the
707: classification of $M$ input stimuli. For each stimulus, the mean
708: responses of $N$ input neurons were randomly selected from a uniform
709: distribution between 0 and 1. The output unit of the network had to
710: classify the $M$ response patterns by producing either a 1 or a 0. The
711: synaptic noise SD was $\sigma_{W}=0.5$. Results (circles) are averages
712: over $1000$ networks and $100$ trials per network. All data are from
713: computer simulations.
714: (A) Relative error, \Emin$/E_{0}$, as a function of the number of
715: input neurons, $N$\@. The number of stimuli was kept constant at
716: $M\eq 10$.
717: (B) Optimal value of the response noise SD, \sigmin, as a function of
718: the number of input neurons, $N$\@. Same simulations as in A\@.
719: (C) Relative error as a function of the number of input stimuli,
720: $M$\@. The number of input neurons was kept constant at $N\eq 10$.
721: (D) Optimal value of the response noise SD as a function of $M$ for
722: the same simulations as in C\@. }
723: \end{figure*}
724:
725: The results from these simulations are shown in Fig.~3. All data
726: points were obtained with the same amount of synaptic variability,
727: $\sigma_W\eq 0.5$. Each point represents an average over 1000
728: networks for which the optimal connections were corrupted. The amount
729: of response noise that minimized the error, averaged over those 1000
730: corruption patterns, was found numerically by calculating the average
731: error with the same mean responses and corruption patterns but
732: different $\sigma_r$. For each combination of $N$ and $M$, this
733: resulted in \sigmin, which is shown in panel B\@. The actual average
734: error obtained with $\sigma_r\eq$ \sigmin\ divided by the error for
735: $\sigma_r\eq 0$ is shown in panel A, as in the previous figure.
736: Interestingly, the benefit conferred by response noise depends
737: strongly on the difference between $N$ and $M$\@. With $M\eq 10$ input
738: stimuli, the effect of response noise is maximized when $N\eq 10$
739: neurons are used to encode them (Fig.~3A); and viceversa, when there
740: are $N\eq 10$ neurons in the network, the maximum effect is seen when
741: they encode $M\eq 10$ stimuli (Fig.~3C). Results with other numbers
742: (5, 20 and 40 stimuli or neurons) were the same: response noise always
743: had a maximum impact when $N\eq M$\@.
744:
745: This is not unreasonable. When there are many more neurons than
746: stimuli, a moderate amount of synaptic corruption causes only a small
747: error, because there is redundancy in the connectivity matrix. On the
748: other hand, when there are many more input stimuli than neurons, the
749: error is large anyway, because the $N$ neurons cannot possibly span
750: all the required dimensions, $M$\@. Thus, at both extremes, the impact
751: of synaptic noise is limited. In contrast, when $N\eq M$ there is no
752: redundancy but the output error can potentially be very small, so the
753: network is most sensitive to alterations in synaptic connectivity.
754: Thus, response noise makes a big difference when the number of
755: responses and the number of independent stimuli encoded are equal or
756: nearly so. In Figs.~3A, C, the relative error is not zero for
757: $N\eq M$, but it is quite small
758: (\Emin\ $\eq 0.23$, \Emin$/E_0 \eq 0.004$). This is primarily because
759: the error without any response noise, $E_0$, can be very large.
760: Interestingly, the optimal amount of response noise also seems to be
761: largest when $N\eq M$, as suggested by Figs.~3B, D\@.
762:
763: In contrast to previous examples, for all data points in Fig.~3 the
764: fluctuations in the synapses and in the firing rates, $\epsilon$ and
765: $\eta$, were drawn from uniform rather than Gaussian distributions.
766: As mentioned before, the variances of the underlying distributions
767: should matter but their shapes should not. Indeed, with the same
768: variances, results for Fig.~3 were virtually identical with Gaussian
769: or exponential distributions.
770:
771: A potential concern in this network is that, although the variability
772: of the output neuron depends on the interaction between the two types
773: of noise, perhaps the interaction is of little consequence with
774: respect to actual classification performance. The relevant measure for
775: this is the probability of correct classification, $p_c$. This
776: probability is obtained by comparing the distributions of output
777: responses to stimuli in one category versus the other, which is
778: typically done using standard methods from signal detection
779: theory~\citep{dayan-2001}. The algorithm underlying the calculation is
780: quite simple: in each trial, the stimulus is assumed to belong to
781: class 1 if the output firing rate is below a threshold, otherwise the
782: stimulus belongs to class 2. To obtain $p_c$, the results should be
783: averaged over trials and stimuli. Finally, note that an optimal
784: threshold should be used to obtain the highest possible $p_c$. We
785: performed this analysis on the data in Fig.~3. Indeed, $p_c$ also
786: depended non-monotonically on response variability. For instance, for
787: $N\eq M\eq 10$ the values with and without response noise were
788: $p_c(\sigma_r\!= $\sigmin$)\eq 0.83$ and
789: $p_c(\sigma_r\eq 0)\eq 0.75$,
790: where chance performance corresponds to 0.5. Also, the maximum benefit
791: of response noise occurred for $N\eq M$ and decreased quickly as the
792: difference between $N$ and $M$ grew, as in Figs.~3A, C. However, the
793: amount of response noise that maximized $p_c$ was typically about one
794: third of the amount that minimized the mean square error. Thus, the
795: best classification probability for $N\eq M\eq 10$ was
796: $p_c(\sigma_r\eq 0.13)\eq 0.91$.
797: Maximizing $p_c$ is not equivalent to minimizing the mean square
798: error; the two quantities weight differently the bias and variance of
799: the output response (see Haykin, 1999). Nevertheless, response noise
800: can also counteract part of the decrease in $p_c$ due to synaptic
801: noise, so its beneficial impact on classification performance is real.
802:
803:
804: \section{Noise Interactions in a Sensory-Motor Network}
805:
806: To illustrate the interactions between synaptic and response noise in
807: a more biologically realistic situation, we apply the general approach
808: outlined in Section~\ref{general} to a well-known model of
809: sensory-motor integration in the brain. We consider the classic
810: coordinate transformation problem in which the location of an object,
811: originally specified in retinal coordinates, becomes independent of
812: gaze angle. This type of computation has been thoroughly studied both
813: experimentally~\citep{AES85,BASG95} and
814: theoretically~\citep{Zipser88,Salinas+Abbott:1995,PS97}, and is
815: thought to be the basis for generating representations of object
816: location relative to the body or the world. Also, the way in which
817: visual and eye-position signals are integrated here is an example of
818: what seems to be a general principle for combining different
819: information streams in the brain~\citep{ST00,Salinas+Sejnowski:2001}.
820: Such integration by 'gain modulation' may have wide applicability in
821: diverse neural circuits~\citep{Salinas-2004-2}, so it represents a
822: plausible and general situation in which computational accuracy is
823: important.
824:
825: From the point of view of the phenomenon at hand, the constructive
826: effect of response noise, this example addresses an important issue:
827: whether the noise interaction is still observed when network
828: performance depends on a population of output neurons. In the
829: classification task, performance was quantified through a single
830: neuron's response, but in this case it depends on a nonlinear
831: combination of multiple firing rates, so maybe the impact of response
832: noise washes out in the population average. As shown below, this is
833: not the case.
834:
835: \begin{figure*}[tb!]
836: \centerline{\epsfig{figure=fig4.eps,height=6.0in}}
837: \caption{\label{inoutSMMaps}
838: Network model of a sensory-motor transformation. In this network,
839: $N\eq 400$, $K\eq 25$, $M\eq 400$. Target and movement directions, $x$
840: and $z$, respectively, vary between $-25$ and $25$, whereas gaze angle
841: $y$ varies between $-15$ and $15$. The graphs correspond to a single
842: trial in which $x\eq -10$, $y\eq 10$ and $z\eq x \! - \! y\eq -20$.
843: Neither response noise nor synaptic corruption were included in this
844: example.
845: (A) Firing rates of the 400 gain-modulated input neurons arranged
846: according to preferred stimulus location.
847: (B) Network architecture.
848: (C) Firing rates of the 25 output motor neurons arranged according to
849: preferred target location. }
850: \end{figure*}
851:
852: The sensory-motor network has, as before, a feedforward architecture
853: with two layers. The first layer contains $N$ gain-modulated sensory
854: units and the second or output layer contains $K$ motor units. Each
855: sensory neuron is connected to all output neurons through a set of
856: feedforward connections, as illustrated in Fig.~4B\@. The sensory
857: neurons are sensitive to two quantities, the location (or direction)
858: of a target stimulus $x$, which is in retinal coordinates, and the
859: gaze (or eye-position) angle $y$. The network is designed so that
860: the motor layer generates or encodes a movement in a direction $z$,
861: which represents the direction of the target relative to the head.
862: The idea is that the profile of activity of the output neurons should
863: have a single peak centered at direction $z$. The correct (i.e.,
864: desired) relationship between inputs and outputs is $z\eq x\!-\!y$,
865: which is approximately how the angles $x$ and $y$ should be combined
866: in order to generate a head-centered representation of target
867: direction~\citep{Zipser88,Salinas+Abbott:1995,PS97}. In other words,
868: $z$ is the quantity encoded by the output neurons and it should
869: relate to the quantities encoded by the sensory neurons through the
870: function $z(x, y)\eq x\!-\!y$. Many other functions are possible, but
871: as far as we can tell, the choice has little impact on the qualitative
872: effect of response noise.
873:
874: In this model, the mean firing rate of sensory neuron $i$ is
875: characterized by a product of two tuning functions, $f_i(x)$ and
876: $g_i(y)$, such that
877: \b
878: \ol{r}_i(x, y) = r_{\mathrm{max}} \,
879: f_i(x)\left(1 - D + D\, g_i(y)\right) + r_{B} ,
880: \label{rateGM}
881: \e
882: where $r_{B}\eq 4$ spikes/s is a baseline firing rate,
883: $r_{\mathrm{max}}\eq 35$ spikes/s and $D$ is the modulation depth,
884: which is set to 0.9 throughout. The sensory neurons are gain modulated
885: because they combine the information from their two inputs
886: nonlinearly. The amplitude --- but not the selectivity --- of a
887: visually-triggered response, represented by $f_i(x)$, depends on the
888: direction of gaze~\citep{AES85,BASG95,ST00}. Note that, in the
889: expression above, the second index of the mean rate $\ol{r}_{ij}$ has
890: been replaced by parentheses indicating a dependence on $x$ and $y$.
891: This is to simplify the notation; the responses can still be arranged
892: in a matrix $\ol{\bm{r}}$ if each value of the second index is
893: understood to indicate a particular combination of values of $x$ and
894: $y$. For example, if the rates were evaluated in a grid with 10 $x$
895: points and 10 $y$ points, the second index would run from 1 to 100,
896: covering all combinations. Indeed, this is how it is done in the
897: computer.
898:
899: For simplicity, the tuning curves for different neurons in a given
900: layer are assumed to have the same shape but different preferred
901: locations or center points, which are always between $-25$ and $25$.
902: Visual responses are modeled as Gaussian tuning functions of stimulus
903: location $x$,
904: \b
905: f_i(x) = \exp\left(-\frac{\left(x - a_i\right)^2}{2\sigma_f^2}\right) ,
906: \label{xtun}
907: \e
908: where $a_i$ is the preferred location and $\sigma_f\eq 4$ is the
909: tuning curve width. The dependence on eye position is modeled using
910: sigmoidal functions of the gaze angle $y$,
911: \b
912: g_i(y) = \frac{1}{1 + \exp(-(b_i-y)/d_i)} \, ,
913: \label{ytun}
914: \e
915: where $b_i$ is the center point of the sigmoid and $d_i$ is chosen
916: randomly between $-7$ and $+7$ to make sure that the curves $g_i(y)$
917: have different slopes for different neurons in the array. In each
918: trial of the task, response variability is included by applying a
919: variant of Equation (\ref{inputnoise1}),
920: \b
921: r_{ij} = \ol{r}_{ij} + \sqrt{\ol{r}_{ij}} \, \eta_{ij} .
922: \label{inputnoise2}
923: \e
924: This makes the variance of the rates proportional to their means,
925: which in general is in good agreement with experimental data
926: \citep{Dean81,nc:Softky+Koch:1992,SK93,HSKD96}. This choice, however,
927: is not critical (see below). The desired response for each output
928: neuron is also described by a Gaussian,
929: \b
930: F_k(z) = r_{\mathrm{max}} \, \exp\!\left(
931: -\frac{\left(z - c_k\right)^2}{2\sigma_F^2}
932: \right) + r_{B} ,
933: \label{Fout}
934: \e
935: where $\sigma_F\eq 4$ and $c_k$ is the preferred target direction of
936: motor neuron $k$. This expression gives the intended response of
937: output unit $k$ in terms of the encoded quantity $z$. Keep in mind,
938: however, that the desired dependence on the sensory inputs is obtained
939: by setting $z\eq x\!-\!y$. When driven by the first-layer neurons,
940: the output rates are still calculated through a weighted sum,
941: \b
942: R_{k}(z) = R_{k}(x, y) = \sum_{i=1}^N W_{ki} \, r_{i}(x, y) .
943: \label{Rdriv1}
944: \e
945: This is equivalent to Equation (\ref{Rdriv}) but with the second index
946: defined implicitly through $x$ and $y$, as mentioned above. The
947: optimal synaptic connections $\ol{W}_{ki}$ are determined exactly as
948: before, using Equation~(\ref{wopt}).
949:
950: Typical profiles of activity for input and output neurons are shown in
951: Figs.~4A, C for a trial with $x\eq -10$ and $y\eq 10$. The sensory
952: neurons are arranged according to their preferred stimulus location
953: $a_i$, whereas the motor neurons are arranged according to their
954: preferred movement direction $c_k$. For this sample trial no
955: variability was included; the firing rate values in Fig.~4A are
956: scattered under a Gaussian envelope (given by Equation (\ref{xtun}))
957: because the gaze-dependent gain factors vary across cells. Also, the
958: output profile of activity is Gaussian and has a peak at the point
959: $z\eq -20$, which is exactly where it should be given that the correct
960: input-output transformation is $z\eq x\!-\!y$. With noise, the output
961: responses would be scattered around the Gaussian profile and the peak
962: would be displaced.
963:
964: The error used to measure network performance is, in this case,
965: \b
966: E_{\mathrm{pop}} = \left< \, \left| z - Z \right| \, \right> .
967: \label{SMMerror}
968: \e
969: This is the absolute difference, averaged over trials and networks,
970: between the desired movement direction $z$ --- the actual
971: head-centered target direction --- and the direction $Z$ that is
972: encoded by the center of mass of the output activity,
973: \b
974: Z = \frac{\sum_i \, (R_i - r_{\!B})^2 \, c_i}
975: {\sum_k \, (R_k - r_{\!B})^2} \, .
976: \label{centermass}
977: \e
978: Therefore, Equation (\ref{SMMerror}) gives the accuracy with which the
979: whole motor population represents the head-centered direction of the
980: target, whereas Equation (\ref{centermass}) provides the recipe to
981: read out such output activity. Now the idea is to corrupt the optimal
982: connections and evaluate \Epop\ using various amounts of response
983: noise to determine whether there is an optimum. Relative to the
984: previous examples, the key differences are, first, that the error in
985: (\ref{SMMerror}) represents a population average, and second, that
986: although the connections are set to minimize the average difference
987: between desired and driven firing rates, the performance criterion is
988: not based directly on it.
989:
990: \begin{figure*}
991: \centerline{\epsfig{figure=fig5.eps,width=5.0in}}
992: \caption{\label{NoiseSMMaps}
993: Noise interaction for the sensory-motor network depicted in Fig.~4.
994: Results are averaged over $100$ networks and $100$ trials per network.
995: All data are from computer simulations.
996: (A) Average absolute deviation between actual and encoded target
997: locations, \Epop, as a function of response noise. Continuous lines
998: are for three probabilities of weight elimination, $p_W\eq 0.1$, 0.3
999: and 0.5; the dashed line corresponds to $p_W\eq 0$.
1000: (B) Magnitude of the noise interaction, measured by the relative error
1001: \Emin$/E_0$, as a function of the number of input neurons, $N$, for
1002: $p_W\eq 0.2$.
1003: (C) \Emin\ and \Emin$/E_0$ as functions of $p_W$.
1004: (D) Optimal response noise SD, \sigmin, as a function of $p_{W}$. }
1005: \end{figure*}
1006:
1007: Simulation results for this sensory-motor model are presented in
1008: Fig.~5. A total of 400 sensory and 25 output neurons were used. These
1009: units were tested with all combinations of 20 values of $x$ and 20
1010: values of $y$, uniformly spaced (thus, $M\eq 400$). Synaptic noise
1011: was generated by random weight elimination. This means that, after
1012: having set the connections to their optimal values given by
1013: Equation~(\ref{wopt}), each one was reset to zero with a probability
1014: $p_W$. Thus, on average, a fraction $p_W$ of the weights in each
1015: network was eliminated. As shown in Fig.~5A, when $p_W\! >\! 0$, the
1016: error between the encoded and the true target direction has a minimum
1017: with respect to $\sigma_r$. These error curves represent averages
1018: over 100 networks. Interestingly, the benefit of noise does not
1019: decrease when more sensory units are included in the first layer
1020: (Fig.~5B). That is, if $p_W$ is constant, the proportion of
1021: eliminated synapses does not change, so the error caused by synaptic
1022: corruption cannot be reduced simply by adding more neurons.
1023:
1024: Figure 5C shows the minimum and relative errors as functions of $p_W$.
1025: This graph highlights the substantial impact that response noise has
1026: on this network: the relative error stays below 0.2 even when about a
1027: third of the synapses are eliminated. This is not only because the
1028: error without response noise is high, but also because the error with
1029: an optimal amount of noise stays low. For instance, with $p_W\eq 0.3$
1030: and $\sigma_r\eq$ \sigmin, the typical deviation from the correct
1031: target direction is about 2 units, whereas with $\sigma_r\eq 0$ the
1032: typical deviation is about 10. Response noise thus cuts the deviation
1033: by about a factor of five, and importantly, the resulting error is
1034: still small relative to the range of values of $z$, which spans 50
1035: units. Also, as observed in the classification task, in general it is
1036: better to include response noise even if $\sigma_r$ is not precisely
1037: matched to the amount of synaptic variability (Fig.~5A).
1038:
1039: Figure 5D plots \sigmin\ as a function of the probability of synaptic
1040: elimination. The optimal amount of response noise increases with $p_W$
1041: and reaches fairly high levels. For instance, at a value of 1, which
1042: corresponds to $p_W$ near 0.15, the variance of the firing rates is
1043: equal to their mean, because of Equation (\ref{inputnoise2}). We
1044: wondered whether the scaling law of the response noise would make any
1045: difference, so we reran the simulations with either additive noise (SD
1046: independent of mean) or noise with an SD proportional to the mean, as
1047: in Equation (\ref{inputnoise1}). Results in these two cases were very
1048: similar: \Emin\ and \Emin$/E_0$ varied very much like in Fig.~5C, and
1049: the optimal amount of noise grew monotonically with $p_W$, as in
1050: Fig.~5D\@.
1051:
1052:
1053: \section{Noise Interactions in a Recurrent Network}
1054:
1055: The networks discussed in the previous sections had a feedforward
1056: architecture, and in those cases the contribution of response noise to
1057: the correlation matrix between neuronal responses could be determined
1058: analytically. In contrast, in recurrent networks the dynamics are more
1059: complex and the effects of random fluctuations more difficult to
1060: ascertain. To investigate whether response noise can still counteract
1061: some of the effects of synaptic variability, we consider a recurrent
1062: network with a well-defined function and relatively simple dynamics
1063: characterized by attractor states. When the firing rates in this
1064: network are initialized at arbitrary values, they eventually stop
1065: changing, settling down at certain steady-state points in which some
1066: neurons fire intensely and others do not. The optimal weights sought
1067: are those that allow the network to settle at predefined sets of
1068: steady-state responses, and the error is thus defined in terms of the
1069: difference between the desired steady states and the observed ones. As
1070: before, response noise is taken into account when the optimal synaptic
1071: weights are generated, although in this case the correction it
1072: introduces (relative to the noiseless case) is an approximation.
1073:
1074: The attractor network consists of $N$ continuous-valued neurons, each
1075: of which is connected to all other units via feedback synaptic
1076: connections~\citep{hertz91b}. With the proper connectivity, such
1077: network can generate, without any tuned input, a steady-state profile
1078: of activity with a cosine or Gaussian
1079: shape~\citep{BBS95,CompteCortex00,Sali03}. Such stable `bump'-shaped
1080: activity is observed in various neural models, including those for
1081: cortical hypercolumns~\citep{Hansel-Sompolinsky-98}, head-direction
1082: cells~\citep{Zhang1996,nc:laing+chow:2001} and working memory
1083: circuits~\citep{CompteCortex00}. Below, we find the connection matrix
1084: that allows the network to exhibit a unimodal activity profile
1085: centered at any point within the array.
1086:
1087: \subsection{Optimal Synaptic Weights in a Recurrent Architecture}
1088:
1089: The dynamics of the network are determined by the equation
1090: \b
1091: \tau \frac{d r_i}{d t} = -r_i
1092: + h \! \left( \sum_j W_{ij} \, r_j\right)
1093: + \eta_i \, ,
1094: \label{RNNmain}
1095: \e
1096: where $\tau\eq 10 $ is the integration time constant, $r_i$ is the
1097: response of neuron $i$, and $h$ is the activation function of the
1098: cells, which relates total current to firing rate. The sigmoid
1099: function
1100: $h(x) = 1/(1 + \exp(-x))$
1101: is used, but this choice is not critical. As before, $\eta_i$
1102: represents the response fluctuations, which are drawn independently
1103: for each neuron in every time step. In this case they are Gaussian,
1104: with zero mean and a variance $\sigma_r^2/\Delta t$. The variance of
1105: $\eta_i$ is divided by the integration time step $\Delta t$ to
1106: guarantee that the variance of the rate $r_i$ remains independent of
1107: the time step~\citep{VanK01}.
1108:
1109: For our purposes, manipulating this type of network is easier if the
1110: equations are expressed in terms of the total input currents to the
1111: cells~\citep{hertz91b,dayan-2001}. If the current for neuron $i$ is
1112: $u_i \eq \sum_j W_{ij} \, r_j$, then
1113: \b
1114: \tau \frac{d u_i}{d t} = -u_i + \sum_j W_{ij}
1115: \left( h(u_j) + \eta_j \right) ,
1116: \label{RNNmain1}
1117: \e
1118: is equivalent to Equation (\ref{RNNmain}) above.
1119: \begin{figure*}
1120: \centerline{\epsfig{figure=fig6.eps,width=5.0in}}
1121: \caption{Steady-state responses of a recurrent neural network with 20
1122: neurons. Results show the input currents of all units after 1000 ms
1123: of simulation time, with responses evolving according to
1124: Equation~(\ref{RNNmain1}). Each neuron is labeled by an angle between
1125: -180\deg\ and 180\deg.
1126: (A) Steady-state responses for four sets of initial conditions with
1127: peaks near units \mbox{-90\deg}, 0\deg, +90\deg and 180\deg. The
1128: observed activity profiles are indistinguishable from the desired
1129: Gaussian curves. Neither synaptic nor response noise were included in
1130: this example.
1131: (B) Steady-state responses with and without noise. The desired
1132: activity profile is indicated by the solid line. The dotted line
1133: corresponds to the activity observed with noise after 1000 ms of
1134: simulation time, having started with an initial condition equal to the
1135: desired steady state. Vertical lines indicate the locations of the
1136: corresponding centers of mass. The absolute deviation is 34\deg.
1137: Here, $\sigma_r\eq 0.3$ and $p_W\eq 0.02$. }
1138: \end{figure*}
1139: A stationary solution of Equation (\ref{RNNmain1}) without input noise
1140: is such that all derivatives become zero. This corresponds to an
1141: attractor state $\alpha$ for which
1142: \b
1143: u_i^{\alpha} = \sum_j W_{ij} \, h(u_j^{\alpha}) .
1144: \label{SScondition}
1145: \e
1146: The label $\alpha$ is used because the network may have several
1147: attractors or sets of fixed points. The desired steady-state currents
1148: are denoted as $U_i^{\alpha}$. These are Gaussian profiles of activity
1149: such that, during steady state $\alpha\eq 1$, neuron 1 is the most
1150: active (i.e., the Gaussian is centered at neuron 1), during steady
1151: state $\alpha\eq 2$, neuron 2 is the most active, and so on. Figure 6
1152: illustrates the activity of the network at four steady states in the
1153: absence of noise ($\sigma_W\eq 0\eq \sigma_r$). To make the network
1154: symmetric, the neurons were arranged in a ring, so their activity
1155: profiles wrap around. Because of this, each neuron is labeled with an
1156: angle. The observed currents $u_i$ settle down at values that are
1157: almost exactly equal to the desired ones, $U_i^{\alpha}$. The synaptic
1158: connections that achieved this match were found by enforcing the
1159: steady-state condition (\ref{SScondition}) for the desired attractors.
1160: That is, we minimized
1161: \b
1162: E = \frac{1}{N_A} \sum_{\alpha = 1}^{N_A} \sum_{i} \left(
1163: U_i^{\alpha} - \sum_j W_{ij} \, h(U_j^{\alpha})
1164: \right)^{\!2} ,
1165: \label{RNNerror}
1166: \e
1167: where $U_i^{\alpha}$ is a (wrap-around) Gaussian function of $i$
1168: centered at $\alpha$ and $N_A$ is the number of attractors; in the
1169: simulations $N_A$ is always equal to the number of neurons, $N$\@.
1170: This procedure leads to an expression for the optimal weights
1171: equivalent to Equation (\ref{wopt}). Thus, without response noise,
1172: \b
1173: \ol{\bm{W}} = \bm{L} \, \bm{C}^{-1} ,
1174: \label{RNNwopt}
1175: \e
1176: where
1177: \ba
1178: L_{ij} & = & \frac{1}{N_A} \sum_{\alpha}
1179: U_i^{\alpha} \, h(U_j^{\alpha}) \nonumber \\
1180: C_{ij} & = & \frac{1}{N_A} \sum_{\alpha}
1181: h(U_i^{\alpha}) \, h(U_j^{\alpha}) \, .
1182: \label{RNNcorr}
1183: \ea
1184: To include the effects of response noise, we add a correction to the
1185: diagonal of the correlation matrix, as in the previous cases (see
1186: Section \ref{RegSect}). We thus set
1187: \b
1188: C_{ij} = \frac{1}{N_A} \sum_{\alpha}
1189: h(U_i^{\alpha}) h(U_j^{\alpha})
1190: + \delta_{ij} \, a \, \frac{\sigma_r^2}{2 \tau} ,
1191: \label{RNNapprox}
1192: \e
1193: where $a$ is a proportionality constant. The rationale for this is as
1194: follows.
1195:
1196: Strictly speaking, Equation (\ref{RNNmain1}) with response noise does
1197: not have a steady state. But consider the simpler case of a single
1198: variable $u$ with a constant asymptotic value $u_{\infty}$, such that
1199: \b
1200: \tau \frac{d u}{d t} = -u + u_{\infty} + \eta .
1201: \label{singleu}
1202: \e
1203: If the trajectory $u(t)$ from $t\eq 0$ to $t\eq T$ is calculated many
1204: times, starting from the same initial condition, the distribution of
1205: endpoints $u(T)$ has a well-defined mean and variance, which vary
1206: smoothly as functions of $T$\@. The mean is always equal to the
1207: endpoint that would be observed without noise, whereas for $T$ much
1208: longer than the integration time constant $\tau$, the variance is
1209: equal to the variance of the fluctuations on the right hand side of
1210: Equation (\ref{singleu}) divided by $2\tau$~\citep{VanK01}. These
1211: considerations suggest that we minimize
1212: \b
1213: E = \frac{1}{N_A} \sum_{\alpha,i} \left(
1214: U_i^{\alpha} - \sum_j W_{ij} \,
1215: \left( h(U_j^{\alpha}) + a \, \tilde{\eta}_j \right)
1216: \right)^{\!2} ,
1217: \label{RNNerror1}
1218: \e
1219: where the variance of $\tilde{\eta}_j$ is $\sigma_r^2/(2\tau)$. This
1220: leads to Equation (\ref{RNNwopt}) with the corrected correlation
1221: matrix given by (\ref{RNNapprox}).
1222:
1223: \subsection{Performance of the Attractor Network}
1224:
1225: To evaluate the performance of this network, we compare the center of
1226: mass of the desired activity profile to that of the observed profile
1227: tracked during a period of time. For a particular attractor $\alpha$,
1228: the network is first initialized very close to that desired steady
1229: state, then Equation (\ref{RNNmain1}) is run for 1000 ms (100 time
1230: constants $\tau$), and the absolute difference between the initial and
1231: the current centers of mass is recorded during the last 500 ms. The
1232: error for the recurrent networks \Erec\ is defined as the absolute
1233: difference averaged over this time period and all attractor states,
1234: ie., all values of $\alpha$. Also, when there is synaptic noise, an
1235: additional average over networks is performed. This error function is
1236: similar to Equation (\ref{SMMerror}), except that the circular
1237: topology is taken into account. Thus, \Erec\ is the mean absolute
1238: difference between desired and observed centers of mass. It is
1239: expressed in degrees.
1240:
1241: \begin{figure*}[t!]
1242: \centerline{\epsfig{figure=fig7.eps,width=5.0in}}
1243: \caption{Interaction between synaptic and response noise in
1244: recurrent networks. (A) Average absolute difference between desired
1245: and observed centers of mass as a function of $\sigma_r$. Units are
1246: degrees. The different curves are for $a\eq 0$, 1.5, 1 and 0.5, from
1247: left to right. The lowest curve (dashed) was obtained with $a\eq
1248: 0.5$, confirming that the synaptic weights are optimized when
1249: response noise is taken into account. (B) Average error \Erec\ as a
1250: function of response noise. Continuous lines are for three
1251: probabilities of weight elimination $p_W\eq 0.005$, 0.015 and 0.025;
1252: the dashed line corresponds to $p_W\eq 0$. Here and in the
1253: following panels, $a\eq 0.5$. (C) \Emin$/E_0$ (left y-axis) and
1254: \Emin\ (right y-axis) as functions of $p_W$. (D) Optimal response
1255: noise SD, \sigmin, as a function of $p_{W}$ for the same data in C.
1256: }
1257: \end{figure*}
1258:
1259: Before exploring the interaction between synaptic and response
1260: noise, we used \Erec\ to test whether the noise-dependent correction
1261: to the correlation matrix in Equation (\ref{RNNapprox}) was
1262: appropriate. To do this, a recurrent network without synaptic
1263: fluctuations was simulated multiple times with different values of
1264: the parameter $a$ and various amounts of response noise. The desired
1265: attractors were kept constant. The resulting error curves are shown
1266: in Fig.~7A\@. Each one gives the average absolute deviation between
1267: desired and observed centers of mass as a function of $\sigma_r$ for
1268: a different value of $a$. The dependence on $a$ was non-monotonic.
1269: The optimal value we found was 0.5, which corresponds to the lowest
1270: curve (dashed) in the figure. This curve was well below the one
1271: observed without adjusting the synaptic weights. Therefore, the
1272: correction was indeed effective.
1273:
1274: Figure 7B shows \Erec\ as a function of $\sigma_r$ when synaptic
1275: noise is also present in the recurrent network. The three solid
1276: curves correspond to nets in which synapses were randomly eliminated
1277: with probabilities $p_W\eq 0.005$, 0.015 and 0.025. As with previous
1278: network architectures, a non-zero amount of response noise improves
1279: performance relative to the case where no response noise is
1280: injected. In this case, however, the mean absolute error is already
1281: about 25\deg at the point at which response noise starts making a
1282: difference, around $p_W\eq 0.005$ (Fig.\ 7C). This is not
1283: surprising: these types of networks are highly sensitive to changes
1284: in their synapses, so even small mismatches can lead to large
1285: errors~\citep{SLRT00,RSW03}. Also, Fig.~7C shows that the ratio
1286: \Emin$/E_0$ does not fall below 0.6, so the benefit of noise is not
1287: as large as in previous examples. The effect was somewhat weaker
1288: when synaptic variability was simulated using Gaussian noise with SD
1289: $\sigma_W$ instead of random synaptic elimination. Nevertheless, it
1290: is interesting that the interaction between synaptic and response
1291: noise is observed at all under these conditions, given that the
1292: response dynamics are richer and that the minimization of Equation
1293: (\ref{RNNerror1}) may not be the best way to produce the desired
1294: steady-state activity.
1295:
1296:
1297: \section{Discussion}
1298:
1299: \subsection{Why are Synaptic and Response Fluctuations Equivalent?}
1300: \label{disc1}
1301:
1302: We have investigated the simultaneous action of synaptic and response
1303: fluctuations on the performance of neural networks and found an
1304: interaction or equivalence between them: when synaptic noise is
1305: multiplicative, its effect is similar to that of response noise. At
1306: heart, this is a simple consequence of the product of responses and
1307: synaptic weights contained in most neural models, which has the form
1308: $\sum_j W_j r_j$. With multiplicative noise in one of the variables,
1309: this weighted sum turns into $\sum_j W_j (1 + \xi_j) r_j$, which is
1310: the same whether it is the synapse or the response that fluctuates. In
1311: either case, the total stochastic component $\sum_j W_j \xi_j r_j$
1312: scales with the synaptic weights. The same result is obtained with
1313: additive response noise. Additive synaptic noise behaves differently,
1314: however. It instead leads to a total fluctuation $\sum_j \xi_j r_j$
1315: that is independent of the mean weights. Evidently, in this case the
1316: mean values of the weights have no effect on the size of the
1317: fluctuations. Thus, the key requirement for some form of equivalence
1318: between the two noise sources is that the synaptic fluctuations must
1319: depend on the strength of the synapses.
1320:
1321: This condition was applied to the three sets of simulations presented
1322: above, which corresponded to the classification of arbitrary response
1323: patterns, a sensory-motor transformation, and the generation of
1324: multiple self-sustained activity profiles. This selection of problems
1325: was meant to illustrate the generality of the observations outlined in
1326: the above paragraph. And indeed, although the three problems differed
1327: in many respects, the results were qualitatively the same.
1328:
1329: We should also point out that, in all the simulations, the criterion
1330: used to determine the optimality of the synaptic weights was based on
1331: a mean square error. But perhaps the noise interaction changes when a
1332: different criterion is used. To investigate this, we performed
1333: additional simulations of the small $2\! \times\! 1$ network in which
1334: the optimal synaptic weights were those that minimized a mean absolute
1335: deviation; thus, the square in Equation (\ref{error}) was substituted
1336: with an absolute value. In this case everything proceeded as before,
1337: except that the mean weight values $\ol{W}$ had to be found
1338: numerically. For this, the averages were performed explicitly and the
1339: downhill simplex method was used to search for the best
1340: weights~\citep{PFTV92}. The results, however, were very similar to
1341: those in Fig.~2A\@. Although the shapes of the curves were not
1342: exactly the same, the relative and minimum errors found with the
1343: absolute value varied very much like with the mean-square error
1344: criterion as functions of $\sigma_W$. Therefore, our conclusions do
1345: not seem to depend strongly on the specific function used to weight
1346: the errors and find the best synaptic connection values.
1347:
1348: \subsection{When Should Response Noise Increase?}
1349: \label{disc2}
1350:
1351: According to the argument above, the most general way to state our
1352: results is this: assuming that neuronal activities are determined by
1353: weighted sums, any mechanism that is able to dampen the impact of
1354: response noise will automatically reduce the impact of multiplicative
1355: synaptic noise as well. Furthermore, we suggest that under some
1356: circumstances it is better to add more response noise and increase the
1357: dampening factor, than ignore the synaptic fluctuations altogether.
1358: There are two conditions for this scenario to make sense. (1) The
1359: network must be highly sensitive to changes in connectivity. This can
1360: be seen, for instance, in Fig.~3A, which shows that the highest
1361: benefit of response noise occurs when the number of neurons matches
1362: the number of conditions to be satisfied --- it is at this point that
1363: the connections need to be most accurate. (2) The fluctuations in
1364: connectivity cannot be evaluated directly. That is, why not take into
1365: account the synaptic noise in exactly the same way as the response
1366: noise when the optimal connections are sought? For example, the
1367: average in Equation (\ref{errormatrix}) could also include an average
1368: over networks (synaptic fluctuations), in which case the optimal mean
1369: weights would depend not only on $\sigma_r$ but also on $\sigma_W$. In
1370: the simulations this could certainly be done, and would lead to
1371: smaller errors. But we explicitly consider the possibility that either
1372: $\sigma_W$ is unknown a priori, or there is no separate biophysical
1373: mechanism for implementing the corresponding corrections to the
1374: synaptic connections.
1375:
1376: Condition number 2 is not unreasonable. Realistic networks with high
1377: synaptic plasticity must incorporate mechanisms to ensure that ongoing
1378: learning does not disrupt their previously acquired functionality.
1379: Thus, synaptic modifications rules need to achieve two goals: to
1380: establish new associations that are relevant for the current
1381: behavioral task, and to make adjustments to prevent interference from
1382: other, future associations. The latter may be particularly difficult
1383: to achieve if learning rates change unpredictably with time. It is
1384: not clear whether plausible (e.g., local) synaptic modification
1385: mechanisms could solve both problems simultaneously (see Hopfield and
1386: Brody, 2004), but the present results suggest an alternative: synaptic
1387: modification rules could be used exclusively to learn new associations
1388: based on current information, whereas response noise could be used to
1389: indirectly make the connectivity more robust to synaptic fluctuations.
1390: Although this mechanism evidently doesn't solve the problem of
1391: combining multiple learned associations, it might alleviate it. Its
1392: advantage is that, assuming that neural circuits have evolved to
1393: adaptively optimize their function in the face of true noise, simply
1394: increasing their response variability would generate synaptic
1395: connectivity patterns that are more resistant to fluctuations.
1396:
1397: \subsection{When is Synaptic Noise Multiplicative?}
1398: \label{disc3}
1399:
1400: The condition that noise should be multiplicative means that changes
1401: in synaptic weight should be proportional to the magnitude of the
1402: weight. Evidently, not all types of synaptic modification processes
1403: lead to fluctuations that can be statistically modeled as
1404: multiplicative noise; for instance, saturation may prevent positive
1405: increases, thus restricting the variability of strong synapses.
1406: However, synaptic changes that generally increase with initial
1407: strength should be reasonably well approximated by the multiplicative
1408: model. Random synapse elimination fits this model because, if a weak
1409: synapse disappears, the change is small, whereas if a strong synapse
1410: disappears, the change is large. Thus, the magnitude of the changes
1411: correlates with initial strength. Another procedure that corresponds
1412: to multiplicative synaptic noise is this. Suppose the size of the
1413: synaptic changes is fixed, so that weights can only vary by
1414: $\pm \delta w$, but suppose also that the probability of suffering a
1415: change increases with initial synaptic strength. In this case, all
1416: changes are equal, but on average a population of strong synapses
1417: whould show higher variability than a population of weak ones. In
1418: simulations, the disruption caused by this type of synaptic corruption
1419: is indeed lessened by response noise (data not shown).
1420:
1421: \subsection{Final Remarks}
1422: \label{disc4}
1423:
1424: To summarize, the scenario we envision rests on five critical
1425: assumptions: (1) the activity of each neuron depends on
1426: synaptically-weighted sums of its (noisy) inputs, (2) network
1427: performance is highly sensitive to changes in synaptic connectivity,
1428: (3) synaptic changes unrelated to a function that has already been
1429: learned can be modeled as multiplicative noise, (4) synaptic
1430: modification mechanisms are able to take into account response noise,
1431: so synaptic strengths are adjusted to minimize its impact, but (5)
1432: synaptic modification mechanisms do not directly account for future
1433: learning. Under these conditions, our results suggest that increasing
1434: the variability of neuronal responses would, on average, result in
1435: more accurate performance. Although some of these assumptions may be
1436: rather restrictive, the diversity of synaptic plasticity mechanisms
1437: together with the high response variability observed in many areas of
1438: the brain make this constructive noise effect worth considering.
1439:
1440: \subsubsection*{Acknowledgments.}
1441: Research was supported by NIH grant NS044894.
1442:
1443: %\bibliography{neuroscience}
1444: \begin{thebibliography}{}
1445:
1446: \bibitem[Andersen et~al., 1985]{AES85}
1447: Andersen, R.~A., Essick, G.~K., and Siegel, R.~M. (1985).
1448: \newblock Encoding of spatial location by posterior parietal neurons.
1449: \newblock {\em Science}, 230:450--458.
1450:
1451: \bibitem[Ben-Yishai et~al., 1995]{BBS95}
1452: Ben-Yishai, R., Bar-Or, R.~L., and Sompolinsky, H. (1995).
1453: \newblock Theory of orientation tuning in visual cortex.
1454: \newblock {\em PNAS}, 92:3844--3848.
1455:
1456: \bibitem[Bishop, 1995]{Bish95}
1457: Bishop, C.~M. (1995).
1458: \newblock Training with noise is equivalent to tikhonov regularization.
1459: \newblock {\em Neural Computation}, 7:108--116.
1460:
1461: \bibitem[Brotchie et~al., 1995]{BASG95}
1462: Brotchie, P.~R., Andersen, R.~A., Snyder, L.~H., and Goodman, S.~J.
1463: (1995).
1464: \newblock Head position signals used by parietal neurons to encode locations of
1465: visual stimuli.
1466: \newblock {\em Nature}, 375:232--235.
1467:
1468: \bibitem[Carpenter and Grossberg, 1987]{carpenter87art2}
1469: Carpenter, G.~A. and Grossberg, S. (1987).
1470: \newblock Art2: Self-organization of stable category recognition codes for
1471: analog input patterns.
1472: \newblock {\em Applied Optics}, 26:4919--4930.
1473:
1474: \bibitem[Compte et~al., 2000]{CompteCortex00}
1475: Compte, A., Brunel, N., Goldman-Rakic, P., and Wang, X.-J. (2000).
1476: \newblock Synaptic mechanisms and network dynamics underlying spatial working
1477: memory in a cortical network model.
1478: \newblock {\em Cerebral Cortex}, 10:910--23.
1479:
1480: \bibitem[Crist et~al., 2001]{CLG01}
1481: Crist, R.~E., Li, W., and D.Gilbert, C. (2001).
1482: \newblock Learning to see: experience and attention in primary visual cortex.
1483: \newblock {\em Nature Neuroscience}, 4(4):519--525.
1484:
1485: \bibitem[Dayan and Abbott, 2001]{dayan-2001}
1486: Dayan, P. and Abbott, L. (2001).
1487: \newblock {\em Theoretical neuroscience: Computational and mathematical
1488: modeling of neural systems}.
1489: \newblock MIT Press.
1490:
1491: \bibitem[Dean, 1981]{Dean81}
1492: Dean, A. (1981).
1493: \newblock The variability of discharge of simple cells in the cat striate
1494: cortex.
1495: \newblock {\em Exp Brain Res}, 44:437--440.
1496:
1497: \bibitem[Gammaitoni et~al., 1998]{Gammaitoni98SR}
1498: Gammaitoni, L., H\"anggi, P., Jung, P., and Marchesoni, F. (1998).
1499: \newblock Stochastic resonance.
1500: \newblock {\em Rev. Mod. Phys.}, 70:223--287.
1501:
1502: \bibitem[Golub and van Loan, 1996]{GolubLoan96a}
1503: Golub, G.~H. and van Loan, C.~F. (1996).
1504: \newblock {\em Matrix Computations}.
1505: \newblock The John Hopkins University Press, Baltimore, 3 edition.
1506:
1507: \bibitem[Hansel and Sompolinsky, 1998]{Hansel-Sompolinsky-98}
1508: Hansel, D. and Sompolinsky, H. (1998).
1509: \newblock Modeling feature selectivity in local cortical circuits.
1510: \newblock In Koch, C. and Segev, I., editors, {\em Methods in Neuronal
1511: Modeling: From Synapse to Networks.}, pages 499--567. MIT Press, Cambridge,
1512: MA.
1513:
1514: \bibitem[Haykin, 1999]{Hayk99}
1515: Haykin, S. (1999).
1516: \newblock {\em Neural Networks. {A} Comprehensive Foundation}.
1517: \newblock Upper Saddle River, NJ: Prentice Hall.
1518:
1519: \bibitem[Hertz et~al., 1991]{hertz91b}
1520: Hertz, J., Krogh, A., and Palmer, R.~G. (1991).
1521: \newblock {\em Introduction to the Theory of Neural Computation}.
1522: \newblock Addison-Wesley, New York.
1523:
1524: \bibitem[Hinton, 1989]{Hint89}
1525: Hinton, G.~E. (1989).
1526: \newblock Connectionist learning procedures.
1527: \newblock {\em Artificial Intelligence}, 40:185--234.
1528:
1529: \bibitem[Holt et~al., 1996]{HSKD96}
1530: Holt, G.~R., Softky, W.~R., Koch, C., and Douglas, R.~J. (1996).
1531: \newblock Comparison of discharge variability in vitro and in vivo in cat
1532: visual cortex neurons.
1533: \newblock {\em Journal Neurophysiology}, 75:1806--1814.
1534:
1535: \bibitem[Hopfield and Brody, 2004]{HB04}
1536: Hopfield, J.~J. and Brody, C.~D. (2004).
1537: \newblock Learning rules and network repair in spike-timing-based computation
1538: networks.
1539: \newblock {\em Proc Natl Acad Sci USA}, 101:337--342.
1540:
1541: \bibitem[Kilgard and Merzenich, 1998]{KM98}
1542: Kilgard, M.~P. and Merzenich, M.~M. (1998).
1543: \newblock Plasticity of temporal information processing in the primary auditory
1544: cortex.
1545: \newblock {\em Nature Neuroscience}, 1:727--731.
1546:
1547: \bibitem[Laing and Chow, 2001]{nc:laing+chow:2001}
1548: Laing, C.~R. and Chow, C.~C. (2001).
1549: \newblock Stationary bumps in networks of spiking neurons.
1550: \newblock {\em Neural Computation}, 13(7):1473--1494.
1551:
1552: \bibitem[Levin and Miller, 1996]{LM96}
1553: Levin, J.~E. and Miller, J.~P. (1996).
1554: \newblock Broadband neural encoding in the cricket cercal sensory system
1555: enhanced by stochastic resonance.
1556: \newblock {\em Nature}, 380:165--168.
1557:
1558: \bibitem[McCloskey and Cohen, 1989]{mccloskey89catastrophic}
1559: McCloskey, M. and Cohen, N.~J. (1989).
1560: \newblock Catastrophic interference in connectionist networks: The sequential
1561: learning problem.
1562: \newblock {\em The Psychology of Learning and Motivation}, 24:109--165.
1563:
1564: \bibitem[Murray and Edwards, 1994]{murray94enhanced}
1565: Murray, A.~F. and Edwards, P.~J. (1994).
1566: \newblock Enhanced {MLP} performance and fault tolerance resulting from
1567: synaptic weight noise during training.
1568: \newblock {\em IEEE Transactions on Neural Networks}, 5(5):792--802.
1569:
1570: \bibitem[Nozaki et~al., 1999]{Nozaki99}
1571: Nozaki, D., Mar, D.~J., Grigg, P., and Collins, J.~J. (1999).
1572: \newblock Effects of colored noise on stochastic resonance in sensory neurons.
1573: \newblock {\em Physical Review Letters}, 82:2402–--2405.
1574:
1575: \bibitem[Pouget and Sejnowski, 1997]{PS97}
1576: Pouget, A. and Sejnowski, T.~J. (1997).
1577: \newblock Spatial transformations in the parietal cortex using basis functions.
1578: \newblock {\em Journal of Cognitive Neuroscience}, 9:222--237.
1579:
1580: \bibitem[Press et~al., 1992]{PFTV92}
1581: Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., and Flannery,
1582: B.~P. (1992).
1583: \newblock {\em Numerical Recipes in {C}}.
1584: \newblock Cambridge University Press, New York.
1585:
1586: \bibitem[Renart et~al., 2003]{RSW03}
1587: Renart, A., Song, P., and Wang, X.~J. (2003).
1588: \newblock Robust spatial working memory through homeostatic synaptic scaling in
1589: heterogeneous cortical networks.
1590: \newblock {\em Neuron}, 38:473--485.
1591:
1592: \bibitem[Salinas, 2003]{Sali03}
1593: Salinas, E. (2003).
1594: \newblock Background synaptic activity as a switch between dynamical states in
1595: a network.
1596: \newblock {\em Neural Computation}, 15(7):1439--1475.
1597:
1598: \bibitem[Salinas, 2004]{Salinas-2004-2}
1599: Salinas, E. (2004).
1600: \newblock Context-dependent selection of visuomotor maps.
1601: \newblock {\em BMC Neuroscience}, 5(1):47.
1602:
1603: \bibitem[Salinas and Abbott, 1995]{Salinas+Abbott:1995}
1604: Salinas, E. and Abbott, L.~F. (1995).
1605: \newblock Transfer of coded information from sensory to motor networks.
1606: \newblock {\em Journal of Neuroscience}, 15:6461--6474.
1607:
1608: \bibitem[Salinas and Sejnowski, 2001]{Salinas+Sejnowski:2001}
1609: Salinas, E. and Sejnowski, T.~J. (2001).
1610: \newblock Gain modulation in the central nervous system: where behavior,
1611: neurophysiology and computation meet.
1612: \newblock {\em Neuroscientist}, 2:539--550.
1613:
1614: \bibitem[Salinas and Thier, 2000]{ST00}
1615: Salinas, E. and Thier, P. (2000).
1616: \newblock Gain modulation: a major computational principle of the central
1617: nervous system.
1618: \newblock {\em Neuron}, 27:15--21.
1619:
1620: \bibitem[Seung et~al., 2000]{SLRT00}
1621: Seung, H.~S., Lee, D.~D., Reis, B.~Y., and Tank, D.~W. (2000).
1622: \newblock Stability of the memory of eye position in a recurrent network of
1623: conductance-based model neurons.
1624: \newblock {\em Neuron}, 26:259--271.
1625:
1626: \bibitem[Shadlen and Newsome, 1994]{SN94}
1627: Shadlen, M.~N. and Newsome, W.~T. (1994).
1628: \newblock Noise, neural codes and cortical organization.
1629: \newblock {\em Curr. Opin. Neurobiol.}, 4:569--579.
1630:
1631: \bibitem[Softky and Koch, 1992]{nc:Softky+Koch:1992}
1632: Softky, W.~P. and Koch, C. (1992).
1633: \newblock Cortical cells should fire regularly, but do not.
1634: \newblock {\em Neural Computation}, 4(5):643--646.
1635:
1636: \bibitem[Softky and Koch, 1993]{SK93}
1637: Softky, W.~R. and Koch, C. (1993).
1638: \newblock The highly irregular firing of cortical cells is inconsistent with
1639: temporal integration of random epsps.
1640: \newblock {\em Journal of Neuroscience}, 13:334--350.
1641:
1642: \bibitem[Stevens and Zador, 1998]{SZ98}
1643: Stevens, C.~F. and Zador, A.~M. (1998).
1644: \newblock Input synchrony and the irregular firing of cortical neurons.
1645: \newblock {\em Nature Neuroscience}, 1:210--217.
1646:
1647: \bibitem[Turrigiano and Nelson, 2000]{TN00}
1648: Turrigiano, G.~G. and Nelson, S.~B. (2000).
1649: \newblock Hebb and homeostasis in neuronal plasticity.
1650: \newblock {\em Curr Opin Neurobiol}, 10:358--364.
1651:
1652: \bibitem[{van Kampen}, 1992]{VanK01}
1653: {van Kampen}, N.~G. (1992).
1654: \newblock {\em Stochastic Processes in Physics and Chemistry}.
1655: \newblock Elsevier, Amsterdam.
1656:
1657: \bibitem[Vilar and Rubi, 2000]{Vilar-Rubi-2000}
1658: Vilar, J.~M.~G. and Rubi, J.~M. (2000).
1659: \newblock {Scaling of Noise and Constructive Aspects of Fluctuations}.
1660: \newblock {\em Lecture Notes in Physics, Berlin Springer Verlag}, 557:121.
1661:
1662: \bibitem[Wang et~al., 1995]{XMSJ95}
1663: Wang, X., Merzenich, M.~M., Sameshima, K., and Jenkins, W. (1995).
1664: \newblock Remodelling of hand representation in adult cortex determined by
1665: timing of tactile stimulation.
1666: \newblock {\em Nature}, 378:71--75.
1667:
1668: \bibitem[Zhang, 1996]{Zhang1996}
1669: Zhang, K. (1996).
1670: \newblock Representation of spatial orientation by the intrinsic dynamics of
1671: the head-direction cell ensemble: a theory.
1672: \newblock {\em Journal of Neuroscience}, 16(6):2112--2126.
1673:
1674: \bibitem[Zipser and Andersen, 1988]{Zipser88}
1675: Zipser, D. and Andersen, R.~A. (1988).
1676: \newblock A back-propagation programmed network that simulates response
1677: properties of a subset of posterior parietal neurons.
1678: \newblock {\em Nature}, 331:679--684.
1679:
1680: \end{thebibliography}
1681:
1682: \end{document}
1683: