0510:q-bio0510036/noise.tex

1: \documentclass[11pt]{article}

2: \usepackage{setspace}

3: \usepackage{epsfig}

4: \usepackage{palatino}

5: \usepackage{verbatim}

6: \usepackage{graphicx}

7: \usepackage{graphics}

8:

9: \bibliographystyle{apalike}

10: \usepackage[round]{natbib}

11:

12: \setlength{\textwidth}{6.0in} \setlength{\evensidemargin}{0.25in}

13: \setlength{\oddsidemargin}{0.25in} \setlength{\topmargin}{-0.0in}

14: \setlength{\textheight}{9.0in} \setlength{\headheight}{0.25in}

15: \setlength{\headsep}{0.3in} \setlength{\footskip}{0.7in}

16:

17: \def\tr{{\mathrm T}}

18: \def\rd{{\mathrm d}}

19: \def\ssum{{\mbox{\small{$\Sigma$}}}}

20: \def\ol#1{{\overline{#1}}}

21: %\def\ol#1{\bar{#1}}

22: \def\eq{\!=\!}

23: \def\deg{$^{\circ}$}

24: \def\b{\begin{equation}}

25: \def\e{\end{equation}}

26: \def\ba{\begin{eqnarray}}

27: \def\ea{\end{eqnarray}}

28: \def\rmax{$r_{\mathrm{max}}$}

29: \def\Emin{$E_{\mathrm{min}}$}

30: \def\sigmin{$\sigma_{\mathrm{min}}$}

31: \def\Epop{$E_{\mathrm{pop}}$}

32: \def\Erec{$E_{\mathrm{rec}}$}

33: \newcommand{\bm}[1]{\mbox{\boldmath $#1$}}

34: \renewcommand{\floatpagefraction}{0.9}

35: \renewcommand{\textfraction}{0.01}

36:

37: \begin{document}

38: \begin{center}

39: \begin{large}

40: {\bf WHEN RESPONSE VARIABILITY INCREASES  \\

41: \vspace*{0.05in} NEURAL NETWORK ROBUSTNESS TO SYNAPTIC NOISE\\}

42: \vspace*{0.2in}

43: \end{large}

44: \vspace*{0.3in}

45: \begin{large}

46: {\bf Gleb Basalyga and Emilio Salinas}

47: \vspace*{0.3in}

48: \end{large} \\

49: Department of Neurobiology and Anatomy \\

50: Wake Forest University School of Medicine \\

51: Winston-Salem, NC 27157-1010 \\

52: E-mail: gbasalyg@wfubmc.edu, esalinas@wfubmc.edu \\

53:  \vspace*{0.3in}

54: \today \\

55:  \vspace*{0.3in}

56: {\small Preliminary version of paper \\

57: to appear in

58: \textbf{\textit{Neural Computation}}}

59:  \vspace*{0.3in}

60: \end{center}

61:

62: \centerline{\textbf{Abstract}}

63:

64: \vspace{0.2in}

65:

66: \noindent

67: Cortical sensory neurons are known to be highly variable, in the sense

68: that responses evoked by identical stimuli often change dramatically

69: from trial to trial. The origin of this variability is uncertain, but

70: it is usually interpreted as detrimental noise that reduces the

71: computational accuracy of neural circuits. Here we investigate the

72: possibility that such response variability might, in fact, be

73: beneficial, because it may partially compensate for a decrease in

74: accuracy due to stochastic changes in the synaptic strengths of a

75: network. We study the interplay between two kinds of noise, response

76: (or neuronal) noise and synaptic noise, by analyzing their joint

77: influence on the accuracy of neural networks trained to perform

78: various tasks. We find an interesting, generic interaction: when

79: fluctuations in the synaptic connections are proportional to their

80: strengths (multiplicative noise), a certain amount of response noise

81: in the input neurons can significantly improve network performance,

82: compared to the same network without response noise. Performance is

83: enhanced because response noise and multiplicative synaptic noise are

84: in some ways equivalent. So, if the algorithm used to find the optimal

85: synaptic weights can take into account the variability of the model

86: neurons, it can also take into account the variability of the

87: synapses. Thus, the connection patterns generated with response noise

88: are typically more resistant to synaptic degradation than those

89: obtained without response noise.  As a consequence of this interplay,

90: if multiplicative synaptic noise is present, it is better to have

91: response noise in the network than not to have it. These results are

92: demonstrated analytically for the most basic network consisting of two

93: input neurons and one output neuron performing a simple classification

94: task, but computer simulations show that the phenomenon persists in a

95: wide range of architectures, including recurrent (attractor) networks

96: and sensory-motor networks that perform coordinate transformations.

97: The results suggest that response variability could play an important

98: dynamic role in networks that continuously learn.

99:

100: \newpage

101: \section{Introduction}

102:

103: Neuronal networks face an inescapable tradeoff between learning new

104: associations and forgetting previously stored information. In

105: competitive learning models, this is sometimes referred to as the

106: stability-plasticity dilemma~\citep{carpenter87art2,hertz91b}: in

107: terms of inputs and outputs, learning to respond to new inputs will

108: interfere with the learned responses to familiar inputs. A

109: particularly severe form of performance degradation is known as

110: catastrophic interference~\citep{mccloskey89catastrophic}. It refers

111: to situations in which the learning of new information causes the

112: virtually complete loss of previously stored associations.

113:

114: Biological networks must face a similar problem, because once a task

115: has been mastered, plasticity mechanisms will inevitably produce

116: further changes in the internal structural elements, leading to

117: decreased performance. That is, within sub-networks that have already

118: learned to perform a specific function, synaptic plasticity must at

119: least partly appear as a source of noise. In the cortex, this problem

120: must be quite significant, given that even primary sensory areas show a

121: large capacity for reorganization~\citep{XMSJ95,KM98,CLG01}. Some

122: mechanisms, such as homeostatic regulation~\citep{TN00} and specific

123: types of synaptic modification rules~\citep{HB04}, may help alleviate

124: the problem, but by and large, how nervous systems cope with it

125: remains unknown.

126:

127: Another factor that is typically considered as a limitation for neural

128: computation capacity is response variability. The activity of cortical

129: neurons is highly variable, as measured either by the temporal

130: structure of spike trains produced during constant stimulation

131: conditions, or by spike counts collected in a given time interval and

132: compared across identical behavioral

133: trials~\citep{Dean81,nc:Softky+Koch:1992,SK93,HSKD96}. Some of the

134: biophysical factors that give rise to this variability, such as the

135: balance between excitation and inhibition, have been

136: identified~\citep{SK93,SN94,SZ98}. But its functional significance, if

137: any, is not understood.

138:

139: Here we consider a possible relationship between the two sources of

140: randomness just discussed, whereby response variability helps

141: counteract the destabilizing effects of synaptic changes. Although

142: noise generally hampers performance, recent studies have shown that

143: in nonlinear dynamical systems such as neural networks this is not

144: always the case. The best known example is stochastic resonance, in

145: which noise enhances the sensitivity of sensory neurons to weak

146: periodic signals~\citep{LM96,Gammaitoni98SR,Nozaki99}, but noise may

147: play other constructive roles as well. For instance, when a system has

148: an internal source of noise, an externally added noise can reduce the

149: total noise of the output~\citep{Vilar-Rubi-2000}. Also, adding noise

150: to the synaptic connections of a network during learning produces

151: networks that, after training, are more robust to synaptic corruption

152: and have a higher capacity to generalize~\citep{murray94enhanced}.

153:

154: In this paper we study another beneficial effect of noise on neural

155: network performance. In this case, adding randomness to the neural

156: responses reduces the impact of fluctuations in synaptic strength.

157: That is, here, performance depends on two sources of variability,

158: response noise and synaptic noise, and adding some amount of response

159: noise produces better performance than having synaptic noise alone.

160: The reason for this paradoxical effect is that response noise acts as

161: a regularization factor that favors connectivity matrices with many

162: small synaptic weights over connectivity matrices with few large

163: weights, and this minimizes the impact of a synapse that is lost or

164: has a wrong value. We study this regularization effect in three

165: different cases: (1) a classification task, which in its simplest

166: instantiation can be studied analytically, (2) a sensory-motor

167: transformation, and (3) an attractor network that produces

168: self-sustained activity. For the latter two, the interaction between

169: noise terms is demonstrated by extensive numerical simulations.

170:

171: \section{General Framework}

172: \label{general}

173:

174: First we consider networks with two layers, an input layer that

175: contains $N$ sensory neurons and an output layer with $K$ output

176: neurons. A matrix $\bm{r}$ is used to denote the firing rates of the

177: input neurons in response to $M$ stimuli, so $r_{ij}$ is the firing

178: rate of input unit $i$ when stimulus $j$ is presented.  These rates

179: have a mean component $\ol{\bm{r}}$ plus noise, as described in detail

180: below.  The output units are driven by the first layer responses, such

181: that the firing rate of output unit $k$ evoked by stimulus $j$ is

182: \b

183:     R_{kj} = \sum_{i=1}^N w_{ki} \, r_{ij} ,

184:     \label{Rdriv}

185: \e

186: or in matrix form, $\bm{R}= \bm{w} \bm{r}$, where $\bm{w}$ is the

187: $K\!\times\!N$ matrix of synaptic connections between input and output

188: neurons. The output neurons also have a set of desired responses

189: $\bm{F}$, where $F_{kj}$ is the firing rate that output unit $k$

190: should produce when stimulus $j$ is presented. In other words, $\bm F$

191: contains target values that the outputs are supposed to learn. The

192: error $E$ is the mean squared difference between the actual driven

193: responses $R_{kj}$ and the desired ones,

194: \b

195:     E = \left< \frac{1}{KM} \, \sum_{k=1}^K\sum_{j=1}^M

196:                \left( R_{kj} - F_{kj} \right)^2

197:         \right> ,

198:     \label{error}

199: \e

200: or in matrix notation,

201: \b

202:     E = \frac{1}{KM}

203:         \left< \mbox{Tr}

204:             \left[(\bm{w} \bm{r} - \bm{F})

205:                   (\bm{w} \bm{r} - \bm{F})^\tr

206:             \right]

207:         \right> .

208:     \label{errormatrix}

209: \e

210: Here, $\mbox{Tr}(\bm{A}) = \sum_i A_{ii}$ is the trace of a matrix and

211: the angle brackets indicate an average over multiple trials, which

212: corresponds to multiple samples of the noise in the inputs $\bm{r}$.

213: The optimal synaptic connections $\ol{\bm{W}}$ are those that make the

214: error as small as possible.  These can be found by computing the

215: derivative of Equation (\ref{errormatrix}) with respect to $\bm{w}$

216: (or with respect to $w_{ab}$, if the summations are written

217: explicitly) and setting the result equal to zero~\citep[see

218: e.g.,][]{GolubLoan96a}.  These steps give

219: \b

220:     \ol{\bm{W}} = \bm{F} \, \ol{\bm{r}}^\tr \bm{C}^{-1} ,

221:     \label{wopt}

222: \e

223: where $\ol{\bm{r}}\eq \left<\bm{r}\right>$ and $\bm{C}^{-1}$ is the

224: inverse (or the pseudo-inverse) of the correlation matrix

225: $\bm{C} = \left<\bm{r} \bm{r}^\tr\right>$.

226:

227: The general outline of the computer experiments proceeds in five

228: steps as follows. First, the matrix $\ol{\bm r}$ with the mean input

229: responses is generated together with the desired output responses

230: $\bm{F}$. These two quantities define the input-output transformation

231: that the network is supposed to implement.  Second, response noise is

232: added to the mean input rates, such that

233: \b

234:     r_{ij} = \ol{r}_{ij} (1 + \eta_{ij}).

235:     \label{inputnoise1}

236: \e

237: The random variables $\eta_{ij}$ are independently drawn from a

238: distribution with zero mean and variance $\sigma_r^2$,

239: \ba

240:     \left< \eta_{ij} \right > & = & 0 \nonumber \\

241:     \left< \eta^2_{ij} \right> & = & \sigma_r^2 ,

242:     \label{inputnoise1a}

243: \ea

244: where the brackets again denote an average over trials. We refer to

245: this as multiplicative noise.  Third, the optimal connections are

246: found using Equation (\ref{wopt}). Note that these connections take

247: into account the response noise through its effect on the correlation

248: matrix $\bm{C}$. Fourth, the connections are corrupted by

249: multiplicative synaptic noise with variance $\sigma_W^2$, that is

250: \b

251:     W_{ij} = \ol{W}_{ij} (1 + \epsilon_{ij}),

252:     \label{weightnoisegeneral}

253: \e

254: where

255: \ba

256:     \left< \epsilon_{ij} \right> & = & 0 \nonumber \\

257:     \left< \epsilon^2_{ij} \right> & = & \sigma_W^2 .

258: \ea

259: Finally, the network's performance is evaluated. For this, we measure

260: the network error $E_W$, which is the square error obtained with the

261: optimal but corrupted weights $\bm{W}$, averaged over both types of

262: noise,

263: \b

264:     E_W = \frac{1}{KM}

265:           \left< \mbox{Tr} \left[

266:              (\bm{W} \bm{r} - \bm{F}) (\bm{W} \bm{r} - \bm{F})^\tr

267:           \right] \right> .

268:     \label{errornet}

269: \e

270: Thus, the brackets in this case indicate an average over multiple

271: trials and multiple networks, i.e., multiple corruptions of the

272: optimal weights $\ol{\bm{W}}$.

273:

274: The main result we report here is an interaction between the two types

275: of noise: in all the network architectures that we have explored, for

276: a fixed amount of synaptic noise $\sigma_W$, the best performance is

277: typically found when the response noise has a certain nonzero

278: variance. So, given that there is synaptic noise in the network, it is

279: better to have some response noise rather than to have none.

280:

281: Before addressing the first example, we should highlight some features

282: of the chosen noise models. Regarding response noise, Equations

283: (\ref{inputnoise1}, \ref{inputnoise1a}), other models were tested in

284: which the fluctuations were additive rather than multiplicative. Also,

285: Gaussian, uniform and exponential distributions were tested.  The

286: results for all combinations were qualitatively the same, so the shape

287: of the response noise distribution does not seem to play an important

288: role; what counts is mainly the variance.  On the other hand, the

289: benefit of response noise is observed only when the synaptic noise is

290: multiplicative; it disappears with additive synaptic noise.  However,

291: we do test several variants of the multiplicative model, including one

292: in which the random variables $\epsilon_{ij}$ are drawn from a

293: Gaussian distribution and another in which they are binary, 0 or -1.

294: The latter case represents a situation in which connections are

295: eliminated randomly with a fixed probability.

296:

297:

298: \section{Noise Interactions in a Classification Task}

299:

300: First we consider a task in which the two-layer, fully connected

301: network is used to approximate a binary function. The task is to

302: classify $M$ stimuli on the basis of the $N$ input firing rates evoked

303: by each stimulus. Only one output neuron is needed, so $K\eq1$.  The

304: desired response of this output neuron is the classification function

305: \b

306:    F_j = \left\{\begin{array}{l}

307:             1 \ \: \: \mbox{if } j \leq M/2 \\

308:             0 \ \: \: \mbox{else} ,

309:          \end{array}\right.

310:    \label{desiredoutput}

311: \e

312: where $j$ goes from 1 to $M$. Therefore, the job of the output unit is

313: to produce a 1 for the first $M/2$ input stimuli and a 0 for the rest.

314:

315:

316: \subsection{A Minimal Network}

317:

318: In order to obtain an analytical description of the noise

319: interactions, we first consider the simplest possible network that

320: exhibits the effect, which consists of two input neurons and two

321: stimuli. Thus, $N\eq M\eq 2$ and the desired output is

322: $\bm{F} = \left(1, 0\right)$. Note that, with a single output neuron,

323: the matrices $\bm{W}$ and $\bm{F}$ become row vectors. Now we proceed

324: according to the five steps outlined in the preceding section --- the

325: goal is to show analytically that, in the presence of synaptic noise,

326: performance is typically better for a nonzero amount of response

327: noise.

328:

329: The matrix of mean input firing rates is set to

330: \b

331:    \ol{\bm{r}} = \left( \begin{array}{cc}

332:                    1 & r_0 \\

333:                    r_0 & 1 \\

334:                  \end{array} \right) ,

335:     \label{inputmeanmatrix}

336: \e

337: where $r_0$ is a parameter that controls the difficulty of the

338: classification. When it is close to 1, the pairs of responses evoked

339: by the two stimuli are very similar and large errors in the output are

340: expected; when it is close to 0, the input responses are most

341: different and the classification should be more accurate. After

342: combining the mean responses with multiplicative noise, as prescribed

343: by Equation (\ref{inputnoise1}), the input responses in a given trial

344: become

345: \b

346:    \bm{r} = \left( \begin{array}{cc}

347:               1 + \eta_{11}     &  r_0 (1+\eta_{12}) \\

348:               r_0 (1+\eta_{21}) &  1 + \eta_{22} \\

349:             \end{array} \right) .

350:    \label{multinputmeanmatrix}

351: \e

352: Assuming that the fluctuations are independent across neurons, the

353: correlation matrix is, therefore,

354: \b

355:    \bm{C} = \left< \bm{r} \bm{r}^\tr \right>

356:           = \left(\begin{array}{cc}

357:               (1+r_0^2)(1+\sigma_r^2)  &  2 r_0  \\

358:               2 r_0  &  (1+r_0^2)(1+\sigma_r^2)  \\

359:             \end{array} \right) .

360:    \label{multicorrelationmatrix}

361: \e

362: Next, after calculating the inverse of $\bm{C}$, Equation (\ref{wopt})

363: is used to find the optimal weights, which are

364: \ba

365:    \ol{W}_1 & = & \frac{\sigma_r^2 (1+r_0^2) + (1-r_0^2)}

366:                        {(1+\sigma_r^2)^2 \, (1+r_0^2)^2 - 4 r_0^2} \nonumber \\

367:    \ol{W}_2 & = & \frac{\sigma_r^2 (1+r_0^2) - (1-r_0^2)}

368:                        {(1+\sigma_r^2)^2 \, (1+r_0^2)^2 - 4 r_0^2} \: r_0 \, .

369:    \label{multiwopt1}

370: \ea

371: Notice that these connections take into account the response

372: variability through their dependence on $\sigma_r$. The next step is

373: to corrupt these synaptic weights as prescribed by Equation

374: (\ref{weightnoisegeneral}), and substitute the resulting expressions

375: into Equation (\ref{errornet}). After making all the substitutions,

376: calculating the averages and simplifying, we obtain the average error,

377: %\b

378: %   E_W = (1 + \sigma_W^2)

379: %            (\ol{W}^2_1 + \ol{W}^2_2) (1 + \sigma_r^2) (1 + r_0^2)

380: %         + 4 r_0 \ol{W}_1 \ol{W}_2

381: %         - 2 r_0 \ol{W}_2 - 2 \ol{W}_1 + 1 .

382: %\e

383: \b

384:    E_W = \frac{1}{2} \left(

385:             \sigma_W^2 (\ol{W}^2_1 + \ol{W}^2_2)

386:                        (1 + \sigma_r^2) (1 + r_0^2)

387:                           - \ol{W}_1 - r_0 \ol{W}_2 + 1

388:          \right) .

389:    \label{multierror2}

390: \e

391: This is the average square difference between the desired and actual

392: responses of the output neuron given the two types of noise. It is a

393: function only of three parameters, $\sigma_r$, $\sigma_W$ and $r_0$,

394: because the optimal weights themselves depend on $\sigma_r$ and $r_0$.

395:

396: The interaction between noise terms for this simple $N\eq K\eq 2$ case

397: is illustrated in Fig.~1A, which plots the error as a function of

398: $\sigma_r$ with and without synaptic variability.  Here, dashed and

399: solid lines represent the theoretical results given by Equations

400: (\ref{multiwopt1}, \ref{multierror2}) and symbols correspond to

401: simulation results averaged over $1000$ networks and $100$ trials per

402: network. Without synaptic noise (dashed line), the error increases

403: monotonically with $\sigma_r$, as one would normally expect when

404: adding response variability. In contrast, when $\sigma_W\eq 0.15$, 0.2

405: or 0.25 (solid lines), the error initially decreases and then starts

406: increasing again, slowly approaching the curve obtained with response

407: noise alone.

408:

409: \begin{figure*}[tb]

410: \centerline{\epsfig{figure=fig1.eps,width=5.0in}}

411: \caption{\label{WeightSpace}

412: Noise interaction for a simple network of two input neurons and one

413: output neuron ($K\eq 1$, $N\eq M\eq 2$). Both input responses and

414: synaptic weights were corrupted by multiplicative Gaussian noise. For

415: all curves, solid lines are theoretical results and symbols are

416: simulation results averaged over $1000$ networks and $100$ trials per

417: network. In all cases, $r_0\eq 0.8$.

418: (A) Average square difference between observed and desired output

419: responses, $E_W$, as a function of the standard deviation (SD) of the

420: response noise, $\sigma_r$.  Squares and dashed line correspond to the

421: error without synaptic noise ($\sigma_{W}\eq 0$); circles and

422: continuous lines correspond to the error with synaptic noise

423: ($\sigma_{W}\eq 0.15, 0.20, 0.25$).

424: (B) Dependence of the (uncorrupted) optimal weights $\ol{\bm{W}}$ on

425: $\sigma_r$. }

426: \end{figure*}

427:

428: Figure 1B shows how the optimal weights depend on $\sigma_r$. The

429: solid lines were obtained from Equations (\ref{multiwopt1}) above.

430: The curves show that the effect of response noise is to decrease the

431: absolute values of the optimal synaptic weights.  Intuitively, that is

432: why response variability is advantageous; smaller synaptic weights

433: also mean smaller synaptic fluctuations, because their standard

434: deviation (SD) is proportional to the mean values.  So, there is a

435: tradeoff: the intrinsic effect of increasing $\sigma_r$ is to increase

436: the error, but with synaptic noise present, $\sigma_r$ also decreases

437: the magnitude of the weights, which lowers the impact of the synaptic

438: fluctuations. That the impact of synaptic noise grows directly with

439: the magnitude of the weights is also apparent from the first term in

440: Equation (\ref{multierror2}).

441:

442: The magnitude of the noise interaction can be quantified by the

443: ratio \Emin$/E_0$, where the numerator is the minimal value of the

444: error curve and the denominator is the error obtained when only

445: synaptic noise is present, that is, when $\sigma_r\eq 0$.  The

446: minimum error \Emin\ occurs at the optimal value of $\sigma_r$,

447: denoted as \sigmin. The ratio \Emin$/E_0$ is equal to 1 if response

448: variability provides no advantage and approaches 0 as \sigmin\

449: cancels more of the error due to synaptic noise.  For the lowest

450: solid curve in Fig.~1A the ratio is approximately 0.8, so response

451: variability cancels about 20\% of the square error generated by

452: synaptic fluctuations. Note, however, that in these examples the

453: error is below $E_0$ for a large range of values of $\sigma_r$, not

454: only near \sigmin, so response noise may be beneficial even if it is

455: not precisely matched to the amount of synaptic noise.

456:

457: \begin{figure*}[tb!]

458: \centerline{\epsfig{figure=fig2.eps,width=5.0in}}

459: \caption{\label{MultiNoise1}

460: Optimal amount of response noise in the minimal classification

461: network.  Same network with two sensory neurons and one output neuron

462: as in Fig.~1.  Lines and symbols indicate theoretical and simulation

463: results, respectively, averaged over $1000$ networks and $100$ trials

464: per network.

465: (A) Strength of the noise interaction quantified by \Emin\ (dashed

466: line) and \Emin$/E_0$ (solid line), as a function of $\sigma_{W}$, which

467: determines the synaptic variability.  Here and in B, $r_0\eq 0.8$.

468: (B) Optimal amount of response variability, \sigmin, as a function of

469: $\sigma_{W}$, for the same data in A\@.

470: (C) Strength of the noise interaction as a function of $r_0$, which

471: parameterizes the discriminability of the mean input responses evoked

472: by the two stimuli. Here and in D, $\sigma_W\eq 1$.

473: (D) \sigmin, as a function of $r_0$ for the same data in C\@.  }

474: \end{figure*}

475:

476: Figure 2 further characterizes the strength of the interaction between

477: the two types of noise.  Figures 2A, B show how the error and the

478: optimal amount of response variability vary as functions of

479: $\sigma_W$. These graphs indicate that the fraction of the error that

480: $\sigma_r$ is able to compensate for, as well as the optimal amount of

481: response noise, increases with the SD of the synaptic noise.  The

482: minimum error, \Emin, grows steadily with $\sigma_W$ --- clearly,

483: $\sigma_r$ cannot completely compensate for synaptic corruption.

484: Also, $\sigma_W$ has to be bigger than a critical value for the noise

485: interaction to be observed ($\sigma_W\!>\!0.1$, approximately).

486: However, except when synaptic noise is very small, the optimal

487: strategy is to add some response noise to the network.

488:

489: As in the previous figure, symbols and lines in Fig. 2 correspond to

490: simulation and theoretical results, respectively. To obtain the

491: latter, the key is to calculate \sigmin. This is done by, first,

492: substituting the optimal synaptic weights of Equation

493: (\ref{multiwopt1}) into the expression for the average error, Equation

494: (\ref{multierror2}), and second, calculating the derivative of the

495: error with respect to $\sigma_r^2$ and equating it to zero. The

496: resulting expression gives $\sigma^2_{\mathrm{min}}$ as a function of

497: the only two remaining parameters, $\sigma_W$ and $r_0$.  The

498: dependence, however, is highly nonlinear, so in general the solution

499: is implicit:

500: \ba

501:      \lefteqn{\sigma_r^8 \, (1 - \sigma_W^2) +

502:            2 \sigma_r^6 \, (1 + a^2(1 - 2 \sigma_W^2)) +

503:            6 \sigma_r^4 a^2 \, (1 - \sigma_W^2) + \mbox{} }

504:            \hspace*{1.2cm} \nonumber \\

505:      &  &  2 \sigma_r^2 a^2 \, (1 + a^2 + 2 a^2 \sigma_W^2 - 4 \sigma_W^2) +

506:            a^4 (1 + 3 \sigma_W^2) -

507:            4 a^2 \sigma_W^2 \:\: = \:\: 0 \, ,

508:      \label{implicit}

509: \ea

510: where

511: \b

512:    a \equiv \frac{1 - r_0^2}{1 + r_0^2} \, .

513: \e

514: The value of $\sigma_r$ that makes Equation (\ref{implicit}) true is

515: \sigmin. For Figs.~2A, B, the zero of the polynomial was found

516: numerically for each combination of $r_0$ and $\sigma_W$.

517:

518: Figures 2C, D show how \Emin, \Emin/$E_0$ and \sigmin\ depend on the

519: separation between evoked input responses, as parameterized by $r_0$.

520: For these two plots, we chose a special case in which \sigmin\ can be

521: obtained analytically from Equation (\ref{implicit}): $\sigma_W\eq 1$.

522: In this particular case the dependence of \sigmin\ on $r_0$ has a

523: closed form,

524: \b

525:    \sigma_{\mathrm{min}}^2 = \frac{(1-r_0^2)^{2/3}}{1+r_0^2}

526:                \left( (1+r_0)^{2/3} + (1-r_0)^{2/3} \right) .

527:    \label{sigmamin}

528: \e

529: This function is shown in Fig.~2D.  In general, the numerical

530: simulations are in good agreement with the theory, except that the

531: scatter in Fig.~2D tends to increase as $r_0$ approaches 0. This is

532: due to a key feature of the noise interaction, which is that it

533: depends on the overlap between input responses across stimuli. This

534: can be seen as follows.

535:

536: First, notice that in Fig.~2C the relative error approaches 1 as $r_0$

537: gets closer to 0. Thus, the noise interaction becomes weaker when

538: there is less overlap between input responses, which is precisely what

539: $r_0$ represents in Equation (\ref{inputmeanmatrix}). If there is no

540: overlap at all, the benefit of response noise vanishes. This fact

541: explains why more than one neuron is needed to observe the noise

542: interaction in the first place.  This observation can be demonstrated

543: analytically by setting $r_0\eq 0$ in Equations (\ref{multiwopt1}) and

544: (\ref{multierror2}), in which case the average square error becomes

545: \b

546:    E_W(r_0\eq 0) = \frac{1}{2} \left(

547:                      \frac{\sigma_W^2 - 1}{1 + \sigma_r^2} + 1

548:                    \right) .

549:    \label{r0error}

550: \e

551: This result has interesting implications.  If $\sigma_W^2\eq 1$,

552: response noise makes no difference, so there is no optimal value.  If

553: $\sigma_W^2\!<\!1$, the error increases monotonically with response

554: noise, so the optimal value is 0.  And if $\sigma_W^2\!>\!1$, the

555: optimal strategy is to add as much noise as possible! In this case,

556: the variance of the output neuron is so high that there is no hope of

557: finding a reasonable solution; the best thing to do is set the mean

558: weights to zero, disconnecting the output unit. Thus, without overlap,

559: either the synaptic noise is so high that the network is effectively

560: useless, or, if $\sigma_W$ is tolerable, response noise does not

561: improve performance. At $r_0\eq 0$, the numerical solutions oscillate

562: between these two extremes, producing an average error of 0.5

563: (leftmost point in Fig.~2C). In general, however, with non-zero

564: overlap there is a true optimal amount of response noise, and the more

565: overlap there is, the larger its benefit, as shown in Fig.~2C\@.

566:

567: The simulation data points in Fig.~2 were obtained using fluctuations

568: $\epsilon$ and $\eta$ in Equations (\ref{weightnoisegeneral}) and

569: (\ref{multinputmeanmatrix}), respectively, sampled from Gaussian

570: distributions. The results, however, were virtually identical when the

571: distribution functions were either uniform or exponential.  Thus, as

572: noted earlier, the exact shapes of the noise distributions do not

573: restrict the observed effect.

574:

575:

576: \subsection{Regularization by Noise}

577: \label{RegSect}

578:

579: Above, we mentioned that response noise tends to decrease the absolute

580: value of the optimal synaptic weights. Why is this? The reason is that

581: minimization of the mean square error in the presence of response

582: noise is mathematically equivalent to minimization of the same error

583: without response noise but with an imposed constraint forcing the

584: optimal weights to be small. This is as follows.

585:

586: Consider Equation (\ref{wopt}), which specifies the optimal weights in

587: the two-layer network.  Response noise enters into the expression

588: through the correlation matrix. By separating the input responses into

589: mean plus noise, we have

590: \ba

591:    \bm{C} & = & \left< (\ol{\bm{r}} + \bm{\eta})

592:                        (\ol{\bm{r}} + \bm{\eta})^{\tr} \right>

593:                 \nonumber \\

594:           & = & \ol{\bm{r}} \, \ol{\bm{r}}^{\tr} +

595:                 \left< \bm{\eta} \bm{\eta}^{\tr} \right>

596:                 \nonumber \\

597:           & = & \ol{\bm{r}} \, \ol{\bm{r}}^{\tr} +

598:                 \bm{D}_{\!\sigma} \, ,

599:    \label{newcorr}

600: \ea

601: where we have assumed that the noise is additive and uncorrelated

602: across neurons (additivity is considered for simplicity but is not

603: necessary). This results in the diagonal matrix $\bm{D}_{\!\sigma}$

604: containing the variances of individual units, such that element $j$

605: along the diagonal is the total variance, summed over all stimuli, of

606: input neuron $j$. Thus, uncorrelated response noise adds a diagonal

607: matrix to the correlation between average responses.  In that case,

608: Equation (\ref{wopt}) can be rewritten as

609: \b

610:     \ol{\bm{W}} = \bm{F} \, \ol{\bm{r}}^\tr

611:                   \left( \ol{\bm{r}} \, \ol{\bm{r}}^{\tr}

612:                          + \bm{D}_{\!\sigma}

613:                   \right)^{-1} .

614:     \label{wopt1}

615: \e

616:

617: Now consider the mean square error without any noise but with an

618: additional term that penalizes large weights. To restrict, for

619: instance, the total synaptic weight provided by each input neuron, add

620: the penalty term

621: \b

622:    \frac{1}{KM} \sum_{i, j} \lambda_i \, w_{ij}^2

623:    \label{wcost}

624: \e

625: to the original error expression, Equation (\ref{errormatrix}). Here,

626: $\lambda_i$ determines how much input neuron $i$ is taxed for its

627: total synaptic weight. Rewriting this as a trace, the total error to

628: be minimized in this case becomes

629: \b

630:     E = \frac{1}{KM} \left(

631:           \left< \mbox{Tr}

632:               \left[(\bm{w} \ol{\bm{r}} - \bm{F})

633:                     (\bm{w} \ol{\bm{r}} - \bm{F})^\tr

634:               \right]

635:           \right> +

636:           \mbox{Tr}\left(\bm{w}^{\tr} \bm{D}_{\!\lambda} \bm{w} \right)

637:         \right) .

638: \e

639: where $\bm{D}_{\!\lambda}$ is a diagonal matrix that contains the

640: penalty coefficients $\lambda_i$ along the diagonal. The synaptic

641: weights that minimize this error function are given by

642: \b

643:    \bm{F} \, \ol{\bm{r}}^\tr

644:       \left( \ol{\bm{r}} \, \ol{\bm{r}}^{\tr}

645:              + \bm{D}_{\!\lambda}

646:       \right)^{-1} \! .

647:    \label{wopt2}

648: \e

649: But this solution has exactly the same form as Equation (\ref{wopt1}),

650: which minimizes the error in the presence of response noise alone,

651: without any other constraints. Therefore, adding response noise is

652: equivalent to imposing a constraint on the magnitude of the synaptic

653: weights, with more noise corresponding to smaller weights. The penalty

654: term in Equation (\ref{wcost}) can also be interpreted as a

655: regularization term, which refers to a common type of constraint used

656: to force the solution of an optimization problem to vary

657: smoothly~\citep{Hint89,Hayk99}. Therefore, as has been pointed out

658: previously~\citep{Bish95}, the effect of response fluctuations can be

659: described as regularization by noise.

660:

661: In our model, we assumed that the fluctuations in synaptic connections

662: are proportional to their size. What happens, then, is that response

663: noise forces the optimal weights to be small, and this significantly

664: decreases the part of the error that depends on $\sigma_W$.  In this

665: way, smaller synaptic weights --- and therefore a nonzero $\sigma_r$

666: --- typically lead to smaller output errors.

667:

668: Another way to look at the relationship between the two types of noise is

669: to calculate the optimal mean synaptic weights taking the synaptic

670: variability directly into account. For simplicity, suppose that there

671: is no response noise. Substitute Equation (\ref{weightnoisegeneral})

672: directly into Equation (\ref{errormatrix}) and minimize with respect

673: to $\ol{\bm{W}}$, now averaging over the synaptic fluctuations. With

674: multiplicative noise the result is again an expression similar to

675: Equations (\ref{wopt1}) and (\ref{wopt2}), where a correction

676: proportional to the synaptic variance is added to the diagonal of the

677: correlation matrix.  In contrast, with additive synaptic noise the

678: resulting optimal weights are exactly the same as without any

679: variability, because this type of noise cannot be compensated for.

680: Therefore, the recipe for counteracting response noise is equivalent

681: to the recipe for counteracting multiplicative synaptic noise. An

682: argument outlining why this is generally true is presented in the

683: Discussion, Section~\ref{disc1}.

684:

685:

686: \subsection{Classification in Larger Networks}

687:

688: When the simple classification task is extended to larger numbers of

689: first-layer neurons ($N\!>2$) and more input stimuli to classify

690: ($M\!>2$), an important question can be studied: how does the

691: interaction between synaptic and response noise depend on the

692: dimensionality of the problem, that is, on $N$ and $M$?  To address

693: this issue we did the following. Each entry in the $N\times M$ matrix

694: $\ol{\bm{r}}$ of mean responses was taken from a uniform distribution

695: between 0 and 1.  The desired output still consisted of a single

696: neuron's response given by Equation (\ref{desiredoutput}), as before.

697: So, each one of the $M$ input stimuli evoked a set of $N$ neuronal

698: responses, each set drawn from the same distribution, and the output

699: neuron had to divide the $M$ evoked firing rate patterns into two

700: categories. The optimal amount of response noise was found, and the

701: process was repeated for different combinations of $N$ and $M$\@.

702:

703: \begin{figure*}[tb!]

704: \centerline{\epsfig{figure=fig3.eps,width=5.0in}}

705: \caption{\label{LargeNets}

706: Interaction between synaptic noise and response noise during the

707: classification of $M$ input stimuli. For each stimulus, the mean

708: responses of $N$ input neurons were randomly selected from a uniform

709: distribution between 0 and 1. The output unit of the network had to

710: classify the $M$ response patterns by producing either a 1 or a 0. The

711: synaptic noise SD was $\sigma_{W}=0.5$. Results (circles) are averages

712: over $1000$ networks and $100$ trials per network.  All data are from

713: computer simulations.

714: (A) Relative error, \Emin$/E_{0}$, as a function of the number of

715: input neurons, $N$\@.  The number of stimuli was kept constant at

716: $M\eq 10$.

717: (B) Optimal value of the response noise SD, \sigmin, as a function of

718: the number of input neurons, $N$\@. Same simulations as in A\@.

719: (C) Relative error as a function of the number of input stimuli,

720: $M$\@. The number of input neurons was kept constant at $N\eq 10$.

721: (D) Optimal value of the response noise SD as a function of $M$ for

722: the same simulations as in C\@. }

723: \end{figure*}

724:

725: The results from these simulations are shown in Fig.~3. All data

726: points were obtained with the same amount of synaptic variability,

727: $\sigma_W\eq 0.5$. Each point represents an average over 1000

728: networks for which the optimal connections were corrupted.  The amount

729: of response noise that minimized the error, averaged over those 1000

730: corruption patterns, was found numerically by calculating the average

731: error with the same mean responses and corruption patterns but

732: different $\sigma_r$. For each combination of $N$ and $M$, this

733: resulted in \sigmin, which is shown in panel B\@.  The actual average

734: error obtained with $\sigma_r\eq$ \sigmin\ divided by the error for

735: $\sigma_r\eq 0$ is shown in panel A, as in the previous figure.

736: Interestingly, the benefit conferred by response noise depends

737: strongly on the difference between $N$ and $M$\@. With $M\eq 10$ input

738: stimuli, the effect of response noise is maximized when $N\eq 10$

739: neurons are used to encode them (Fig.~3A); and viceversa, when there

740: are $N\eq 10$ neurons in the network, the maximum effect is seen when

741: they encode $M\eq 10$ stimuli (Fig.~3C).  Results with other numbers

742: (5, 20 and 40 stimuli or neurons) were the same: response noise always

743: had a maximum impact when $N\eq M$\@.

744:

745: This is not unreasonable. When there are many more neurons than

746: stimuli, a moderate amount of synaptic corruption causes only a small

747: error, because there is redundancy in the connectivity matrix. On the

748: other hand, when there are many more input stimuli than neurons, the

749: error is large anyway, because the $N$ neurons cannot possibly span

750: all the required dimensions, $M$\@. Thus, at both extremes, the impact

751: of synaptic noise is limited. In contrast, when $N\eq M$ there is no

752: redundancy but the output error can potentially be very small, so the

753: network is most sensitive to alterations in synaptic connectivity.

754: Thus, response noise makes a big difference when the number of

755: responses and the number of independent stimuli encoded are equal or

756: nearly so. In Figs.~3A, C, the relative error is not zero for

757: $N\eq M$, but it is quite small

758: (\Emin\ $\eq 0.23$, \Emin$/E_0 \eq 0.004$). This is primarily because

759: the error without any response noise, $E_0$, can be very large.

760: Interestingly, the optimal amount of response noise also seems to be

761: largest when $N\eq M$, as suggested by Figs.~3B, D\@.

762:

763: In contrast to previous examples, for all data points in Fig.~3 the

764: fluctuations in the synapses and in the firing rates, $\epsilon$ and

765: $\eta$, were drawn from uniform rather than Gaussian distributions.

766: As mentioned before, the variances of the underlying distributions

767: should matter but their shapes should not. Indeed, with the same

768: variances, results for Fig.~3 were virtually identical with Gaussian

769: or exponential distributions.

770:

771: A potential concern in this network is that, although the variability

772: of the output neuron depends on the interaction between the two types

773: of noise, perhaps the interaction is of little consequence with

774: respect to actual classification performance. The relevant measure for

775: this is the probability of correct classification, $p_c$. This

776: probability is obtained by comparing the distributions of output

777: responses to stimuli in one category versus the other, which is

778: typically done using standard methods from signal detection

779: theory~\citep{dayan-2001}. The algorithm underlying the calculation is

780: quite simple: in each trial, the stimulus is assumed to belong to

781: class 1 if the output firing rate is below a threshold, otherwise the

782: stimulus belongs to class 2. To obtain $p_c$, the results should be

783: averaged over trials and stimuli. Finally, note that an optimal

784: threshold should be used to obtain the highest possible $p_c$.  We

785: performed this analysis on the data in Fig.~3.  Indeed, $p_c$ also

786: depended non-monotonically on response variability.  For instance, for

787: $N\eq M\eq 10$ the values with and without response noise were

788: $p_c(\sigma_r\!= $\sigmin$)\eq 0.83$ and

789: $p_c(\sigma_r\eq 0)\eq 0.75$,

790: where chance performance corresponds to 0.5. Also, the maximum benefit

791: of response noise occurred for $N\eq M$ and decreased quickly as the

792: difference between $N$ and $M$ grew, as in Figs.~3A, C. However, the

793: amount of response noise that maximized $p_c$ was typically about one

794: third of the amount that minimized the mean square error. Thus, the

795: best classification probability for $N\eq M\eq 10$ was

796: $p_c(\sigma_r\eq 0.13)\eq 0.91$.

797: Maximizing $p_c$ is not equivalent to minimizing the mean square

798: error; the two quantities weight differently the bias and variance of

799: the output response (see Haykin, 1999). Nevertheless, response noise

800: can also counteract part of the decrease in $p_c$ due to synaptic

801: noise, so its beneficial impact on classification performance is real.

802:

803:

804: \section{Noise Interactions in a Sensory-Motor Network}

805:

806: To illustrate the interactions between synaptic and response noise in

807: a more biologically realistic situation, we apply the general approach

808: outlined in Section~\ref{general} to a well-known model of

809: sensory-motor integration in the brain.  We consider the classic

810: coordinate transformation problem in which the location of an object,

811: originally specified in retinal coordinates, becomes independent of

812: gaze angle. This type of computation  has been thoroughly studied both

813: experimentally~\citep{AES85,BASG95} and

814: theoretically~\citep{Zipser88,Salinas+Abbott:1995,PS97}, and is

815: thought to be the basis for generating representations of object

816: location relative to the body or the world.  Also, the way in which

817: visual and eye-position signals are integrated here is an example of

818: what seems to be a general principle for combining different

819: information streams in the brain~\citep{ST00,Salinas+Sejnowski:2001}.

820: Such integration by 'gain modulation' may have wide applicability in

821: diverse neural circuits~\citep{Salinas-2004-2}, so it represents a

822: plausible and general situation in which computational accuracy is

823: important.

824:

825: From the point of view of the phenomenon at hand, the constructive

826: effect of response noise, this example addresses an important issue:

827: whether the noise interaction is still observed when network

828: performance depends on a population of output neurons. In the

829: classification task, performance was quantified through a single

830: neuron's response, but in this case it depends on a nonlinear

831: combination of multiple firing rates, so maybe the impact of response

832: noise washes out in the population average. As shown below, this is

833: not the case.

834:

835: \begin{figure*}[tb!]

836: \centerline{\epsfig{figure=fig4.eps,height=6.0in}}

837: \caption{\label{inoutSMMaps}

838: Network model of a sensory-motor transformation. In this network,

839: $N\eq 400$, $K\eq 25$, $M\eq 400$. Target and movement directions, $x$

840: and $z$, respectively, vary between $-25$ and $25$, whereas gaze angle

841: $y$ varies between $-15$ and $15$. The graphs correspond to a single

842: trial in which $x\eq -10$, $y\eq 10$ and $z\eq x \! - \! y\eq -20$.

843: Neither response noise nor synaptic corruption were included in this

844: example.

845: (A) Firing rates of the 400 gain-modulated input neurons arranged

846: according to preferred stimulus location.

847: (B) Network architecture.

848: (C) Firing rates of the 25 output motor neurons arranged according to

849: preferred target location. }

850: \end{figure*}

851:

852: The sensory-motor network has, as before, a feedforward architecture

853: with two layers.  The first layer contains $N$ gain-modulated sensory

854: units and the second or output layer contains $K$ motor units. Each

855: sensory neuron is connected to all output neurons through a set of

856: feedforward connections, as illustrated in Fig.~4B\@. The sensory

857: neurons are sensitive to two quantities, the location (or direction)

858: of a target stimulus $x$, which is in retinal coordinates, and the

859: gaze (or eye-position) angle $y$.  The network is designed so that

860: the motor layer generates or encodes a movement in a direction $z$,

861: which represents the direction of the target relative to the head.

862: The idea is that the profile of activity of the output neurons should

863: have a single peak centered at direction $z$.  The correct (i.e.,

864: desired) relationship between inputs and outputs is $z\eq x\!-\!y$,

865: which is approximately how the angles $x$ and $y$ should be combined

866: in order to generate a head-centered representation of target

867: direction~\citep{Zipser88,Salinas+Abbott:1995,PS97}. In other words,

868: $z$ is the quantity encoded by the output neurons and it should

869: relate to the quantities encoded by the sensory neurons through the

870: function $z(x, y)\eq x\!-\!y$. Many other functions are possible, but

871: as far as we can tell, the choice has little impact on the qualitative

872: effect of response noise.

873:

874: In this model, the mean firing rate of sensory neuron $i$ is

875: characterized by a product of two tuning functions, $f_i(x)$ and

876: $g_i(y)$, such that

877: \b

878:    \ol{r}_i(x, y) = r_{\mathrm{max}} \,

879:                    f_i(x)\left(1 - D + D\, g_i(y)\right) + r_{B} ,

880:    \label{rateGM}

881: \e

882: where $r_{B}\eq 4$ spikes/s is a baseline firing rate,

883: $r_{\mathrm{max}}\eq 35$ spikes/s and $D$ is the modulation depth,

884: which is set to 0.9 throughout. The sensory neurons are gain modulated

885: because they combine the information from their two inputs

886: nonlinearly. The amplitude --- but not the selectivity --- of a

887: visually-triggered response, represented by $f_i(x)$, depends on the

888: direction of gaze~\citep{AES85,BASG95,ST00}. Note that, in the

889: expression above, the second index of the mean rate $\ol{r}_{ij}$ has

890: been replaced by parentheses indicating a dependence on $x$ and $y$.

891: This is to simplify the notation; the responses can still be arranged

892: in a matrix $\ol{\bm{r}}$ if each value of the second index is

893: understood to indicate a particular combination of values of $x$ and

894: $y$. For example, if the rates were evaluated in a grid with 10 $x$

895: points and 10 $y$ points, the second index would run from 1 to 100,

896: covering all combinations. Indeed, this is how it is done in the

897: computer.

898:

899: For simplicity, the tuning curves for different neurons in a given

900: layer are assumed to have the same shape but different preferred

901: locations or center points, which are always between $-25$ and $25$.

902: Visual responses are modeled as Gaussian tuning functions of stimulus

903: location $x$,

904: \b

905:    f_i(x) =  \exp\left(-\frac{\left(x - a_i\right)^2}{2\sigma_f^2}\right) ,

906:    \label{xtun}

907: \e

908: where $a_i$ is the preferred location and $\sigma_f\eq 4$ is the

909: tuning curve width. The dependence on eye position is modeled using

910: sigmoidal functions of the gaze angle $y$,

911: \b

912:    g_i(y) = \frac{1}{1 + \exp(-(b_i-y)/d_i)} \, ,

913:    \label{ytun}

914: \e

915: where $b_i$ is the center point of the sigmoid and $d_i$ is chosen

916: randomly between $-7$ and $+7$ to make sure that the curves $g_i(y)$

917: have different slopes for different neurons in the array.  In each

918: trial of the task, response variability is included by applying a

919: variant of Equation (\ref{inputnoise1}),

920: \b

921:    r_{ij} = \ol{r}_{ij} + \sqrt{\ol{r}_{ij}} \, \eta_{ij} .

922:    \label{inputnoise2}

923: \e

924: This makes the variance of the rates proportional to their means,

925: which in general is in good agreement with experimental data

926: \citep{Dean81,nc:Softky+Koch:1992,SK93,HSKD96}. This choice, however,

927: is not critical (see below). The desired response for each output

928: neuron is also described by a Gaussian,

929: \b

930:    F_k(z) = r_{\mathrm{max}} \, \exp\!\left(

931:                   -\frac{\left(z - c_k\right)^2}{2\sigma_F^2}

932:             \right) + r_{B} ,

933:    \label{Fout}

934: \e

935: where $\sigma_F\eq 4$ and $c_k$ is the preferred target direction of

936: motor neuron $k$. This expression gives the intended response of

937: output unit $k$ in terms of the encoded quantity $z$. Keep in mind,

938: however, that the desired dependence on the sensory inputs is obtained

939: by setting $z\eq x\!-\!y$.  When driven by the first-layer neurons,

940: the output rates are still calculated through a weighted sum,

941: \b

942:     R_{k}(z) = R_{k}(x, y) = \sum_{i=1}^N W_{ki} \, r_{i}(x, y) .

943:     \label{Rdriv1}

944: \e

945: This is equivalent to Equation (\ref{Rdriv}) but with the second index

946: defined implicitly through $x$ and $y$, as mentioned above. The

947: optimal synaptic connections $\ol{W}_{ki}$ are determined exactly as

948: before, using Equation~(\ref{wopt}).

949:

950: Typical profiles of activity for input and output neurons are shown in

951: Figs.~4A, C for a trial with $x\eq -10$ and $y\eq 10$. The sensory

952: neurons are arranged according to their preferred stimulus location

953: $a_i$, whereas the motor neurons are arranged according to their

954: preferred movement direction $c_k$.  For this sample trial no

955: variability was included; the firing rate values in Fig.~4A are

956: scattered under a Gaussian envelope (given by Equation (\ref{xtun}))

957: because the gaze-dependent gain factors vary across cells. Also, the

958: output profile of activity is Gaussian and has a peak at the point

959: $z\eq -20$, which is exactly where it should be given that the correct

960: input-output transformation is $z\eq x\!-\!y$. With noise, the output

961: responses would be scattered around the Gaussian profile and the peak

962: would be displaced.

963:

964: The error used to measure network performance is, in this case,

965: \b

966:    E_{\mathrm{pop}} = \left< \, \left| z - Z \right| \, \right> .

967:    \label{SMMerror}

968: \e

969: This is the absolute difference, averaged over trials and networks,

970: between the desired movement direction $z$ --- the actual

971: head-centered target direction --- and the direction $Z$ that is

972: encoded by the center of mass of the output activity,

973: \b

974:     Z = \frac{\sum_i \, (R_i - r_{\!B})^2 \, c_i}

975:              {\sum_k \, (R_k - r_{\!B})^2} \, .

976:     \label{centermass}

977: \e

978: Therefore, Equation (\ref{SMMerror}) gives the accuracy with which the

979: whole motor population represents the head-centered direction of the

980: target, whereas Equation (\ref{centermass}) provides the recipe to

981: read out such output activity.  Now the idea is to corrupt the optimal

982: connections and evaluate \Epop\ using various amounts of response

983: noise to determine whether there is an optimum.  Relative to the

984: previous examples, the key differences are, first, that the error in

985: (\ref{SMMerror}) represents a population average, and second, that

986: although the connections are set to minimize the average difference

987: between desired and driven firing rates, the performance criterion is

988: not based directly on it.

989:

990: \begin{figure*}

991: \centerline{\epsfig{figure=fig5.eps,width=5.0in}}

992: \caption{\label{NoiseSMMaps}

993: Noise interaction for the sensory-motor network depicted in Fig.~4.

994: Results are averaged over $100$ networks and $100$ trials per network.

995: All data are from computer simulations.

996: (A) Average absolute deviation between actual and encoded target

997: locations, \Epop, as a function of response noise. Continuous lines

998: are for three probabilities of weight elimination, $p_W\eq 0.1$, 0.3

999: and 0.5; the dashed line corresponds to $p_W\eq 0$.

1000: (B) Magnitude of the noise interaction, measured by the relative error

1001: \Emin$/E_0$, as a function of the number of input neurons, $N$, for

1002: $p_W\eq 0.2$.

1003: (C) \Emin\ and \Emin$/E_0$ as functions of  $p_W$.

1004: (D) Optimal response noise SD, \sigmin, as a function of $p_{W}$. }

1005: \end{figure*}

1006:

1007: Simulation results for this sensory-motor model are presented in

1008: Fig.~5. A total of 400 sensory and 25 output neurons were used.  These

1009: units were tested with all combinations of 20 values of $x$ and 20

1010: values of $y$, uniformly spaced (thus, $M\eq 400$).  Synaptic noise

1011: was generated by random weight elimination. This means that, after

1012: having set the connections to their optimal values given by

1013: Equation~(\ref{wopt}), each one was reset to zero with a probability

1014: $p_W$. Thus, on average, a fraction $p_W$ of the weights in each

1015: network was eliminated. As shown in Fig.~5A, when $p_W\! >\! 0$, the

1016: error between the encoded and the true target direction has a minimum

1017: with respect to $\sigma_r$. These error curves represent averages

1018: over 100 networks.  Interestingly, the benefit of noise does not

1019: decrease when more sensory units are included in the first layer

1020: (Fig.~5B).  That is, if $p_W$ is constant, the proportion of

1021: eliminated synapses does not change, so the error caused by synaptic

1022: corruption cannot be reduced simply by adding more neurons.

1023:

1024: Figure 5C shows the minimum and relative errors as functions of $p_W$.

1025: This graph highlights the substantial impact that response noise has

1026: on this network: the relative error stays below 0.2 even when about a

1027: third of the synapses are eliminated. This is not only because the

1028: error without response noise is high, but also because the error with

1029: an optimal amount of noise stays low. For instance, with $p_W\eq 0.3$

1030: and $\sigma_r\eq$ \sigmin, the typical deviation from the correct

1031: target direction is about 2 units, whereas with $\sigma_r\eq 0$ the

1032: typical deviation is about 10. Response noise thus cuts the deviation

1033: by about a factor of five, and importantly, the resulting error is

1034: still small relative to the range of values of $z$, which spans 50

1035: units.  Also, as observed in the classification task, in general it is

1036: better to include response noise even if $\sigma_r$ is not precisely

1037: matched to the amount of synaptic variability (Fig.~5A).

1038:

1039: Figure 5D plots \sigmin\ as a function of the probability of synaptic

1040: elimination. The optimal amount of response noise increases with $p_W$

1041: and reaches fairly high levels. For instance, at a value of 1, which

1042: corresponds to $p_W$ near 0.15, the variance of the firing rates is

1043: equal to their mean, because of Equation (\ref{inputnoise2}). We

1044: wondered whether the scaling law of the response noise would make any

1045: difference, so we reran the simulations with either additive noise (SD

1046: independent of mean) or noise with an SD proportional to the mean, as

1047: in Equation (\ref{inputnoise1}).  Results in these two cases were very

1048: similar: \Emin\ and \Emin$/E_0$ varied very much like in Fig.~5C, and

1049: the optimal amount of noise grew monotonically with $p_W$, as in

1050: Fig.~5D\@.

1051:

1052:

1053: \section{Noise Interactions in a Recurrent Network}

1054:

1055: The networks discussed in the previous sections had a feedforward

1056: architecture, and in those cases the contribution of response noise to

1057: the correlation matrix between neuronal responses could be determined

1058: analytically. In contrast, in recurrent networks the dynamics are more

1059: complex and the effects of random fluctuations more difficult to

1060: ascertain. To investigate whether response noise can still counteract

1061: some of the effects of synaptic variability, we consider a recurrent

1062: network with a well-defined function and relatively simple dynamics

1063: characterized by attractor states. When the firing rates in this

1064: network are initialized at arbitrary values, they eventually stop

1065: changing, settling down at certain steady-state points in which some

1066: neurons fire intensely and others do not. The optimal weights sought

1067: are those that allow the network to settle at predefined sets of

1068: steady-state responses, and the error is thus defined in terms of the

1069: difference between the desired steady states and the observed ones. As

1070: before, response noise is taken into account when the optimal synaptic

1071: weights are generated, although in this case the correction it

1072: introduces (relative to the noiseless case) is an approximation.

1073:

1074: The attractor network consists of $N$ continuous-valued neurons, each

1075: of which is connected to all other units via feedback synaptic

1076: connections~\citep{hertz91b}.  With the proper connectivity, such

1077: network can generate, without any tuned input, a steady-state profile

1078: of activity with a cosine or Gaussian

1079: shape~\citep{BBS95,CompteCortex00,Sali03}. Such stable `bump'-shaped

1080: activity is observed in various neural models, including those for

1081: cortical hypercolumns~\citep{Hansel-Sompolinsky-98}, head-direction

1082: cells~\citep{Zhang1996,nc:laing+chow:2001} and working memory

1083: circuits~\citep{CompteCortex00}. Below, we find the connection matrix

1084: that allows the network to exhibit a unimodal activity profile

1085: centered at any point within the array.

1086:

1087: \subsection{Optimal Synaptic Weights in a Recurrent Architecture}

1088:

1089: The dynamics of the network are determined by the equation

1090: \b

1091:    \tau \frac{d r_i}{d t} = -r_i

1092:                          + h \! \left( \sum_j W_{ij} \, r_j\right)

1093:                          + \eta_i \, ,

1094:    \label{RNNmain}

1095: \e

1096: where $\tau\eq 10 $ is the integration time constant, $r_i$ is the

1097: response of neuron $i$, and $h$ is the activation function of the

1098: cells, which relates total current to firing rate.  The sigmoid

1099: function

1100: $h(x) = 1/(1 + \exp(-x))$

1101: is used, but this choice is not critical. As before, $\eta_i$

1102: represents the response fluctuations, which are drawn independently

1103: for each neuron in every time step.  In this case they are Gaussian,

1104: with zero mean and a variance $\sigma_r^2/\Delta t$. The variance of

1105: $\eta_i$ is divided by the integration time step $\Delta t$ to

1106: guarantee that the variance of the rate $r_i$ remains independent of

1107: the time step~\citep{VanK01}.

1108:

1109: For our purposes, manipulating this type of network is easier if the

1110: equations are expressed in terms of the total input currents to the

1111: cells~\citep{hertz91b,dayan-2001}. If the current for neuron $i$ is

1112: $u_i \eq \sum_j W_{ij} \, r_j$, then

1113: \b

1114:    \tau \frac{d u_i}{d t} = -u_i + \sum_j W_{ij}

1115:                               \left( h(u_j) + \eta_j \right) ,

1116:    \label{RNNmain1}

1117: \e

1118: is equivalent to Equation (\ref{RNNmain}) above.

1119: \begin{figure*}

1120: \centerline{\epsfig{figure=fig6.eps,width=5.0in}}

1121: \caption{Steady-state responses of a recurrent neural network with 20

1122: neurons.  Results show the input currents of all units after 1000 ms

1123: of simulation time, with responses evolving according to

1124: Equation~(\ref{RNNmain1}). Each neuron is labeled by an angle between

1125: -180\deg\ and 180\deg.

1126: (A) Steady-state responses for four sets of initial conditions with

1127: peaks near units \mbox{-90\deg}, 0\deg, +90\deg and 180\deg. The

1128: observed activity profiles are indistinguishable from the desired

1129: Gaussian curves.  Neither synaptic nor response noise were included in

1130: this example.

1131: (B) Steady-state responses with and without noise. The desired

1132: activity profile is indicated by the solid line. The dotted line

1133: corresponds to the activity observed with noise after 1000 ms of

1134: simulation time, having started with an initial condition equal to the

1135: desired steady state. Vertical lines indicate the locations of the

1136: corresponding centers of mass. The absolute deviation is 34\deg.

1137: Here, $\sigma_r\eq 0.3$ and $p_W\eq 0.02$. }

1138: \end{figure*}

1139: A stationary solution of Equation (\ref{RNNmain1}) without input noise

1140: is such that all derivatives become zero. This corresponds to an

1141: attractor state $\alpha$ for which

1142: \b

1143:    u_i^{\alpha} = \sum_j W_{ij} \, h(u_j^{\alpha}) .

1144:    \label{SScondition}

1145: \e

1146: The label $\alpha$ is used because the network may have several

1147: attractors or sets of fixed points. The desired steady-state currents

1148: are denoted as $U_i^{\alpha}$. These are Gaussian profiles of activity

1149: such that, during steady state $\alpha\eq 1$, neuron 1 is the most

1150: active (i.e., the Gaussian is centered at neuron 1), during steady

1151: state $\alpha\eq 2$, neuron 2 is the most active, and so on. Figure 6

1152: illustrates the activity of the network at four steady states in the

1153: absence of noise ($\sigma_W\eq 0\eq \sigma_r$). To make the network

1154: symmetric, the neurons were arranged in a ring, so their activity

1155: profiles wrap around. Because of this, each neuron is labeled with an

1156: angle. The observed currents $u_i$ settle down at values that are

1157: almost exactly equal to the desired ones, $U_i^{\alpha}$. The synaptic

1158: connections that achieved this match were found by enforcing the

1159: steady-state condition (\ref{SScondition}) for the desired attractors.

1160: That is, we minimized

1161: \b

1162:    E = \frac{1}{N_A} \sum_{\alpha = 1}^{N_A} \sum_{i} \left(

1163:                U_i^{\alpha} - \sum_j W_{ij} \, h(U_j^{\alpha})

1164:        \right)^{\!2} ,

1165:    \label{RNNerror}

1166: \e

1167: where $U_i^{\alpha}$ is a (wrap-around) Gaussian function of $i$

1168: centered at $\alpha$ and $N_A$ is the number of attractors; in the

1169: simulations $N_A$ is always equal to the number of neurons, $N$\@.

1170: This procedure leads to an expression for the optimal weights

1171: equivalent to Equation (\ref{wopt}). Thus, without response noise,

1172: \b

1173:     \ol{\bm{W}} = \bm{L} \, \bm{C}^{-1} ,

1174:     \label{RNNwopt}

1175: \e

1176: where

1177: \ba

1178:     L_{ij} & = & \frac{1}{N_A} \sum_{\alpha}

1179:                  U_i^{\alpha} \, h(U_j^{\alpha}) \nonumber \\

1180:     C_{ij} & = & \frac{1}{N_A} \sum_{\alpha}

1181:                  h(U_i^{\alpha}) \, h(U_j^{\alpha}) \, .

1182:     \label{RNNcorr}

1183: \ea

1184: To include the effects of response noise, we add a correction to the

1185: diagonal of the correlation matrix, as in the previous cases (see

1186: Section \ref{RegSect}). We thus set

1187: \b

1188:     C_{ij} = \frac{1}{N_A} \sum_{\alpha}

1189:                h(U_i^{\alpha}) h(U_j^{\alpha})

1190:                + \delta_{ij} \, a \, \frac{\sigma_r^2}{2 \tau} ,

1191:     \label{RNNapprox}

1192: \e

1193: where $a$ is a proportionality constant. The rationale for this is as

1194: follows.

1195:

1196: Strictly speaking, Equation (\ref{RNNmain1}) with response noise does

1197: not have a steady state. But consider the simpler case of a single

1198: variable $u$ with a constant asymptotic value $u_{\infty}$, such that

1199: \b

1200:     \tau \frac{d u}{d t} = -u + u_{\infty} + \eta .

1201:     \label{singleu}

1202: \e

1203: If the trajectory $u(t)$ from $t\eq 0$ to $t\eq T$ is calculated many

1204: times, starting from the same initial condition, the distribution of

1205: endpoints $u(T)$ has a well-defined mean and variance, which vary

1206: smoothly as functions of $T$\@. The mean is always equal to the

1207: endpoint that would be observed without noise, whereas for $T$ much

1208: longer than the integration time constant $\tau$, the variance is

1209: equal to the variance of the fluctuations on the right hand side of

1210: Equation (\ref{singleu}) divided by $2\tau$~\citep{VanK01}. These

1211: considerations suggest that we minimize

1212: \b

1213:    E = \frac{1}{N_A} \sum_{\alpha,i} \left(

1214:               U_i^{\alpha} - \sum_j W_{ij} \,

1215:                   \left( h(U_j^{\alpha}) + a \, \tilde{\eta}_j \right)

1216:        \right)^{\!2} ,

1217:    \label{RNNerror1}

1218: \e

1219: where the variance of $\tilde{\eta}_j$ is $\sigma_r^2/(2\tau)$. This

1220: leads to Equation (\ref{RNNwopt}) with the corrected correlation

1221: matrix given by (\ref{RNNapprox}).

1222:

1223: \subsection{Performance of the Attractor Network}

1224:

1225: To evaluate the performance of this network, we compare the center of

1226: mass of the desired activity profile to that of the observed profile

1227: tracked during a period of time. For a particular attractor $\alpha$,

1228: the network is first initialized very close to that desired steady

1229: state, then Equation (\ref{RNNmain1}) is run for 1000 ms (100 time

1230: constants $\tau$), and the absolute difference between the initial and

1231: the current centers of mass is recorded during the last 500 ms.  The

1232: error for the recurrent networks \Erec\ is defined as the absolute

1233: difference averaged over this time period and all attractor states,

1234: ie., all values of $\alpha$.  Also, when there is synaptic noise, an

1235: additional average over networks is performed.  This error function is

1236: similar to Equation (\ref{SMMerror}), except that the circular

1237: topology is taken into account.  Thus, \Erec\ is the mean absolute

1238: difference between desired and observed centers of mass.  It is

1239: expressed in degrees.

1240:

1241: \begin{figure*}[t!]

1242: \centerline{\epsfig{figure=fig7.eps,width=5.0in}}

1243: \caption{Interaction between synaptic and response noise in

1244: recurrent networks. (A) Average absolute difference between desired

1245: and observed centers of mass as a function of $\sigma_r$. Units are

1246: degrees. The different curves are for $a\eq 0$, 1.5, 1 and 0.5, from

1247: left to right. The lowest curve (dashed) was obtained with $a\eq

1248: 0.5$, confirming that the synaptic weights are optimized when

1249: response noise is taken into account. (B) Average error \Erec\ as a

1250: function of response noise. Continuous lines are for three

1251: probabilities of weight elimination $p_W\eq 0.005$, 0.015 and 0.025;

1252: the dashed line corresponds to $p_W\eq 0$.  Here and in the

1253: following panels, $a\eq 0.5$. (C) \Emin$/E_0$ (left y-axis) and

1254: \Emin\ (right y-axis) as functions of $p_W$. (D) Optimal response

1255: noise SD, \sigmin, as a function of $p_{W}$ for the same data in C.

1256: }

1257: \end{figure*}

1258:

1259: Before exploring the interaction between synaptic and response

1260: noise, we used \Erec\ to test whether the noise-dependent correction

1261: to the correlation matrix in Equation (\ref{RNNapprox}) was

1262: appropriate. To do this, a recurrent network without synaptic

1263: fluctuations was simulated multiple times with different values of

1264: the parameter $a$ and various amounts of response noise. The desired

1265: attractors were kept constant.  The resulting error curves are shown

1266: in Fig.~7A\@. Each one gives the average absolute deviation between

1267: desired and observed centers of mass as a function of $\sigma_r$ for

1268: a different value of $a$. The dependence on $a$ was non-monotonic.

1269: The optimal value we found was 0.5, which corresponds to the lowest

1270: curve (dashed) in the figure. This curve was well below the one

1271: observed without adjusting the synaptic weights.  Therefore, the

1272: correction was indeed effective.

1273:

1274: Figure 7B shows \Erec\ as a function of $\sigma_r$ when synaptic

1275: noise is also present in the recurrent network. The three solid

1276: curves correspond to nets in which synapses were randomly eliminated

1277: with probabilities $p_W\eq 0.005$, 0.015 and 0.025. As with previous

1278: network architectures, a non-zero amount of response noise improves

1279: performance relative to the case where no response noise is

1280: injected. In this case, however,  the mean absolute error is already

1281: about 25\deg at the point at which response noise starts making a

1282: difference, around $p_W\eq 0.005$ (Fig.\ 7C). This is not

1283: surprising: these types of networks are highly sensitive to changes

1284: in their synapses, so even small mismatches can lead to large

1285: errors~\citep{SLRT00,RSW03}.  Also, Fig.~7C shows that the ratio

1286: \Emin$/E_0$ does not fall below 0.6, so the benefit of noise is not

1287: as large as in previous examples. The effect was somewhat weaker

1288: when synaptic variability was simulated using Gaussian noise with SD

1289: $\sigma_W$ instead of random synaptic elimination. Nevertheless, it

1290: is interesting that the interaction between synaptic and response

1291: noise is observed at all under these conditions, given that the

1292: response dynamics are richer and that the minimization of Equation

1293: (\ref{RNNerror1}) may not be the best way to produce the desired

1294: steady-state activity.

1295:

1296:

1297: \section{Discussion}

1298:

1299: \subsection{Why are Synaptic and Response Fluctuations Equivalent?}

1300: \label{disc1}

1301:

1302: We have investigated the simultaneous action of synaptic and response

1303: fluctuations on the performance of neural networks and found an

1304: interaction or equivalence between them: when synaptic noise is

1305: multiplicative, its effect is similar to that of response noise. At

1306: heart, this is a simple consequence of the product of responses and

1307: synaptic weights contained in most neural models, which has the form

1308: $\sum_j W_j r_j$. With multiplicative noise in one of the variables,

1309: this weighted sum turns into $\sum_j W_j (1 + \xi_j) r_j$, which is

1310: the same whether it is the synapse or the response that fluctuates. In

1311: either case, the total stochastic component $\sum_j W_j \xi_j r_j$

1312: scales with the synaptic weights. The same result is obtained with

1313: additive response noise.  Additive synaptic noise behaves differently,

1314: however. It instead leads to a total fluctuation $\sum_j \xi_j r_j$

1315: that is independent of the mean weights.  Evidently, in this case the

1316: mean values of the weights have no effect on the size of the

1317: fluctuations.  Thus, the key requirement for some form of equivalence

1318: between the two noise sources is that the synaptic fluctuations must

1319: depend on the strength of the synapses.

1320:

1321: This condition was applied to the three sets of simulations presented

1322: above, which corresponded to the classification of arbitrary response

1323: patterns, a sensory-motor transformation, and the generation of

1324: multiple self-sustained activity profiles. This selection of problems

1325: was meant to illustrate the generality of the observations outlined in

1326: the above paragraph. And indeed, although the three problems differed

1327: in many respects, the results were qualitatively the same.

1328:

1329: We should also point out that, in all the simulations, the criterion

1330: used to determine the optimality of the synaptic weights was based on

1331: a mean square error. But perhaps the noise interaction changes when a

1332: different criterion is used. To investigate this, we performed

1333: additional simulations of the small $2\! \times\! 1$ network in which

1334: the optimal synaptic weights were those that minimized a mean absolute

1335: deviation; thus, the square in Equation (\ref{error}) was substituted

1336: with an absolute value. In this case everything proceeded as before,

1337: except that the mean weight values $\ol{W}$ had to be found

1338: numerically. For this, the averages were performed explicitly and the

1339: downhill simplex method was used to search for the best

1340: weights~\citep{PFTV92}. The results, however, were very similar to

1341: those in Fig.~2A\@.  Although the shapes of the curves were not

1342: exactly the same, the relative and minimum errors found with the

1343: absolute value varied very much like with the mean-square error

1344: criterion as functions of $\sigma_W$. Therefore, our conclusions do

1345: not seem to depend strongly on the specific function used to weight

1346: the errors and find the best synaptic connection values.

1347:

1348: \subsection{When Should Response Noise Increase?}

1349: \label{disc2}

1350:

1351: According to the argument above, the most general way to state our

1352: results is this: assuming that neuronal activities are determined by

1353: weighted sums, any mechanism that is able to dampen the impact of

1354: response noise will automatically reduce the impact of multiplicative

1355: synaptic noise as well. Furthermore, we suggest that under some

1356: circumstances it is better to add more response noise and increase the

1357: dampening factor, than ignore the synaptic fluctuations altogether.

1358: There are two conditions for this scenario to make sense.  (1) The

1359: network must be highly sensitive to changes in connectivity.  This can

1360: be seen, for instance, in Fig.~3A, which shows that the highest

1361: benefit of response noise occurs when the number of neurons matches

1362: the number of conditions to be satisfied --- it is at this point that

1363: the connections need to be most accurate.  (2) The fluctuations in

1364: connectivity cannot be evaluated directly.  That is, why not take into

1365: account the synaptic noise in exactly the same way as the response

1366: noise when the optimal connections are sought?  For example, the

1367: average in Equation (\ref{errormatrix}) could also include an average

1368: over networks (synaptic fluctuations), in which case the optimal mean

1369: weights would depend not only on $\sigma_r$ but also on $\sigma_W$. In

1370: the simulations this could certainly be done, and would lead to

1371: smaller errors. But we explicitly consider the possibility that either

1372: $\sigma_W$ is unknown a priori, or there is no separate biophysical

1373: mechanism for implementing the corresponding corrections to the

1374: synaptic connections.

1375:

1376: Condition number 2 is not unreasonable. Realistic networks with high

1377: synaptic plasticity must incorporate mechanisms to ensure that ongoing

1378: learning does not disrupt their previously acquired functionality.

1379: Thus, synaptic modifications rules need to achieve two goals: to

1380: establish new associations that are relevant for the current

1381: behavioral task, and to make adjustments to prevent interference from

1382: other, future associations. The latter may be particularly difficult

1383: to achieve if learning rates change unpredictably with time.  It is

1384: not clear whether plausible (e.g., local) synaptic modification

1385: mechanisms could solve both problems simultaneously (see Hopfield and

1386: Brody, 2004), but the present results suggest an alternative: synaptic

1387: modification rules could be used exclusively to learn new associations

1388: based on current information, whereas response noise could be used to

1389: indirectly make the connectivity more robust to synaptic fluctuations.

1390: Although this mechanism evidently doesn't solve the problem of

1391: combining multiple learned associations, it might alleviate it. Its

1392: advantage is that, assuming that neural circuits have evolved to

1393: adaptively optimize their function in the face of true noise, simply

1394: increasing their response variability would generate synaptic

1395: connectivity patterns that are more resistant to fluctuations.

1396:

1397: \subsection{When is Synaptic Noise Multiplicative?}

1398: \label{disc3}

1399:

1400: The condition that noise should be multiplicative means that changes

1401: in synaptic weight should be proportional to the magnitude of the

1402: weight.  Evidently, not all types of synaptic modification processes

1403: lead to fluctuations that can be statistically modeled as

1404: multiplicative noise; for instance, saturation may prevent positive

1405: increases, thus restricting the variability of strong synapses.

1406: However, synaptic changes that generally increase with initial

1407: strength should be reasonably well approximated by the multiplicative

1408: model.  Random synapse elimination fits this model because, if a weak

1409: synapse disappears, the change is small, whereas if a strong synapse

1410: disappears, the change is large. Thus, the magnitude of the changes

1411: correlates with initial strength. Another procedure that corresponds

1412: to multiplicative synaptic noise is this.  Suppose the size of the

1413: synaptic changes is fixed, so that weights can only vary by

1414: $\pm \delta w$, but suppose also that the probability of suffering a

1415: change increases with initial synaptic strength. In this case, all

1416: changes are equal, but on average a population of strong synapses

1417: whould show higher variability than a population of weak ones. In

1418: simulations, the disruption caused by this type of synaptic corruption

1419: is indeed lessened by response noise (data not shown).

1420:

1421: \subsection{Final Remarks}

1422: \label{disc4}

1423:

1424: To summarize, the scenario we envision rests on five critical

1425: assumptions: (1) the activity of each neuron depends on

1426: synaptically-weighted sums of its (noisy) inputs, (2) network

1427: performance is highly sensitive to changes in synaptic connectivity,

1428: (3) synaptic changes unrelated to a function that has already been

1429: learned can be modeled as multiplicative noise, (4) synaptic

1430: modification mechanisms are able to take into account response noise,

1431: so synaptic strengths are adjusted to minimize its impact, but (5)

1432: synaptic modification mechanisms do not directly account for future

1433: learning. Under these conditions, our results suggest that increasing

1434: the variability of neuronal responses would, on average, result in

1435: more accurate performance. Although some of these assumptions may be

1436: rather restrictive, the diversity of synaptic plasticity mechanisms

1437: together with the high response variability observed in many areas of

1438: the brain make this constructive noise effect worth considering.

1439:

1440: \subsubsection*{Acknowledgments.}

1441: Research was supported by NIH grant NS044894.

1442:

1443: %\bibliography{neuroscience}

1444: \begin{thebibliography}{}

1445:

1446: \bibitem[Andersen et~al., 1985]{AES85}

1447: Andersen, R.~A., Essick, G.~K., and Siegel, R.~M. (1985).

1448: \newblock Encoding of spatial location by posterior parietal neurons.

1449: \newblock {\em Science}, 230:450--458.

1450:

1451: \bibitem[Ben-Yishai et~al., 1995]{BBS95}

1452: Ben-Yishai, R., Bar-Or, R.~L., and Sompolinsky, H. (1995).

1453: \newblock Theory of orientation tuning in visual cortex.

1454: \newblock {\em PNAS}, 92:3844--3848.

1455:

1456: \bibitem[Bishop, 1995]{Bish95}

1457: Bishop, C.~M. (1995).

1458: \newblock Training with noise is equivalent to tikhonov regularization.

1459: \newblock {\em Neural Computation}, 7:108--116.

1460:

1461: \bibitem[Brotchie et~al., 1995]{BASG95}

1462: Brotchie, P.~R., Andersen, R.~A., Snyder, L.~H., and Goodman, S.~J.

1463: (1995).

1464: \newblock Head position signals used by parietal neurons to encode locations of

1465:   visual stimuli.

1466: \newblock {\em Nature}, 375:232--235.

1467:

1468: \bibitem[Carpenter and Grossberg, 1987]{carpenter87art2}

1469: Carpenter, G.~A. and Grossberg, S. (1987).

1470: \newblock Art2: Self-organization of stable category recognition codes for

1471:   analog input patterns.

1472: \newblock {\em Applied Optics}, 26:4919--4930.

1473:

1474: \bibitem[Compte et~al., 2000]{CompteCortex00}

1475: Compte, A., Brunel, N., Goldman-Rakic, P., and Wang, X.-J. (2000).

1476: \newblock Synaptic mechanisms and network dynamics underlying spatial working

1477:   memory in a cortical network model.

1478: \newblock {\em Cerebral Cortex}, 10:910--23.

1479:

1480: \bibitem[Crist et~al., 2001]{CLG01}

1481: Crist, R.~E., Li, W., and D.Gilbert, C. (2001).

1482: \newblock Learning to see: experience and attention in primary visual cortex.

1483: \newblock {\em Nature Neuroscience}, 4(4):519--525.

1484:

1485: \bibitem[Dayan and Abbott, 2001]{dayan-2001}

1486: Dayan, P. and Abbott, L. (2001).

1487: \newblock {\em Theoretical neuroscience: Computational and mathematical

1488:   modeling of neural systems}.

1489: \newblock MIT Press.

1490:

1491: \bibitem[Dean, 1981]{Dean81}

1492: Dean, A. (1981).

1493: \newblock The variability of discharge of simple cells in the cat striate

1494:   cortex.

1495: \newblock {\em Exp Brain Res}, 44:437--440.

1496:

1497: \bibitem[Gammaitoni et~al., 1998]{Gammaitoni98SR}

1498: Gammaitoni, L., H\"anggi, P., Jung, P., and Marchesoni, F. (1998).

1499: \newblock Stochastic resonance.

1500: \newblock {\em Rev. Mod. Phys.}, 70:223--287.

1501:

1502: \bibitem[Golub and van Loan, 1996]{GolubLoan96a}

1503: Golub, G.~H. and van Loan, C.~F. (1996).

1504: \newblock {\em Matrix Computations}.

1505: \newblock The John Hopkins University Press, Baltimore, 3 edition.

1506:

1507: \bibitem[Hansel and Sompolinsky, 1998]{Hansel-Sompolinsky-98}

1508: Hansel, D. and Sompolinsky, H. (1998).

1509: \newblock Modeling feature selectivity in local cortical circuits.

1510: \newblock In Koch, C. and Segev, I., editors, {\em Methods in Neuronal

1511:   Modeling: From Synapse to Networks.}, pages 499--567. MIT Press, Cambridge,

1512:   MA.

1513:

1514: \bibitem[Haykin, 1999]{Hayk99}

1515: Haykin, S. (1999).

1516: \newblock {\em Neural Networks. {A} Comprehensive Foundation}.

1517: \newblock Upper Saddle River, NJ: Prentice Hall.

1518:

1519: \bibitem[Hertz et~al., 1991]{hertz91b}

1520: Hertz, J., Krogh, A., and Palmer, R.~G. (1991).

1521: \newblock {\em Introduction to the Theory of Neural Computation}.

1522: \newblock Addison-Wesley, New York.

1523:

1524: \bibitem[Hinton, 1989]{Hint89}

1525: Hinton, G.~E. (1989).

1526: \newblock Connectionist learning procedures.

1527: \newblock {\em Artificial Intelligence}, 40:185--234.

1528:

1529: \bibitem[Holt et~al., 1996]{HSKD96}

1530: Holt, G.~R., Softky, W.~R., Koch, C., and Douglas, R.~J. (1996).

1531: \newblock Comparison of discharge variability in vitro and in vivo in cat

1532:   visual cortex neurons.

1533: \newblock {\em Journal Neurophysiology}, 75:1806--1814.

1534:

1535: \bibitem[Hopfield and Brody, 2004]{HB04}

1536: Hopfield, J.~J. and Brody, C.~D. (2004).

1537: \newblock Learning rules and network repair in spike-timing-based computation

1538:   networks.

1539: \newblock {\em Proc Natl Acad Sci USA}, 101:337--342.

1540:

1541: \bibitem[Kilgard and Merzenich, 1998]{KM98}

1542: Kilgard, M.~P. and Merzenich, M.~M. (1998).

1543: \newblock Plasticity of temporal information processing in the primary auditory

1544:   cortex.

1545: \newblock {\em Nature Neuroscience}, 1:727--731.

1546:

1547: \bibitem[Laing and Chow, 2001]{nc:laing+chow:2001}

1548: Laing, C.~R. and Chow, C.~C. (2001).

1549: \newblock Stationary bumps in networks of spiking neurons.

1550: \newblock {\em Neural Computation}, 13(7):1473--1494.

1551:

1552: \bibitem[Levin and Miller, 1996]{LM96}

1553: Levin, J.~E. and Miller, J.~P. (1996).

1554: \newblock Broadband neural encoding in the cricket cercal sensory system

1555:   enhanced by stochastic resonance.

1556: \newblock {\em Nature}, 380:165--168.

1557:

1558: \bibitem[McCloskey and Cohen, 1989]{mccloskey89catastrophic}

1559: McCloskey, M. and Cohen, N.~J. (1989).

1560: \newblock Catastrophic interference in connectionist networks: The sequential

1561:   learning problem.

1562: \newblock {\em The Psychology of Learning and Motivation}, 24:109--165.

1563:

1564: \bibitem[Murray and Edwards, 1994]{murray94enhanced}

1565: Murray, A.~F. and Edwards, P.~J. (1994).

1566: \newblock Enhanced {MLP} performance and fault tolerance resulting from

1567:   synaptic weight noise during training.

1568: \newblock {\em IEEE Transactions on Neural Networks}, 5(5):792--802.

1569:

1570: \bibitem[Nozaki et~al., 1999]{Nozaki99}

1571: Nozaki, D., Mar, D.~J., Grigg, P., and Collins, J.~J. (1999).

1572: \newblock Effects of colored noise on stochastic resonance in sensory neurons.

1573: \newblock {\em Physical Review Letters}, 82:2402�--2405.

1574:

1575: \bibitem[Pouget and Sejnowski, 1997]{PS97}

1576: Pouget, A. and Sejnowski, T.~J. (1997).

1577: \newblock Spatial transformations in the parietal cortex using basis functions.

1578: \newblock {\em Journal of Cognitive Neuroscience}, 9:222--237.

1579:

1580: \bibitem[Press et~al., 1992]{PFTV92}

1581: Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., and Flannery,

1582: B.~P. (1992).

1583: \newblock {\em Numerical Recipes in {C}}.

1584: \newblock Cambridge University Press, New York.

1585:

1586: \bibitem[Renart et~al., 2003]{RSW03}

1587: Renart, A., Song, P., and Wang, X.~J. (2003).

1588: \newblock Robust spatial working memory through homeostatic synaptic scaling in

1589:   heterogeneous cortical networks.

1590: \newblock {\em Neuron}, 38:473--485.

1591:

1592: \bibitem[Salinas, 2003]{Sali03}

1593: Salinas, E. (2003).

1594: \newblock Background synaptic activity as a switch between dynamical states in

1595:   a network.

1596: \newblock {\em Neural Computation}, 15(7):1439--1475.

1597:

1598: \bibitem[Salinas, 2004]{Salinas-2004-2}

1599: Salinas, E. (2004).

1600: \newblock Context-dependent selection of visuomotor maps.

1601: \newblock {\em BMC Neuroscience}, 5(1):47.

1602:

1603: \bibitem[Salinas and Abbott, 1995]{Salinas+Abbott:1995}

1604: Salinas, E. and Abbott, L.~F. (1995).

1605: \newblock Transfer of coded information from sensory to motor networks.

1606: \newblock {\em Journal of Neuroscience}, 15:6461--6474.

1607:

1608: \bibitem[Salinas and Sejnowski, 2001]{Salinas+Sejnowski:2001}

1609: Salinas, E. and Sejnowski, T.~J. (2001).

1610: \newblock Gain modulation in the central nervous system: where behavior,

1611:   neurophysiology and computation meet.

1612: \newblock {\em Neuroscientist}, 2:539--550.

1613:

1614: \bibitem[Salinas and Thier, 2000]{ST00}

1615: Salinas, E. and Thier, P. (2000).

1616: \newblock Gain modulation: a major computational principle of the central

1617:   nervous system.

1618: \newblock {\em Neuron}, 27:15--21.

1619:

1620: \bibitem[Seung et~al., 2000]{SLRT00}

1621: Seung, H.~S., Lee, D.~D., Reis, B.~Y., and Tank, D.~W. (2000).

1622: \newblock Stability of the memory of eye position in a recurrent network of

1623:   conductance-based model neurons.

1624: \newblock {\em Neuron}, 26:259--271.

1625:

1626: \bibitem[Shadlen and Newsome, 1994]{SN94}

1627: Shadlen, M.~N. and Newsome, W.~T. (1994).

1628: \newblock Noise, neural codes and cortical organization.

1629: \newblock {\em Curr. Opin. Neurobiol.}, 4:569--579.

1630:

1631: \bibitem[Softky and Koch, 1992]{nc:Softky+Koch:1992}

1632: Softky, W.~P. and Koch, C. (1992).

1633: \newblock Cortical cells should fire regularly, but do not.

1634: \newblock {\em Neural Computation}, 4(5):643--646.

1635:

1636: \bibitem[Softky and Koch, 1993]{SK93}

1637: Softky, W.~R. and Koch, C. (1993).

1638: \newblock The highly irregular firing of cortical cells is inconsistent with

1639:   temporal integration of random epsps.

1640: \newblock {\em Journal of Neuroscience}, 13:334--350.

1641:

1642: \bibitem[Stevens and Zador, 1998]{SZ98}

1643: Stevens, C.~F. and Zador, A.~M. (1998).

1644: \newblock Input synchrony and the irregular firing of cortical neurons.

1645: \newblock {\em Nature Neuroscience}, 1:210--217.

1646:

1647: \bibitem[Turrigiano and Nelson, 2000]{TN00}

1648: Turrigiano, G.~G. and Nelson, S.~B. (2000).

1649: \newblock Hebb and homeostasis in neuronal plasticity.

1650: \newblock {\em Curr Opin Neurobiol}, 10:358--364.

1651:

1652: \bibitem[{van Kampen}, 1992]{VanK01}

1653: {van Kampen}, N.~G. (1992).

1654: \newblock {\em Stochastic Processes in Physics and Chemistry}.

1655: \newblock Elsevier, Amsterdam.

1656:

1657: \bibitem[Vilar and Rubi, 2000]{Vilar-Rubi-2000}

1658: Vilar, J.~M.~G. and Rubi, J.~M. (2000).

1659: \newblock {Scaling of Noise and Constructive Aspects of Fluctuations}.

1660: \newblock {\em Lecture Notes in Physics, Berlin Springer Verlag}, 557:121.

1661:

1662: \bibitem[Wang et~al., 1995]{XMSJ95}

1663: Wang, X., Merzenich, M.~M., Sameshima, K., and Jenkins, W. (1995).

1664: \newblock Remodelling of hand representation in adult cortex determined by

1665:   timing of tactile stimulation.

1666: \newblock {\em Nature}, 378:71--75.

1667:

1668: \bibitem[Zhang, 1996]{Zhang1996}

1669: Zhang, K. (1996).

1670: \newblock Representation of spatial orientation by the intrinsic dynamics of

1671:   the head-direction cell ensemble: a theory.

1672: \newblock {\em Journal of Neuroscience}, 16(6):2112--2126.

1673:

1674: \bibitem[Zipser and Andersen, 1988]{Zipser88}

1675: Zipser, D. and Andersen, R.~A. (1988).

1676: \newblock A back-propagation programmed network that simulates response

1677:   properties of a subset of posterior parietal neurons.

1678: \newblock {\em Nature}, 331:679--684.

1679:

1680: \end{thebibliography}

1681:

1682: \end{document}

1683: