0505:q-bio0505024/pre.tex

1: %

2: % ****** Start of file apssamp.tex ******

3: %

4: %   This file is part of the APS files in the REVTeX 4 distribution.

5: %   Version 4.0 of REVTeX, August 2001

6: %

7: %   Copyright (c) 2001 The American Physical Society.

8: %

9: %   See the REVTeX 4 README file for restrictions and more information.

10: %

11: % TeX'ing this file requires that you have AMS-LaTeX 2.0 installed

12: % as well as the rest of the prerequisites for REVTeX 4.0

13: %

14: % See the REVTeX 4 README file

15: % It also requires running BibTeX. The commands are as follows:

16: %

17: %  1)  latex apssamp.tex

18: %  2)  bibtex apssamp

19: %  3)  latex apssamp.tex

20: %  4)  latex apssamp.tex

21: %

22:

23: %\documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}

24: \documentclass[preprint,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}

25:

26: % Some other (several out of many) possibilities

27: %\documentclass[preprint,aps]{revtex4}

28: %\documentclass[preprint,aps,draft]{revtex4}

29: %\documentclass[prb]{revtex4}% Physical Review B

30:

31: \usepackage{graphicx}% Include figure files

32: \usepackage{dcolumn}% Align table columns on decimal point

33: \usepackage{bm}% bold math

34:

35: %\nofiles

36:

37: \begin{document}

38:

39: %\setlength{\baselineskip}{1cm}

40:

41: %\preprint{APS/123-QED}

42:

43: \title{Search for optimal measure for discriminating spike trains with different randomness}

44:

45: \author{Keiji Miura}

46: \email{miura@ton.scphys.kyoto-u.ac.jp}

47: \affiliation{Department of Physics, Graduate School of Sciences, Kyoto University Kyoto 606-8502, Japan}

48: \affiliation{``Intelligent Cooperation and Control'', PRESTO, JST, c/o The University of Tokyo, Chiba 277--8561, Japan\\}

49:

50: \author{Masato Okada}

51: \email{okada@k.u-tokyo.ac.jp}

52: \affiliation{Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8561, Japan}

53: \affiliation{``Intelligent Cooperation and Control'', PRESTO, JST, c/o The University of Tokyo, Chiba 277-8561, Japan\\}

54: \affiliation{Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Saitama 351-0198, Japan}

55:

56: \author{Shigeru Shinomoto}

57: \email{shinomoto@scphys.kyoto-u.ac.jp}

58: \affiliation{Department of Physics, Graduate School of Sciences, Kyoto University Kyoto 606-8502, Japan}

59:

60: \date{\today}% It is always \today, today,

61:              %  but any date may be explicitly specified

62:

63: \begin{abstract}

64: We wish to discriminate spike sequences based on the degree of irregularity. For this

65: purpose, we search for a rational expressions of quadratic functions of

66: consecutive interspike intervals that efficiently measures

67: spiking irregularity. Under natural assumptions, the functional form of the coefficient

68: can be parameterized by a single parameter. The parameter is determined so as to

69: maximize the mutual information between the distributions of coefficients computed for

70: spike sequences derived from different renewal point processes. We find that the local

71: variation of interspike intervals, $L_V$ (Neural Comput. Vol. 15, pp. 2823-42, 2003), is

72: nearly optimal for whose intrinsic irregularity is close to that of experimental data.

73: % Valid PACS numbers may be entered using the \verb+\pacs{#1}+ command.

74: \end{abstract}

75:

76: \pacs{Valid PACS appear here}% PACS, the Physics and Astronomy

77:                              % Classification Scheme.

78: %\keywords{Suggested keywords}%Use showkeys class option if keyword

79:                               %display desired

80: % gamma distribution, mutual information, neuroscience, information geometry, inter spike intervals

81:

82: \maketitle

83:

84: \section{\label{sec1}Introduction}

85:

86: It is important to extract as much information as possible from spike sequences when

87: looking for correlations between animal behaviors and neuronal activities

88: \cite{georgopoulos,miyashita,funahashi,fujita} or controlling prosthetic apparatuses by

89: neuronal activities \cite{chapin}. In many cases, however, only the mean firing rate is

90: considered and the timing information is not taken into account. Consideration of detailed

91: temporal structure of each spike train would help to decode brain signals more efficiently.

92: We would like to propose a measure, which augments the information provided by the mean

93: firing rate.

94:

95: Coefficients that are functions of the interspike intervals (ISIs) are effective

96: in detecting a spiking irregularity from a short spike train. For instance, the coefficient

97: of variation, $C_V$, is widely adopted as a measure of the variance of ISIs

98: \cite{cox,abbott,shinomoto1,shinomoto4}.

99: Recently, a measure of the local variation of interspike intervals,

100: $L_V$, was proposed \cite{shinomoto}, as a natural extension of

101: $C_{V2}$ which was designed to detect a stepwise variation of consecutive ISIs

102: \cite{holt}.  An analysis using $L_V$ revealed that \textit{in vivo} spike sequences are

103: not uniformly random, but possess specific characteristics that vary among individual

104: neurons. In addition, it was found that the neocortex consists of heterogeneous neurons

105: that differ not only from one cortical area to another, but also from one layer to another

106: in their spiking patterns \cite{shinomoto7}.

107:

108: In the present study, we try to modify $L_V$ in an attempt to find a better measure for

109: discriminating spike sequences based on the degree of irregularity. Namely, we examine

110: rational expressions of quadratic functions of consecutive interspike intervals for

111: suitability as coefficients for measuring spiking irregularity. Under reasonable

112: assumptions, the functional form of the coefficient is found to be parameterized by a

113: single parameter. The parameter is determined so as to maximize the mutual

114: information between the distributions of coefficients computed for finite size sample

115: sequences derived from different renewal gamma processes. It is found that $L_V$ is

116: not optimal for nearly random Poisson spike trains but optimal for more regular spike

117: trains.

118:

119: In Sec.~\ref{sec2}, we explain how we generated spike sequences with the same

120: firing rate but different intrinsic irregularity. We show that a gamma

121: distribution suffices for that purpose and that two parameters in the gamma

122: distribution can be chosen as orthogonal coordinates.

123: In Sec.~\ref{sec3} we explain $L_V$ and compare it with $C_V$.

124: We show that attractiveness of $L_V$ stems from its symmetries.

125: In Sec.~\ref{sec4} we extend $L_V$ and show that, under reasonable assumptions,

126: the extension of $L_V$ can be parameterized by a single parameter.

127: In Sec.~\ref{sec5} we explain how we determined the optimal value of the

128: parameter using the maximization principle of mutual information.

129: In Sec.~\ref{sec6}, we determine the optimal value numerically.

130: In Sec.~\ref{sec7}, we describe our theory, developed using a Gaussian

131: approximation, for explaining the results.

132: In Sec.~\ref{TD}, we discuss two non-stationary cases.

133:

134:

135: \section{\label{sec2}Generating spike trains with different randomness}

136: In this section, we explain how to generate spike trains with the same firing

137: rate but different randomness.

138:

139: There are many ways to generate spike trains artificially.

140: For example, we can generate spike trains by using a network of spiking neuron

141: models.

142: However, we do not need to describe precise spike timing here, and

143: a simple mechanism is desirable.

144: Therefore, we assume that the mechanism is a renewal process and

145: that the inter spike interval (ISI) follows a gamma distribution \cite{cox},

146: which is described as

147: \begin{equation}

148: p(T) = \frac{1}{\Gamma(\kappa)}\left(\frac{\kappa}{\mu}\right)^\kappa T^{\kappa-1}e^{-\frac{\kappa}{\mu} T},

149: \label{gamma}

150: \end{equation}

151: where $T$ denotes an ISI.

152: We generate ISIs from the distribution and align them to make a spike train.

153: The mean and variance of the ISIs are

154: \begin{equation}

155: \left\{ \begin{array}{c}

156: Ex(T)=\mu\\

157: Var(T)=\frac{\mu^2}{\kappa}.

158: \label{expectation}

159: \end{array}\right.

160: \end{equation}

161: The mean firing rate is obtained by taking the inverse of the mean ISI

162: \cite{lansky}.

163: The $\kappa$ is a shape parameter; $\kappa=1$ corresponds to an exponential

164: distribution, and, as $\kappa$ increases, the distribution approaches a normal

165: distribution.

166: The exponential distribution corresponds to a Poisson process in which the

167: firing rate (hazard function) is constant with time independent of the

168: previous firing time. The spike train looks random.

169: As $\kappa$ increases, the variance of the ISIs decreases, and the ISIs become

170: regular.

171:

172: Our goal is to find an optimal measure for discriminating two spike trains

173: with different randomness independent of their mean firing rates.

174: A gamma distribution is suitable for that purpose.

175: First, we can control the mean firing rate and randomness independently by

176: changing the two parameters ($\mu$ and $\kappa$) in the distribution.

177: Next, experimental data can be well fitted by the distribution.

178: For example, Baker et al. showed that the spike patterns recorded from

179: primary and supplementary motor areas are explicable using a gamma

180: distribution \cite{baker}.

181:

182: % \subsection{orthogonal coordinates of gamma distribution}

183: We can transform the parameters in a gamma distribution arbitrarily.

184: For example, we can transform the parameters into $(\alpha,\lambda)$:

185: \begin{equation}

186: \left\{ \begin{array}{c}

187: \alpha = \kappa,\\

188: \lambda = \frac{\kappa}{\mu}.

189: \end{array}\right.

190: \end{equation}

191: The gamma distribution in this coordinate can be written as

192: \begin{equation}

193: p(T) = \frac{\lambda^\alpha}{\Gamma(\alpha)}T^{\alpha-1}e^{-\lambda T},

194: \end{equation}

195: where $\lambda$ is a scale parameter.

196: The mean and variance of the ISIs can be written as functions of $\alpha$

197: and $\lambda$ as

198: \begin{equation}

199: \left\{ \begin{array}{c}

200: Ex(T)=\frac{\alpha}{\lambda},\\

201: Var(T)=\frac{\alpha}{\lambda^2}.

202: \end{array}\right.

203: \end{equation}

204: Thus, there are many ways of writing (parameterizing) a gamma distribution.

205: We used the expression shown as Eq.~(\ref{gamma}) because $\mu$ corresponds to

206: the mean ISI and $\kappa$ is orthogonal to it in the sense of information

207: geometry \cite{amari2,amari3}.

208: The proof is shown in APPENDIX \ref{appendixA}.

209: We call the parameters of a gamma distribution coordinates because we regard

210: the family of gamma distributions as manifold.

211: We would like to define randomness as information orthogonal to the firing

212: rate.

213: Therefore, we regard $\kappa$ as randomness in what follows.

214: We generate spike trains having different intrinsic randomness by using the

215: gamma distributions with different values of $\kappa$.

216:

217: \section{\label{sec3}$L_V$ and $C_V$}

218:

219: \begin{figure}[t]

220: \includegraphics[width=70mm]{TDGlv.ps}

221:  \caption{\label{TDGlv}$L_V$ for doubly stochastic gamma process with various

222: values of time constant $\tau$ and rate amplitude $\Delta$.}

223: \end{figure}

224:

225: The measure of local variation proposed by Shinomoto et al. \cite{shinomoto}

226: is defined as

227: \begin{equation}

228: L_V = \frac{1}{n-1}\sum_{i=1}^{n-1} \frac{3(T_i-T_{i+1})^2}{(T_i+T_{i+1})^2},

229: \end{equation}

230: where $T_i$ denotes the i-th ISI in a spike train.

231: The coefficient ``3'' is multiplied so that $\overline{L_V}$ is 1 for a Poisson

232: process.

233: $L_V$ is large when consecutive ISIs differ.

234: It is dimensionless and invariant if all the ISIs are multiplied by a constant.

235: The conventional Cv is defined as \cite{holt}

236: \begin{equation}

237: C_V \equiv \frac{\sqrt{Var(T)}}{Ex(T)}.

238: \end{equation}

239: Next we examine the difference between $L_V$ and $C_V$ and calculate $L_V$ and

240: $C_V$ for the rate modulated gamma process.

241:

242: We define a rate modulated gamma process as an extension of a gamma

243: distribution where the firing rate, $\lambda(t)(=\frac{1}{\mu(t)})$, is

244: time-dependent while $\kappa$ is time-independent.

245: The spikes for the rate modulated gamma process are generated as follows

246: \cite{abbott,brown}.

247: Note that we consider only the case of integer $\kappa$.

248: A spike is generated with probability $\lambda(t) dt$ for every small time

249: step, dt.

250: To be precise, we generate a uniform random number and if it is less than

251: $\lambda(t) dt$, we generate a spike at that time step.

252: For the case where $\kappa$ is larger than 1, we keep every $\kappa$-th spike

253: and remove the others. What is left is the desired sequence.

254: In fact, for the case where $\lambda$ is constant over time, the spike

255: sequence generated in this way is equivalent to that generated from a renewal

256: gamma distribution with $\mu=\frac{1}{\lambda}$.

257:

258: Here we consider a doubly stochastic gamma process whose firing rate obeys the

259: Ornstein-Uhlenbeck process \cite{shinomoto6}.

260: We assume the firing rate, $\lambda$, satisfies

261: \begin{equation}

262: \frac{d\lambda}{dt}=-\frac{\lambda-\lambda_0}{\tau}+\Delta\sqrt{\frac{2}{\tau}}\xi(t),

263: \end{equation}

264: where $\xi$ is Gaussian white noise, $<\xi(t)>=0$, and $<\xi(t),\xi(t')>=\delta(t-t')$.

265:

266: \begin{figure}[t]

267: \includegraphics[width=70mm]{TDGcv.ps}

268:  \caption{\label{TDGcv}$C_V$ for doubly stochastic gamma process with various

269: values of time constant $\tau$ and rate amplitude $\Delta$.}

270: \end{figure}

271:

272: Fig.~\ref{TDGlv} and Fig.~\ref{TDGcv} show $L_V$ and $C_V$ with $\lambda_0=1$

273: for various values of time constant $\tau$ and rate amplitude $\Delta$.

274: For simplicity, we consider sufficiently long spike sequences and

275: assume that the values of $L_V$ and $C_V$ converge.

276: Fig.~\ref{TDGlv} shows that in the limit of a large time constant, the values

277: of $L_V$ converge to the value for the stationary case.

278: This means that the value of $L_V$ does not depend on the amplitude of the

279: firing rate and has one-to-one correspondence with $\kappa$ in this limit.

280: Fig.~\ref{TDGcv} shows that $C_V$ depends on both $\kappa$ and $\Delta$ and

281: does not have one-to-one correspondence with $\kappa$.

282: Therefore, $L_V$ is better than $C_V$ for discriminating the intrinsic

283: randomness of spike sequences.

284:

285: This attractive property seems to stem from the fact that $L_V$ is the sum of

286: the dimensionless terms of consecutive interspike intervals.

287: By ``dimensionless'' we mean that the numerator and denominator have the same

288: dimension.

289: Every term in $L_V$ is normalized locally by the average of two consecutive

290: interspike intervals instead of the global average.

291: Intuitively, because the firing rates for two consecutive interspike intervals

292: can be regarded as the same in the slow limit, terms should be the same as

293: those for the stationary case.

294: On the other hand, $C_V$ is the variance around the global mean of the ISIs

295: and can be large for both the case where the firing rate fluctuates

296: significantly and the case where the intrinsic randomness is large.

297: Therefore, we cannot distinguish the two cases based on the value of $C_V$.

298:

299: \section{\label{sec4}Measure of local variation}

300: We extend $L_V$ without losing its attractive property described in the

301: previous section and find a better measure of intrinsic randomness.

302: We do this by focusing on the ISI statistics and imposing three symmetry

303: conditions: (1) time translation invariance, (2) time-scale transformation

304: invariance, and (3) time inversion invariance.

305:

306: We assume the randomness of a spike train is constant over time and

307: define the extended $L_V$ as

308: \begin{equation}

309: \widetilde{L_V} = \frac{1}{n-1}\sum_{i=1}^{n-1} f(T_i,T_{i+1}),

310: \end{equation}

311: where $T_1,T_2,...T_n$ are the observed ISIs and $f(x,y)$ does not depend on

312: $i$ explicitly.

313: This form guarantees invariance under time translation ($i\rightarrow i+1$)

314: if $n$ is infinite.

315: Next, we assume that $f$ is invariant under the time-scale

316: transformation ( $T\rightarrow kT$).

317: This requires that the denominator and numerator of $f$ have the same

318: dimension.

319: For simplicity, we assume that the dimension is two, so $f$ can be written as

320: \begin{equation}

321: f(x,y) = \frac{c_1 x^2 + c_2 xy + c_3 y^2}{c_4 x^2 + c_5 xy + c_6 y^2},

322: \end{equation}

323: which includes the original $L_V$ as a specific case.

324: In addition, because we do not distinguish increases from decreases in the

325: firing rate in terms of randomness, we impose time inversion invariance and

326: require

327: \begin{equation}

328: f(x,y) = f(y,x).

329: \end{equation}

330: Thus, $f$ can be written as

331: \begin{equation}

332: f(x,y) = \frac{c_1 x^2 + c_2 xy + c_1 y^2}{c_4 x^2 + c_5 xy + c_4 y^2}.

333: \end{equation}

334: Note that the absolute value of $L_V$ does not matter in discriminant

335: analysis, and we can add (or multiply by) a constant to $f$.

336: Then, without loss of generality, $f$ can be written as

337: \begin{equation}

338: f(x,y) = \frac{xy}{x^2 + c_5 xy + y^2}.

339: \end{equation}

340: In addition, we can rewrite the denominator using $c=c_5+2$:

341: \begin{equation}

342: f(x,y) = \frac{xy}{(x-y)^2 + c xy}.

343: \label{f}

344: \end{equation}

345: Because each term in the denominator is non-negative, the necessary and

346: sufficient condition that the denominator always be positive is $c>0$.

347:

348: As a result, $\widetilde{L_V}$ can be written as

349: \begin{equation}

350: \widetilde{L_V}(c) = \frac{1}{n-1} \sum_{i=1}^{n-1} \frac{T_i T_{i+1}}{(T_i-T_{i+1})^2 + c T_i T_{i+1}}.

351: \end{equation}

352: Note that the original $L_V$ corresponds to the case of $c=4$.

353: In this way, the measures satisfying the symmetries have only one degree of

354: freedom and can be parametrized by a single parameter.

355:

356: \begin{figure}[t]

357: \includegraphics[width=70mm]{TDGlv1.ps}

358:  \caption{\label{lv1}$\widetilde{L_V}(1)$ for doubly stochastic gamma process

359: with various values of time constant $\tau$ and rate amplitude $\Delta$.}

360: \end{figure}

361:

362: The $\widetilde{L_V}$ should have one-to-one correspondence to $\kappa$

363: like $L_V$ because of its symmetries.

364: In fact, it has the same values as those for the stationary case in the

365: limit of a large time constant for the doubly stochastic gamma process.

366: Fig.~\ref{lv1} shows that $\widetilde{L_V}(1)$ is independent of the rate

367: amplitude, $\Delta$, and is a function of $\kappa$ in the limit.

368: The results for other values of $c$, for instance $\widetilde{L_V}(16)$,

369: remain the same.

370:

371: Thus, $\widetilde{L_V}(c)$ has one-to-one correspondence with $\kappa$.

372: However, this is not sufficient to make it a good measure.

373: We previously have considered only spike sequences with infinite length.

374: However, in practical experimental situations, data sizes are limited, and

375: $\widetilde{L_V}(c)$ varies widely by trial around the mean.

376: Similarly, if spike sequences are generated using a gamma distribution,

377: $\widetilde{L_V}(c)$ varies by trial for the finite spike sequence.

378: In the discrimination of intrinsic randomness, roughly speaking, the smaller

379: the variance, the higher the hitrate.

380: Thus, we next search for an optimal value of parameter c, where the variance

381: is the smallest.

382:

383: \section{\label{sec5}Mutual information maximization principle}

384: We use the mutual information maximization principle to determine an optimal

385: measure.

386: We assume that the firing rate is constant over time and spike sequences are

387: generated by a gamma distribution, as shown in Sec.~\ref{sec2}.

388: As shown in Sec.~\ref{sec3}, $\widetilde{L_V}$ does not depend on $\mu$.

389: Here we set $\mu=1$.

390: We consider the stationary case because it is tractable and

391: can be regarded as the slow change limit of the firing rate.

392: We show in Sec.~\ref{TD} that the optimal value of $c$ for the nonstationary

393: case does not differ significantly from that for the stationary case.

394:

395: The optimal parameter value is determined so as to maximize the mutual

396: information between the coefficients and randomness.

397: Here we assume that a spike train consists of 100 ISIs because this is the

398: typical length available from laboratory experiments.

399: $\widetilde{L_V}$ can be computed for a spike train, and the value of

400: $\widetilde{L_V}$ varies among spike trains.

401: Even if spike trains are generated from the same distribution, the values of

402: $\widetilde{L_V}$ can differ because the length of a spike train is finite.

403: As a result, the distribution of $\widetilde{L_V}$ can be obtained for one

404: parameter set of the gamma distribution.

405: Thus, two distributions can be obtained from two types of spike trains.

406: The mutual information can be computed from the two distributions.

407: The bigger the mutual information, the better randomness ($\kappa$) can be

408: discriminated based on the observed $\widetilde{L_V}$.

409:

410: Mutual information is calculated as follows.

411: Spike trains are generated from two gamma distributions with equal probability,

412: $\frac{1}{2}$.

413: The two distribution have different $\kappa$.

414: All the ISIs in a spike train are generated by using the same distribution.

415: We denote the distribution of $\widetilde{L_V}$ generated from the

416: $i(=1,2)$-th gamma distribution as $p(x|i)$;

417: $p(x)(=\frac{1}{2}p(x|1)+\frac{1}{2}p(x|2))$ represents the distribution of

418: $\widetilde{L_V}$ with no distinction of the source.

419: The entropy is defined as

420: \begin{equation}

421: H = -\int p(x) \ln p(x) dx.

422: \end{equation}

423: The noise entropy is defined as

424: \begin{equation}

425: H_{n}=-\frac{1}{2}\int p(x|1)\ln p(x|1) dx -\frac{1}{2}\int p(x|2)\ln p(x|2)dx.

426: \end{equation}

427: The mutual information is the difference,

428: \begin{equation}

429: I_m = H - H_{n}.

430: \end{equation}

431:

432: The mutual information is the reduction in uncertainty about the spike trains

433: due to the knowledge of $\widetilde{L_V}$.

434: Mutual information is $0$ if two distributions of $\widetilde{L_V}$ are

435: identical so that they cannot be distinguished .

436: Mutual information is $1$ if two distributions of $\widetilde{L_V}$ have no

437: overlap, and only one sample of $\widetilde{L_V}$ is needed to distinguish

438: them.

439:

440: In the next section, we will show the results of a Monte Carlo simulation.

441: We calculated mutual information as a function of $c$ for various sets of

442: randomness, $\kappa_1$ and $\kappa_2$.

443:

444: \begin{figure}[t]

445:   \includegraphics[width=70mm]{minfo1.ps}

446:   \caption{\label{minfo1}Mutual information with $\kappa_1=1,\kappa_2=1.1$. Open circle denotes peak. Dotted line is for $c=4$ corresponding to original $L_V$. Mutual information has a peak with $c$ larger than $4$.}

447: \end{figure}

448:

449: \section{\label{sec6}Results}

450: Fig.~\ref{minfo1} shows the mutual information with $\kappa_1=1$ and

451: $\kappa_2=1.1$; $\kappa_1$ and $\kappa_2$ are the shape parameters of two

452: gamma distributions and $c$ is the parameter in $\widetilde{L_V}(c)$.

453: We set the number of ISIs per spike train, n, to 100.

454: The mutual information has a peak, whose location we denote by $c_{peak}$.

455: The vertical line represents $c=4$, which corresponds to the original $L_V$.

456: Since $c_{peak} (\approx16)$ is bigger,

457: the optimal coefficient in this case is not the original $L_V$ but

458: $\widetilde{L_V}(16)$.

459: However, $c_{peak}$ depends on various parameters, and we will examine how it

460: depends on the number of ISIs per spike train, $\kappa_1$ and $\kappa_2$, in

461: what follows.

462: We can use the maximum likelihood estimator of $\kappa$ as a measure instead

463: of $L_V$, and the peak value of the mutual information for $\kappa$ is 0.097.

464: (For the maximum likelihood estimator, see Appendix \ref{appendixB}.)

465: The peak value for $L_V$ is about 0.066, which is smaller than that for the

466: maximum likelihood estimator.

467: We nonetheless use $L_V$ because the maximum likelihood estimator cannot be

468: applied to the nonstationary case.

469: In the cases where the firing rate is time-dependent, the mutual information

470: for $L_V$ can be much higher than that for the maximum likelihood estimator,

471: as we will show in Sec.~\ref{TD}.

472:

473: \begin{figure}[t]

474:   \includegraphics[width=70mm]{minfo-nspike.ps}

475:   \caption{\label{minfo-nspike}Mutual information for various numbers of ISIs

476: per spike sequence with $\kappa_1=1,\kappa_2=1.1$.

477: Open circles denote peaks.

478: Dotted line is for $c=4$ corresponding to original $L_V$.

479: Peak location almost does not depend on number of ISIs.}

480: \end{figure}

481:

482: Fig.~\ref{minfo-nspike} shows the mutual information for various numbers of

483: ISIs per spike train.

484: While the mutual information increases with the number of ISIs,

485: the peak location remains almost the same.

486: Although we show only the case for $\kappa_1=1,\kappa_2=1.1$,

487: the other cases have similar results.

488: Therefore, we set the number of ISIs per spike train to $100$.

489:

490: Fig.~\ref{minfo-0to1} shows the mutual information with $\kappa_1=1$ and

491: various $\kappa_2$.

492: As $d\kappa(=\kappa_2-\kappa_1)$ increases, the mutual information approaches

493: $1$.

494: The peak location remains almost unchanged $(c_{peak}\approx16)$.

495: For $\kappa_2=3.2$, the mutual information is almost $1$, and

496: the two distributions are completely distinguishable.

497: In general, $c_{peak}$ largely depends on $\kappa_1$ and is almost independent

498: of $\kappa_2$.

499:

500: \begin{figure}[t]

501:   \includegraphics[width=70mm]{minfo-0to1.ps}

502:   \caption{\label{minfo-0to1}Mutual information for various $\kappa_2$ with $\kappa_1=1$.

503: Lines are for $\kappa_2=0.1, 0.2, 0.4, 0.8, 1.6$ and $3.2$ from below.

504: Open circles denote peaks.

505: Dotted line is for $c=4$ corresponding to original $L_V$.

506: Peak location almost does not depend on $\kappa_2$.}

507: \end{figure}

508:

509: Fig.~\ref{minfo-peak} shows the mutual information with $\kappa_2=1.3\kappa_1$

510: and various $\kappa_1$.

511: The peak location decreases with increasing $\kappa_1$.

512: For $\kappa_1=16$, the original $L_V$ is nearly optimal

513: ($c_{peak}\approx4\sqrt{2}$).

514: Since reported experimental data can be well fitted by a gamma distribution

515: with $\kappa\approx16$ \cite{baker}, $L_V$ seems to be optimal not for the

516: Poisson data but for the experimental data.

517:

518: \begin{figure}[t]

519:   \includegraphics[width=70mm]{minfo-peak.ps}

520:   \caption{\label{minfo-peak}Mutual information for various $\kappa_1$ with $\kappa_2=1.3\kappa_1$.

521: Open circles denote peaks.

522: Dotted line is for $c=4$ corresponding to original $L_V$.

523: Peak location decreases as $\kappa_1$ increases.}

524: \end{figure}

525:

526: \section{\label{sec7}theoretical analysis}

527: In this section we analyze the property of the mutual information

528: theoretically.

529: For simplicity, we do two approximations.

530:

531: First, we consider the limit of a large number of ISIs per spike train and

532: approximate the distribution of $L_V$ by using the normal distribution.

533: Although this approximation is not good for $c\approx 0$,

534: the peak location is far larger than 0 and can be discussed within this

535: approximation.

536:

537: In addition, we consider the limit of small $d\kappa$.

538: In the limit, the mutual information can be written using the Fisher

539: information \cite{lehmann} as

540: \begin{equation}

541: I_m = \frac{1}{8}J(p(x,\kappa))d\kappa^2,

542: \label{eighth}

543: \end{equation}

544: where the Fisher information is defined as

545: \begin{equation}

546: J(p(x,\kappa))= Ex((\frac{d\log p(x,\kappa)}{d\kappa})^2).

547: \end{equation}

548: This relation can be easily derived.

549: We represent two $L_V$ distributions as

550: \begin{equation}

551: p_1(x)=\frac{1}{\sqrt{2\pi\sigma(\kappa)^2}}e^{-(x-m(\kappa))^2/2\sigma(\kappa)^2}

552: \end{equation}

553: and

554: \begin{equation}

555: p_2(x)=\frac{1}{\sqrt{2\pi\sigma(\kappa+d\kappa)^2}}e^{-(x-m(\kappa+d\kappa))^2/2\sigma(\kappa+d\kappa)^2} .

556: \end{equation}

557: Inserting these equations into the definition of the mutual information and

558: expanding by $d\kappa$ to the second order lead to the relation.

559:

560: The Fisher information can be explicitly written as

561: \begin{equation}

562: J=\frac{m'(\kappa)^2+2\sigma'(\kappa)^2}{\sigma(\kappa)^2}.

563: \end{equation}

564: Because $\sigma^2$ is inversely proportional to $N$, $\sigma$ can be written as

565: \begin{equation}

566: \sigma = \frac{\sigma_0}{\sqrt{N}}.

567: \end{equation}

568: The Fisher information can then be approximated as

569: \begin{eqnarray}

570: J/N &=& \frac{m'(\kappa)^2+2\frac{1}{N}\sigma_0'(\kappa)^2}{\frac{1}{N}\sigma_0(\kappa)^2}\frac{1}{N}\nonumber\\

571:   &\simeq& \frac{m'(\kappa)^2}{\sigma_0(\kappa)^2},

572: \end{eqnarray}

573: where $m'$ and $\sigma_0$ depend on only $\kappa$ and $c$.

574: As a result, the mutual information can be written as

575: \begin{equation}

576: I_m = \frac{1}{8} \frac{m'(\kappa,c)^2}{\sigma_0(\kappa,c)^2} N d\kappa^2.

577: \end{equation}

578:

579: Thus, $I_m$ is proportional to $N$ and $d\kappa^2$.

580: The $c$ dependency of $I_m$ stems from only $m'$ and $\sigma_0$.

581: Therefore, when $N$ or $d\kappa$ changes, the absolute value of the mutual

582: information changes while the peak location does not change.

583: This is consistent with our numerical results in which the peak location

584: did not depend on $N$ and $d\kappa$.

585: The peak location can be explained by an interplay of $m'$ and $\sigma_0$.

586: However, $m$ and $\sigma_0$ cannot be predicted solely by our theory.

587: Numerical calculations are necessary for finding the peak location.

588:

589: \section{\label{TD}nonstationary case}

590: We considered the discrimination of randomness for the stationary gamma

591: process in the previous sections.

592: However, it has been reported that experimental data can be explicable by

593: the rate-modulated gamma process \cite{baker}.

594: Therefore, we consider the rate-modulated gamma process in this section.

595: We show two simple cases in which the firing rate decreases monotonically

596: or changes stepwise.

597:

598: \subsection{monotonically decreasing firing rate}

599:

600: \begin{figure}[t]

601:   \includegraphics[width=70mm]{td-4.ps}

602:   \caption{\label{td-4}Mutual information for monotonically decreasing firing rate for various $r$ with $\kappa_1=4$ and $\kappa_2=5.2$.}

603: \end{figure}

604:

605: We consider a simple rate-modulated case and show that the peak location of

606: the mutual information, $c_{peak}$, tends not to change if the firing rate

607: fluctuates significantly.

608: We generate the ISIs by again using a gamma distribution.

609: We assume that the mean ISI increases monotonically.

610: For simplicity, we set $\mu_i=r^i$, where $\mu_i$ denotes the mean of the

611: i-th ISI.

612: We simply align $n$ ISIs to make a single spike train as before.

613: The value of $\kappa$ does not change within the train.

614: The mutual information is calculated for two spike trains with different

615: values of $\kappa$.

616:

617: Fig.~\ref{td-4} shows the mutual information for $\kappa_1=4$ and

618: $\kappa_2=5.2$.

619: The peak location decreases gradually from the stationary value as r increases.

620: However, only extreme cases, in which the firing rates decrease more than 1.5

621: times one after another, are plotted.

622: For realistic cases, $c_{peak}$ changes only slightly.

623: For example, the ratio between the last and first mean ISI is

624: \begin{equation}

625: \frac{\mu_n}{\mu_1}=r^{n-1},

626: \end{equation}

627: and the ratio is 2.678033 for $r=1.01$ and $n=100$ and 12527.83 for

628: $r=1.1$ and $n=100$.

629: This illustrates that the $1.5$ used for r is extremely large.

630: Similar results were obtained for different values of $\kappa$, so

631: $c_{peak}$ apparently tends not to change even if the firing rate fluctuates.

632: This result is not restricted to the decreasing firing rate case.

633: For example, the mean $\widetilde{L_V}$ remains the same if a small and a

634: large mean ISI appear alternately instead of the firing rate increasing

635: monotonically.

636: It thus appears that the peak location of the mutual information is almost

637: independent of the firing rate if the variation in the firing rate is small.

638:

639: \subsection{stepwise changing firing rate}

640: % especially if the change of the firing rate is so slow that the firing rates for consecutive two ISIs are almost the same.

641: \begin{figure}[t]

642:   \includegraphics[width=70mm]{stairs.ps}

643:   \caption{\label{stairs}

644: Schematic diagram of stepwise increasing firing rate. Firing rate shifts from $1$ to $\lambda_2$ at $t=50$.}

645: \end{figure}

646:

647: So far we have considered only $\widetilde{L_V}(c)$.

648: However, the maximum likelihood estimator, $\hat{\kappa}$, should be better

649: for the stationary case.

650: Here we consider the case of a stepwise changing firing rate to show why

651: we favor $\widetilde{L_V}(c)$ nonetheless.

652: In a word, $\hat{\kappa}$ is not good for the nonstationary case because it is

653: the maximum likelihood estimator for the stationary case, as shown in Appendix

654: \ref{appendixB}.

655: In principle, the firing rate at every small time bin can be estimated for the

656: nonstationary case.

657: However, doing so requires many spike sequences and the firing rate profile

658: must be the same for all the sequences.

659: Therefore, it is not practical for many realistic cases.

660: Instead we consider simple measures like $\widetilde{L_V}(c)$ and

661: $\hat{\kappa}$ even in the nonstationary case.

662: In this section, we compare $L_V$ and $\hat{\kappa}$ for the nonstationary

663: case.

664:

665: Consider the case in which the firing rate is stepwise increasing, as shown in

666: the Fig.~\ref{stairs}.

667: At time $t=50$, it shifts from $1$ to $\lambda_2$.

668: Two types of spike trains, with $\kappa_1=16$ and $\kappa_2=20$, are generated

669: based on the firing rate profile.

670: Fig.~\ref{step-mle} shows the mutual information for these trains

671: when $L_V$ or $\hat{\kappa}$ is used as a measure.

672: The mutual information for $L_V$ is independent of $\lambda_2$ in the limit of

673: a large number of ISIs per train.

674: The reason is that $L_V$ is independent of the firing rate for the stationary

675: case and in this case the firing rate is constant over time except for the

676: discontinuous point.

677: The contribution of the term in $L_V$ that cross the discontinuous point is

678: $O(1/n)$ and is small if the number of ISIs is large enough.

679: We plotted the value for the stationary case, neglecting the contribution for

680: simplicity.

681: On the other hand, the mutual information for $\hat{\kappa}$ decreases as

682: $\lambda_2$ increases.

683: For example, when the firing rate increases 1.5 times, the mutual information

684: for $L_V$ is larger than that for $\hat{\kappa}$.

685:

686: \begin{figure}[t]

687:   \includegraphics[width=70mm]{step-mle.ps}

688:   \caption{\label{step-mle}Mutual information for stepwise increasing firing rate with $\kappa_1=16$ and $\kappa_2=20$.}

689: \end{figure}

690:

691: Thus, for a stepwise increasing firing rate, $L_V$ is better than

692: $\hat{\kappa}$.

693: This type of sudden change can be observed when a visual stimulus is presented

694: to a monkey at a given time.

695: The result remains almost the same for the stepwise firing rate with multiple

696: discontinuous points in the limit of a large number of ISIs.

697: In addition, $\hat{\kappa}$ depends on both $\kappa$ and the amplitude of the

698: firing rate, as shown in Sec.~\ref{sec2} for $C_V$.

699: Therefore, $L_V$ is a better measure of intrinsic randomness.

700:

701: \section{summary and discussion}

702: In this study, we sought a measure more effective than the local variation of interspike

703: intervals, $L_V$, in discriminating spike trains based on the degree of intrinsic spiking

704: irregularity.

705:

706: We first compared characteristics of the conventional coefficient of variation, $C_V$,

707: and the local variation, $L_V$. The coefficient of variation, $C_V$, measures a global

708: variability of ISIs, and therefore depends on not only the local irregularity of ISIs but

709: also the rate fluctuation, which would naturally manifest itself in \textit{in vivo}

710: neuronal spiking conditions. In contrast, the local variation, $L_V$, measures only a

711: stepwise variability of ISIs, and therefore does not depends significantly on a rate

712: fluctuation. It was revealed that $L_V$ is superior to $C_V$ in detecting some intrinsic

713: spiking irregularity specific to individual neurons \textit{in vivo}

714: \cite{shinomoto,shinomoto7}.

715:

716: For a spike train of a finite number of ISIs derived from a given point process, the value

717: of $L_V$ as well as $C_V$ varies from trial to trial. The goodness of a

718: coefficient is quantified by its narrow distribution of values among

719: spike trains derived from the same point process and

720: the small overlap of this distribution with the distribution obtained

721: from spike trains derived from a different point process. In other words,

722: we sought a new coefficient that maximizes the mutual information between

723: spike sequences created from different renewal gamma processes.

724:

725: For this purpose, we adopted a rational expression of quadratic functions of

726: consecutive interspike intervals that is the same form as $L_V$, and

727: searched for the optimal parameter of the coefficient.  The optimal parameter of the

728: coefficient depends on the choice of the point processes that are to be

729: discriminated. It was found that the original $L_V$ is not optimal for near random

730: (Poisson) point processes, but is optimal for more regular spike trains. In this way, if we

731: have preliminary knowledge of the spiking irregularities of the point processes,

732: we are able to propose a better coefficient than the original $L_V$ for the purpose of

733: discriminating spike trains.

734:

735: We generated spike sequences entirely by using a stationary or rate-modulated gamma process.

736: The reason is as follows.

737: The Poisson process, in which the firing rate is represented as a function of time from

738: stimulus onset, is widely used in spike data analysis \cite{richmond}.

739: However, the statistical properties of spike sequences cannot be fully

740: captured by the rate-modulated Poisson process \cite{berry,reich,keat,pillow}.

741: In other words, spike probability depends on the past spike times due to the

742: so-called refractory period.

743: A gamma process is a Poisson process with an additional parameter representing

744: a kind of refractory period.

745: Baker et al. showed that the spike pattern recorded from primary and

746: supplementary motor areas is explicable by a gamma process \cite{baker}.

747:

748: We considered only mutual information as a measure for discriminating

749: two spike trains.

750: However, the Kullback-Leibler divergence $D(p_1,p_2)$ is a well-known measure

751: of the dissimilarity of two distributions, too.

752: It is also proportional to the Fisher information,

753: $D=\frac{1}{2}J(p(x,\kappa))d\kappa^2$,

754: under the same approximation as described in Sec.~\ref{sec7}.

755: Note that the coefficient is $\frac{1}{2}$ instead of $\frac{1}{8}$, as seen in

756: Eq.~(\ref{eighth}), for the mutual information.

757: However the coefficient is irrelevant to the peak location.

758: Thus, the Kullback-Leibler divergence leads to the same results as mutual

759: information.

760: Nontheless, we used mutual information because it is symmetrical in terms of

761: two distributions.

762: The Kullback-Leibler divergence is not symmetrical.

763: Its value changes if the two distributions are interchanged.

764: The Kullback-Leibler divergence becomes symmetrical in the limit of a small

765: difference of two distributions, where it is proportional to the Fisher

766: information.

767:

768: In previous studies, various measures were computed for mathematical models

769: \cite{lansky,feng,shinomoto3}.

770: However, the focus was only on the expectations for the measures.

771: In discrimination tasks, the variance of a measure is more important than the expectation.

772: For example, consider the case in which the expectations of a measure for two

773: different types of spike sequences differ considerably.

774: If the variances are very large, discriminating the two sequences is difficult.

775: In addition, if the definition of a measure is changed, for example,

776: multiplied or added to by a constant, the expectation changes, but the mutual

777: information never changes.

778: Therefore, in this paper we focused on the variance and searched for the

779: measure that maximizes the mutual information.

780:

781:

782: \appendix

783: \section{\label{appendixA}Orthogonal coordinates for gamma distribution}

784: We show that $\kappa$ and $\mu$ are orthogonal coordinates in the sense of

785: information geometry.

786: The theory of information geometry is described elsewhere \cite{amari2,amari3},

787: and there are applications to neuroscience \cite{tatsuno,tatsuno2,nakahara}.

788:

789: For the purpose of proving the orthogonality, it suffices to demonstrate that

790: the Fisher information matrix is diagonal.

791: The Fisher information matrix is defined as

792: \begin{equation}

793: g_{ij} = \int^{\infty}_{0}\frac{\partial \log p(T)}{\partial \xi^i}\frac{\partial \log p(T)}{\partial \xi^j}p(T)dT,

794: \end{equation}

795: where $\xi^1=\mu$ and $\xi^2=\kappa$.

796: The log-likelihood can be written as

797: \begin{equation}

798: \log p(T) = \kappa \log(\frac{\kappa}{\mu}) + (\kappa -1) \log T - \log \Gamma(\kappa) - \frac{T \kappa}{\mu}.

799: \end{equation}

800: The derivatives of the log-likelihood are

801: \begin{equation}

802: \frac{\partial \log p(T)}{\partial \mu} = - \frac{\kappa}{\mu} + \frac{T \kappa}{\mu ^2}

803: \end{equation}

804: and

805: \begin{equation}

806: \frac{\partial \log p(T)}{\partial \kappa} = \log \frac{\kappa}{\mu} +1 + \log T - \psi (\kappa) - \frac{T}{\mu},

807: \end{equation}

808: where $\psi(\kappa)=(\log\Gamma(\kappa))'$.

809: The matrix elements can be written as

810: \begin{equation}

811: g_{\mu\mu}=\frac{\kappa}{\mu ^2},

812: \end{equation}

813: \begin{equation}

814: g_{\mu\kappa}=g_{\kappa\mu}=0,

815: \end{equation}

816: \begin{equation}

817: g_{\kappa\kappa}=\psi(\kappa)' - \frac{1}{\kappa}.

818: \end{equation}

819: Thus, the Fisher information matrix is diagonal at every point.

820: According to the theory of information geometry, it is always possible to

821: choose orthogonal coordinates for an exponential family of distributions that

822: includes the gamma distribution as a specific case.

823:

824: The Fisher information matrix has the meanings described below.

825: When $\mu$ and $\kappa$ are estimated from a finite number of samples,

826: the estimated values are not necessarily the same as the true value.

827: The value of the maximum likelihood estimator varies depending on the sample

828: sets, and its variation around the true value can be approximated by a normal

829: distribution whose variance is the inverse of the Fisher matrix if the sample

830: size is sufficiently large \cite{lehmann}.

831: Thus, the diagonality of the Fisher matrix means that the variations in the

832: maximum likelihood estimators of $\mu$ and $\kappa$ are uncorrelated.

833: %Note that $\hat{\mu}$ and the original $L_V$ are also uncorrelated.

834:

835: \section{\label{appendixB}Maximum likelihood estimation for gamma distribution}

836: Let $T_1,T_2,...,T_n$ be observed ISIs.

837: We would like to estimate the true values of $\mu$ and $\kappa$ from them.

838: The log-likelihood is defined as

839: \begin{equation}

840: l \equiv \ln(p(T_1)p(T_2)...p(T_n))

841: \end{equation}

842: and can be written as

843: \begin{equation}

844: l = n \kappa \ln \frac{\kappa}{\mu} - n \Gamma(\kappa) + (\kappa -1) \sum \ln T_i - \frac{\kappa}{\mu} \sum T_i .

845: \end{equation}

846: The maximum likelihood estimators must satisfy both

847: $\frac{\partial l}{\partial \mu} = 0$ and

848: $\frac{\partial l}{\partial \kappa} = 0$.

849: The derivatives of the log-likelihood are

850: \begin{equation}

851: \frac{\partial  l}{\partial \mu} = \frac{\kappa}{\mu^2}\sum T_i - n \frac{\kappa}{\mu}

852: \end{equation}

853: and

854: \begin{equation}

855: \frac{\partial l}{\partial \kappa} = \sum \ln T_i - \frac{1}{\mu}\sum T_i + n \ln\frac{\kappa}{\mu} + n -n\psi(\kappa).

856: \end{equation}

857: Then, $\hat{\mu}$ can be explicitly obtained as

858: \begin{equation}

859: \hat{\mu} = \frac{1}{n}\sum T_i,

860: \end{equation}

861: and $\hat{\kappa}$ must satisfy

862: \begin{equation}

863: \frac{1}{n}\sum \ln T_i - \ln\frac{1}{n}\sum T_i = \psi(\hat{\kappa})-\ln\hat{\kappa},

864: \label{mle}

865: \end{equation}

866: where $\psi(\hat{\kappa})=(\log\Gamma(\hat{\kappa}))'$.

867: This equation cannot be solved explicitly for $\hat{\kappa}$.

868: However, the right side of the equation is a monotonic function of

869: $\hat{\kappa}$, and we can obtain $\hat{\kappa}$ by numerical iteration.

870:

871: Instead of a lengthy numerical iteration, we can use the moment estimator.

872: According to Eq.~(\ref{expectation}), we can estimate the true $\kappa$ from

873: the sample mean and variance:

874: \begin{equation}

875: \kappa=\frac{Ex(T)^2}{Var(T)}.

876: \label{cv}

877: \end{equation}

878: In fact, the right side of Eq.~(\ref{cv}) can be rewritten as

879: $\frac{1}{C_V^2}$.

880: Thus, we can regard $C_V$ as a moment estimator.

881: However, the moment estimator is worse than the maximum likelihood estimator,

882: especially when $\kappa$ is close to $1$ \cite{cox}.

883: Nevertheless, it is good as a first approximation, and we can use it as the

884: initial value of the numerical iteration in maximum likelihood estimation.

885:

886: Another way to avoid numerical iteration is to use the left side of

887: Eq.~(\ref{mle}) as a measure.

888: In discriminant analysis, we do not need to estimate $\kappa$ because

889: the left side of Eq.~(\ref{mle}) has one-to-one correspondence with

890: $\hat{\kappa}$ and has the same information as $\hat{\kappa}$.

891:

892: %\begin{acknowledgments}

893: %We are grateful to A, B, and C for discussion.

894: %The present work is supported by \dots.

895: %\end{acknowledgments}

896:

897: %\newpage %Just because of unusual number of tables stacked at end

898:

899: \bibliography{pre}% Produces the bibliography via BibTeX.

900:

901: \end{document}

902: %

903: % ****** End of file apssamp.tex ******

904:

905:

906:

907:

908:

909:

910:

911:

912:

913:

914:

915:

916:

917:

918:

919:

920:

921:

922: