q-bio0505024/pre.tex
1: %
2: % ****** Start of file apssamp.tex ******
3: %
4: %   This file is part of the APS files in the REVTeX 4 distribution.
5: %   Version 4.0 of REVTeX, August 2001
6: %
7: %   Copyright (c) 2001 The American Physical Society.
8: %
9: %   See the REVTeX 4 README file for restrictions and more information.
10: %
11: % TeX'ing this file requires that you have AMS-LaTeX 2.0 installed
12: % as well as the rest of the prerequisites for REVTeX 4.0
13: %
14: % See the REVTeX 4 README file
15: % It also requires running BibTeX. The commands are as follows:
16: %
17: %  1)  latex apssamp.tex
18: %  2)  bibtex apssamp
19: %  3)  latex apssamp.tex
20: %  4)  latex apssamp.tex
21: %
22: 
23: %\documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
24: \documentclass[preprint,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
25: 
26: % Some other (several out of many) possibilities
27: %\documentclass[preprint,aps]{revtex4}
28: %\documentclass[preprint,aps,draft]{revtex4}
29: %\documentclass[prb]{revtex4}% Physical Review B
30: 
31: \usepackage{graphicx}% Include figure files
32: \usepackage{dcolumn}% Align table columns on decimal point
33: \usepackage{bm}% bold math
34: 
35: %\nofiles
36: 
37: \begin{document}
38: 
39: %\setlength{\baselineskip}{1cm}
40: 
41: %\preprint{APS/123-QED}
42: 
43: \title{Search for optimal measure for discriminating spike trains with different randomness}
44: 
45: \author{Keiji Miura}
46: \email{miura@ton.scphys.kyoto-u.ac.jp}
47: \affiliation{Department of Physics, Graduate School of Sciences, Kyoto University Kyoto 606-8502, Japan}
48: \affiliation{``Intelligent Cooperation and Control'', PRESTO, JST, c/o The University of Tokyo, Chiba 277--8561, Japan\\}
49: 
50: \author{Masato Okada}
51: \email{okada@k.u-tokyo.ac.jp}
52: \affiliation{Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8561, Japan}
53: \affiliation{``Intelligent Cooperation and Control'', PRESTO, JST, c/o The University of Tokyo, Chiba 277-8561, Japan\\}
54: \affiliation{Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Saitama 351-0198, Japan}
55: 
56: \author{Shigeru Shinomoto}
57: \email{shinomoto@scphys.kyoto-u.ac.jp}
58: \affiliation{Department of Physics, Graduate School of Sciences, Kyoto University Kyoto 606-8502, Japan}
59: 
60: \date{\today}% It is always \today, today,
61:              %  but any date may be explicitly specified
62: 
63: \begin{abstract}
64: We wish to discriminate spike sequences based on the degree of irregularity. For this 
65: purpose, we search for a rational expressions of quadratic functions of 
66: consecutive interspike intervals that efficiently measures 
67: spiking irregularity. Under natural assumptions, the functional form of the coefficient 
68: can be parameterized by a single parameter. The parameter is determined so as to 
69: maximize the mutual information between the distributions of coefficients computed for 
70: spike sequences derived from different renewal point processes. We find that the local 
71: variation of interspike intervals, $L_V$ (Neural Comput. Vol. 15, pp. 2823-42, 2003), is 
72: nearly optimal for whose intrinsic irregularity is close to that of experimental data.
73: % Valid PACS numbers may be entered using the \verb+\pacs{#1}+ command.
74: \end{abstract}
75: 
76: \pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
77:                              % Classification Scheme.
78: %\keywords{Suggested keywords}%Use showkeys class option if keyword
79:                               %display desired
80: % gamma distribution, mutual information, neuroscience, information geometry, inter spike intervals
81: 
82: \maketitle
83: 
84: \section{\label{sec1}Introduction}
85: 
86: It is important to extract as much information as possible from spike sequences when 
87: looking for correlations between animal behaviors and neuronal activities 
88: \cite{georgopoulos,miyashita,funahashi,fujita} or controlling prosthetic apparatuses by 
89: neuronal activities \cite{chapin}. In many cases, however, only the mean firing rate is
90: considered and the timing information is not taken into account. Consideration of detailed
91: temporal structure of each spike train would help to decode brain signals more efficiently.
92: We would like to propose a measure, which augments the information provided by the mean
93: firing rate.
94: 
95: Coefficients that are functions of the interspike intervals (ISIs) are effective 
96: in detecting a spiking irregularity from a short spike train. For instance, the coefficient 
97: of variation, $C_V$, is widely adopted as a measure of the variance of ISIs
98: \cite{cox,abbott,shinomoto1,shinomoto4}.
99: Recently, a measure of the local variation of interspike intervals, 
100: $L_V$, was proposed \cite{shinomoto}, as a natural extension of 
101: $C_{V2}$ which was designed to detect a stepwise variation of consecutive ISIs 
102: \cite{holt}.  An analysis using $L_V$ revealed that \textit{in vivo} spike sequences are 
103: not uniformly random, but possess specific characteristics that vary among individual 
104: neurons. In addition, it was found that the neocortex consists of heterogeneous neurons 
105: that differ not only from one cortical area to another, but also from one layer to another 
106: in their spiking patterns \cite{shinomoto7}.
107: 
108: In the present study, we try to modify $L_V$ in an attempt to find a better measure for 
109: discriminating spike sequences based on the degree of irregularity. Namely, we examine 
110: rational expressions of quadratic functions of consecutive interspike intervals for 
111: suitability as coefficients for measuring spiking irregularity. Under reasonable 
112: assumptions, the functional form of the coefficient is found to be parameterized by a 
113: single parameter. The parameter is determined so as to maximize the mutual 
114: information between the distributions of coefficients computed for finite size sample 
115: sequences derived from different renewal gamma processes. It is found that $L_V$ is 
116: not optimal for nearly random Poisson spike trains but optimal for more regular spike 
117: trains.
118: 
119: In Sec.~\ref{sec2}, we explain how we generated spike sequences with the same 
120: firing rate but different intrinsic irregularity. We show that a gamma 
121: distribution suffices for that purpose and that two parameters in the gamma 
122: distribution can be chosen as orthogonal coordinates.
123: In Sec.~\ref{sec3} we explain $L_V$ and compare it with $C_V$.
124: We show that attractiveness of $L_V$ stems from its symmetries.
125: In Sec.~\ref{sec4} we extend $L_V$ and show that, under reasonable assumptions,
126: the extension of $L_V$ can be parameterized by a single parameter.
127: In Sec.~\ref{sec5} we explain how we determined the optimal value of the 
128: parameter using the maximization principle of mutual information.
129: In Sec.~\ref{sec6}, we determine the optimal value numerically.
130: In Sec.~\ref{sec7}, we describe our theory, developed using a Gaussian
131: approximation, for explaining the results.
132: In Sec.~\ref{TD}, we discuss two non-stationary cases.
133: 
134: 
135: \section{\label{sec2}Generating spike trains with different randomness}
136: In this section, we explain how to generate spike trains with the same firing
137: rate but different randomness.
138: 
139: There are many ways to generate spike trains artificially.
140: For example, we can generate spike trains by using a network of spiking neuron
141: models.
142: However, we do not need to describe precise spike timing here, and 
143: a simple mechanism is desirable.
144: Therefore, we assume that the mechanism is a renewal process and 
145: that the inter spike interval (ISI) follows a gamma distribution \cite{cox}, 
146: which is described as
147: \begin{equation}
148: p(T) = \frac{1}{\Gamma(\kappa)}\left(\frac{\kappa}{\mu}\right)^\kappa T^{\kappa-1}e^{-\frac{\kappa}{\mu} T},
149: \label{gamma}
150: \end{equation}
151: where $T$ denotes an ISI.
152: We generate ISIs from the distribution and align them to make a spike train.
153: The mean and variance of the ISIs are
154: \begin{equation}
155: \left\{ \begin{array}{c}
156: Ex(T)=\mu\\
157: Var(T)=\frac{\mu^2}{\kappa}.
158: \label{expectation}
159: \end{array}\right.
160: \end{equation}
161: The mean firing rate is obtained by taking the inverse of the mean ISI 
162: \cite{lansky}.
163: The $\kappa$ is a shape parameter; $\kappa=1$ corresponds to an exponential
164: distribution, and, as $\kappa$ increases, the distribution approaches a normal
165: distribution.
166: The exponential distribution corresponds to a Poisson process in which the
167: firing rate (hazard function) is constant with time independent of the
168: previous firing time. The spike train looks random.
169: As $\kappa$ increases, the variance of the ISIs decreases, and the ISIs become
170: regular.
171: 
172: Our goal is to find an optimal measure for discriminating two spike trains
173: with different randomness independent of their mean firing rates.
174: A gamma distribution is suitable for that purpose.
175: First, we can control the mean firing rate and randomness independently by 
176: changing the two parameters ($\mu$ and $\kappa$) in the distribution.
177: Next, experimental data can be well fitted by the distribution.
178: For example, Baker et al. showed that the spike patterns recorded from 
179: primary and supplementary motor areas are explicable using a gamma
180: distribution \cite{baker}.
181: 
182: % \subsection{orthogonal coordinates of gamma distribution}
183: We can transform the parameters in a gamma distribution arbitrarily.
184: For example, we can transform the parameters into $(\alpha,\lambda)$:
185: \begin{equation}
186: \left\{ \begin{array}{c}
187: \alpha = \kappa,\\
188: \lambda = \frac{\kappa}{\mu}.
189: \end{array}\right.
190: \end{equation}
191: The gamma distribution in this coordinate can be written as
192: \begin{equation}
193: p(T) = \frac{\lambda^\alpha}{\Gamma(\alpha)}T^{\alpha-1}e^{-\lambda T},
194: \end{equation}
195: where $\lambda$ is a scale parameter.
196: The mean and variance of the ISIs can be written as functions of $\alpha$ 
197: and $\lambda$ as
198: \begin{equation}
199: \left\{ \begin{array}{c}
200: Ex(T)=\frac{\alpha}{\lambda},\\
201: Var(T)=\frac{\alpha}{\lambda^2}.
202: \end{array}\right.
203: \end{equation}
204: Thus, there are many ways of writing (parameterizing) a gamma distribution.
205: We used the expression shown as Eq.~(\ref{gamma}) because $\mu$ corresponds to
206: the mean ISI and $\kappa$ is orthogonal to it in the sense of information
207: geometry \cite{amari2,amari3}.
208: The proof is shown in APPENDIX \ref{appendixA}.
209: We call the parameters of a gamma distribution coordinates because we regard
210: the family of gamma distributions as manifold.
211: We would like to define randomness as information orthogonal to the firing
212: rate.
213: Therefore, we regard $\kappa$ as randomness in what follows.
214: We generate spike trains having different intrinsic randomness by using the
215: gamma distributions with different values of $\kappa$.
216: 
217: \section{\label{sec3}$L_V$ and $C_V$}
218: 
219: \begin{figure}[t]
220: \includegraphics[width=70mm]{TDGlv.ps}
221:  \caption{\label{TDGlv}$L_V$ for doubly stochastic gamma process with various
222: values of time constant $\tau$ and rate amplitude $\Delta$.}
223: \end{figure}
224: 
225: The measure of local variation proposed by Shinomoto et al. \cite{shinomoto}
226: is defined as
227: \begin{equation}
228: L_V = \frac{1}{n-1}\sum_{i=1}^{n-1} \frac{3(T_i-T_{i+1})^2}{(T_i+T_{i+1})^2},
229: \end{equation}
230: where $T_i$ denotes the i-th ISI in a spike train.
231: The coefficient ``3'' is multiplied so that $\overline{L_V}$ is 1 for a Poisson
232: process.
233: $L_V$ is large when consecutive ISIs differ.
234: It is dimensionless and invariant if all the ISIs are multiplied by a constant.
235: The conventional Cv is defined as \cite{holt}
236: \begin{equation}
237: C_V \equiv \frac{\sqrt{Var(T)}}{Ex(T)}.
238: \end{equation}
239: Next we examine the difference between $L_V$ and $C_V$ and calculate $L_V$ and
240: $C_V$ for the rate modulated gamma process.
241: 
242: We define a rate modulated gamma process as an extension of a gamma 
243: distribution where the firing rate, $\lambda(t)(=\frac{1}{\mu(t)})$, is 
244: time-dependent while $\kappa$ is time-independent.
245: The spikes for the rate modulated gamma process are generated as follows
246: \cite{abbott,brown}.
247: Note that we consider only the case of integer $\kappa$.
248: A spike is generated with probability $\lambda(t) dt$ for every small time 
249: step, dt.
250: To be precise, we generate a uniform random number and if it is less than
251: $\lambda(t) dt$, we generate a spike at that time step.
252: For the case where $\kappa$ is larger than 1, we keep every $\kappa$-th spike
253: and remove the others. What is left is the desired sequence.
254: In fact, for the case where $\lambda$ is constant over time, the spike
255: sequence generated in this way is equivalent to that generated from a renewal
256: gamma distribution with $\mu=\frac{1}{\lambda}$.
257: 
258: Here we consider a doubly stochastic gamma process whose firing rate obeys the
259: Ornstein-Uhlenbeck process \cite{shinomoto6}.
260: We assume the firing rate, $\lambda$, satisfies
261: \begin{equation}
262: \frac{d\lambda}{dt}=-\frac{\lambda-\lambda_0}{\tau}+\Delta\sqrt{\frac{2}{\tau}}\xi(t),
263: \end{equation}
264: where $\xi$ is Gaussian white noise, $<\xi(t)>=0$, and $<\xi(t),\xi(t')>=\delta(t-t')$.
265: 
266: \begin{figure}[t]
267: \includegraphics[width=70mm]{TDGcv.ps}
268:  \caption{\label{TDGcv}$C_V$ for doubly stochastic gamma process with various 
269: values of time constant $\tau$ and rate amplitude $\Delta$.}
270: \end{figure}
271: 
272: Fig.~\ref{TDGlv} and Fig.~\ref{TDGcv} show $L_V$ and $C_V$ with $\lambda_0=1$ 
273: for various values of time constant $\tau$ and rate amplitude $\Delta$.
274: For simplicity, we consider sufficiently long spike sequences and 
275: assume that the values of $L_V$ and $C_V$ converge.
276: Fig.~\ref{TDGlv} shows that in the limit of a large time constant, the values
277: of $L_V$ converge to the value for the stationary case.
278: This means that the value of $L_V$ does not depend on the amplitude of the 
279: firing rate and has one-to-one correspondence with $\kappa$ in this limit.
280: Fig.~\ref{TDGcv} shows that $C_V$ depends on both $\kappa$ and $\Delta$ and
281: does not have one-to-one correspondence with $\kappa$.
282: Therefore, $L_V$ is better than $C_V$ for discriminating the intrinsic 
283: randomness of spike sequences.
284: 
285: This attractive property seems to stem from the fact that $L_V$ is the sum of
286: the dimensionless terms of consecutive interspike intervals.
287: By ``dimensionless'' we mean that the numerator and denominator have the same
288: dimension.
289: Every term in $L_V$ is normalized locally by the average of two consecutive
290: interspike intervals instead of the global average.
291: Intuitively, because the firing rates for two consecutive interspike intervals
292: can be regarded as the same in the slow limit, terms should be the same as
293: those for the stationary case.
294: On the other hand, $C_V$ is the variance around the global mean of the ISIs
295: and can be large for both the case where the firing rate fluctuates 
296: significantly and the case where the intrinsic randomness is large.
297: Therefore, we cannot distinguish the two cases based on the value of $C_V$.
298: 
299: \section{\label{sec4}Measure of local variation}
300: We extend $L_V$ without losing its attractive property described in the
301: previous section and find a better measure of intrinsic randomness.
302: We do this by focusing on the ISI statistics and imposing three symmetry
303: conditions: (1) time translation invariance, (2) time-scale transformation
304: invariance, and (3) time inversion invariance.
305: 
306: We assume the randomness of a spike train is constant over time and 
307: define the extended $L_V$ as
308: \begin{equation}
309: \widetilde{L_V} = \frac{1}{n-1}\sum_{i=1}^{n-1} f(T_i,T_{i+1}),
310: \end{equation}
311: where $T_1,T_2,...T_n$ are the observed ISIs and $f(x,y)$ does not depend on 
312: $i$ explicitly.
313: This form guarantees invariance under time translation ($i\rightarrow i+1$)
314: if $n$ is infinite.
315: Next, we assume that $f$ is invariant under the time-scale 
316: transformation ( $T\rightarrow kT$).
317: This requires that the denominator and numerator of $f$ have the same
318: dimension.
319: For simplicity, we assume that the dimension is two, so $f$ can be written as
320: \begin{equation}
321: f(x,y) = \frac{c_1 x^2 + c_2 xy + c_3 y^2}{c_4 x^2 + c_5 xy + c_6 y^2},
322: \end{equation}
323: which includes the original $L_V$ as a specific case.
324: In addition, because we do not distinguish increases from decreases in the
325: firing rate in terms of randomness, we impose time inversion invariance and
326: require 
327: \begin{equation}
328: f(x,y) = f(y,x).
329: \end{equation}
330: Thus, $f$ can be written as
331: \begin{equation}
332: f(x,y) = \frac{c_1 x^2 + c_2 xy + c_1 y^2}{c_4 x^2 + c_5 xy + c_4 y^2}.
333: \end{equation}
334: Note that the absolute value of $L_V$ does not matter in discriminant
335: analysis, and we can add (or multiply by) a constant to $f$.
336: Then, without loss of generality, $f$ can be written as
337: \begin{equation}
338: f(x,y) = \frac{xy}{x^2 + c_5 xy + y^2}.
339: \end{equation}
340: In addition, we can rewrite the denominator using $c=c_5+2$:
341: \begin{equation}
342: f(x,y) = \frac{xy}{(x-y)^2 + c xy}.
343: \label{f}
344: \end{equation}
345: Because each term in the denominator is non-negative, the necessary and
346: sufficient condition that the denominator always be positive is $c>0$.
347: 
348: As a result, $\widetilde{L_V}$ can be written as
349: \begin{equation}
350: \widetilde{L_V}(c) = \frac{1}{n-1} \sum_{i=1}^{n-1} \frac{T_i T_{i+1}}{(T_i-T_{i+1})^2 + c T_i T_{i+1}}.
351: \end{equation}
352: Note that the original $L_V$ corresponds to the case of $c=4$.
353: In this way, the measures satisfying the symmetries have only one degree of
354: freedom and can be parametrized by a single parameter.
355: 
356: \begin{figure}[t]
357: \includegraphics[width=70mm]{TDGlv1.ps}
358:  \caption{\label{lv1}$\widetilde{L_V}(1)$ for doubly stochastic gamma process
359: with various values of time constant $\tau$ and rate amplitude $\Delta$.}
360: \end{figure}
361: 
362: The $\widetilde{L_V}$ should have one-to-one correspondence to $\kappa$
363: like $L_V$ because of its symmetries.
364: In fact, it has the same values as those for the stationary case in the
365: limit of a large time constant for the doubly stochastic gamma process.
366: Fig.~\ref{lv1} shows that $\widetilde{L_V}(1)$ is independent of the rate 
367: amplitude, $\Delta$, and is a function of $\kappa$ in the limit.
368: The results for other values of $c$, for instance $\widetilde{L_V}(16)$, 
369: remain the same.
370: 
371: Thus, $\widetilde{L_V}(c)$ has one-to-one correspondence with $\kappa$.
372: However, this is not sufficient to make it a good measure.
373: We previously have considered only spike sequences with infinite length.
374: However, in practical experimental situations, data sizes are limited, and
375: $\widetilde{L_V}(c)$ varies widely by trial around the mean.
376: Similarly, if spike sequences are generated using a gamma distribution,
377: $\widetilde{L_V}(c)$ varies by trial for the finite spike sequence.
378: In the discrimination of intrinsic randomness, roughly speaking, the smaller
379: the variance, the higher the hitrate.
380: Thus, we next search for an optimal value of parameter c, where the variance
381: is the smallest.
382: 
383: \section{\label{sec5}Mutual information maximization principle}
384: We use the mutual information maximization principle to determine an optimal
385: measure.
386: We assume that the firing rate is constant over time and spike sequences are 
387: generated by a gamma distribution, as shown in Sec.~\ref{sec2}.
388: As shown in Sec.~\ref{sec3}, $\widetilde{L_V}$ does not depend on $\mu$.
389: Here we set $\mu=1$.
390: We consider the stationary case because it is tractable and
391: can be regarded as the slow change limit of the firing rate.
392: We show in Sec.~\ref{TD} that the optimal value of $c$ for the nonstationary
393: case does not differ significantly from that for the stationary case.
394: 
395: The optimal parameter value is determined so as to maximize the mutual
396: information between the coefficients and randomness.
397: Here we assume that a spike train consists of 100 ISIs because this is the 
398: typical length available from laboratory experiments.
399: $\widetilde{L_V}$ can be computed for a spike train, and the value of 
400: $\widetilde{L_V}$ varies among spike trains.
401: Even if spike trains are generated from the same distribution, the values of 
402: $\widetilde{L_V}$ can differ because the length of a spike train is finite.
403: As a result, the distribution of $\widetilde{L_V}$ can be obtained for one 
404: parameter set of the gamma distribution.
405: Thus, two distributions can be obtained from two types of spike trains.
406: The mutual information can be computed from the two distributions.
407: The bigger the mutual information, the better randomness ($\kappa$) can be
408: discriminated based on the observed $\widetilde{L_V}$.
409: 
410: Mutual information is calculated as follows.
411: Spike trains are generated from two gamma distributions with equal probability,
412: $\frac{1}{2}$.
413: The two distribution have different $\kappa$.
414: All the ISIs in a spike train are generated by using the same distribution.
415: We denote the distribution of $\widetilde{L_V}$ generated from the 
416: $i(=1,2)$-th gamma distribution as $p(x|i)$;
417: $p(x)(=\frac{1}{2}p(x|1)+\frac{1}{2}p(x|2))$ represents the distribution of 
418: $\widetilde{L_V}$ with no distinction of the source.
419: The entropy is defined as
420: \begin{equation}
421: H = -\int p(x) \ln p(x) dx.
422: \end{equation}
423: The noise entropy is defined as
424: \begin{equation}
425: H_{n}=-\frac{1}{2}\int p(x|1)\ln p(x|1) dx -\frac{1}{2}\int p(x|2)\ln p(x|2)dx.
426: \end{equation}
427: The mutual information is the difference,
428: \begin{equation}
429: I_m = H - H_{n}.
430: \end{equation}
431: 
432: The mutual information is the reduction in uncertainty about the spike trains
433: due to the knowledge of $\widetilde{L_V}$.
434: Mutual information is $0$ if two distributions of $\widetilde{L_V}$ are 
435: identical so that they cannot be distinguished .
436: Mutual information is $1$ if two distributions of $\widetilde{L_V}$ have no 
437: overlap, and only one sample of $\widetilde{L_V}$ is needed to distinguish 
438: them.
439: 
440: In the next section, we will show the results of a Monte Carlo simulation.
441: We calculated mutual information as a function of $c$ for various sets of 
442: randomness, $\kappa_1$ and $\kappa_2$.
443: 
444: \begin{figure}[t]
445:   \includegraphics[width=70mm]{minfo1.ps}
446:   \caption{\label{minfo1}Mutual information with $\kappa_1=1,\kappa_2=1.1$. Open circle denotes peak. Dotted line is for $c=4$ corresponding to original $L_V$. Mutual information has a peak with $c$ larger than $4$.}
447: \end{figure}
448: 
449: \section{\label{sec6}Results}
450: Fig.~\ref{minfo1} shows the mutual information with $\kappa_1=1$ and 
451: $\kappa_2=1.1$; $\kappa_1$ and $\kappa_2$ are the shape parameters of two
452: gamma distributions and $c$ is the parameter in $\widetilde{L_V}(c)$.
453: We set the number of ISIs per spike train, n, to 100.
454: The mutual information has a peak, whose location we denote by $c_{peak}$.
455: The vertical line represents $c=4$, which corresponds to the original $L_V$.
456: Since $c_{peak} (\approx16)$ is bigger,
457: the optimal coefficient in this case is not the original $L_V$ but
458: $\widetilde{L_V}(16)$.
459: However, $c_{peak}$ depends on various parameters, and we will examine how it 
460: depends on the number of ISIs per spike train, $\kappa_1$ and $\kappa_2$, in
461: what follows.
462: We can use the maximum likelihood estimator of $\kappa$ as a measure instead 
463: of $L_V$, and the peak value of the mutual information for $\kappa$ is 0.097.
464: (For the maximum likelihood estimator, see Appendix \ref{appendixB}.)
465: The peak value for $L_V$ is about 0.066, which is smaller than that for the 
466: maximum likelihood estimator.
467: We nonetheless use $L_V$ because the maximum likelihood estimator cannot be
468: applied to the nonstationary case.
469: In the cases where the firing rate is time-dependent, the mutual information
470: for $L_V$ can be much higher than that for the maximum likelihood estimator,
471: as we will show in Sec.~\ref{TD}.
472: 
473: \begin{figure}[t]
474:   \includegraphics[width=70mm]{minfo-nspike.ps}
475:   \caption{\label{minfo-nspike}Mutual information for various numbers of ISIs
476: per spike sequence with $\kappa_1=1,\kappa_2=1.1$.
477: Open circles denote peaks.
478: Dotted line is for $c=4$ corresponding to original $L_V$.
479: Peak location almost does not depend on number of ISIs.}
480: \end{figure}
481: 
482: Fig.~\ref{minfo-nspike} shows the mutual information for various numbers of
483: ISIs per spike train.
484: While the mutual information increases with the number of ISIs,
485: the peak location remains almost the same.
486: Although we show only the case for $\kappa_1=1,\kappa_2=1.1$,
487: the other cases have similar results.
488: Therefore, we set the number of ISIs per spike train to $100$.
489: 
490: Fig.~\ref{minfo-0to1} shows the mutual information with $\kappa_1=1$ and 
491: various $\kappa_2$.
492: As $d\kappa(=\kappa_2-\kappa_1)$ increases, the mutual information approaches
493: $1$.
494: The peak location remains almost unchanged $(c_{peak}\approx16)$.
495: For $\kappa_2=3.2$, the mutual information is almost $1$, and 
496: the two distributions are completely distinguishable.
497: In general, $c_{peak}$ largely depends on $\kappa_1$ and is almost independent
498: of $\kappa_2$.
499: 
500: \begin{figure}[t]
501:   \includegraphics[width=70mm]{minfo-0to1.ps}
502:   \caption{\label{minfo-0to1}Mutual information for various $\kappa_2$ with $\kappa_1=1$.
503: Lines are for $\kappa_2=0.1, 0.2, 0.4, 0.8, 1.6$ and $3.2$ from below.
504: Open circles denote peaks.
505: Dotted line is for $c=4$ corresponding to original $L_V$.
506: Peak location almost does not depend on $\kappa_2$.}
507: \end{figure}
508: 
509: Fig.~\ref{minfo-peak} shows the mutual information with $\kappa_2=1.3\kappa_1$
510: and various $\kappa_1$.
511: The peak location decreases with increasing $\kappa_1$.
512: For $\kappa_1=16$, the original $L_V$ is nearly optimal
513: ($c_{peak}\approx4\sqrt{2}$).
514: Since reported experimental data can be well fitted by a gamma distribution
515: with $\kappa\approx16$ \cite{baker}, $L_V$ seems to be optimal not for the
516: Poisson data but for the experimental data.
517: 
518: \begin{figure}[t]
519:   \includegraphics[width=70mm]{minfo-peak.ps}
520:   \caption{\label{minfo-peak}Mutual information for various $\kappa_1$ with $\kappa_2=1.3\kappa_1$.
521: Open circles denote peaks.
522: Dotted line is for $c=4$ corresponding to original $L_V$.
523: Peak location decreases as $\kappa_1$ increases.}
524: \end{figure}
525: 
526: \section{\label{sec7}theoretical analysis}
527: In this section we analyze the property of the mutual information
528: theoretically.
529: For simplicity, we do two approximations.
530: 
531: First, we consider the limit of a large number of ISIs per spike train and 
532: approximate the distribution of $L_V$ by using the normal distribution.
533: Although this approximation is not good for $c\approx 0$,
534: the peak location is far larger than 0 and can be discussed within this 
535: approximation.
536: 
537: In addition, we consider the limit of small $d\kappa$.
538: In the limit, the mutual information can be written using the Fisher
539: information \cite{lehmann} as
540: \begin{equation}
541: I_m = \frac{1}{8}J(p(x,\kappa))d\kappa^2,
542: \label{eighth}
543: \end{equation}
544: where the Fisher information is defined as
545: \begin{equation}
546: J(p(x,\kappa))= Ex((\frac{d\log p(x,\kappa)}{d\kappa})^2).
547: \end{equation}
548: This relation can be easily derived.
549: We represent two $L_V$ distributions as
550: \begin{equation}
551: p_1(x)=\frac{1}{\sqrt{2\pi\sigma(\kappa)^2}}e^{-(x-m(\kappa))^2/2\sigma(\kappa)^2}
552: \end{equation}
553: and
554: \begin{equation}
555: p_2(x)=\frac{1}{\sqrt{2\pi\sigma(\kappa+d\kappa)^2}}e^{-(x-m(\kappa+d\kappa))^2/2\sigma(\kappa+d\kappa)^2} .
556: \end{equation}
557: Inserting these equations into the definition of the mutual information and 
558: expanding by $d\kappa$ to the second order lead to the relation.
559: 
560: The Fisher information can be explicitly written as
561: \begin{equation}
562: J=\frac{m'(\kappa)^2+2\sigma'(\kappa)^2}{\sigma(\kappa)^2}.
563: \end{equation}
564: Because $\sigma^2$ is inversely proportional to $N$, $\sigma$ can be written as
565: \begin{equation}
566: \sigma = \frac{\sigma_0}{\sqrt{N}}.
567: \end{equation}
568: The Fisher information can then be approximated as
569: \begin{eqnarray}
570: J/N &=& \frac{m'(\kappa)^2+2\frac{1}{N}\sigma_0'(\kappa)^2}{\frac{1}{N}\sigma_0(\kappa)^2}\frac{1}{N}\nonumber\\
571:   &\simeq& \frac{m'(\kappa)^2}{\sigma_0(\kappa)^2},
572: \end{eqnarray}
573: where $m'$ and $\sigma_0$ depend on only $\kappa$ and $c$.
574: As a result, the mutual information can be written as
575: \begin{equation}
576: I_m = \frac{1}{8} \frac{m'(\kappa,c)^2}{\sigma_0(\kappa,c)^2} N d\kappa^2.
577: \end{equation}
578: 
579: Thus, $I_m$ is proportional to $N$ and $d\kappa^2$.
580: The $c$ dependency of $I_m$ stems from only $m'$ and $\sigma_0$.
581: Therefore, when $N$ or $d\kappa$ changes, the absolute value of the mutual
582: information changes while the peak location does not change.
583: This is consistent with our numerical results in which the peak location
584: did not depend on $N$ and $d\kappa$.
585: The peak location can be explained by an interplay of $m'$ and $\sigma_0$.
586: However, $m$ and $\sigma_0$ cannot be predicted solely by our theory.
587: Numerical calculations are necessary for finding the peak location.
588: 
589: \section{\label{TD}nonstationary case}
590: We considered the discrimination of randomness for the stationary gamma
591: process in the previous sections.
592: However, it has been reported that experimental data can be explicable by 
593: the rate-modulated gamma process \cite{baker}.
594: Therefore, we consider the rate-modulated gamma process in this section.
595: We show two simple cases in which the firing rate decreases monotonically
596: or changes stepwise.
597: 
598: \subsection{monotonically decreasing firing rate}
599: 
600: \begin{figure}[t]
601:   \includegraphics[width=70mm]{td-4.ps}
602:   \caption{\label{td-4}Mutual information for monotonically decreasing firing rate for various $r$ with $\kappa_1=4$ and $\kappa_2=5.2$.}
603: \end{figure}
604: 
605: We consider a simple rate-modulated case and show that the peak location of 
606: the mutual information, $c_{peak}$, tends not to change if the firing rate
607: fluctuates significantly.
608: We generate the ISIs by again using a gamma distribution.
609: We assume that the mean ISI increases monotonically.
610: For simplicity, we set $\mu_i=r^i$, where $\mu_i$ denotes the mean of the
611: i-th ISI.
612: We simply align $n$ ISIs to make a single spike train as before.
613: The value of $\kappa$ does not change within the train.
614: The mutual information is calculated for two spike trains with different 
615: values of $\kappa$.
616: 
617: Fig.~\ref{td-4} shows the mutual information for $\kappa_1=4$ and
618: $\kappa_2=5.2$.
619: The peak location decreases gradually from the stationary value as r increases.
620: However, only extreme cases, in which the firing rates decrease more than 1.5
621: times one after another, are plotted.
622: For realistic cases, $c_{peak}$ changes only slightly.
623: For example, the ratio between the last and first mean ISI is
624: \begin{equation}
625: \frac{\mu_n}{\mu_1}=r^{n-1},
626: \end{equation}
627: and the ratio is 2.678033 for $r=1.01$ and $n=100$ and 12527.83 for
628: $r=1.1$ and $n=100$.
629: This illustrates that the $1.5$ used for r is extremely large.
630: Similar results were obtained for different values of $\kappa$, so
631: $c_{peak}$ apparently tends not to change even if the firing rate fluctuates.
632: This result is not restricted to the decreasing firing rate case.
633: For example, the mean $\widetilde{L_V}$ remains the same if a small and a 
634: large mean ISI appear alternately instead of the firing rate increasing
635: monotonically.
636: It thus appears that the peak location of the mutual information is almost
637: independent of the firing rate if the variation in the firing rate is small.
638: 
639: \subsection{stepwise changing firing rate}
640: % especially if the change of the firing rate is so slow that the firing rates for consecutive two ISIs are almost the same.
641: \begin{figure}[t]
642:   \includegraphics[width=70mm]{stairs.ps}
643:   \caption{\label{stairs}
644: Schematic diagram of stepwise increasing firing rate. Firing rate shifts from $1$ to $\lambda_2$ at $t=50$.}
645: \end{figure}
646: 
647: So far we have considered only $\widetilde{L_V}(c)$.
648: However, the maximum likelihood estimator, $\hat{\kappa}$, should be better
649: for the stationary case.
650: Here we consider the case of a stepwise changing firing rate to show why
651: we favor $\widetilde{L_V}(c)$ nonetheless.
652: In a word, $\hat{\kappa}$ is not good for the nonstationary case because it is
653: the maximum likelihood estimator for the stationary case, as shown in Appendix
654: \ref{appendixB}.
655: In principle, the firing rate at every small time bin can be estimated for the
656: nonstationary case.
657: However, doing so requires many spike sequences and the firing rate profile
658: must be the same for all the sequences.
659: Therefore, it is not practical for many realistic cases.
660: Instead we consider simple measures like $\widetilde{L_V}(c)$ and
661: $\hat{\kappa}$ even in the nonstationary case.
662: In this section, we compare $L_V$ and $\hat{\kappa}$ for the nonstationary
663: case.
664: 
665: Consider the case in which the firing rate is stepwise increasing, as shown in
666: the Fig.~\ref{stairs}.
667: At time $t=50$, it shifts from $1$ to $\lambda_2$.
668: Two types of spike trains, with $\kappa_1=16$ and $\kappa_2=20$, are generated
669: based on the firing rate profile.
670: Fig.~\ref{step-mle} shows the mutual information for these trains
671: when $L_V$ or $\hat{\kappa}$ is used as a measure.
672: The mutual information for $L_V$ is independent of $\lambda_2$ in the limit of
673: a large number of ISIs per train.
674: The reason is that $L_V$ is independent of the firing rate for the stationary
675: case and in this case the firing rate is constant over time except for the
676: discontinuous point.
677: The contribution of the term in $L_V$ that cross the discontinuous point is
678: $O(1/n)$ and is small if the number of ISIs is large enough.
679: We plotted the value for the stationary case, neglecting the contribution for
680: simplicity.
681: On the other hand, the mutual information for $\hat{\kappa}$ decreases as
682: $\lambda_2$ increases.
683: For example, when the firing rate increases 1.5 times, the mutual information 
684: for $L_V$ is larger than that for $\hat{\kappa}$.
685: 
686: \begin{figure}[t]
687:   \includegraphics[width=70mm]{step-mle.ps}
688:   \caption{\label{step-mle}Mutual information for stepwise increasing firing rate with $\kappa_1=16$ and $\kappa_2=20$.}
689: \end{figure}
690: 
691: Thus, for a stepwise increasing firing rate, $L_V$ is better than
692: $\hat{\kappa}$.
693: This type of sudden change can be observed when a visual stimulus is presented
694: to a monkey at a given time.
695: The result remains almost the same for the stepwise firing rate with multiple 
696: discontinuous points in the limit of a large number of ISIs.
697: In addition, $\hat{\kappa}$ depends on both $\kappa$ and the amplitude of the
698: firing rate, as shown in Sec.~\ref{sec2} for $C_V$.
699: Therefore, $L_V$ is a better measure of intrinsic randomness.
700: 
701: \section{summary and discussion}
702: In this study, we sought a measure more effective than the local variation of interspike 
703: intervals, $L_V$, in discriminating spike trains based on the degree of intrinsic spiking 
704: irregularity. 
705: 
706: We first compared characteristics of the conventional coefficient of variation, $C_V$, 
707: and the local variation, $L_V$. The coefficient of variation, $C_V$, measures a global 
708: variability of ISIs, and therefore depends on not only the local irregularity of ISIs but 
709: also the rate fluctuation, which would naturally manifest itself in \textit{in vivo} 
710: neuronal spiking conditions. In contrast, the local variation, $L_V$, measures only a 
711: stepwise variability of ISIs, and therefore does not depends significantly on a rate 
712: fluctuation. It was revealed that $L_V$ is superior to $C_V$ in detecting some intrinsic 
713: spiking irregularity specific to individual neurons \textit{in vivo} 
714: \cite{shinomoto,shinomoto7}.
715: 
716: For a spike train of a finite number of ISIs derived from a given point process, the value 
717: of $L_V$ as well as $C_V$ varies from trial to trial. The goodness of a 
718: coefficient is quantified by its narrow distribution of values among 
719: spike trains derived from the same point process and 
720: the small overlap of this distribution with the distribution obtained 
721: from spike trains derived from a different point process. In other words,
722: we sought a new coefficient that maximizes the mutual information between 
723: spike sequences created from different renewal gamma processes.
724: 
725: For this purpose, we adopted a rational expression of quadratic functions of 
726: consecutive interspike intervals that is the same form as $L_V$, and 
727: searched for the optimal parameter of the coefficient.  The optimal parameter of the 
728: coefficient depends on the choice of the point processes that are to be 
729: discriminated. It was found that the original $L_V$ is not optimal for near random 
730: (Poisson) point processes, but is optimal for more regular spike trains. In this way, if we 
731: have preliminary knowledge of the spiking irregularities of the point processes, 
732: we are able to propose a better coefficient than the original $L_V$ for the purpose of 
733: discriminating spike trains.
734: 
735: We generated spike sequences entirely by using a stationary or rate-modulated gamma process.
736: The reason is as follows.
737: The Poisson process, in which the firing rate is represented as a function of time from
738: stimulus onset, is widely used in spike data analysis \cite{richmond}.
739: However, the statistical properties of spike sequences cannot be fully
740: captured by the rate-modulated Poisson process \cite{berry,reich,keat,pillow}.
741: In other words, spike probability depends on the past spike times due to the 
742: so-called refractory period.
743: A gamma process is a Poisson process with an additional parameter representing
744: a kind of refractory period.
745: Baker et al. showed that the spike pattern recorded from primary and 
746: supplementary motor areas is explicable by a gamma process \cite{baker}.
747: 
748: We considered only mutual information as a measure for discriminating
749: two spike trains.
750: However, the Kullback-Leibler divergence $D(p_1,p_2)$ is a well-known measure
751: of the dissimilarity of two distributions, too.
752: It is also proportional to the Fisher information,
753: $D=\frac{1}{2}J(p(x,\kappa))d\kappa^2$, 
754: under the same approximation as described in Sec.~\ref{sec7}.
755: Note that the coefficient is $\frac{1}{2}$ instead of $\frac{1}{8}$, as seen in
756: Eq.~(\ref{eighth}), for the mutual information.
757: However the coefficient is irrelevant to the peak location.
758: Thus, the Kullback-Leibler divergence leads to the same results as mutual
759: information.
760: Nontheless, we used mutual information because it is symmetrical in terms of
761: two distributions.
762: The Kullback-Leibler divergence is not symmetrical.
763: Its value changes if the two distributions are interchanged.
764: The Kullback-Leibler divergence becomes symmetrical in the limit of a small
765: difference of two distributions, where it is proportional to the Fisher
766: information.
767: 
768: In previous studies, various measures were computed for mathematical models
769: \cite{lansky,feng,shinomoto3}.
770: However, the focus was only on the expectations for the measures.
771: In discrimination tasks, the variance of a measure is more important than the expectation.
772: For example, consider the case in which the expectations of a measure for two
773: different types of spike sequences differ considerably.
774: If the variances are very large, discriminating the two sequences is difficult.
775: In addition, if the definition of a measure is changed, for example,
776: multiplied or added to by a constant, the expectation changes, but the mutual
777: information never changes.
778: Therefore, in this paper we focused on the variance and searched for the
779: measure that maximizes the mutual information.
780: 
781: 
782: \appendix
783: \section{\label{appendixA}Orthogonal coordinates for gamma distribution}
784: We show that $\kappa$ and $\mu$ are orthogonal coordinates in the sense of 
785: information geometry.
786: The theory of information geometry is described elsewhere \cite{amari2,amari3},
787: and there are applications to neuroscience \cite{tatsuno,tatsuno2,nakahara}.
788: 
789: For the purpose of proving the orthogonality, it suffices to demonstrate that
790: the Fisher information matrix is diagonal.
791: The Fisher information matrix is defined as
792: \begin{equation}
793: g_{ij} = \int^{\infty}_{0}\frac{\partial \log p(T)}{\partial \xi^i}\frac{\partial \log p(T)}{\partial \xi^j}p(T)dT,
794: \end{equation}
795: where $\xi^1=\mu$ and $\xi^2=\kappa$.
796: The log-likelihood can be written as
797: \begin{equation}
798: \log p(T) = \kappa \log(\frac{\kappa}{\mu}) + (\kappa -1) \log T - \log \Gamma(\kappa) - \frac{T \kappa}{\mu}.
799: \end{equation}
800: The derivatives of the log-likelihood are
801: \begin{equation}
802: \frac{\partial \log p(T)}{\partial \mu} = - \frac{\kappa}{\mu} + \frac{T \kappa}{\mu ^2}
803: \end{equation}
804: and
805: \begin{equation}
806: \frac{\partial \log p(T)}{\partial \kappa} = \log \frac{\kappa}{\mu} +1 + \log T - \psi (\kappa) - \frac{T}{\mu},
807: \end{equation}
808: where $\psi(\kappa)=(\log\Gamma(\kappa))'$.
809: The matrix elements can be written as 
810: \begin{equation}
811: g_{\mu\mu}=\frac{\kappa}{\mu ^2},
812: \end{equation}
813: \begin{equation}
814: g_{\mu\kappa}=g_{\kappa\mu}=0,
815: \end{equation}
816: \begin{equation}
817: g_{\kappa\kappa}=\psi(\kappa)' - \frac{1}{\kappa}.
818: \end{equation}
819: Thus, the Fisher information matrix is diagonal at every point.
820: According to the theory of information geometry, it is always possible to 
821: choose orthogonal coordinates for an exponential family of distributions that
822: includes the gamma distribution as a specific case.
823: 
824: The Fisher information matrix has the meanings described below.
825: When $\mu$ and $\kappa$ are estimated from a finite number of samples,
826: the estimated values are not necessarily the same as the true value.
827: The value of the maximum likelihood estimator varies depending on the sample
828: sets, and its variation around the true value can be approximated by a normal 
829: distribution whose variance is the inverse of the Fisher matrix if the sample
830: size is sufficiently large \cite{lehmann}.
831: Thus, the diagonality of the Fisher matrix means that the variations in the 
832: maximum likelihood estimators of $\mu$ and $\kappa$ are uncorrelated.
833: %Note that $\hat{\mu}$ and the original $L_V$ are also uncorrelated.
834: 
835: \section{\label{appendixB}Maximum likelihood estimation for gamma distribution}
836: Let $T_1,T_2,...,T_n$ be observed ISIs.
837: We would like to estimate the true values of $\mu$ and $\kappa$ from them.
838: The log-likelihood is defined as
839: \begin{equation}
840: l \equiv \ln(p(T_1)p(T_2)...p(T_n))
841: \end{equation}
842: and can be written as
843: \begin{equation}
844: l = n \kappa \ln \frac{\kappa}{\mu} - n \Gamma(\kappa) + (\kappa -1) \sum \ln T_i - \frac{\kappa}{\mu} \sum T_i .
845: \end{equation}
846: The maximum likelihood estimators must satisfy both 
847: $\frac{\partial l}{\partial \mu} = 0$ and 
848: $\frac{\partial l}{\partial \kappa} = 0$.
849: The derivatives of the log-likelihood are
850: \begin{equation}
851: \frac{\partial  l}{\partial \mu} = \frac{\kappa}{\mu^2}\sum T_i - n \frac{\kappa}{\mu}
852: \end{equation}
853: and
854: \begin{equation}
855: \frac{\partial l}{\partial \kappa} = \sum \ln T_i - \frac{1}{\mu}\sum T_i + n \ln\frac{\kappa}{\mu} + n -n\psi(\kappa).
856: \end{equation}
857: Then, $\hat{\mu}$ can be explicitly obtained as
858: \begin{equation}
859: \hat{\mu} = \frac{1}{n}\sum T_i,
860: \end{equation}
861: and $\hat{\kappa}$ must satisfy
862: \begin{equation}
863: \frac{1}{n}\sum \ln T_i - \ln\frac{1}{n}\sum T_i = \psi(\hat{\kappa})-\ln\hat{\kappa},
864: \label{mle}
865: \end{equation}
866: where $\psi(\hat{\kappa})=(\log\Gamma(\hat{\kappa}))'$.
867: This equation cannot be solved explicitly for $\hat{\kappa}$.
868: However, the right side of the equation is a monotonic function of 
869: $\hat{\kappa}$, and we can obtain $\hat{\kappa}$ by numerical iteration.
870: 
871: Instead of a lengthy numerical iteration, we can use the moment estimator.
872: According to Eq.~(\ref{expectation}), we can estimate the true $\kappa$ from
873: the sample mean and variance:
874: \begin{equation}
875: \kappa=\frac{Ex(T)^2}{Var(T)}.
876: \label{cv}
877: \end{equation}
878: In fact, the right side of Eq.~(\ref{cv}) can be rewritten as 
879: $\frac{1}{C_V^2}$.
880: Thus, we can regard $C_V$ as a moment estimator.
881: However, the moment estimator is worse than the maximum likelihood estimator,
882: especially when $\kappa$ is close to $1$ \cite{cox}.
883: Nevertheless, it is good as a first approximation, and we can use it as the
884: initial value of the numerical iteration in maximum likelihood estimation.
885: 
886: Another way to avoid numerical iteration is to use the left side of
887: Eq.~(\ref{mle}) as a measure.
888: In discriminant analysis, we do not need to estimate $\kappa$ because 
889: the left side of Eq.~(\ref{mle}) has one-to-one correspondence with
890: $\hat{\kappa}$ and has the same information as $\hat{\kappa}$.
891: 
892: %\begin{acknowledgments}
893: %We are grateful to A, B, and C for discussion.
894: %The present work is supported by \dots.
895: %\end{acknowledgments}
896: 
897: %\newpage %Just because of unusual number of tables stacked at end
898: 
899: \bibliography{pre}% Produces the bibliography via BibTeX.
900: 
901: \end{document}
902: %
903: % ****** End of file apssamp.tex ******
904: 
905: 
906: 
907: 
908: 
909: 
910: 
911: 
912: 
913: 
914: 
915: 
916: 
917: 
918: 
919: 
920: 
921: 
922: