1: \documentclass[a4,12pt]{article}
2: \usepackage{latexsym}
3: %\usepackage[light]{draftcopy}
4: \usepackage{epsfig}
5: %\usepackage{bookman}
6: %\usepackage{graphicx}
7: \oddsidemargin=0cm
8: \evensidemargin=0cm
9: \textwidth=16cm
10: \paperwidth=21cm
11: \textwidth=18.6cm
12: %\textheight=24.7cm
13: \oddsidemargin=-0.5in
14: \evensidemargin=-0.5in
15: %\topmargin=-0.6in
16: \usepackage{amsmath,amstext,amsfonts}
17: \def\bm#1{\mbox{\boldmath $#1$}}
18: \def\teigi{\stackrel{\rm def}{=}}
19: \def\hatena{\stackrel{\boldmath ?}{=}}
20: %%
21: \makeatletter
22: \renewcommand{\theequation}{%
23: \thesection.\arabic{equation}}
24: \@addtoreset{equation}{section}
25: \makeatother
26: \tolerance=6000
27:
28:
29:
30:
31: \title{Multiplicative Algorithm for Orthgonal Groups \\and
32: Independent Component Analysis }
33: \author{Toshinao {\sc
34: Akuzawa}\thanks{akuzawa@brain.riken.go.jp}\vspace{0.3cm}\\
35: \vspace{0.5cm}\\
36: Brain Science Institute \\
37: {\it RIKEN}\\
38: %%{\small(The Institute of Physical and Chemical Research)}\\
39: {\small 2-1 Hirosawa, Wako, Saitama 351-0198, Japan}}
40: \date{{\it \today}}
41:
42: \begin{document}
43: \maketitle
44: \abstract{
45: The multiplicative Newton-like method developed by the author\:{\it et\:al.}
46: is extended to
47: the situation where the dynamics is restricted to
48: the orthogonal group.
49: A general framework is
50: constructed without specifying the cost function.
51: Though the restriction to the orthogonal groups makes the problem
52: somewhat complicated,
53: an explicit expression for the amount of individual jumps is obtained.
54: This algorithm is exactly second-order-convergent.
55: The global instability inherent in the Newton method is remedied by
56: a Levenberg-Marquardt-type variation.
57: The method thus constructed can readily be applied to the independent
58: component analysis.
59: Its remarkable performance is illustrated by a
60: numerical simulation.
61:
62:
63: % In the case of the independent component analysis the restriction
64: % corresponds to the prewhitening of the data.
65: }
66: \section{Overview}
67: \label{intro}
68: Many optimization problems take the form,
69: ``Find an optimal matrix under the constraints (1).. (2).. {\it etc}."
70: Some of these can be considered as optimizations on Lie groups.
71: For groups, the fundamental manipulation
72: is a multiplication whereas an addition is unnatural.
73: %(Imagine the compound interest rate on your bank account.)
74: In consideration of this fact,
75: we have constructed a multiplicative Newton-like algorithm
76: for maximizing the kurtosis (a good barometer for the independence) in
77: \cite{akuzawa8}. There the dynamics takes place on the coset
78: $GL(1,{\Bbb R})^{N}\backslash GL(N,{\Bbb R})$.
79: We can apply the techniques
80: developed in \cite{akuzawa8} to many other optimization problems.
81: The coset structure $GL(1,{\Bbb R})^{N}\backslash GL(N,{\Bbb R})$ is,
82: however,
83: characteristic of the independent component
84: analysis(ICA). It is understood
85: by the fact that the independence is nothing to do with the scaling.
86: The redundancy
87: resulting from the invariance of the model under the componentwise scaling
88: must be eliminated for a rigorous discussion and this redundancy
89: corresponds
90: to $GL(1,{\Bbb R})^{N}$.
91:
92: Another way to eliminate this redundancy is the
93: prewhitening.
94: The prewhitening is a linear transformation of the observed data
95: which maps
96: the covariance matrix to the unit matrix.
97: If we deal with prewhitened data, we can legitimately narrow
98: the sweeping range to the orthogonal group.
99: The aim of this letter is the construction of a multiplicative
100: algorithm
101: for the orthogonal groups.
102:
103:
104: The framework is as follows.
105: %Suppose that
106: $N$-dimensional prewhitened random variables
107: $\{X_i|1\le i \le N\}$ are available
108: and it is anticipated that their origins are
109: some unknown mutually independent components $\{Y_i^*|1\le i \le N\}$.
110: The goal of the ICA is the map
111: $\{X_i\} \mapsto \{Y_i^*\}$.
112: We restrict ourselves to
113: the linear independent component analysis.
114: There
115: we want to find a linear transformation $C^*:{X}=(X_1,\cdots,X_N)'\mapsto
116: { Y^*}=(Y_1^*,\cdots,Y_N^*)'=C^*{ X}$ which
117: minimizes some cost function that measures the independence.
118: Since we are assuming that the data is already prewhitened, the
119: covariance matrix of $X$ is the $N\times N$ unit matrix.
120: If we do not take into account errors in the prewhitening,
121: the optimal point $C^*$ must belong to $O(N)$.
122:
123:
124: Giving up the analytical solution,
125: we consider a sequence,
126: \begin{eqnarray}
127: \label{eq:intro1}
128: C(0),~ C{(1)},~ C{(2)},~ C{(3)},~\cdots\cdots~,
129: \end{eqnarray}
130: which converges to the optimal solution $C^*$.
131: The sequence $\{C(t)\}$
132: % which specifies the
133: %linear transformation
134: is generated by the left-multiplication of another sequence of
135: orthogonal
136: matrices $\{D(t)\}$.
137: Each $D(t)$ is specified by the coordinate
138: $\Delta(t)$ which satisfies $D(t)={\rm e}^{\Delta(t)}$.
139: We assume that $\Delta(t)$ is
140: an $N\times N$ skew-symmetric
141: matrix,
142: which implies that $D(t)$ belongs to
143: the identity component of $O(N)$.
144: In practice the procedure is as follows.
145: As an initial condition we set $C(0)$.
146: For $t>0~(t\in{\Bbb N}^{+})$,
147: we introduce %an $N\times N$ matrix
148: $\Delta(t)$ and
149: denote $C({t})$ as $C({t+1})={\rm e}^{\Delta({t})}C(t)$.
150: Under these settings, we determine $\Delta(t)$ by using the Newton method
151: %for the second order
152: %expansion of the cost function
153: with respect to
154: the matrix elements of
155: $\Delta(t)$. That is,
156: we evaluate the cost function at $C({t+1})$
157: by expanding it around $C({t})$
158: in terms of the elements of
159: $\Delta({t})$ up to the second order.
160: Then $\Delta(t)$ is choosen as the (unique) critical point of
161: this second order
162: expansion.
163: We iteratively follow these procedures until we obtain a satisfactory
164: solution.
165:
166: This letter is organized as follows.
167: In Section \ref{sec:mult}
168: we will give a complete description of
169: a new multiplicative updating method for the orthogonal groups.
170: This section is the main part of this letter. Since our formulation
171: does not depend on the details of the cost function
172: the method can be useful for many problems other than the ICA.
173: The performance of
174: our method including the second-order-convergence is discussed in
175: Section \ref{sec:per1}.
176: Section \ref{sec:appl} is a survey of possible applications of our
177: method.
178: The algorithm constructed in Section \ref{sec:mult}
179: is considered as a pure-Newton method on the orthogonal groups.
180: To achive the global convergence, we must modify the method. This is
181: accomplished in
182: Section \ref{sec:practice}. Section
183: \ref{sec:practice} also includes a numerical examination of
184: the performance of our
185: method. Section \ref{sec:summ} is a summary.
186:
187: \section{Multiplicative updating on $O(N)$}
188: \label{sec:mult}
189: We assume that the cost function $F$ takes the form,
190: \begin{eqnarray}
191: \label{eq:a1}
192: F(Y)=\sum_{i=1}^NE(f_i(Y_i))~,
193: \end{eqnarray}
194: where each $f_i:{\Bbb R}\rightarrow{\Bbb R}$ is an unspecified function.
195: Through this letter we denote by $E(\cdot)$ the expectation. % of $A$.
196: We will determine
197: the concrete procedures
198: %amount of each step
199: after the Newton manner.
200: First, we introduce
201: maps,
202: $R$ and $\{U_{i}(1\le i\le N)\}$'s, from
203: $N$-dimensional
204: dataset to $N \times N$ matrices
205: by
206: \begin{eqnarray}
207: \label{eq:a2}
208: [R(Y)]_{ki}=E\left(\frac{\partial f_i(Y_i)}{\partial Y_i}Y_k\right)
209: \end{eqnarray}
210: and
211: \begin{eqnarray}
212: \label{eq:a3}
213: [ U_{i}(Y)]_{kl}=U_{ikl}(Y)= E\left(\frac{\partial^2 f_i(Y_i)}{\partial
214: Y_i^2}Y_k Y_l\right)~.
215: \end{eqnarray}
216: The goal is the construction of a sequence
217: $\{Y(t)\}$ of the estimates of the independent components, which
218: converges to the optimal point $Y^*$.
219: %We suppose that
220: Within the framework of the linear analysis, we consider that
221: this sequence is derived from another sequence
222: $\{C(t)\}$ of the linear transformation by the relation
223: $Y(t)=C(t)X$,
224: where $X$ are the original data. Thus if we restate the problem,
225: the task is to
226: determine
227: a sequence $\{C(t)\}$.
228: We assume that
229: for each $t\in {\Bbb N}^{+}$
230: the estimates of the independent components at time $t$ and
231: and the estimates
232: at time $t+1$ are related by
233: \begin{eqnarray}
234: \label{eq:a4}
235: Y{(t+1)}=D{(t)}Y{(t)}~
236: \end{eqnarray}
237: or equivalently
238: \begin{eqnarray}
239: \label{eq:a4bb}
240: C{(t+1)}=D{(t)}C{(t)}~,
241: \end{eqnarray}
242: where $D{(t)}$ is some orthogonal matrix to be fixed.
243: Our method is characterized by this left-multiplicative updating rule.
244: As mentioned in the previous section,
245: we assume that
246: each $D(t)$ always belongs to the identity component of the
247: orthogonal group $O(N)$.
248: This assumption is reasonable, for example, if the original data $X$
249: are already prewhitened in the case of the ICA.
250: % we suppose that the original data $X$ are already prewhitened.
251: %In this case we can legitimately
252: Anyway, under this restriction
253: $D{(t)}$ is specified by an $N\times N$ anti-symmetric matrix $\Delta{(t)}$,
254: which satisfies
255: \begin{eqnarray}
256: \label{eq:a5}
257: \exp(\Delta{(t)})=D{(t)}~.
258: \end{eqnarray}
259: For brevity's sake we will omit the argument $(t)$ and denote $Y(t+1)$ by $Z$.
260: $F(Z)$ is expanded in terms of $\{\Delta_{ij}\}$ as
261: \begin{eqnarray}
262: \label{eq:a6}
263: F(Z)=F(Y)+{\rm tr}(\Delta R(Y))+{\rm
264: tr}\left(\frac{\Delta^2}{2}R(Y)\right)
265: +\frac{1}{2}\sum_{i,k,l}\Delta_{ik}\Delta_{il}U_{ikl}(Y)
266: +O(\Delta^3)~.
267: \end{eqnarray}
268: %By partially differentiating (\ref{eq:a6}),
269: Through the letter
270: we denote by $O(\Delta^k)$ polynomials of matrix elements of $\Delta$
271: which does not contain terms with degrees less than $k$. Do not
272: confuse this with the symbol for the orthogonal groups such as $O(N)$.
273: As in the usual Newton method,
274: we truncate the expansion (\ref{eq:a6}) at the second order with
275: respect to $\{\Delta_{ij}\}$.
276: Then $\Delta$ in this step is determined as the coordinate of the
277: critical point of this truncated expansion.
278: The partial derivative of (\ref{eq:a6}) is more convenient for the purpose.
279: It reads
280: \begin{eqnarray}
281: \label{eq:a7}
282: \frac{\partial F(Z)}{\partial \Delta_{kl}}
283: =R_{lk}+\frac{1}{2}\left[\Delta R+R\Delta
284: \right]_{lk}+\sum_p \Delta_{kp}U_{klp}+O(\Delta^2)~,
285: \end{eqnarray}
286: where we have omitted the argument $Y$ for $R$ and $U$.
287: Now let us introduce a map $\rm cs$ (the column string) as in the previous
288: article
289: \cite{akuzawa8}:
290: \begin{eqnarray}
291: \label{eq:a14}
292: {\rm Mat}(N,{\Bbb F}) &\rightarrow& {\Bbb F}^{N^2}\\
293: A=\left(
294: \begin{array}{cccc}
295: A_{11}& A_{12}&\cdots &A_{1N}\\
296: A_{21} &\multicolumn{3}{c}{\dotfill}\\
297: \multicolumn{4}{c}{\dotfill}\\
298: A_{N1} &\multicolumn{2}{c}{\dotfill}&A_{NN}
299: \end{array}
300: \right) &\mapsto&
301: {\rm cs}(A)=
302: (A_{11}~ A_{21}~ \cdots~ A_{N1}~~ A_{12}~ A_{22}~\cdots ~A_{NN})'~,\nonumber
303: \end{eqnarray}
304: where ${\rm Mat}(N,{\Bbb F})$ is
305: $N\times N$ matrices on some unspecified field $\Bbb F$.
306: We denote by the upper subscript $\prime$ the transposition and
307: by $\dagger$ the complex conjugate.
308: For the orthogonal groups it is rather simple to move to
309: the framework of the column string as compared to the case of
310: $GL(1,{\Bbb R})^N\backslash GL(N,{\Bbb R})$:
311: By neglecting $O(\Delta^2)$ terms,
312: the right-hand-side of (\ref{eq:a7}) is straightforwardly rewritten as
313: \begin{eqnarray}
314: \label{eq:a8}
315: R_{lk}&+&\frac{1}{2}\left[\Delta R+R\Delta
316: \right]_{lk}+\sum_p \Delta_{kp}U_{klp}\nonumber\\
317: &&=\left[{\rm cs}(R)+\frac{1}{2}\left(
318: R'\otimes I_N+I_N\otimes R
319: \right) {\rm cs}(\Delta)+
320: \big(\bigoplus_k U_k\big) T{\rm cs}(\Delta)\right]_{l+(k-1)N}~,
321: \end{eqnarray}
322: where
323: the symbol ``$\bigoplus$'' stands for the direct sum,
324: \begin{eqnarray}
325: \label{eq:tiu1}
326: \bigoplus_{k=1}^N U_{k}=
327: \left(
328: \begin{array}{lllll}
329: U_1 & 0 & \multicolumn{2}{c}{\cdots\cdots} & 0\\
330: 0& U_2 & 0 & \multicolumn{2}{c}{\cdots\cdots}\\
331: \multicolumn{5}{c}{\dotfill}\\
332: \multicolumn{5}{c}{\dotfill}\\
333: 0& \multicolumn{2}{c}{\cdots\cdots}& U_{N-1}& 0 \\
334: 0& 0& \multicolumn{2}{c}{\cdots\cdots}& U_{N} \\
335: \end{array}
336: \right)~,
337: \end{eqnarray}
338: $T$ is an $N^2\times N^2$ matrix
339: defined by
340: \begin{eqnarray}
341: \label{eq:a15}
342: {\rm cs}(A')=T{\rm cs}(A) ~\mbox{\rm for~} A\in {\rm Mat}(N,{\Bbb F})~,
343: \end{eqnarray}
344: and $I_N$ is the $N\times N$ unit matrix.
345: We denote the tensor product by $\otimes$ as usual.
346: The ``transposition'' $T$ is also considered as
347: an intertwiner between two equivalent representations:
348: \begin{eqnarray}
349: %\nonumber
350: T(A\otimes B)T=B\otimes A~.
351: \end{eqnarray}
352: The orthogonal group $O(N)$ has less degrees of freedom than the
353: general linear group.
354: The canonical basis of the Lie algebra, ${\frak o}(N)$, of $O(N)$ is
355: $N(N-1)/2$
356: anti-symmetric
357: matrices. We will introduce some operators which enable us to move to
358: the coordinates based on
359: the canonical basis on ${\frak o}(N)$.
360: In the first place, we introduce an $N^2\times N^2$ matrix $H$ by
361: \begin{eqnarray}
362: \label{eq:a9}
363: H=\sum_{i>j}H^{(i,j)}~,
364: \end{eqnarray}
365: where $H^{(i,j)}$ is a $\pi/4$ rotation between
366: the $j+N(i-1)$-th component and the $i+N(j-1)$-th component:
367: \begin{eqnarray}
368: \label{eq:a10}
369: H^{(i,j)}_{kl}= \left\{
370: \begin{array}{ccl}
371: \frac{1}{\sqrt{2}}~~~~&\mbox{\rm for}&k=j+N(i-1),~~l=j+M(i-1)\\
372: -\frac{1}{\sqrt{2}}~~~~&\mbox{\rm for}&k=j+N(i-1),~~l=i+M(j-1)\\
373: \frac{1}{\sqrt{2}}~~~~&\mbox{\rm for}&k=i+N(j-1),~~l=j+M(i-1)\\
374: \frac{1}{\sqrt{2}}~~~~&\mbox{\rm for}&k=i+N(j-1),~~l=i+M(j-1)\\
375: 0~~~~&&\mbox{\rm otherwise. }
376: \end{array}
377: \right.
378: \end{eqnarray}
379: The projection operator $P_D$,
380: \begin{eqnarray}
381: \label{eq:a18}
382: P_D&=&{\rm diag}(p_1,\cdots,p_{N^2})~,\nonumber\\
383: &&\left\{
384: \begin{array}{ll}
385: p_k=1 ~~~\mbox{\rm for}~~ k=N(i-1)+i,1\le i\le N~\\
386: p_k=0~~~~ \mbox{\rm otherwise}~,
387: \end{array}
388: \right.
389: \end{eqnarray}
390: is used to extract the diagonal
391: elements of a matrix from its image by $\rm cs$.
392: Then the coordinate transformation
393: is realized by a multiplication of
394: \begin{eqnarray}
395: \label{eq:a12.1}
396: H+P_D~
397: \end{eqnarray}
398: to column string vectors.
399: We need to introduce two more
400: projection operators $P_S$ and $P_A$ defined by
401: \begin{eqnarray}
402: \label{eq:a11}
403: P_S&=&{\rm diag}(p_1,p_2,\cdots,p_{N^2})\\
404: P_A&=&{\rm diag}(1-p_1,1-p_2,\cdots,1-p_{N^2})~,%\nonumber
405: \end{eqnarray}
406: where
407: \begin{eqnarray}
408: \label{eq:a12}
409: p_k=\left\{
410: \begin{array}{ccl}
411: 1&\mbox{\rm if}&{}^{\exists}(i,j);~~ j\le i~~ \mbox{\rm and}~~k=i+N(j-1)\\
412: 0&&\mbox{\rm otherwise}.
413: \end{array}
414: \right.
415: \end{eqnarray}
416: By the left-action of $P_S$ and $P_A$ to
417: column string vectors rotated by $H+P_D$
418: we can extract,
419: %The projection operators $P_S$ and $P_A$ are used to extract,
420: respectively,
421: the symmetric components
422: and the anti-symmetric components of the matrices.
423: Then the conditions
424: for the critical point of the second-order-expansion,
425: which must be satisfied by $\Delta$, are
426: translated into the following two conditions.
427: First, symmetric components of $\Delta$ must vanish.
428: This condition is expressed as
429: \begin{eqnarray}
430: \label{eq:11.91}
431: \left[(H+P_D){\rm cs}(\Delta)\right]_{j+(i-1)N}=0
432: \qquad\mbox{\rm for}\quad i\le j
433: \quad\bigg(\Longleftrightarrow
434: P_S(H+P_D){\rm cs}(\Delta) =0\bigg)~.
435: \end{eqnarray}
436: Secondly, for the anti-symmetric components
437: the condition for the critical point is transformed to
438: \begin{eqnarray}
439: \label{eq:a11.9}
440: \left[(H+P_D){\rm cs}(R)+(H+P_D)W
441: {\rm cs}(\Delta)
442: \right]_{j+(i-1)N}~=0 \qquad\mbox{\rm for}\quad i>j~,
443: \end{eqnarray}
444: where we have set
445: \begin{eqnarray}
446: \label{eq:a14b}
447: W=\frac{1}{2}\left(
448: R'\otimes I_N+I_N\otimes R
449: \right) +
450: \big(\bigoplus_k U_k\big) T~.
451: \end{eqnarray}
452: %Now
453: %one can see that
454: % (\ref{eq:a8})
455: The conditions (\ref{eq:11.91}) and (\ref{eq:a11.9}) are
456: combined into an equation,
457: \begin{eqnarray}
458: \label{eq:a13-1}
459: P_A(H+P_D){\rm cs}(R)+
460: \bigg[P_A (H+P_D) W (H+P_D)' P_A +P_S
461: \bigg](H+P_D) {\rm cs}(\Delta)
462: =0~.
463: \end{eqnarray}
464: Note that
465: \begin{eqnarray}
466: \label{eq:a12.2}
467: P_A(H+P_D)=P_AH~.
468: \end{eqnarray}
469: The optimal $\Delta$ is immediately obtained from (\ref{eq:a13-1}):
470: \begin{eqnarray}
471: \label{eq:a13}
472: {\rm cs}(\Delta)&=&
473: -(H+P_D)'\bigg[P_A (H+P_D) W (H+P_D)' P_A +P_S
474: \bigg]^{-1}P_A(H+P_D){\rm cs}(R)~\nonumber\\
475: &=&
476: -H'\left(P_A H W H' P_A +P_S
477: \right)^{-1}P_AH{\rm cs}(R)~.
478: \end{eqnarray}
479: Thus we have obtained the explicit updating rule.
480: By iterating the procedure in this section from a starting point
481: sufficiently close
482: to the
483: optimal one,
484: the sequences $\{C(t)\}$ and $\{Y(t)\}$ converge to
485: the optimal solutions.
486:
487: \section{Performance (theoretical aspects)}\label{sec:per1}
488: %Our method has very desirable convergence properties.
489: The second-order-convergence is one of the main advantages of this
490: method.
491: Indeed, this algorithm is rigorously second-order-convergent. The
492: proof can be given
493: almost in the same way as in \cite{akuzawa8}. So we omit the proof in
494: this letter.
495:
496: Sometimes we have to
497: deal with large matrices to apply the technique here constructed.
498: Let us examine the situation.
499: The $N^2\times N^2$ matrix $P_A HW H' P_A +P_S$ is
500: a direct sum of an $N(N-1)/2\times N(N-1)/2$ matrix
501: and an $N(N+1)/2\times N(N+1)/2$ unit matrix.
502: Within the $N(N-1)/2\times N(N-1)/2$
503: block
504: the number of non-zero off-diagonal elements is
505: no more than ${N(N-1)(N-2)}$.
506: So this is a very sparse matrix when $N$ becomes large.
507: Of course if $N$ becomes extremely large, our method requires quite large
508: memories. But due to the sparseness, it remains to be a
509: practical tool for problems with considerably large $N$.
510: \begin{figure}[htbp]
511: \begin{center}
512: \epsfig{file=sparse.eps, scale=0.3}
513: \caption{\small $N=10$. The black dots denote non-zero elements of $P_A H W H' P_A +P_S$. }
514: \end{center}
515: \end{figure}
516:
517: As is often the case with the Newton method, % \cite{akuzawa8}
518: the global convergence is not assured by this algorithm.
519: %So first few steps must be treated separately.
520: Fortunately it is possible to cure this fault.
521: We will show the prescription to the global instability in
522: Section \ref{sec:practice}.
523:
524: %it is not assured that this method
525: %converges globally.
526:
527:
528: \section{Applications to ICA}\label{sec:appl}
529: So far we have not specified the cost function beyond the assumption
530: that
531: the cost function is a sum of the form (\ref{eq:a1}).
532: Many of the cost functions for the independent component analysis
533: belong to this class.
534: \subsection{Kullback-Leibler information}
535: The Kullback-Leibler information,
536: \begin{eqnarray}
537: \label{eq:ka9}
538: \int \prod_{i=1}^Ndy_i P(y)\bigg\{\ln P(y)- \sum_{i=1}^N \ln
539: P_i(y_i)\bigg\}
540: ~,
541: \end{eqnarray}
542: is a good measure for the independence.
543: Here $P$ is the joint probability density function of $\{Y_i\}$ and
544: $P_i$ is the probability density function of the $i$-th component.
545: We have already restricted ourselves to the case where the jacobian of
546: the transformation equals one. Then
547: the minimization of the Kullback-Leibler information is equivalent to
548: the minimization of
549: \begin{eqnarray}
550: \label{eq:bb1.1}
551: -\int \prod_i dY_i P(Y)\sum_{i=1}^N\ln P_i(Y_i)
552: =\sum_{i=1}^N E(-\ln P_i(Y_i)) ~ .
553: \end{eqnarray}
554: Thus we can legitimately
555: transform
556: the Kullback-Leibler information
557: to a cost
558: function of the
559: form (\ref{eq:a1}), where we
560: should set $\{f_i\}$'s as
561: \begin{eqnarray}
562: \label{eq:bb1}
563: f_i(\cdot)= -\ln P_i(\cdot)~.
564: \end{eqnarray}
565: We must evaluate $\{P_i\}$'s, their derivatives, and so on to determine
566: the optimal
567: solution. A robust estimation
568: of these quantities is possibly not an easy task\cite{silverman1,cox1}.
569:
570: \subsection{Cumulant of fourth order}\label{subsec:cum}
571: The kurtosis of a random variable $A$ is defined by
572: \begin{eqnarray}
573: {\kappa(A)}
574: =\frac{E(A^4)}{(E(A^2))^2}-3~.
575: \end{eqnarray}
576: The kurtosis is related to the cumulant of the fourth order,
577: \begin{eqnarray}
578: % \nonumber
579: Cum^{(4)}(A)=E(A^4)-3(E(A^2))^2~,
580: \end{eqnarray}
581: by
582: \begin{eqnarray}%\nonumber
583: {\kappa(A)}=\frac{ Cum^{(4)}(A)}{(E(A^2))^2}~.
584: \end{eqnarray}
585: For prewhitened data the kurtosis equals the cumulant of the fourth
586: order.
587: As is well-known\cite{hyvarinen1,akuzawa8},
588: we can grab independent components in many cases
589: by seeking the maximum of the absolute values of the kurtoses. Our method
590: is applicable
591: by setting
592: \begin{eqnarray}
593: \label{eq:kur1}
594: f_i=-\kappa^2
595: \end{eqnarray}
596: for all $i$.
597: If it is known a priori that all the sources $\{Y_i^*\}$ have positive
598: kurtoses, we may use the kurtosis itself and set
599: \begin{eqnarray}
600: \label{eq:kur2}
601: f_i=-\kappa~.
602: \end{eqnarray}
603: For these cost functions, $R$, $\{U_i\}$, and other
604: quantities needed for determining each step are calculated easily
605: from the observed data.
606: Thus applying our method for this cost function is highly practical and
607: reasonable choice.
608: \section{Levenberg-Marquardt-type variation and performance in practice}
609: \label{sec:practice}
610: The pure-Newton updating rule (\ref{eq:a13}) has a
611: poor global convergence property.
612: This drawback is remedied by
613: the Levenberg-Marquardt-type variation\cite{numerical1}.
614: First, We modify (\ref{eq:a13})
615: as
616: \begin{eqnarray}
617: \label{eq:lev1}
618: {\rm cs}(\Delta)&=&
619: -H'\left(P_A H W H' P_A +P_S+\lambda I_{N^2}
620: \right)^{-1}P_AH{\rm cs}(R)~.
621: \end{eqnarray}
622: The initial value $\lambda_0$ for $\lambda$ is fixed at some positive value.
623: We also fix a real number $\alpha(>1)$.
624: (In the following example we set $\lambda_0=50$ and $\alpha=10$.)
625: Then the procedure at time $t$ is as follows:
626: \renewcommand{\labelenumi}{\roman{enumi})}
627: \begin{enumerate}
628: \item
629: Calculate $\Delta$ by (\ref{eq:lev1}).
630: \item
631: If $F({\rm e}^{\Delta}Y(t))$ is larger than $F(Y(t))$,
632: multiply $\lambda$
633: by $\alpha$ and go back to i).
634: \item
635: Otherwise,
636: multiply $\lambda$ by $1/\alpha$ and proceed to the next time step $t+1$.
637: \end{enumerate}
638: Other parts of the algorithm is completely the same as in the
639: pure-Newton version in Section \ref{sec:mult}.
640:
641:
642: Let us examine the real performance of
643: our method under this setting.
644: For the cost function we choose the kurtosis as in Subsection
645: \ref{subsec:cum}.
646: The source signals are three synthesizer-generated
647: wav files(Fig.\ref{fig:2}).
648: \begin{figure}[htbp]
649: \begin{center}
650: %\epsfig{file=sample.eps,width=15cm,height=3cm}
651: \epsfig{file=sample.eps,scale=0.4}
652: \caption{Sample data generated by a synthesizer (by courtesy of
653: N.Murata).}
654: \label{fig:2}
655: \end{center}
656: \end{figure}
657: Pseudo-observed data are generated by mixing the source by
658: a random matrix,
659: \begin{eqnarray}
660: \label{eq:tr1}
661: A=I_3+S,
662: \end{eqnarray}
663: where each element of $S$ is distributed uniformly on $(-1/2,1/2)$.
664: The residual crosstalk of the signals
665: demixed by our method
666: is
667: $1.29\%$ on average. It takes about $122$ seconds (CPU time) for one
668: hundred iteration of the same problem on
669: our workstation.
670: For reference, we have also solved the same demixing problem
671: by the FastICA\cite{fastica1}.
672: In this case the residual crosstalk
673: is
674: $1.36\%$ on average and it takes about $156$ seconds for
675: one hundred
676: iteration on
677: the same workstation.
678: Since the author's knowledge about the FastICA package is limited,
679: one should not take this result seriously.
680: It can, however, be said
681: that our method is quite good also in practice.
682:
683: \section{Summary}\label{sec:summ}
684: We have constructed a new algorithm for finding a
685: critical point
686: of broad classes of cost functions %defined
687: on the orthogonal groups. This method is second-order-convergent
688: since it is in essence the Newton method.
689: The method here constructed is an extension (or a restriction) of
690: the multiplicative updating method
691: developed in our
692: previous work\cite{akuzawa8}. The constraint for $\Delta$ from the nature
693: of the orthogonal groups makes the
694: problem a little complicated. We have, however, obtained a rigorous and
695: explicit updating rule.
696: We have also constructed
697: a Levenberg-Marquardt-type variation, which is suitable for
698: practical purpose.
699: The global instability inherent in the Newton method is remedied in
700: this version.
701: % the Kullback-Leibler information, the kurtosis, {\it etc.},
702: % suitable for
703: %the
704: %purpose.
705: Since our discussion does not depend on the
706: detail of the cost function,
707: this method is applicable to many concrete problems.
708: The relatively mild assumption (\ref{eq:a1}) on the form of the cost
709: function, however, implies that
710: our algorithm is especially
711: suitable for
712: the ICA.
713: %we can choose arbitrary functions for
714: %$\{f_i\}$.
715: % readily our method
716: %by
717: %prewhitening data.
718: %The potential of our method
719: Its practical utility for the ICA
720: have been illustrated here by a numerical simulation.
721:
722:
723: %Let us conclude the a
724: To summarize,
725: our algorithm has numerous theoretical virtues such as
726: the rigorous second order convergence, the explicit and strict formulation,
727: and so on.
728: %Moreover
729: It provides,
730: also in practice,
731: fast and powerful tools for the
732: ICA and many other problems.
733:
734:
735:
736: %Since it does not require prewhitening,
737:
738: \section*{Acknowledgments}
739: The author would like to thank Noboru Murata and Shun-ichi Amari for
740: valuable
741: discussions and comments.
742: %\bibliography{mybib}
743: \begin{thebibliography}{6}
744:
745: \bibitem[A.Hyv\"arinen,1997]{hyvarinen1}
746: A.Hyv\"arinen (1997).
747: \newblock A Fast Fixed-Point Algorithm for Independent Component Analysis.
748: \newblock {\em Neural Computation\/}, {\em 9\/}, 1483--1492.
749:
750: \bibitem[B.W.Sliverman,1986]{silverman1}
751: B.W.Sliverman (1986).
752: \newblock {\em Density Estimation for Statistics and Data Analysis\/}.
753: \newblock London: Chapman \& Hall.
754:
755: \bibitem[D.Cox,1985]{cox1}
756: D.Cox, D. (1985).
757: \newblock A Penalty Method for Nonparametric Estimation of the Logarithmic
758: Derivative of a Density Function.
759: \newblock {\em Ann.Inst.Statist.Math.\/}, {\em 37\/}, 271--288.
760:
761: \bibitem[Hurri {\em et~al.\/},1998]{fastica1}
762: Hurri, J., G\"avert, H., S\"alel\"a, J., \& Hyv\"arinen, A. (1998).
763: \newblock FastICA package for MATLAB.
764: \newblock http://www.cis.hut.fi/projects/ica/fastica/.
765:
766: \bibitem[T.Akuzawa \& N.Murata,1999]{akuzawa8}
767: T.Akuzawa \& N.Murata (1999).
768: \newblock Multiplicative Nonholonomic/Newton -like Algorithm.
769: \newblock {\em preprint \\(available from
770: http://www.islab.brain.riken.go.jp/\~{}akuzawa/)\/}.
771:
772: \bibitem[W.H.Press {\em et~al.\/},1988]{numerical1}
773: W.H.Press, B.P.Flannery, S.A.Teukolsky, \& W.T.Vetterling (1988).
774: \newblock {\em Numerical Recipes in C\/}.
775: \newblock Cambridge: Cambridge U.P.
776:
777: \end{thebibliography}
778:
779: \end{document}
780: