0002:cs0002006/cs0002006

1: \documentclass[a4,12pt]{article}

2: \usepackage{latexsym}

3: \oddsidemargin=0cm

4: \evensidemargin=0cm

5: \textwidth=16cm

6: \paperwidth=21cm

7: \textwidth=18.6cm

8: %\textheight=24.7cm

9: \oddsidemargin=-0.5in

10: \evensidemargin=-0.5in

11: %\topmargin=-0.6in

12: \usepackage{amsmath,amstext,amsfonts}

13: \def\bm#1{\mbox{\boldmath $#1$}}

14: \def\teigi{\stackrel{\rm def}{=}}

15: \def\hatena{\stackrel{\boldmath ?}{=}}

16: %\bibliographystyle{mybstfeb96}

17: %\bibliographystyle{mybst1996}

18: %\bibliographystyle{bstforNEu}

19: %\bibliographystyle{BSTforNEU}

20: %\bibliographystyle{apalike}

21: %\bibliographystyle{apahack}

22: %%

23: \makeatletter

24:   \renewcommand{\theequation}{%

25:      \thesection.\arabic{equation}}

26:   \@addtoreset{equation}{section}

27: \makeatother

28: \tolerance=6000

29:

30:

31:

32:

33: \title{Multiplicative Nonholonomic/Newton -like Algorithm  }

34: \author{Toshinao {\sc

35:     Akuzawa}\thanks{akuzawa@islab.brain.riken.go.jp}\vspace{0.3cm}\\

36: and\vspace{0.3cm}\\

37: Noboru {\sc Murata}

38: \vspace{0.5cm}\\

39: Brain Science Institute \\

40: {\it RIKEN}\\

41: %%{\small(The Institute of Physical and Chemical Research)}\\

42: {\small 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan}}

43: \date{{\it October 19, 1999}}

44:

45: \begin{document}

46: \maketitle

47: \abstract{We construct new algorithms  from scratch,

48: which use the  fourth order cumulant  of stochastic

49: variables for the cost function.

50: The

51: multiplicative updating rule

52: here constructed  is natural from the

53: homogeneous nature of the Lie group

54: and has numerous merits for

55: the   rigorous treatment of the

56: dynamics.

57: As one consequence, the second order convergence is shown.

58: For the cost function,

59: functions invariant under

60: the componentwise scaling

61: are choosen.

62: By

63: identifying

64: points  which can be transformed to each other by the scaling,

65:  we assume that the dynamics is in a coset space.

66: In our method, a point can move toward  any direction in this coset.

67: Thus,

68:  no  prewhitening is  required.

69:  }

70: \section{Introduction}

71: \label{intro}

72: Suppose that $N$-dimensional stochastic

73: variables $\{X_i|1\le i \le N\}$ are observed.

74: The independent component analysis (ICA) pursues a map

75:  $X \mapsto Y$, where each component of $Y$ becomes mutually independent.

76: In this letter  we restrict ourselves to

77:  the linear independent component analysis.

78: There

79: we want to find a linear transformation $C:{\bf X}=(X_1,\cdots,X_N)'\mapsto

80: {\bf Y}=(Y_1,\cdots,Y_N)'=C{\bf X}$ which

81:  minimizes some cost function that measures the independence.

82: Hereafter we  denote by the upper subscript $\prime$ the transposition and

83: by $\dagger$ the complex conjugate.

84:

85: There can be  many candidates for the cost function.

86: For example

87: the Kullback-Leibler information

88: is a good measure for the independence.

89: In this case

90: the problem is translated to

91:   the minimization of

92: $ -\sum_{i=1}^N\int dy_i P_i(y_i)\ln  P_i(y_i)$, where

93: $P_i$ is the probability density function of the $i$-th component.

94: It is obvious that we must evaluate $P_i$'s to find the optimal

95: solution. A robust estimation

96: of the probability density functions  is not an easy  task

97: and if it is possible it may be computationally expensive.

98:

99: An alternative idea is to make use of  the cumulant of the fourth

100: order, or the kurtosis\cite{hyvarinen1}, which we will adopt in this letter.

101: The fourth order cumulant vanishes for

102: the  normal distribution.  So, this cost function is robust under

103: the gaussian random noises.

104: We will construct algorithms where a matrix, which specifies the

105: linear transformation, is updated by the left-multiplication of a

106: matrix $D={\rm e}^{\Delta}$.

107: This expression implies that $D$ belongs to

108: $GL(N,{\boldmath R})$ (more accurately,

109: the component of $GL(N,{\boldmath R})$ connected to the unit element),

110: which ensures the

111: conservation of the rank.

112: The specification of $D$ by the coordinate $\Delta$

113: has many advantages

114: since it has a compatibility with   the homogeneous nature of the Lie group.

115:

116: There are variations for the form of the cost

117: function. We will show our definitions in the following two sections, which

118: are choosen to possess  invariance under componentwise scaling.

119: This invariance is crucial for

120: a rigorous treatment of the convergence properties.

121: Moreover, this invariance allows us to

122: identify

123: points  in $GL(N,{\boldmath R})$ which is transformed to each

124: other by the

125: scaling.

126: Then  we can legitimately restrict   the dynamics to a coset space

127: which is introduced by this identification.

128:

129: Under these settings, we determine $\Delta$ by using the Newton method

130: for the second order

131: expansion of the cost function with respect to $\{\Delta_{ij}\}$. It

132: is assumed

133: that the diagonal elements of $\Delta$ are zeros,

134: which does not impose any restrictions.

135: That is, a point can move toward  any direction in this coset by a

136: left-multiplication of ${\rm e}^{\Delta}$.

137: Thus

138: it is not necesarry for our  method to prewhiten the data.

139: It is also not required

140: that the

141: optimal solution is  the maximum or the minimum of the

142: cost function. Indeed,  the sole requirement is that

143: the optimal point  is a saddle point of the cost function

144: since our method

145: is in principle the Newton method.

146: These are great advantages of our method.

147:

148: %This property  is unique to our method  and

149: %that does not causes any serious problem if the starting point is

150: %close enough.

151:

152:

153:

154: Our strategy is as follows.

155: As an initial condition we set $C_0$.

156: For  $t>0~(t\in{\bf N}^{+})$,

157: we introduce an  $N\times N$ matrix $\Delta_t$ and

158: denote $C_{t}$ as  $C_{t}={\rm e}^{\Delta_{t}}C_{t-1}$.

159: Next, we evaluate the cost function at $C_{t}$

160: by using the expansion around $C_{t-1}$

161: with respect to the elements of

162: $\Delta_{t}$ up to the second order.

163: Then    $\Delta_t$ is choosen as a saddle point of

164: this second order

165: expansion.

166: We iteratively follow these procedures until we obtain a satisfactory

167: solution.

168:

169:

170: This letter is organized as follows.

171: In Section \ref{kurt1} the main part of our algorithm is  constructed,

172: where the cost function  is essentially identical to the sum of

173: kurtoses.

174: We adopt the square of the kurtoses for the cost function

175: in Section \ref{kurt2}.

176: Explicit expressions for the optimal

177: $\Delta$ (up to the second order)

178: are obtained both in Sections \ref{kurt1} and \ref{kurt2}.

179:  Section \ref{iteration} is a short section  where  we show how

180: each updating step is combined to obtain the optimal $C$.

181: In Section \ref{secconv} the convergence property of our algorithm is

182: discussed. Section \ref{disc} contains conclusions and discussions.

183: \section{Multiplicative update algorithm}

184: \label{kurt1}

185: \subsection{Expansion of the cost function }

186: Let us start by defining the cost function:

187: \begin{eqnarray}

188:   \label{eq:e1}

189: &&f(C,X)=\sum_i f_i(C,X)~,

190:   \end{eqnarray}

191: where $f_i$'s are the fourth order moments

192: of components

193: divided by the square of their variances,

194: \begin{eqnarray}

195:   \label{eq:e1.1}

196: &&  f_i(C,X)=\frac{E((CX)_i^4)}{E((CX)_i^2)^2}~.

197: \end{eqnarray}

198: In this letter we denote by $E(A)$ the expectation  of

199: $A$.

200: Obviously

201: the cost function $f$ coincides with the sum of kurtoses of all the components

202: up to  the constant.

203: We set $D={\rm e}^{\Delta}$ and

204:  expand $f(D,Y)$  %(\ref{eq:e1})

205:  in terms of the elements of $\Delta$.

206: %and $K={\rm e}^{-\Delta}-1$,

207: For example expansions term  by term are evaluated as follows:

208:  \begin{eqnarray}

209:   \label{eq:e2}

210: E((DY)_i^4)

211: &=&

212: E(Y_i^4)+4\sum_{p}(\Delta_{ip}+(\frac{\Delta^2}{2})_{ip})E(Y_i^3Y_p)

213: +6\sum_{p,q}\Delta_{ip}\Delta_{iq}E(Y_i^2Y_pY_q)+O(\Delta^3)~\nonumber\\

214: %\end{eqnarray}

215: %\begin{eqnarray}

216: %  \label{eq:3}

217: E((DY)_i^2)

218: &=&

219: E(Y_i^2)+2\sum_{p}(\Delta_{ip}+(\frac{\Delta^2}{2})_{ip})E(Y_iY_p)

220: +\sum_{p,q}\Delta_{ip}\Delta_{iq}E(Y_pY_q)+O(\Delta^3)~.

221: \end{eqnarray}

222: Hereafter we denote by

223:  $O(\Delta^k)$   polynomials of matrix elements of $\Delta$ which

224: does not contain terms with degrees less than $k$.

225: For  brevity's sake

226: we introduce the following notations:

227: \begin{eqnarray}

228:   \label{eq:e3.1}

229: &&  \sigma_i^{(k)}=|E(Y_i^k)|^{1/k}~,\\

230: &&  R^{(k)}_{pi}=\frac{E(Y_i^k Y_p)}{(\sigma^{(2)}_i)^{k+1}}~,\\

231: &&  U^{(k,i)}_{pq}=\frac{E(Y_i^kY_p Y_q)}{(\sigma^{(2)}_i)^{k+2}}~,

232: \end{eqnarray}

233: and

234: \begin{eqnarray}

235:   \label{eq:e3.2}

236:   && \kappa_i={(\sigma^{(4)}_i)^4}/{(\sigma^{(2)}_i)^4}~.

237: \end{eqnarray}

238: Using the quantities defined above we can  show that  the

239: cost function is expanded as

240: \begin{eqnarray}

241:   \label{eq:e4}

242:  f_i(D,Y)

243: &=&\bigg[

244: \kappa_i+4\big[(\Delta+\frac{\Delta^2}{2})R^{(3)}\big]_{ii}

245: +6\big[

246: \Delta U^{(2,i)}\Delta'

247: \big]_{ii}

248: +O(\Delta^3)

249: \bigg]\nonumber\\

250: &&~~\times

251: \bigg[

252: 1-4\big[(\Delta+\frac{\Delta^2}{2})R^{(1)}\big]_{ii}

253: -2\big[

254: \Delta U^{(0,i)}\Delta'

255: \big]_{ii}

256: +12\big[

257: \Delta R^{(1)}

258: \big]_{ii}^2

259: +O(\Delta^3)

260: \bigg]\nonumber\\

261: &=&\kappa_i - 4\big[(\Delta+\frac{\Delta^2}{2})(\kappa_i

262: R^{(1)}-R^{(3)})\big]_{ii}

263: +2\big[

264: \Delta (3U^{(2,i)}-\kappa_i  U^{(0,i)})\Delta'

265: \big]_{ii}\nonumber\\

266: &&~~

267: +12\kappa_i\big[

268: \Delta R^{(1)}

269: \big]_{ii}^2

270: -16\big[

271: \Delta R^{(1)}

272: \big]_{ii}\big[

273: \Delta R^{(3)}

274: \big]_{ii}+O(\Delta^3)~

275: \end{eqnarray}

276: by  straightforward calculations.

277: Next, we evaluate  partial derivatives of the cost function

278: by the matrix elements of $\Delta$.

279:  %We  need only terms up to $O(\Delta^2)$.

280:  Partially differentiating  (\ref{eq:e4}),

281: %It follows that the partial derivative of $f(C,Y)$ becomes

282: we get an expression,

283: \begin{eqnarray}

284:   \label{eq:e5}

285: &&  \frac{\partial f({\rm e}^{\Delta},Y)}{\partial \Delta_{kl}}=

286: -4\big[K-R^{(3)}\big]_{lk}

287: -2\big[(K-R^{(3)})\Delta+\Delta(K-R^{(3)})\big]_{lk}\nonumber\\

288: &&+4\big[

289:  (3U^{(2,k)}-\kappa_k  U^{(0,k)})\Delta'

290: \big]_{lk}

291: +24K_{lk}\big[\Delta R^{(1)}

292: \big]_{kk}

293: -16R^{(1)}_{lk}\big[\Delta R^{(3)}

294: \big]_{kk}

295: -16 R^{(3)}_{lk}\big[\Delta R^{(1)}

296: \big]_{kk}\nonumber\\

297: &&+O(\Delta^2)~,

298: \end{eqnarray}

299: where $K$ is an $N\times N$ matrix defined by

300: \begin{eqnarray}

301:   \label{eq:e5.9}

302: &&K_{pq}=\kappa_q  R^{(1)}_{pq}~.

303: \end{eqnarray}

304: We want to decide $\Delta$  for which

305:  the partial derivative

306: by  $\Delta_{kl}~(k\ne

307:  l)$

308: of the cost function

309:  vanish on condition that

310:  $\Delta_{ii}=0$ for $1\le i \le N$.

311: We neglect $O(\Delta^3)$ terms in the cost function.

312: Thus the  right-hand side of (\ref{eq:e5}) is

313: regarded as a polynomial of

314: % the elements of $\Delta$

315:  $\{\Delta_{kl}\}$

316: of at most first order and it is  always possible

317: in principle to

318:  determine $\Delta$ which satifies the above condition.

319: % for which

320: % (\ref{eq:e5}) vanishes.

321: It is, at the same time, not easy  to  describe  the problem

322: in a  form which is valid

323: for

324: arbitrary $N$.

325: In the following subsection we will introduce  a transparent and unified

326: method for handling the partial derivatives of $f$.

327: %Before  this subsection by

328: We leave this subsection by

329: introducing $N\times N$ matrices

330: \begin{eqnarray}

331:   \label{eq:e6}

332: &&  V^{(i)}=3U^{(2,i)}-\kappa_i  U^{(0,i)}~

333: \end{eqnarray}

334: and

335: \begin{eqnarray}

336:   \label{eq:e6.1}

337: % &&  Q=R^{(1)}-R^{(3)}~.

338:  &&  Q=K-R^{(3)}~

339: \end{eqnarray}

340: for later convenience.

341: \subsection{Expression by tensor product and determination of $\Delta$}

342: The expression (\ref{eq:e5}) is quite  complicated and   not

343: convenient for our purpose,

344: `` determine $\Delta$, where

345: all the partial derivatives  vanish''.

346: Fortunately by  mapping  the relations between elements of

347: $N\times N$ matrices  to those of   $N^2\times

348: N^2$ matrices, we can handle the problem transparently.

349: %,  the problem can be rewritten in a general form.

350: Some preparations

351: are needed.

352: First, let us introduce a map $\rm cs$:

353: \begin{eqnarray}

354:   \label{eq:a14}

355:   {\rm Mat}(N,{\boldmath F}) &\rightarrow& {\boldmath F}^{N^2}\nonumber\\

356: A=\left(

357:   \begin{array}{cccc}

358:  A_{11}& A_{12}&\cdots &A_{1N}\\

359: A_{21} &\multicolumn{3}{c}{\dotfill}\\

360: \multicolumn{4}{c}{\dotfill}\\

361: A_{N1} &\multicolumn{2}{c}{\dotfill}&A_{NN}

362:   \end{array}

363: \right) &\mapsto&

364: {\rm cs}(A)=

365: (A_{11}~ A_{21}~ \cdots~ A_{N1}~ A_{12}~ A_{22}~\cdots~ A_{NN})'~,\nonumber\\

366: \end{eqnarray}

367: where $\boldmath F$ is an unspecified  field.

368: We also introduce

369: two useful operators $T$ and $P$.

370: The ``intertwiner'' $T$ is  an $N^2\times N^2$ matrix

371: defined by

372: \begin{eqnarray}

373:   \label{eq:a15}

374:   {\rm cs}(A')=T{\rm cs}(A) ~\mbox{\rm for~} A\in  {\rm Mat}(N,{\boldmath F})~.

375: \end{eqnarray}

376: The projection  operator $P$,

377: \begin{eqnarray}

378:   \label{eq:a18}

379: P&=&{\rm diag}(p_1,\cdots,p_{N^2})~,\nonumber\\

380: &&\left\{

381: \begin{array}{ll}

382:  p_k=1 ~~~\mbox{\rm for}~~ k=N(i-1)+i,1\le i\le N~\\

383:  p_k=0~~~~ \mbox{\rm otherwise}~,

384: \end{array}

385: \right.

386: \end{eqnarray}

387:  is used to extract the ``diagonal''

388: elements of a matrix from its image by $\rm cs$.

389:

390: On this setting we can rewrite (\ref{eq:e5}) as

391: \begin{eqnarray}

392:   \label{eq:e7}

393:   \frac{\partial f({\rm e}^{\Delta},Y)}{\partial \Delta_{kl}}&=&

394: \bigg[ -4{\rm cs}(Q)

395: -2\big[I_N\otimes Q+T(I_N\otimes Q')T\big]{\rm cs}(\Delta)

396: +4

397: \big\{\bigoplus_{i=1}^N V^{(i)}\big\}

398: {\rm cs}(\Delta')

399: \nonumber\\&&

400: +

401: \bigg\{24(I_N \otimes K)P(I\otimes R^{(1)})'

402: -16 ( I_N \otimes R^{(1)})P(I\otimes R^{(3)})'\nonumber\\

403: &&-16 (I_N\otimes R^{(3)})P(I\otimes R^{(1)})'

404: \bigg\}

405: {\rm cs}(\Delta')

406: \bigg]_{l+N(k-1)}~,

407: \end{eqnarray}

408: where $I_N$ is the $N\times N$ unit matrix and

409: \begin{eqnarray}

410:   \label{eq:tiu1}

411: \bigoplus_{i=1}^N V^{(i)}=

412: \left(

413:   \begin{array}{lllll}

414: V^{(1)} & 0 & \multicolumn{2}{c}{\cdots\cdots} & 0\\

415: 0& V^{(2)} & 0 & \multicolumn{2}{c}{\cdots\cdots}\\

416:  \multicolumn{5}{c}{\dotfill}\\

417:  \multicolumn{5}{c}{\dotfill}\\

418:  0& \multicolumn{2}{c}{\cdots\cdots}& V^{(N-1)}& 0   \\

419: 0& 0& \multicolumn{2}{c}{\cdots\cdots}& V^{(N)}   \\

420:    \end{array}

421: \right)~.

422: \end{eqnarray}

423: %where $E_N$ is an $N\times N$ matrix of ones.

424: We make use of the following fact:\\

425: For $X\in {\rm Mat}(N,{\boldmath F})$

426: \begin{eqnarray}

427:   \label{eq:e8f}

428:   T(I_N\otimes X)T=X\otimes I_N~.

429: \end{eqnarray}

430: See  Appendix \ref{app:prf} for the proof of  (\ref{eq:e8f}).

431: Then (\ref{eq:e7}) becomes

432: \begin{eqnarray}

433:   \label{eq:e77}

434: &&  \frac{\partial f({\rm e}^{\Delta},Y)}{\partial \Delta_{kl}}=

435:  -4[{\rm cs}(Q)]_{l+N(k-1)}

436: +\big[

437: W

438: {\rm cs}(\Delta)

439: \big]_{l+N(k-1)}~,\nonumber\\

440: \end{eqnarray}

441: where

442: \begin{eqnarray}

443:   \label{eq:e8}

444: W&=&

445: -2\big(I_N\otimes Q+Q'\otimes I_N\big)

446: +4

447: \big\{\bigoplus_{i=1}^N V^{(i)}\big\}

448: T

449: +

450: \bigg[24(I_N\otimes K)P(I\otimes R^{(1)})'

451: \nonumber\\&&

452: -16 (I_N \otimes R^{(1)})P(I\otimes R^{(3)})'

453: -16 (I_N \otimes R^{(3)})P(I\otimes R^{(1)})'

454: \bigg]

455: T~.\nonumber\\

456: \end{eqnarray}

457: Now let us determine $\Delta$.

458: Remember that we are going along the spirit of the Newton method.

459: Thus we want to find $\Delta$ which satisfies

460: the  conditions

461: \begin{eqnarray}

462:   \label{eq:e10}

463:     \frac{\partial f({\rm e}^{\Delta},Y)}{\partial

464:     \Delta_{kl}}=0+O(\Delta^2)~~

465: \mbox{\rm for } 1\le k,l \le N,~k\ne l

466: \end{eqnarray}

467: and

468: \begin{eqnarray}

469:   \label{eq:e11}

470:   \Delta_{kk}=0 ~~\mbox{\rm for}~~ 1\le k\le N~.

471: \end{eqnarray}

472: The conditions (\ref{eq:e11}) make the problem rather complicated one.

473: Fortunately,

474: by using $P$

475: we can combine %%%transform

476: the conditions  (\ref{eq:e10}) and  (\ref{eq:e11}) into

477:   a matrix equation :

478: \begin{eqnarray}

479:   \label{eq:e19}

480: \Big[(I_{N^2}-P)

481: W(I_{N^2}-P)

482: +P

483: \Big]

484: {\rm cs}(\Delta)-4(I_{N^2}-P){\rm cs}(Q)=0~.

485: \end{eqnarray}

486: Immediately it follows that

487: \begin{eqnarray}

488:   \label{eq:e20}

489: {\rm cs}(\Delta)=4

490: \Big[(I_{N^2}-P)

491: W

492: (I_{N^2}-P)

493: +P

494: \Big]^{-1}

495: (I_{N^2}-P){\rm cs}(Q)~.

496: \end{eqnarray}

497: Thus we have obtained $\Delta$ which specify a saddle point of

498: the  expansion of

499: $f(C,Y)$ up to the second order.

500: Note that quantities in the right-hand side of (\ref{eq:e20}) are easily estimated

501: ones

502: from the

503: observed data.

504: So, an updating is determined by (\ref{eq:e20}) without any

505: ambiguities.

506:

507: \section{Case $\rm I\!I$:  square of kurtosis}

508: %~(kurtosis)${\bm{}^2}$}

509: \label{kurt2}

510: Obviously, points where kurtosis

511: vanishes do not play any special role  for

512: the cost function  $f$ in Section \ref{kurt1}. The optimal solution, however,

513: contains components with zero kurtoses

514: when the number of the sources is less than that of the observation channels.

515: Thus,

516: in this section  we treat with  a slightly different

517: % algorithm, where

518:   cost function, which  is the sum,

519: \begin{eqnarray}

520:   \label{eq:se1}

521: &&{\bm f}(C,X)=\sum_i {\bm f}_i(C,X)~,

522:   \end{eqnarray}

523: of the square of the kurtoses,

524: \begin{eqnarray}

525:   \label{eq:se1.1}

526: &&  {\bm f}_i(C,X)=\left[\frac{E((CX)_i^4)}{E((CX)_i^2)^2}-3\right]^2~.

527: \end{eqnarray}

528: %Computations needed for evaluating

529: As in the last section, we want to know the saddle point

530: $D={\rm  e}^{\Delta}$ of

531: the expansion of ${\bm

532:   f_i}(D,Y)$ in

533: terms of $\{\Delta_{ij}\}$ up to the second order.

534: We do not describe details of the calculations in this section,

535: which is

536:  carried out %accomplished

537: almost in the same way as in Section \ref{kurt1}.

538: First, the expansion of ${\bm

539:   f_i}(D,Y)$ is evaluated as

540: \begin{eqnarray}

541:   \label{eq:se4}

542:  {\bm f}_i(D,Y)

543: &=&(\kappa_i-3)^2 - 8\big[(\Delta+\frac{\Delta^2}{2})(

544: R^{(1)}\kappa_i-R^{(3)})\big]_{ii}(\kappa_i-3)\nonumber\\

545: &&+4\big[

546: \Delta (3U^{(2,i)}-\kappa_i  U^{(0,i)})\Delta'

547: \big]_{ii}(\kappa_i-3)

548: +16\big[

549: \Delta (R^{(1)}\kappa_i-R^{(3)})

550: \big]_{ii}^2

551: \nonumber\\

552: &&

553: +24(\kappa_i-3)\kappa_i\big[

554: \Delta R^{(1)}

555: \big]_{ii}^2

556: -32(\kappa_i-3)\big[

557: \Delta R^{(1)}

558: \big]_{ii}\big[

559: \Delta R^{(3)}

560: \big]_{ii}+O(\Delta^3)~.

561: \end{eqnarray}

562: Next, we introduce  $N\times N$ matrices $\bm K$, $\{{\bm

563:   V}^{(i)}|1\le i\le N\}$,

564: $\bm S$, and $\bm Q$

565: defined respectively by

566: \begin{eqnarray}

567:   \label{eq:se5.9}

568: &&{\bm K}_{pq}=  2R^{(1)}_{pq}(\kappa_q-3)\kappa_q~,

569: \end{eqnarray}

570: \begin{eqnarray}

571:   \label{eq:se6}

572: &&  {\bm V}^{(i)}=2(\kappa_i-3)(3U^{(2,i)}-\kappa_i  U^{(0,i)})~,\\

573: \end{eqnarray}

574: \begin{eqnarray}

575:   \label{eq:se6.001}

576:   {\bm S}={\rm diag}(2(\kappa_i-3))~,

577: \end{eqnarray}

578: and

579: \begin{eqnarray}

580:   \label{eq:se6.1}

581:  && {\bm Q}_{pq}=2(\kappa_q-3)(R^{(1)}_{pq}\kappa_q-R^{(3)}_{pq})~.

582: \end{eqnarray}

583: We also rewrite $Q$ in (\ref{eq:e6.1}) by $\bm q$ in order to avoid confusions:

584: \begin{eqnarray}

585:   \label{eq:se6.2}

586:  && {\bm q}_{pq}=(R^{(1)}_{pq}\kappa_q-R^{(3)}_{pq})~.

587: \end{eqnarray}

588: Now

589: we proceed to the expression by using the tensor product.

590: We can show  that the gradients of the cost function have the

591: following expression:

592: \begin{eqnarray}

593:   \label{eq:se77}

594: &&  \frac{\partial {\bm f}({\rm e}^{\Delta},Y)}{\partial \Delta_{kl}}=

595:  -4[{\rm cs}({\bm Q})]_{l+N(k-1)}

596: +\big[

597: {\bm W}

598: {\rm cs}(\Delta)

599: \big]_{l+N(k-1)}+O(\Delta^2)~,\nonumber\\

600: \end{eqnarray}

601: where

602: \begin{eqnarray}

603:   \label{eq:se8}

604: {\bm W}&=&

605: -2\big(I_N\otimes {\bm Q}+{\bm Q'}\otimes I_N\big)

606: +4

607: \big\{\bigoplus_{i=1}^N {\bm V}^{(i)}\big\}

608: T

609: +

610: \bigg[24( I_N\otimes {\bm K})P(I\otimes R^{(1)})'

611: \nonumber\\&&

612: +32( I_N\otimes {\bm q})P(I_N\otimes {\bm q})'

613: -16 ( I_N\otimes R^{(1)}{\bm S})P(I\otimes R^{(3)})'

614: \nonumber\\&&

615: -16 ( I_N\otimes R^{(3)}{\bm S})P(I\otimes R^{(1)})'

616: \bigg]

617: T~.

618: \end{eqnarray}

619: This is a  completely analogous expression  to  (\ref{eq:e77}).

620: Thus  the coordinate $\Delta$ of the saddle point of the second order

621: expansion

622: is determined by

623: \begin{eqnarray}

624:   \label{eq:se20}

625: {\rm cs}(\Delta)=4

626: \Big[(I_{N^2}-P)

627: {\bm W}

628: (I_{N^2}-P)

629: +P

630: \Big]^{-1}

631: (I_{N^2}-P){\rm cs}({\bm Q})~.

632: \end{eqnarray}

633: %In many cases we obtain almost the same results through the two

634: %cost functions in Section \ref{kurt1} and Section \ref{kurt2}.

635: %algorithms.

636: In many cases obtained through the two cost functions in Section

637: \ref{kurt1} and Section \ref{kurt2}  are almost the same results.

638: As  implied at the beginning of this section,

639: the main difference between these two lies in the points where the kurtosis of

640: one of the components vanishes.

641: These point indeed constitue saddle points of

642:  the   cost function

643: $\boldmath f$, while  it is impossible to  capture them by the

644: algorithm in Section \ref{kurt1}.

645: Thus, we must choose an appropriate method for individual problems

646: having this differnce in mind.

647: %This  will be

648: %revisited  in Section

649: %{\ref{disc}}.

650:

651:

652: \section{Iteration of updating}

653: \label{iteration}

654: Now we have obtained the updating rules. It is not necessary to tune the

655: learning rate. Apparently, (\ref{eq:e19})

656: and (\ref{eq:se20})

657: look complicated.

658: They are, however, easily implemented by the numerical tools like MatLab.

659: (The source codes will be available from our Web-site. )

660: Starting from $C_0$,

661: $C_i$ for positive $i$ is determined by the left multiplication by

662: ${\rm e}^{\Delta_i}$, where

663: $\Delta$ is determined by setting $Y=C_{i-1}X$,

664: i.e,

665: \begin{eqnarray}

666:   \label{eq:b1}

667:   C_t={\rm e}^{\Delta_{t}}{\rm e}^{\Delta_{t-1}}{\rm e}^{\Delta_{t-2}}\cdots{\rm e}^{\Delta_{1}}C_0~.

668: \end{eqnarray}

669: If $\Delta$ becomes saficiently small, we can stop the iteration and exit the

670: process.

671:

672: \section{Second order convergence}

673: \label{secconv}

674: First, we will take over  the notations in Section \ref{kurt1}.

675: The following discussion  is, however, valid for the algorithm in Section

676: \ref{kurt2} if we  substitute the quantities  $f$,  $W$, and so on by

677: their boldface counterparts.

678: Let us  start this section by introducing some additional notations.

679: We set

680: \begin{eqnarray}

681:   \label{eq:pr1}

682:   G\in GL(N,{\boldmath R})

683: \end{eqnarray}

684: and

685: \begin{eqnarray}

686:   \label{eq:prd2}

687:   K\in GL(1,{\boldmath R})^{\oplus N}~.

688: \end{eqnarray}

689: We also define the coset space  $K\backslash G$ by

690: introducing  the equivalence relation

691: \begin{eqnarray}

692:   \label{eq:pr3}

693: g' g^{-1}\in K

694: \Longleftrightarrow

695:  g\sim g'

696: \end{eqnarray}

697: to $G$. That is, $K\backslash G\cong\{Kg|g\in G\}$.

698: Our method is

699: understood as

700: an orthodox adaptation of the Newton method to this

701: coset space $K\backslash G$.

702: Note that

703: the cost function $F(\cdot)\teigi f(\cdot,Y)$ on $G$

704: % defined by (\ref{eq:e1})

705: %and (\ref{eq:e1.1})

706:  satisfies the relation

707: \begin{eqnarray}

708:   \label{eq:pr4}

709: F(g)=F(Kg)~.

710: \end{eqnarray}

711: So $F$ is  naturally considered as a function on $K\backslash G$.

712: That is the reason of our choice for  the cost function.

713: Thus, the second-order convergence immediately follows if the

714: the correction to the  error with respect to the  coordinating

715: resulting from the  multiplicative nature is properly evaluated.

716:

717: At time $t$, a point $g$ on $K\backslash G$ is specified by

718: the coordinate $X^{(t)}(g) \in{\frak m}$ such that

719: \begin{eqnarray}

720:   \label{eq:prf101}

721:   {\rm e}^{X^{(t)}(g)}C_t\sim g~,

722: \end{eqnarray}

723: where $\frak m$ is the set of $N\times N$ matrices whose diagonal

724: elements are zeros.

725: Actually, this statement itself  is not a thing of course, for which the proof

726: will be given

727: elsewhere.

728: Define $F_t$, the representation of the cost function at $t$,   by

729: \begin{eqnarray}

730:   \label{eq:prf102}

731:   F_t(X)=F(  {\rm e}^{X}C_t)~.

732: \end{eqnarray}

733: Here we introduce an $(N^2-N)\times N^2$ matrix $\tilde P$ by

734: drawing out the $i+N(i-1)$-th raws from the unit $N^2\times N^2$

735: matrix where $i=N,N-1,\cdots, 2,1$.

736: We will denote by $\boldmath H^{(t)}$ the  Hessian,

737: \begin{eqnarray}

738:   \label{eq:prf102.11}

739:   {\boldmath H}^{(t)}_{kl}=\frac{\partial^2 F_t(X)}

740: {\partial ({\tilde P}{\rm cs}(X))_k\partial ({\tilde P}{\rm cs}(X))_l}

741: \end{eqnarray}

742: Note that if we set

743: \begin{eqnarray}

744:   \label{eq:prf103}

745: h_t(X)=\left.

746: T\bigg((I_{N^2}-P)

747: W(I_{N^2}-P)

748: +P\bigg)\right|_{C={\rm e}^X C_t}~,

749: \end{eqnarray}

750: the Hessian is written as

751: \begin{eqnarray}

752:   \label{eq:prf103.1}

753:   {\boldmath H}^{(t)}={\tilde P}h_t{\tilde P}' ~.

754: \end{eqnarray}

755: Suppose that at some neighborhood of the optimal solution $g_*$,

756: ${\boldmath H}^{(t)}(X)$

757: is Lipschitz continuous for some $t$:

758: \begin{eqnarray}

759:   \label{eq:prf104}

760:   ||{\boldmath H}^{(t)}(X)-{\boldmath H}^{(t)}(X')||\le L ||X-X'||~,

761: \end{eqnarray}

762: where $||A||$ is the norm of a  matrix $A$ as the Euclidian space,

763: \begin{eqnarray}

764:   \label{eq:norm1}

765:   ||A||^2={\rm tr}(AA^{\dagger})~.

766: \end{eqnarray}

767: We set

768: \begin{eqnarray}

769:   \label{eq:prf104.001}

770: \beta=||H^{(t)}(X^t(g_*))^{-1} ||  ~.

771: \end{eqnarray}

772: There exists a positive real number $r$,

773:  for which

774: % neighborhood of $g_*$,

775: \begin{eqnarray}

776:   \label{eq:prf104.002}

777:   ||H^{(t)}(X^t(g))^{-1} || <2\beta~~\mbox{\rm for}~

778: \forall g\in B^{(t)}(g_*,r)\teigi\bigg\{g\bigg|r> ||X^t(g)-X^t(g_*)||~\bigg\}

779: \end{eqnarray}

780:  is satisfied.

781: Then

782: it is known that

783: for all $g\in B(g_*,{\rm min}(r,(2\beta L)^{-1}))$,

784: \begin{eqnarray}

785:   \label{eq:prf104.003}

786:   ||X^t(C_{t+1})-X^t(g_*)||\le  \beta L ||X^t(C_{t})-X^t(g_*)||^2

787: \end{eqnarray}

788: and

789: \begin{eqnarray}

790:   \label{eq:prf104.004}

791:   ||X^t(C_{t+1})-X^t(g_*)||\le \frac{1}{2} ||X^t(C_{t})-X^t(g_*)||

792: \end{eqnarray}

793: are fulfilled. Thus the second order convergence in this norm  is shown.

794: Unfortunately, this norm is not invariant and is  unnatural.

795: (A natural   metric on $K\backslash G$

796:  is  one which is  invariant  under the parallel transformation,

797: %where the parallel transformation

798: which is induced by the action

799:  of  elements in $K\backslash G$

800:  from the right-hand side.) But, it suffices in practice.

801:

802:

803: \section{Discussions}

804: \label{disc}

805: \subsection{Nonholonomy?}

806: Our method is  related to the nonholonomic method

807: by

808: Amari, Chen, and Chichocki\cite{amari-chen-cichocki1}.

809: In essence our method is a Newton

810:  approach to the same problem, the optimization without prewhitening.

811: Let us   set

812: \begin{eqnarray}

813:   \label{eq:conc11}

814:   {\rm e}^{z} = {\rm e}^{x}{\rm e}^{y}

815: \end{eqnarray}

816: for $x,y\in {\frak gl}(N,{\boldmath R})$.

817: Then it is obvious that $z$ does not necessarily belongs to $\frak m$

818:  even if $x,y\in {\frak m}$(, that is,

819:  $z_{ii}$'s do not always  vanish

820: when $x_{ii}=y_{ii}=0$ for $1\le i\le N$).

821: This may be explained by using the concept of nonholonomy.

822:  The degree of freedom in each step, however, equals the dimension

823: of the space $K\backslash G$ in our setting. The nonholonomic nature

824: emerges when we go back to $G=GL(N,{\bm R})$ again.

825:

826: There exist several

827: studies\cite{takeuchi1,helgason1,helgason2,helgason3,akuzawa5} which

828: deal with

829: cosets

830: like $K\backslash G$  or the right coset $G/K$

831:  when $K$ is a maximal compact subgroup of $G$.

832: Unfortunately, what we are studying is the case where $K$ is not a

833: maximal compact subgroup of $G$.

834: So, for example

835: it is necessary to show

836:  whether the  coordinate (\ref{eq:prf101})  is justified or not.

837: As mentioned above, further studies including this justification

838:  will  appear  elsewhere.

839:

840: \subsection{Global convergence}

841: % On the other hand,

842: We should carefully treat

843:  first few  steps since this method also has

844: a somewhat undesirable global convergent property  inherent in

845: the  Newton method. Fortunately enough,

846: there exist methods which can

847: handle the earlier stage. For example, the nonholonomic gradient

848:  method\cite{amari-chen-cichocki1}

849: may be applicable.

850: Another posiibility is to construct a nonholonomic fixed-point

851:  algorithm which uses the kernel method.

852: These methods are suitable for  capturing the optimal point which

853:  contains components with zero kurtoses. There

854:  we must, of course,     use the method in Section \ref{kurt2}.

855: If it is not necessary to worry about these zero kurtosis components,

856: there is little difference between the two methods described in

857: Section \ref{kurt1} and Section \ref{kurt2}.

858:

859: \subsection{Conclusions}

860: We have constructed a new  algorithm for finding a optimal point in a

861: matrix space, where we have  introduced a new

862: multiplicative updating method.

863: %does not

864: %

865: %requie

866: %prewhitening.

867: The algorithm is in essence the Newton method on a

868: coset.

869: So it converges quite rapidly and it can capture the saddle point.

870: Since it does not require prewhitening,

871: it is not necessary to worry about the error resulting from the

872: prewhitening.

873: Indeed, it is  possible to increase

874: the kurtosis slightly  for data preprocessed by

875: the FastICA\cite{fastica1}.

876:

877:

878: \begin{thebibliography}{8}

879:

880: \bibitem[A.Hyv\"arinen,1997]{hyvarinen1}

881: A.Hyv\"arinen (1997).

882: \newblock A Fast Fixed-Point Algorithm for Independent Component Analysis.

883: \newblock {\em Neural Computation\/}, {\em 9\/}, 1483--1492.

884:

885: \bibitem[Amari {\em et~al.\/},1997]{amari-chen-cichocki1}

886: Amari, S., Chen, T.-P., \& Cichocki, A. (1997).

887: \newblock Non-holonomic Constraints in Learning Algorithms for Blind Source

888:   Separation.

889: \newblock {\em preprint\/}.

890:

891: \bibitem[Hurri {\em et~al.\/},1998]{fastica1}

892: Hurri, J., G\"avert, H., S\"alel\"a, J., \& Hyv\"arinen, A. (1998).

893: \newblock FastICA package for MATLAB.

894: \newblock http://www.cis.hut.fi/projects/ica/fastica/.

895:

896: \bibitem[M.Takeuchi,1994]{takeuchi1}

897: M.Takeuchi (1994).

898: \newblock {\em Modern Spherical Functions\/}.

899: \newblock Amer. Math. Soc.

900:

901: \bibitem[S.Helgason,1962]{helgason2}

902: S.Helgason (1962).

903: \newblock {\em Differential Geometry and Symmetric Spaces\/}.

904: \newblock Academic Press.

905:

906: \bibitem[S.Helgason,1978]{helgason1}

907: S.Helgason (1978).

908: \newblock {\em Differential Geometry, Lie Groups and Symmetric Spaces\/}.

909: \newblock New York: Academic Press.

910:

911: \bibitem[S.Helgason,1984]{helgason3}

912: S.Helgason (1984).

913: \newblock {\em Groups and Geometric Analysis\/}.

914: \newblock Academic Press.

915:

916: \bibitem[T.Akuzawa \& M.Wadati,1998]{akuzawa5}

917: T.Akuzawa \& M.Wadati (1998).

918: \newblock Diffusions on symmetric spaces of type A${\rm I\!I\!I}$ and random

919:   matrix theories for rectangular matrices.

920: \newblock {\em J.Phys.A\/}, {\em 31\/}, 1713--1732.

921:

922: \end{thebibliography}

923:

924:

925: \appendix

926: \section*{appendix}

927: \section{proof  of (\ref{eq:e8f})}

928: \label{app:prf}

929: \begin{quote}

930: %{Proof}:

931: For $B\in GL(N,{\boldmath F})$ and $1\le i,j\le N$,

932: \begin{eqnarray}

933:   \label{eq:proof1}

934: [  T(X\otimes Y)T {\rm cs}(B)]_{i+N(j-1)}

935: &=&[  (X\otimes Y)T {\rm cs}(B)]_{j+N(i-1)}\nonumber\\

936: &&=X_{ip}Y_{jq}(B')_{qp}

937: =(YB'X')_{ji}~.

938: \end{eqnarray}

939: On the other hand

940: \begin{eqnarray}

941:   \label{eq:proof12}

942: [  (Y\otimes X) {\rm cs}(B)]_{i+N(j-1)}

943: &&=Y_{jp}X_{iq}B_{qp}

944: =(YB'X')_{ji}~.

945: \end{eqnarray}

946: This proves the statement since $\rm cs$  is bijective. $\Box$

947: \end{quote}

948:

949:

950:

951:

952:

953:

954:

955:

956: \end{document}

957:

958:

959:

960:

961:

962:

963:

964:

965: