0610:math0610438/JFD.tex

1: \documentclass[a4paper,numberrefs]{icarnws}

2:

3: \usepackage[latin2]{inputenc}

4: \usepackage[T1]{fontenc}

5: \usepackage{amsmath}

6: \usepackage{amssymb}

7: \usepackage{graphicx}

8: \usepackage{xspace}

9: \usepackage[mathscr]{eucal}

10: \usepackage{url}

11: \usepackage{subfig}

12: \usepackage{times}

13: \usepackage{tabularx}

14: \usepackage{amsthm}

15:

16: \newcommand{\R}{\ensuremath{\mathbb{R}}}

17: \newcommand{\C}{\ensuremath{\mathbb{C}}}

18: \newcommand{\K}{\ensuremath{\mathbb{K}}}

19: \newcommand{\RICA}{\mbox{$\mathbb{R}$-ICA}\xspace}

20: \newcommand{\CICA}{\mbox{$\mathbb{C}$-ICA}\xspace}

21: \newcommand{\RISA}{\mbox{$\mathbb{R}$-ISA}\xspace}

22: \newcommand{\CISA}{\mbox{$\mathbb{C}$-ISA}\xspace}

23: \newcommand{\KICA}{\mbox{$\mathbb{K}$-ICA}\xspace}

24: \newcommand{\KISA}{\mbox{$\mathbb{K}$-ISA}\xspace}

25: \newcommand{\pv}{\ensuremath{\varphi}}

26: \newcommand{\G}{\ensuremath{\mathscr{G}}}

27: \newcommand{\F}{\ensuremath{\mathscr{F}}}

28:

29: \renewcommand{\vec}{\mathbf}

30:

31: \newtheorem{theorem}{Theorem}

32: \newtheorem*{proof2}{Proof}

33: \newtheorem{note}{Note}

34:

35: \title{REAL AND COMPLEX INDEPENDENT SUBSPACE ANALYSIS BY GENERALIZED VARIANCE}

36:

37: \author{

38:   \bf Zolt{\'a}n Szab{\'o} \hspace*{0.5cm} Andr{\'a}s L{\H{o}}rincz\\ % Author's names are \bf

39:   Department of Information Systems, E\"{o}tv\"{o}s Lor{\'a}nd University, \\

40:   P{\'a}zm{\'a}ny P. s{\'e}t{\'a}ny 1/C, Budapest H-1117, Hungary \\

41:   Research Group on Intelligent Information Systems \\

42:   Hungarian Academy of Sciences\\

43:   WWW home page: \url{http://nipg.inf.elte.hu}\\

44:   \tt szzoli@cs.elte.hu, andras.lorincz@elte.hu% e-mail addresses are \tt

45: }

46:

47: \begin{document}

48: \maketitle

49:

50: \begin{abstract}

51: Here, we address the problem of Independent Subspace Analysis (ISA). We develop a technique that (i) builds upon joint

52: decorrelation for a set of functions, (ii) can be related to kernel based techniques, (iii) can be interpreted as a

53: self-adjusting, self-grouping neural network solution, (iv) can be used both for real and for complex problems, and (v)

54: can be a first step towards large scale problems. Our numerical examples extend to a few 100 dimensional ISA tasks.

55: \end{abstract}

56:

57: \keywords{Independent Subspace Analysis, joint \mbox{f-decorrelation}} % Put 2-5 keywords here.

58:

59: \section{INTRODUCTION}

60: Uncovering independent processes is of high importance, because it breaks combinatorial explosion \citep{poczos06noncombinatorial}. In

61: cases, like Smart Dust, the problem is vital, because (i) elements have limited computational capacity and (ii) communication to remote

62: distances is prohibitively expensive. Self-adjusting, self-grouping neural network solutions may come to our rescue here. Here, we present

63: such an approach for Independent Subspace Analysis (ISA). The extension of ISA to Independent Process Analysis is straightforward under

64: certain conditions \citep{poczos06noncombinatorial}.

65:

66: Our paper is built as follows. The \KISA model is introduced in Section~\ref{sec:KISA-model}.

67: Section~\ref{sec:KISA-method} is about our method. Illustrations are provided in Section~\ref{sec:illustrations}.

68:

69: \section{THE \KISA MODEL}\label{sec:KISA-model}

70: Section~\ref{subsec:KISA-eqs} defines the \KISA task to be studied, Section~\ref{subsec:KISA-ambiguitites} treats the ambiguities of the

71: model.

72:

73: \subsection{The \KISA Equations}\label{subsec:KISA-eqs}

74: We treat real and complex ISA tasks: Let $\K\in\{\R,\C\}$. Assume that we observe the mixture of multidimensional independent i.i.d.

75: sampled sources (\emph{components}):

76: \begin{align}

77:     \vec{z}(t)&=\vec{A}\vec{s}(t),&

78:     \vec{s}(t)&=[\vec{s}^1(t);\ldots;\vec{s}^M(t)],

79: \end{align} where $D=\sum_{m=1}^M d_m$ is the total dimension of the components, $\vec{A}\in\K^{D\times D}$ is the invertible \emph{mixing matrix}.

80: The task is to recover the hidden components \mbox{$\vec{s}^m(t)\in\K^{d_m}$} by means of observations  $\vec{z}(t)\in

81: \K^D$. If $\K=\R$ ($\K=\C$) then we shall talk about Real (Complex) ISA [i.e., \RISA (\CISA)] task. The special

82: $d_m=1(\forall m)$ case is the Real (Complex) Independent Component Analysis [i.e., \RICA (\CICA)].

83:

84: \subsection{Ambiguities of the \KISA Model}\label{subsec:KISA-ambiguitites}

85: Identification of the $\KISA$ model is ambiguous. However, ambiguities are simple: hidden components $\vec{s}^m$ can be determined up to

86: permutation among subspaces and up to invertible transformation within subspaces. Details about \RISA and \CISA can be found in

87: \citep{theis04uniqueness1} and \citep{szabo06separation}, respectively.

88:

89: Ambiguities within subspaces can be lessened: given our assumption on the invertibility of matrix $\vec{A}$, we can

90: assume without any loss of generality that both the sources and the observation are \emph{white}, that is,

91: \begin{eqnarray}

92: E[\vec{s}]&=&\vec{0},cov\left[\vec{s}\right]=\vec{I}_D,\label{eq:white1}\\

93: E[\vec{z}]&=&\vec{0},cov\left[\vec{z}\right]=\vec{I}_D,\label{eq:white2}

94: \end{eqnarray}

95: where $E[\cdot]$ denotes the expectation value, $\vec{I}_D$ is the \mbox{$D$-dimensional} identity matrix. Now, the

96: $\vec{s}^m$ sources are determined up to (i) permutation \emph{and} orthogonal transformation in the real case and (ii)

97: permutation \emph{and} unitary transformation in the complex case.

98:

99: \section{\KISA BY JOINT DECORRELATION}\label{sec:KISA-method}

100: Components $\vec{s}^m$ are estimated by a neural network, which aims to `decorrelate' (see below) the

101: $\vec{y}^m\in\K^{d_m}$ parts of the $\K^D\ni\vec{y}(t)=[\vec{y}^1(t);\ldots;\vec{y}^M(t)]$ output of the network. The

102: network executes mapping $\vec{z}\mapsto L(\vec{z},\Theta)$ with network parameter $\Theta$.

103:

104: \subsection{Neural Network Candidates ($L$)}

105: Choosing an RNN with feedforward ($\vec{F}$) and recurrent ($\vec{R}$) connections then the network assumes the form

106: \begin{equation}

107: \dot{\vec{y}}(\tau)=-\vec{y}(\tau)+\vec{F}\vec{z}(t)-\vec{R}\vec{y}(\tau)

108: \end{equation}

109: and thus, upon relaxation it solves the

110: \begin{equation}

111: \vec{y}(t)=(\vec{I}_D+\vec{R})^{-1}\vec{F}\vec{z}(t)=L(\vec{z}(t);\vec{F},\vec{R})\label{eq:network I-O2}

112: \end{equation}

113: input-output mapping \citep{base06blind,amari95recurrent}. Another natural choice is a network with feedforward

114: connections $\vec{W}$ that executes mapping

115: \begin{equation}

116: \vec{y}(t)=\vec{W}\vec{z}(t)=L(\vec{z}(t);\vec{W}).\label{eq:feedforward-network:IO}

117: \end{equation}

118:

119: \subsection{Cost Function of \KISA}

120: The neural network estimates hidden sources $\vec{s}^m$ by non-linear ($\vec{f}$) decorrelation of $\vec{y}^m$s,

121: components of network output $\vec{y}$. Formally:

122:

123: Let us denote the empirical $\vec{f}$-covariance matrix of $\vec{y}(t)$ and $\vec{y}^m(t)$ for function

124: $\vec{f}=[\vec{f}^1;\ldots;\vec{f}^M]$ over $[1,T]$ by

125: \begin{align}

126:      \vec{\Sigma}_{\K}(\vec{f},T)&=\widehat{cov}\left(\vec{f}\left[\pv_{\K}(\vec{y})\right],\vec{f}\left[\pv_{\K}(\vec{y})\right]\right),\label{eq:cov_C0}

127:      \\

128:      \vec{\Sigma}_{\K}^{i,j}(\vec{f},T)&=\widehat{cov}\left(\vec{f}^i\left[\pv_{\K}\left(\vec{y}^i\right)\right],\vec{f}^j\left[\pv_{\K}\left(\vec{y}^j\right)\right]\right),\label{eq:cov_C}

129: \end{align}

130: respectively, where $i,j=1,\ldots,M$, $\pv_{\R}(\vec{v})=\vec{v}$, $\pv_{\C}$ is the mapping

131: \begin{equation}

132:      \pv_{\C}:\C^L\ni\vec{v}\mapsto\vec{v}\otimes\left[\begin{array}{c}\Re(\cdot)\\\Im(\cdot)\end{array}\right]\in\R^{2L}.

133: \end{equation}

134: Here, $\Re(\cdot)$ [$\Im(\cdot)$] denotes the real (imaginary) part, $\otimes$ is the Kronecker-product. Then

135: minimization of the following non-negative cost function (in $\Theta$)

136: \begin{equation}

137:     Q_{\Theta}(\vec{f},T):=-\frac{1}{2}\log\left\{\frac{\det[\vec{\Sigma}_{\K}(\vec{f},T)]}{\prod_{m=1}^M\det[\vec{\Sigma}_{\K}^{m,m}(\vec{f},T)]}\right\}\label{IPA-cost}

138: \end{equation}

139: gives rise to \emph{pairwise}\footnote{We note that -- unlike in the the \mbox{1-dimensional} case, i.e., unlike for $d=1$ -- pairwise

140: independence is \emph{not }equivalent to mutual independence. Nonetheless, according to our numerical experiences it is an efficient

141: approximation.} \mbox{$\vec{f}$-uncorrelatedness}:

142: \begin{theorem}\label{thm:equiv}

143: For the separation carried out by the network minimizing cost function \eqref{IPA-cost}, the following statements are equivalent:

144: \begin{enumerate}

145:     \item[i)]\label{thm:equiv:i)}

146:         $\vec{f}$-uncorrelatedness: $\vec{\Sigma}_{\K}^{i,j}(\vec{f},T)=0$\quad($\forall i\ne j$).

147:     \item[ii)]

148:         $Q_{\Theta}$ is minimal: $Q_{\Theta}(\vec{f},T)=0$.

149: \end{enumerate}

150: \end{theorem}

151:

152: \begin{proof2}[sketch]

153: The statement follows from the inequality related to the multi-dimensional Shannon differential entropy $H$: Let

154: $\vec{u}=[\vec{u}^1;\ldots;\vec{u}^M]\in\R^D$ $(\vec{u}^m\in\R^d)$ denote a random variable. Then

155: \begin{equation}

156: H(\vec{u}^1,\ldots,\vec{u}^M)\le\sum_{m=1}^MH(\vec{u}^m),

157: \end{equation}

158: and equality holds iff $\vec{u}^m$s are independent. Hint: one can choose $\vec{u}$ as a normal random variable with covariance

159: $\vec{\Sigma}_{\K}(\vec{f},T)$ and insert the expression of the entropy of normal variables.

160: \end{proof2}

161:

162: \begin{note}

163: For the special case $\K=\R$, \mbox{$\Theta=(\vec{F},\vec{R})$}, \mbox{$\vec{f}(\vec{z})=\vec{z}$} and $d=1$, see \citep{base06blind}.

164: \end{note}

165:

166: \begin{note}

167: Cost function $Q_{\Theta}$ of \eqref{IPA-cost} is attractive from the point of view of computing its gradient. This

168: gradient for the case of an RNN architecture [see Eq.~\eqref{eq:network I-O2}] may give rise to self-organization

169: \citep{base06blind}.

170: \end{note}

171:

172: \begin{note}

173: For real random variables, the separation, which is aimed by cost function \eqref{IPA-cost}, can be related to the more general principle,

174: the Kernel Generalized Variance (KGV) technique \citep{bach02kernel}. This technique aims to separate the $\vec{y}^m$ components of

175: $\vec{y}$, the transformed form of input $\vec{z}$. To this end, KGV estimates mutual information $I(\vec{y}^1,\ldots,\vec{y}^M)$ in

176: Gaussian approximation\footnote{A complex variable is normal if its image using map $\pv_{\C}$ is real multivariate normal

177: \citep{eriksson06complex}. Thus, relation $I(\vec{y}^1,\ldots,\vec{y}^M)=I[\pv_{\C}(\vec{y}^1),\ldots,\pv_{\C}(\vec{y}^M)]\quad

178: \vec{y}^m\in\C^d$ extends the KGV based interpretation to the complex case, too [see Eqs.~\eqref{eq:cov_C0}-\eqref{eq:cov_C}].} by means of

179: the covariance matrix of variable $\vec{y}$.  Here, the transformation of the KGV technique is realized by the neural network parameterized

180: with variable $\Theta$ and by the function $\vec{f}$.

181: \end{note}

182:

183: \begin{note}\label{note:KC}

184: We note that KGV is related to the kernel covariance (KC) method \citep{gretton03kernel}, which makes use of the supremum of

185: \mbox{1-dimensional} covariances as a measure of independence. Our approximation may also be improved by minimizing $Q_{\Theta}(\vec{f},T)$

186: on $\F(\ni\vec{f})$, i.e., on a set of functions.

187: \end{note}

188:

189: \subsection{The \KISA Algorithm}\label{subsec:KISA-alg}

190: Below, our proposed \KISA method is introduced. A decomposition principle called \KISA Separation Theorem has been

191: formulated in \citep{szabo06separation}. It says that (under certain conditions) the \KISA task can be solved in 2

192: steps: In the first step, 1-dimensional \KICA estimation is executed that provides separation matrix

193: $\vec{W}_{\text{\KICA}}$ and estimated sources $\hat{\vec{s}}_{\text{\KICA}}$. In the second step, optimal permutation

194: of the \KICA elements ($\hat{\vec{s}}_{\text{\KICA}}$) is searched for, the \KICA elements are grouped.

195:

196: This principle is adapted to linear feedforward neural networks [see, Eq.~\eqref{eq:feedforward-network:IO}]

197: here.\footnote{For the sake of simplicity we assume that all components have the same dimension, i.e., $d=d_m(\forall

198: m)$.} Separation matrix \mbox{$\vec{W}=\vec{W}_{\text{\KISA}}$} is searched in the form

199: \begin{equation}

200:     \vec{W}_{\text{\KISA}}=\vec{P}\vec{W}_{\text{\KICA}},

201: \end{equation}

202: where matrix $\vec{P}\in\R^{D\times D}$ denotes the desired permutation matrix. We search for the hidden sources

203: $\vec{s}^m$ by pairwise decorrelation of the components $\vec{y}^m$ of the output of the network using function

204: manifold $\F$ ($\F$: see, Note~\ref{note:KC}). Thus, given Theorem~\ref{thm:equiv}, our cost function is:

205: \begin{equation}

206:     Q(\F,T,\vec{P}):=\sum_{\vec{f}\in\F}\left\|\vec{M}\circ\vec{\Sigma}_{\K}(\vec{f},T,\vec{P})\right\|^2\rightarrow\min_{\vec{P}}.\\

207: \end{equation}

208: Here: (i) $\F$ denotes a set of functions, each function $\R^D\mapsto \R^D$ (if $\K=\C$ then $\R^{2D}\mapsto \R^{2D}$), and each function

209: acts on each coordinate separately, (ii) $\circ$ denotes pointwise multiplication (Hadamard product), (iii) $\vec{M}$ masks according to

210: the subspaces \{\mbox{$\vec{M}=\vec{E}_D-\vec{I}_M\otimes \vec{E}_d$}, where all elements of matrix \mbox{$\vec{E}_D\in\R^{D\times D}$} and

211: \mbox{$\vec{E}_d\in\R^{d\times d}$} are equal to 1 [if $\K=\C$ then  $\vec{E}_D$ ($\vec{E}_d$) is replaced by $\vec{E}_{2D}$

212: ($\vec{E}_{2d}$)]\}, (iv) $\left\|\cdot\right\|^2$ denotes the square of the Frobenius norm (sum of squares of the elements), (v) in

213: $\vec{\Sigma}_K(\vec{f},T,\vec{P}$), \mbox{$\vec{y}=\vec{P}\hat{\vec{s}}_{\text{\KICA}}$}, and (vi) $\vec{P}$ is the $D\times D$

214: permutation matrix to be determined.

215:

216: Greedy permutation search is applied: 2 coordinates of different subspace are exchanged if this change lowers cost function

217: $Q(\F,T,\cdot)$. Note: Greedy search could be replaced by a \emph{global} one for a higher computational burden \citep{szabo06cross}.

218: Table~\ref{tab:KISA-pseudocode} contains the pseudocode of our technique.

219:

220: \begin{table}

221:   \centering

222:   \caption{\KISA Algorithm - pseudocode}\label{tab:KISA-pseudocode}

223:   \begin{minipage}{\textwidth}

224:   \begin{tabular}{|l|}

225:         \hline

226:         \textbf{Input of the algorithm}\\

227:         \verb|  |observation: $\{\mathbf{z}(t)\}_{t=1,\ldots,T}$\\

228:         \textbf{Optimization}\footnote{Let $\G^1, \ldots, \G^M$ denote the indices of the $1^{st}, \ldots , M^{th}$\\

229:              subspaces, i.e., $\G^m:=\{(m-1)d+1,\ldots,md\}$, and\\

230:              permutation matrix $\vec{P}_{pq}$ exchanges coordinates $p$ and $q$.}\\

231:         \verb|  |\textbf{\KICA}: on whitened observation $\vec{z}$, \\

232:         \verb|        |$\Rightarrow$ $\hat{\vec{s}}_{\text{\KICA}}$ estimation\\

233:         \verb|  |\textbf{Permutation search}\\

234:         \verb|    |$\vec{P}:=\vec{I}_D$\\

235:         \verb|    |repeat\\

236:         \verb|      |sequentially for $\forall p\in\G^{m_1},q\in\G^{m_2}$\\

237:         \verb|                     |$(m_1\ne m_2):$\\

238:         \verb|        |if $Q(\F,T,\vec{P}_{pq}\vec{P})<Q(\F,T,\vec{P})$\\

239:         \verb|          |$\vec{P}:=\vec{P}_{pq}\vec{P}$\\

240:         \verb|        |end\\

241:         \verb|    |until $Q(\F,T,\cdot)$ decreases in the \emph{sweep} above\\

242:         \textbf{Estimation}\\

243:         \verb|  |$\hat{\vec{s}}_{\text{\KISA}}=\vec{P}\hat{\vec{s}}_{\text{\KICA}}$\\

244:         \hline

245:   \end{tabular}

246:   \end{minipage}

247: \end{table}

248:

249: \section{Illustrations}\label{sec:illustrations}

250: The \KISA identification algorithm of Section~\ref{subsec:KISA-alg} is illustrated below (due to the lack of space,

251: illustrations are provided for $\K=\R$ only). Test cases are introduced in Section~\ref{subsec:databases}. The quality

252: of the solutions will be measured by the normalized Amari-distance (Section~\ref{subsec:amaridist}). Numerical results

253: are provided in Section~\ref{subsec:simulations}.

254:

255: \subsection{Databases}\label{subsec:databases}

256: Two databases were defined to study our algorithm. The databases are illustrated in Fig.~\ref{fig:database-Aw} and

257: Fig.~\ref{fig:database-d-spherical}.

258:

259: \subsubsection{The $A\omega$ Database}

260: Here, hidden sources $\vec{s}^m$ are uniform distributions defined by the 2-dimensional images ($d=2$) of letters on

261: A-Z and $\alpha-\omega$. This is called database $A\omega$, which has 50 components ($M=50$), see

262: Fig.~\ref{fig:database-Aw} for an illustration. This test falls outside of the (known) validity domain of the \RISA

263: Separation Theorem.

264:

265: \begin{figure}%

266: \centering%

267: \includegraphics[width=3cm]{database_A-omega.eps}

268: \caption[]{Database $A\omega$. 100-dimensional task of 50 pieces of 2-dimensional components ($D=100$, $M=50$, \mbox{$d=2$}).

269: Hidden sources are uniformly distributed variables on the letters of the English and the Greek alphabets.}%

270: \label{fig:database-Aw}%

271: \end{figure}

272:

273: \subsubsection{The $d$-Spherical Database}

274: Here, hidden sources $\vec{s}^m$ are spherically symmetric random variables that have representation of the form

275: \mbox{$\vec{v}\stackrel{\mathrm{distr}}{=}\rho\vec{u}^{(d)}$}, where $\vec{u}^{(d)}$ is uniformly distributed on the $d$-dimensional unit

276: sphere, and $\rho$ is a non-negative scalar random variable independent of $\vec{u}^{(d)}$ ($\stackrel{\mathrm{distr}}{=}$ denotes equality

277: in distribution). This \emph{$d$-spherical} database: (i) can be scaled in dimension $d$, (ii) satisfies conditions of the \RISA Separation

278: Theorem, and (iii) can be defined by $\rho$. (See \citep{fang90symmetric,frahm04generalized} for spherical variables.) Our choices for

279: $\rho$ are shown in Fig.~\ref{fig:database-d-spherical}.

280:

281: \begin{figure}%

282: \centering%

283: \subfloat[][]{\includegraphics[trim=100 27 70 40,scale=0.17]{database_d-spherical_1.eps}}\hfill%

284: \subfloat[][]{\includegraphics[trim=85 30 67 40,scale=0.15]{database_d-spherical_2.eps}}\hfill%

285: \subfloat[][]{\includegraphics[trim=90 30 75 40,scale=0.16]{database_d-spherical_3.eps}}\hfill%

286: \caption[]{Database $d$-spherical. Stochastic representation of the 3 ($M=3$) hidden sources. (a): $\rho$ is

287: uniform on $[0,1]$, (b): $\rho$ is exponential with parameter $\mu=1$, and (c): $\rho$ is lognormal with parameters $\mu=0, \sigma=1$, respectively.}%

288: \label{fig:database-d-spherical}%

289: \end{figure}

290:

291:

292: \subsection{Normalized Amari-distance}\label{subsec:amaridist}

293: The precision of our algorithm was measured by the normalized Amari-distance as follows. The optimal estimation of the \RISA model provides

294: matrix \mbox{$\vec{B}:=\vec{W}\vec{A}\in\R^{D\times D}$}, a block-permutation matrix made of $d\times d$ sized blocks. Let us decompose

295: matrix $\vec{B}\in\R^{D\times D}$ into $d\times d$ blocks: \mbox{$\vec{B}=\left[\vec{B}^{i,j}\right]_{i,j=1,\ldots,M}$}. Let $b^{i,j}$

296: denote the sum of the absolute values of the elements of matrix \mbox{$\vec{B}^{i,j}\in\R^{d\times d}$}. Then the \emph{normalized} version

297: of the Amari-distance \citep{amari96new} (as it was introduced in \citep{szabo06cross} for \RISA) is defined as:

298: \begin{eqnarray}

299:     r(\vec{B})&:=&\frac{1}{2M(M-1)}\left[\sum_{i=1}^M\left(\frac{

300:     \sum_{j=1}^Mb^{ij}}{\max_jb^{ij}}-1\right)\right. + \nonumber \\

301:     &&\qquad \qquad\quad\left.\sum_{j=1}^M\left(\frac{ \sum_{i=1}^Mb^{ij}}{\max_ib^{ij}}-1\right)\right].

302: \end{eqnarray}

303: For matrix $\vec{B}$ we have that $0\le r(\vec{B})\le 1$, and $r(\vec{B})=0$ if, and only if $\vec{B}$ is a block-permutation matrix with

304: $d\times d$ sized blocks. Thus, $r=0$ corresponds to perfect estimation (0\% error), $r=1$ is the worst estimation (100\% error). This

305: performance measure can be used for $\K=\C$, too.

306:

307: \subsection{Simulations}\label{subsec:simulations}

308: Results on databases \emph{$A\omega$} and \emph{$d$-spherical} are provided here.  In our simulations, sample number of observations

309: $\vec{z}(t)$ changed: $1000 \le T \le 30000$. Mixing matrix $\vec{A}$ was chosen randomly from the orthogonal group. Manifold $\F$ was

310: $\F:=\{\vec{z}\mapsto \cos(\vec{z}), \vec{z}\mapsto \cos(2\vec{z})\}$ (functions operated on coordinates separately). Scaling properties of

311: the approximation were studied for database \mbox{$d$-spherical} by changing the value of $d$ between $20$ and $110$ [i.e., the number of

312: subspaces ($M$) was fixed, but the dimension of the subspaces was increased.] For each parameters [$T$ for database $A\omega$, $(T,d)$ for

313: database $d$-spherical] ten experiments were averaged. Qualities of the solutions were measured by the Amari-error (see

314: Section~\ref{subsec:amaridist}). We have chosen FastICA \citep{hyvarinen97fast} for the \RICA module (see Table~\ref{tab:KISA-pseudocode}).

315:

316: Precision of our method is shown: (i) for database $A\omega$ in Fig.~\ref{fig:demo:A-omega} as a function of sample

317: number, (ii) for database $d$-spherical in Fig.~\ref{fig:amari-vs-T:d-spherical} as a function of sample number and

318: source dimension ($d$) (for details, see Table~\ref{fig:amari-vs-T:d-spherical}). The figures demonstrate that the

319: algorithm was able to uncover the hidden components with high precision. In the case of database $d$-spherical the

320: Amari error decreases according to power law $r(T)\propto T^{-c}$ $(c>0)$.

321:

322: In our numerical simulations, the number of sweeps before the iteration of the permutation optimization stopped (see

323: Table~\ref{tab:KISA-pseudocode}) varied between 2 and 6.

324:

325: \begin{figure}%

326: \centering%

327: \subfloat[][]{\includegraphics[width=6.1cm]{amari_A-omega.eps}}%

328: \subfloat[][]{\includegraphics[width=2.3cm]{hat_A-omega.eps}}\\%

329: \caption[]{Estimations on database $A\omega$. (a) Amari-error as a function of the number of samples.

330: Average$\pm$deviation for $30000$ samples: $0.58\%\pm0.04$, (b) estimation with average error for $30000$ samples:

331: the hidden components are recovered up to permutation and orthogonal transformation (\RISA ambiguity).}%

332: \label{fig:demo:A-omega}%

333: \end{figure}

334:

335: \begin{figure}

336: \centering

337: \includegraphics[width=6.1cm]{amari_d-spherical.eps}

338: \caption{Estimations of database $d$-spherical: Amari-error as a function of the number of samples on loglog scale for

339: different dimensional ($d$) subspaces. Task dimension: $D$. Errors are approximately linear, so they scale according to

340: power law, like $r(T)\propto T^{-c}$ $(c>0)$. For numerical values, see Table~\ref{tab:amari-dists-d-spherical}.}

341: \label{fig:amari-vs-T:d-spherical}

342: \end{figure}

343:

344: \begin{table}

345:     \centering

346:     \caption{Amari-error for database $d$-spherical, for different $d$ values: average $\pm$ deviation. Number of samples: $T=30000$.}

347:     \label{tab:amari-dists-d-spherical}

348:     \begin{tabular}{|c|c|c|}

349:     \hline

350:         $d=20$ & $d=30$ &$d=40$\\

351:     \hline

352:         $1.40\%$ $(\pm 0.03)$ & $1.71\%$ $(\pm 0.03)$& $1.99\%$ $(\pm 0.03)$\\

353:     \hline\hline

354:        $d=50$ & $d=60$& $d=70$\\

355:     \hline

356:           $2.23\%$ $(\pm 0.03)$& $2.44\%$ $(\pm 0.03)$ & $2.65\%$ $(\pm 0.03)$\\

357:     \hline\hline

358:         $d=80$& $d=90$& $d=100$\\

359:     \hline

360:          $2.85\%$ $(\pm 0.03)$& $3.03\%$ $(\pm 0.04)$&$3.19\%$ $(\pm 0.02)$\\

361:     \hline\hline

362:         $d=110$&\multicolumn{2}{c}{}\\

363:     \cline{1-1}

364:         $3.37\%$ $(\pm 0.03)$&\multicolumn{2}{c}{}\\

365:     \cline{1-1}

366:     \end{tabular}

367: \end{table}

368:

369: \begin{thebibliography}{13}

370: \providecommand{\natexlab}[1]{#1} \providecommand{\url}[1]{\texttt{#1}} \expandafter\ifx\csname urlstyle\endcsname\relax

371:   \providecommand{\doi}[1]{doi: #1}\else

372:   \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi

373:

374: \bibitem[Amari et~al.(1995)Amari, Cichocki, and Yang]{amari95recurrent}

375: S.~Amari, A.~Cichocki, and H.H. Yang.

376: \newblock Recurrent neural networks for blind separation of sources.

377: \newblock NOLTA'95, pp. 37-42, 1995.

378:

379: \bibitem[Amari et~al.(1996)Amari, Cichocki, and Yang]{amari96new}

380: S.~Amari, A.~Cichocki, and H.~Yang.

381: \newblock A new learning algorithm for blind signal separation.

382: \newblock \emph{Advances in Neural Information Processing Systems}, 1996.

383:

384: \bibitem[Bach and Jordan(2002)]{bach02kernel}

385: Francis~R. Bach and Michael~I. Jordan.

386: \newblock Kernel {ICA}.

387: \newblock \emph{JMLR}, 3:\penalty0 1--48, 2002.

388:

389: \bibitem[Eriksson and Koivunen(2006)]{eriksson06complex}

390: Jan Eriksson and Visa Koivunen.

391: \newblock Complex random vectors and {ICA} models: Identifiability, uniqueness

392:   and separability.

393: \newblock \emph{{IEEE} {TIT}}, 52\penalty0 (3), 2006.

394:

395: \bibitem[Fang et~al.(1990)Fang, Kotz, and Ng]{fang90symmetric}

396: Kai-Tai Fang, Samuel Kotz, and Kai~Wang Ng.

397: \newblock \emph{Symmetric multivariate and related distributions}.

398: \newblock Chapman and Hall, 1990.

399:

400: \bibitem[Frahm(2004)]{frahm04generalized}

401: Gabriel Frahm.

402: \newblock \emph{Generalized elliptical distributions: Theory and applications}.

403: \newblock PhD thesis, University of K�ln, 2004.

404:

405: \bibitem[Gretton et~al.(2003)Gretton, Herbrich, and Smola]{gretton03kernel}

406: A.~Gretton, R.~Herbrich, and A.~Smola.

407: \newblock The kernel mutual information.

408: \newblock In \emph{IEEE ICASSP}, volume~4, pages 880--883, 2003.

409:

410: \bibitem[Hyv{\"a}rinen and Oja(1997)]{hyvarinen97fast}

411: A.~Hyv{\"a}rinen and E.~Oja.

412: \newblock A fast fixed-point algorithm for independent component analysis.

413: \newblock \emph{Neural Computation}, 9\penalty0 (7):\penalty0 1483--1492, 1997.

414:

415: \bibitem[Meyer-B{\"a}se et~al.(2006)Meyer-B{\"a}se, Gruber, Theis, and

416:   Foo]{base06blind}

417: Anke Meyer-B{\"a}se, Peter Gruber, Fabian Theis, and Simon Foo.

418: \newblock Blind source separation based on self-organizing neural network.

419: \newblock \emph{Engng. Appl. of AI}, 19:\penalty0 305--311, 2006.

420:

421: \bibitem[P\'oczos and L\H{o}rincz(2006)]{poczos06noncombinatorial}

422: B.~P\'oczos and A.~L\H{o}rincz.

423: \newblock Non-combinatorial estimation of independent {AR} sources.

424: \newblock \emph{Neurocomp.}, 2006.

425: \newblock accepted.

426:

427: \bibitem[Szab\'o et~al.(2006{\natexlab{a}})Szab\'o, P\'oczos, and

428:   L\H{o}rincz]{szabo06cross}

429: Z.~Szab\'o, B.~P\'oczos, and A.~L\H{o}rincz.

430: \newblock Cross-entropy optimization for independent process analysis.

431: \newblock In \emph{Proc. of {ICA}}, LNCS 3889, pages 909--916. Springer-Verlag,

432:   2006{\natexlab{a}}.

433:

434: \bibitem[Szab\'o et~al.(2006{\natexlab{b}})Szab\'o, P\'oczos, and L{\H

435:   o}rincz]{szabo06separation}

436: Z.~Szab\'o, B.~P\'oczos, and A.~L{\H o}rincz.

437: \newblock Separation theorem for $\mathbb{K}$-independent subspace analysis

438:   with sufficient conditions.

439: \newblock Technical report, E\"otv\"os Lor\'and University, Budapest,

440:   2006{\natexlab{b}}.

441: \newblock http://arxiv.org/abs/math.ST/0608100.

442:

443: \bibitem[Theis(2004)]{theis04uniqueness1}

444: F.J. Theis.

445: \newblock Uniqueness of complex and multidimensional independent component

446:   analysis.

447: \newblock \emph{Signal Processing}, 84\penalty0 (5):\penalty0 951--956, 2004.

448: \end{thebibliography}

449:

450: \end{document}

451: