1: \documentclass[a4paper,numberrefs]{icarnws}
2:
3: \usepackage[latin2]{inputenc}
4: \usepackage[T1]{fontenc}
5: \usepackage{amsmath}
6: \usepackage{amssymb}
7: \usepackage{graphicx}
8: \usepackage{xspace}
9: \usepackage[mathscr]{eucal}
10: \usepackage{url}
11: \usepackage{subfig}
12: \usepackage{times}
13: \usepackage{tabularx}
14: \usepackage{amsthm}
15:
16: \newcommand{\R}{\ensuremath{\mathbb{R}}}
17: \newcommand{\C}{\ensuremath{\mathbb{C}}}
18: \newcommand{\K}{\ensuremath{\mathbb{K}}}
19: \newcommand{\RICA}{\mbox{$\mathbb{R}$-ICA}\xspace}
20: \newcommand{\CICA}{\mbox{$\mathbb{C}$-ICA}\xspace}
21: \newcommand{\RISA}{\mbox{$\mathbb{R}$-ISA}\xspace}
22: \newcommand{\CISA}{\mbox{$\mathbb{C}$-ISA}\xspace}
23: \newcommand{\KICA}{\mbox{$\mathbb{K}$-ICA}\xspace}
24: \newcommand{\KISA}{\mbox{$\mathbb{K}$-ISA}\xspace}
25: \newcommand{\pv}{\ensuremath{\varphi}}
26: \newcommand{\G}{\ensuremath{\mathscr{G}}}
27: \newcommand{\F}{\ensuremath{\mathscr{F}}}
28:
29: \renewcommand{\vec}{\mathbf}
30:
31: \newtheorem{theorem}{Theorem}
32: \newtheorem*{proof2}{Proof}
33: \newtheorem{note}{Note}
34:
35: \title{REAL AND COMPLEX INDEPENDENT SUBSPACE ANALYSIS BY GENERALIZED VARIANCE}
36:
37: \author{
38: \bf Zolt{\'a}n Szab{\'o} \hspace*{0.5cm} Andr{\'a}s L{\H{o}}rincz\\ % Author's names are \bf
39: Department of Information Systems, E\"{o}tv\"{o}s Lor{\'a}nd University, \\
40: P{\'a}zm{\'a}ny P. s{\'e}t{\'a}ny 1/C, Budapest H-1117, Hungary \\
41: Research Group on Intelligent Information Systems \\
42: Hungarian Academy of Sciences\\
43: WWW home page: \url{http://nipg.inf.elte.hu}\\
44: \tt szzoli@cs.elte.hu, andras.lorincz@elte.hu% e-mail addresses are \tt
45: }
46:
47: \begin{document}
48: \maketitle
49:
50: \begin{abstract}
51: Here, we address the problem of Independent Subspace Analysis (ISA). We develop a technique that (i) builds upon joint
52: decorrelation for a set of functions, (ii) can be related to kernel based techniques, (iii) can be interpreted as a
53: self-adjusting, self-grouping neural network solution, (iv) can be used both for real and for complex problems, and (v)
54: can be a first step towards large scale problems. Our numerical examples extend to a few 100 dimensional ISA tasks.
55: \end{abstract}
56:
57: \keywords{Independent Subspace Analysis, joint \mbox{f-decorrelation}} % Put 2-5 keywords here.
58:
59: \section{INTRODUCTION}
60: Uncovering independent processes is of high importance, because it breaks combinatorial explosion \citep{poczos06noncombinatorial}. In
61: cases, like Smart Dust, the problem is vital, because (i) elements have limited computational capacity and (ii) communication to remote
62: distances is prohibitively expensive. Self-adjusting, self-grouping neural network solutions may come to our rescue here. Here, we present
63: such an approach for Independent Subspace Analysis (ISA). The extension of ISA to Independent Process Analysis is straightforward under
64: certain conditions \citep{poczos06noncombinatorial}.
65:
66: Our paper is built as follows. The \KISA model is introduced in Section~\ref{sec:KISA-model}.
67: Section~\ref{sec:KISA-method} is about our method. Illustrations are provided in Section~\ref{sec:illustrations}.
68:
69: \section{THE \KISA MODEL}\label{sec:KISA-model}
70: Section~\ref{subsec:KISA-eqs} defines the \KISA task to be studied, Section~\ref{subsec:KISA-ambiguitites} treats the ambiguities of the
71: model.
72:
73: \subsection{The \KISA Equations}\label{subsec:KISA-eqs}
74: We treat real and complex ISA tasks: Let $\K\in\{\R,\C\}$. Assume that we observe the mixture of multidimensional independent i.i.d.
75: sampled sources (\emph{components}):
76: \begin{align}
77: \vec{z}(t)&=\vec{A}\vec{s}(t),&
78: \vec{s}(t)&=[\vec{s}^1(t);\ldots;\vec{s}^M(t)],
79: \end{align} where $D=\sum_{m=1}^M d_m$ is the total dimension of the components, $\vec{A}\in\K^{D\times D}$ is the invertible \emph{mixing matrix}.
80: The task is to recover the hidden components \mbox{$\vec{s}^m(t)\in\K^{d_m}$} by means of observations $\vec{z}(t)\in
81: \K^D$. If $\K=\R$ ($\K=\C$) then we shall talk about Real (Complex) ISA [i.e., \RISA (\CISA)] task. The special
82: $d_m=1(\forall m)$ case is the Real (Complex) Independent Component Analysis [i.e., \RICA (\CICA)].
83:
84: \subsection{Ambiguities of the \KISA Model}\label{subsec:KISA-ambiguitites}
85: Identification of the $\KISA$ model is ambiguous. However, ambiguities are simple: hidden components $\vec{s}^m$ can be determined up to
86: permutation among subspaces and up to invertible transformation within subspaces. Details about \RISA and \CISA can be found in
87: \citep{theis04uniqueness1} and \citep{szabo06separation}, respectively.
88:
89: Ambiguities within subspaces can be lessened: given our assumption on the invertibility of matrix $\vec{A}$, we can
90: assume without any loss of generality that both the sources and the observation are \emph{white}, that is,
91: \begin{eqnarray}
92: E[\vec{s}]&=&\vec{0},cov\left[\vec{s}\right]=\vec{I}_D,\label{eq:white1}\\
93: E[\vec{z}]&=&\vec{0},cov\left[\vec{z}\right]=\vec{I}_D,\label{eq:white2}
94: \end{eqnarray}
95: where $E[\cdot]$ denotes the expectation value, $\vec{I}_D$ is the \mbox{$D$-dimensional} identity matrix. Now, the
96: $\vec{s}^m$ sources are determined up to (i) permutation \emph{and} orthogonal transformation in the real case and (ii)
97: permutation \emph{and} unitary transformation in the complex case.
98:
99: \section{\KISA BY JOINT DECORRELATION}\label{sec:KISA-method}
100: Components $\vec{s}^m$ are estimated by a neural network, which aims to `decorrelate' (see below) the
101: $\vec{y}^m\in\K^{d_m}$ parts of the $\K^D\ni\vec{y}(t)=[\vec{y}^1(t);\ldots;\vec{y}^M(t)]$ output of the network. The
102: network executes mapping $\vec{z}\mapsto L(\vec{z},\Theta)$ with network parameter $\Theta$.
103:
104: \subsection{Neural Network Candidates ($L$)}
105: Choosing an RNN with feedforward ($\vec{F}$) and recurrent ($\vec{R}$) connections then the network assumes the form
106: \begin{equation}
107: \dot{\vec{y}}(\tau)=-\vec{y}(\tau)+\vec{F}\vec{z}(t)-\vec{R}\vec{y}(\tau)
108: \end{equation}
109: and thus, upon relaxation it solves the
110: \begin{equation}
111: \vec{y}(t)=(\vec{I}_D+\vec{R})^{-1}\vec{F}\vec{z}(t)=L(\vec{z}(t);\vec{F},\vec{R})\label{eq:network I-O2}
112: \end{equation}
113: input-output mapping \citep{base06blind,amari95recurrent}. Another natural choice is a network with feedforward
114: connections $\vec{W}$ that executes mapping
115: \begin{equation}
116: \vec{y}(t)=\vec{W}\vec{z}(t)=L(\vec{z}(t);\vec{W}).\label{eq:feedforward-network:IO}
117: \end{equation}
118:
119: \subsection{Cost Function of \KISA}
120: The neural network estimates hidden sources $\vec{s}^m$ by non-linear ($\vec{f}$) decorrelation of $\vec{y}^m$s,
121: components of network output $\vec{y}$. Formally:
122:
123: Let us denote the empirical $\vec{f}$-covariance matrix of $\vec{y}(t)$ and $\vec{y}^m(t)$ for function
124: $\vec{f}=[\vec{f}^1;\ldots;\vec{f}^M]$ over $[1,T]$ by
125: \begin{align}
126: \vec{\Sigma}_{\K}(\vec{f},T)&=\widehat{cov}\left(\vec{f}\left[\pv_{\K}(\vec{y})\right],\vec{f}\left[\pv_{\K}(\vec{y})\right]\right),\label{eq:cov_C0}
127: \\
128: \vec{\Sigma}_{\K}^{i,j}(\vec{f},T)&=\widehat{cov}\left(\vec{f}^i\left[\pv_{\K}\left(\vec{y}^i\right)\right],\vec{f}^j\left[\pv_{\K}\left(\vec{y}^j\right)\right]\right),\label{eq:cov_C}
129: \end{align}
130: respectively, where $i,j=1,\ldots,M$, $\pv_{\R}(\vec{v})=\vec{v}$, $\pv_{\C}$ is the mapping
131: \begin{equation}
132: \pv_{\C}:\C^L\ni\vec{v}\mapsto\vec{v}\otimes\left[\begin{array}{c}\Re(\cdot)\\\Im(\cdot)\end{array}\right]\in\R^{2L}.
133: \end{equation}
134: Here, $\Re(\cdot)$ [$\Im(\cdot)$] denotes the real (imaginary) part, $\otimes$ is the Kronecker-product. Then
135: minimization of the following non-negative cost function (in $\Theta$)
136: \begin{equation}
137: Q_{\Theta}(\vec{f},T):=-\frac{1}{2}\log\left\{\frac{\det[\vec{\Sigma}_{\K}(\vec{f},T)]}{\prod_{m=1}^M\det[\vec{\Sigma}_{\K}^{m,m}(\vec{f},T)]}\right\}\label{IPA-cost}
138: \end{equation}
139: gives rise to \emph{pairwise}\footnote{We note that -- unlike in the the \mbox{1-dimensional} case, i.e., unlike for $d=1$ -- pairwise
140: independence is \emph{not }equivalent to mutual independence. Nonetheless, according to our numerical experiences it is an efficient
141: approximation.} \mbox{$\vec{f}$-uncorrelatedness}:
142: \begin{theorem}\label{thm:equiv}
143: For the separation carried out by the network minimizing cost function \eqref{IPA-cost}, the following statements are equivalent:
144: \begin{enumerate}
145: \item[i)]\label{thm:equiv:i)}
146: $\vec{f}$-uncorrelatedness: $\vec{\Sigma}_{\K}^{i,j}(\vec{f},T)=0$\quad($\forall i\ne j$).
147: \item[ii)]
148: $Q_{\Theta}$ is minimal: $Q_{\Theta}(\vec{f},T)=0$.
149: \end{enumerate}
150: \end{theorem}
151:
152: \begin{proof2}[sketch]
153: The statement follows from the inequality related to the multi-dimensional Shannon differential entropy $H$: Let
154: $\vec{u}=[\vec{u}^1;\ldots;\vec{u}^M]\in\R^D$ $(\vec{u}^m\in\R^d)$ denote a random variable. Then
155: \begin{equation}
156: H(\vec{u}^1,\ldots,\vec{u}^M)\le\sum_{m=1}^MH(\vec{u}^m),
157: \end{equation}
158: and equality holds iff $\vec{u}^m$s are independent. Hint: one can choose $\vec{u}$ as a normal random variable with covariance
159: $\vec{\Sigma}_{\K}(\vec{f},T)$ and insert the expression of the entropy of normal variables.
160: \end{proof2}
161:
162: \begin{note}
163: For the special case $\K=\R$, \mbox{$\Theta=(\vec{F},\vec{R})$}, \mbox{$\vec{f}(\vec{z})=\vec{z}$} and $d=1$, see \citep{base06blind}.
164: \end{note}
165:
166: \begin{note}
167: Cost function $Q_{\Theta}$ of \eqref{IPA-cost} is attractive from the point of view of computing its gradient. This
168: gradient for the case of an RNN architecture [see Eq.~\eqref{eq:network I-O2}] may give rise to self-organization
169: \citep{base06blind}.
170: \end{note}
171:
172: \begin{note}
173: For real random variables, the separation, which is aimed by cost function \eqref{IPA-cost}, can be related to the more general principle,
174: the Kernel Generalized Variance (KGV) technique \citep{bach02kernel}. This technique aims to separate the $\vec{y}^m$ components of
175: $\vec{y}$, the transformed form of input $\vec{z}$. To this end, KGV estimates mutual information $I(\vec{y}^1,\ldots,\vec{y}^M)$ in
176: Gaussian approximation\footnote{A complex variable is normal if its image using map $\pv_{\C}$ is real multivariate normal
177: \citep{eriksson06complex}. Thus, relation $I(\vec{y}^1,\ldots,\vec{y}^M)=I[\pv_{\C}(\vec{y}^1),\ldots,\pv_{\C}(\vec{y}^M)]\quad
178: \vec{y}^m\in\C^d$ extends the KGV based interpretation to the complex case, too [see Eqs.~\eqref{eq:cov_C0}-\eqref{eq:cov_C}].} by means of
179: the covariance matrix of variable $\vec{y}$. Here, the transformation of the KGV technique is realized by the neural network parameterized
180: with variable $\Theta$ and by the function $\vec{f}$.
181: \end{note}
182:
183: \begin{note}\label{note:KC}
184: We note that KGV is related to the kernel covariance (KC) method \citep{gretton03kernel}, which makes use of the supremum of
185: \mbox{1-dimensional} covariances as a measure of independence. Our approximation may also be improved by minimizing $Q_{\Theta}(\vec{f},T)$
186: on $\F(\ni\vec{f})$, i.e., on a set of functions.
187: \end{note}
188:
189: \subsection{The \KISA Algorithm}\label{subsec:KISA-alg}
190: Below, our proposed \KISA method is introduced. A decomposition principle called \KISA Separation Theorem has been
191: formulated in \citep{szabo06separation}. It says that (under certain conditions) the \KISA task can be solved in 2
192: steps: In the first step, 1-dimensional \KICA estimation is executed that provides separation matrix
193: $\vec{W}_{\text{\KICA}}$ and estimated sources $\hat{\vec{s}}_{\text{\KICA}}$. In the second step, optimal permutation
194: of the \KICA elements ($\hat{\vec{s}}_{\text{\KICA}}$) is searched for, the \KICA elements are grouped.
195:
196: This principle is adapted to linear feedforward neural networks [see, Eq.~\eqref{eq:feedforward-network:IO}]
197: here.\footnote{For the sake of simplicity we assume that all components have the same dimension, i.e., $d=d_m(\forall
198: m)$.} Separation matrix \mbox{$\vec{W}=\vec{W}_{\text{\KISA}}$} is searched in the form
199: \begin{equation}
200: \vec{W}_{\text{\KISA}}=\vec{P}\vec{W}_{\text{\KICA}},
201: \end{equation}
202: where matrix $\vec{P}\in\R^{D\times D}$ denotes the desired permutation matrix. We search for the hidden sources
203: $\vec{s}^m$ by pairwise decorrelation of the components $\vec{y}^m$ of the output of the network using function
204: manifold $\F$ ($\F$: see, Note~\ref{note:KC}). Thus, given Theorem~\ref{thm:equiv}, our cost function is:
205: \begin{equation}
206: Q(\F,T,\vec{P}):=\sum_{\vec{f}\in\F}\left\|\vec{M}\circ\vec{\Sigma}_{\K}(\vec{f},T,\vec{P})\right\|^2\rightarrow\min_{\vec{P}}.\\
207: \end{equation}
208: Here: (i) $\F$ denotes a set of functions, each function $\R^D\mapsto \R^D$ (if $\K=\C$ then $\R^{2D}\mapsto \R^{2D}$), and each function
209: acts on each coordinate separately, (ii) $\circ$ denotes pointwise multiplication (Hadamard product), (iii) $\vec{M}$ masks according to
210: the subspaces \{\mbox{$\vec{M}=\vec{E}_D-\vec{I}_M\otimes \vec{E}_d$}, where all elements of matrix \mbox{$\vec{E}_D\in\R^{D\times D}$} and
211: \mbox{$\vec{E}_d\in\R^{d\times d}$} are equal to 1 [if $\K=\C$ then $\vec{E}_D$ ($\vec{E}_d$) is replaced by $\vec{E}_{2D}$
212: ($\vec{E}_{2d}$)]\}, (iv) $\left\|\cdot\right\|^2$ denotes the square of the Frobenius norm (sum of squares of the elements), (v) in
213: $\vec{\Sigma}_K(\vec{f},T,\vec{P}$), \mbox{$\vec{y}=\vec{P}\hat{\vec{s}}_{\text{\KICA}}$}, and (vi) $\vec{P}$ is the $D\times D$
214: permutation matrix to be determined.
215:
216: Greedy permutation search is applied: 2 coordinates of different subspace are exchanged if this change lowers cost function
217: $Q(\F,T,\cdot)$. Note: Greedy search could be replaced by a \emph{global} one for a higher computational burden \citep{szabo06cross}.
218: Table~\ref{tab:KISA-pseudocode} contains the pseudocode of our technique.
219:
220: \begin{table}
221: \centering
222: \caption{\KISA Algorithm - pseudocode}\label{tab:KISA-pseudocode}
223: \begin{minipage}{\textwidth}
224: \begin{tabular}{|l|}
225: \hline
226: \textbf{Input of the algorithm}\\
227: \verb| |observation: $\{\mathbf{z}(t)\}_{t=1,\ldots,T}$\\
228: \textbf{Optimization}\footnote{Let $\G^1, \ldots, \G^M$ denote the indices of the $1^{st}, \ldots , M^{th}$\\
229: subspaces, i.e., $\G^m:=\{(m-1)d+1,\ldots,md\}$, and\\
230: permutation matrix $\vec{P}_{pq}$ exchanges coordinates $p$ and $q$.}\\
231: \verb| |\textbf{\KICA}: on whitened observation $\vec{z}$, \\
232: \verb| |$\Rightarrow$ $\hat{\vec{s}}_{\text{\KICA}}$ estimation\\
233: \verb| |\textbf{Permutation search}\\
234: \verb| |$\vec{P}:=\vec{I}_D$\\
235: \verb| |repeat\\
236: \verb| |sequentially for $\forall p\in\G^{m_1},q\in\G^{m_2}$\\
237: \verb| |$(m_1\ne m_2):$\\
238: \verb| |if $Q(\F,T,\vec{P}_{pq}\vec{P})<Q(\F,T,\vec{P})$\\
239: \verb| |$\vec{P}:=\vec{P}_{pq}\vec{P}$\\
240: \verb| |end\\
241: \verb| |until $Q(\F,T,\cdot)$ decreases in the \emph{sweep} above\\
242: \textbf{Estimation}\\
243: \verb| |$\hat{\vec{s}}_{\text{\KISA}}=\vec{P}\hat{\vec{s}}_{\text{\KICA}}$\\
244: \hline
245: \end{tabular}
246: \end{minipage}
247: \end{table}
248:
249: \section{Illustrations}\label{sec:illustrations}
250: The \KISA identification algorithm of Section~\ref{subsec:KISA-alg} is illustrated below (due to the lack of space,
251: illustrations are provided for $\K=\R$ only). Test cases are introduced in Section~\ref{subsec:databases}. The quality
252: of the solutions will be measured by the normalized Amari-distance (Section~\ref{subsec:amaridist}). Numerical results
253: are provided in Section~\ref{subsec:simulations}.
254:
255: \subsection{Databases}\label{subsec:databases}
256: Two databases were defined to study our algorithm. The databases are illustrated in Fig.~\ref{fig:database-Aw} and
257: Fig.~\ref{fig:database-d-spherical}.
258:
259: \subsubsection{The $A\omega$ Database}
260: Here, hidden sources $\vec{s}^m$ are uniform distributions defined by the 2-dimensional images ($d=2$) of letters on
261: A-Z and $\alpha-\omega$. This is called database $A\omega$, which has 50 components ($M=50$), see
262: Fig.~\ref{fig:database-Aw} for an illustration. This test falls outside of the (known) validity domain of the \RISA
263: Separation Theorem.
264:
265: \begin{figure}%
266: \centering%
267: \includegraphics[width=3cm]{database_A-omega.eps}
268: \caption[]{Database $A\omega$. 100-dimensional task of 50 pieces of 2-dimensional components ($D=100$, $M=50$, \mbox{$d=2$}).
269: Hidden sources are uniformly distributed variables on the letters of the English and the Greek alphabets.}%
270: \label{fig:database-Aw}%
271: \end{figure}
272:
273: \subsubsection{The $d$-Spherical Database}
274: Here, hidden sources $\vec{s}^m$ are spherically symmetric random variables that have representation of the form
275: \mbox{$\vec{v}\stackrel{\mathrm{distr}}{=}\rho\vec{u}^{(d)}$}, where $\vec{u}^{(d)}$ is uniformly distributed on the $d$-dimensional unit
276: sphere, and $\rho$ is a non-negative scalar random variable independent of $\vec{u}^{(d)}$ ($\stackrel{\mathrm{distr}}{=}$ denotes equality
277: in distribution). This \emph{$d$-spherical} database: (i) can be scaled in dimension $d$, (ii) satisfies conditions of the \RISA Separation
278: Theorem, and (iii) can be defined by $\rho$. (See \citep{fang90symmetric,frahm04generalized} for spherical variables.) Our choices for
279: $\rho$ are shown in Fig.~\ref{fig:database-d-spherical}.
280:
281: \begin{figure}%
282: \centering%
283: \subfloat[][]{\includegraphics[trim=100 27 70 40,scale=0.17]{database_d-spherical_1.eps}}\hfill%
284: \subfloat[][]{\includegraphics[trim=85 30 67 40,scale=0.15]{database_d-spherical_2.eps}}\hfill%
285: \subfloat[][]{\includegraphics[trim=90 30 75 40,scale=0.16]{database_d-spherical_3.eps}}\hfill%
286: \caption[]{Database $d$-spherical. Stochastic representation of the 3 ($M=3$) hidden sources. (a): $\rho$ is
287: uniform on $[0,1]$, (b): $\rho$ is exponential with parameter $\mu=1$, and (c): $\rho$ is lognormal with parameters $\mu=0, \sigma=1$, respectively.}%
288: \label{fig:database-d-spherical}%
289: \end{figure}
290:
291:
292: \subsection{Normalized Amari-distance}\label{subsec:amaridist}
293: The precision of our algorithm was measured by the normalized Amari-distance as follows. The optimal estimation of the \RISA model provides
294: matrix \mbox{$\vec{B}:=\vec{W}\vec{A}\in\R^{D\times D}$}, a block-permutation matrix made of $d\times d$ sized blocks. Let us decompose
295: matrix $\vec{B}\in\R^{D\times D}$ into $d\times d$ blocks: \mbox{$\vec{B}=\left[\vec{B}^{i,j}\right]_{i,j=1,\ldots,M}$}. Let $b^{i,j}$
296: denote the sum of the absolute values of the elements of matrix \mbox{$\vec{B}^{i,j}\in\R^{d\times d}$}. Then the \emph{normalized} version
297: of the Amari-distance \citep{amari96new} (as it was introduced in \citep{szabo06cross} for \RISA) is defined as:
298: \begin{eqnarray}
299: r(\vec{B})&:=&\frac{1}{2M(M-1)}\left[\sum_{i=1}^M\left(\frac{
300: \sum_{j=1}^Mb^{ij}}{\max_jb^{ij}}-1\right)\right. + \nonumber \\
301: &&\qquad \qquad\quad\left.\sum_{j=1}^M\left(\frac{ \sum_{i=1}^Mb^{ij}}{\max_ib^{ij}}-1\right)\right].
302: \end{eqnarray}
303: For matrix $\vec{B}$ we have that $0\le r(\vec{B})\le 1$, and $r(\vec{B})=0$ if, and only if $\vec{B}$ is a block-permutation matrix with
304: $d\times d$ sized blocks. Thus, $r=0$ corresponds to perfect estimation (0\% error), $r=1$ is the worst estimation (100\% error). This
305: performance measure can be used for $\K=\C$, too.
306:
307: \subsection{Simulations}\label{subsec:simulations}
308: Results on databases \emph{$A\omega$} and \emph{$d$-spherical} are provided here. In our simulations, sample number of observations
309: $\vec{z}(t)$ changed: $1000 \le T \le 30000$. Mixing matrix $\vec{A}$ was chosen randomly from the orthogonal group. Manifold $\F$ was
310: $\F:=\{\vec{z}\mapsto \cos(\vec{z}), \vec{z}\mapsto \cos(2\vec{z})\}$ (functions operated on coordinates separately). Scaling properties of
311: the approximation were studied for database \mbox{$d$-spherical} by changing the value of $d$ between $20$ and $110$ [i.e., the number of
312: subspaces ($M$) was fixed, but the dimension of the subspaces was increased.] For each parameters [$T$ for database $A\omega$, $(T,d)$ for
313: database $d$-spherical] ten experiments were averaged. Qualities of the solutions were measured by the Amari-error (see
314: Section~\ref{subsec:amaridist}). We have chosen FastICA \citep{hyvarinen97fast} for the \RICA module (see Table~\ref{tab:KISA-pseudocode}).
315:
316: Precision of our method is shown: (i) for database $A\omega$ in Fig.~\ref{fig:demo:A-omega} as a function of sample
317: number, (ii) for database $d$-spherical in Fig.~\ref{fig:amari-vs-T:d-spherical} as a function of sample number and
318: source dimension ($d$) (for details, see Table~\ref{fig:amari-vs-T:d-spherical}). The figures demonstrate that the
319: algorithm was able to uncover the hidden components with high precision. In the case of database $d$-spherical the
320: Amari error decreases according to power law $r(T)\propto T^{-c}$ $(c>0)$.
321:
322: In our numerical simulations, the number of sweeps before the iteration of the permutation optimization stopped (see
323: Table~\ref{tab:KISA-pseudocode}) varied between 2 and 6.
324:
325: \begin{figure}%
326: \centering%
327: \subfloat[][]{\includegraphics[width=6.1cm]{amari_A-omega.eps}}%
328: \subfloat[][]{\includegraphics[width=2.3cm]{hat_A-omega.eps}}\\%
329: \caption[]{Estimations on database $A\omega$. (a) Amari-error as a function of the number of samples.
330: Average$\pm$deviation for $30000$ samples: $0.58\%\pm0.04$, (b) estimation with average error for $30000$ samples:
331: the hidden components are recovered up to permutation and orthogonal transformation (\RISA ambiguity).}%
332: \label{fig:demo:A-omega}%
333: \end{figure}
334:
335: \begin{figure}
336: \centering
337: \includegraphics[width=6.1cm]{amari_d-spherical.eps}
338: \caption{Estimations of database $d$-spherical: Amari-error as a function of the number of samples on loglog scale for
339: different dimensional ($d$) subspaces. Task dimension: $D$. Errors are approximately linear, so they scale according to
340: power law, like $r(T)\propto T^{-c}$ $(c>0)$. For numerical values, see Table~\ref{tab:amari-dists-d-spherical}.}
341: \label{fig:amari-vs-T:d-spherical}
342: \end{figure}
343:
344: \begin{table}
345: \centering
346: \caption{Amari-error for database $d$-spherical, for different $d$ values: average $\pm$ deviation. Number of samples: $T=30000$.}
347: \label{tab:amari-dists-d-spherical}
348: \begin{tabular}{|c|c|c|}
349: \hline
350: $d=20$ & $d=30$ &$d=40$\\
351: \hline
352: $1.40\%$ $(\pm 0.03)$ & $1.71\%$ $(\pm 0.03)$& $1.99\%$ $(\pm 0.03)$\\
353: \hline\hline
354: $d=50$ & $d=60$& $d=70$\\
355: \hline
356: $2.23\%$ $(\pm 0.03)$& $2.44\%$ $(\pm 0.03)$ & $2.65\%$ $(\pm 0.03)$\\
357: \hline\hline
358: $d=80$& $d=90$& $d=100$\\
359: \hline
360: $2.85\%$ $(\pm 0.03)$& $3.03\%$ $(\pm 0.04)$&$3.19\%$ $(\pm 0.02)$\\
361: \hline\hline
362: $d=110$&\multicolumn{2}{c}{}\\
363: \cline{1-1}
364: $3.37\%$ $(\pm 0.03)$&\multicolumn{2}{c}{}\\
365: \cline{1-1}
366: \end{tabular}
367: \end{table}
368:
369: \begin{thebibliography}{13}
370: \providecommand{\natexlab}[1]{#1} \providecommand{\url}[1]{\texttt{#1}} \expandafter\ifx\csname urlstyle\endcsname\relax
371: \providecommand{\doi}[1]{doi: #1}\else
372: \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
373:
374: \bibitem[Amari et~al.(1995)Amari, Cichocki, and Yang]{amari95recurrent}
375: S.~Amari, A.~Cichocki, and H.H. Yang.
376: \newblock Recurrent neural networks for blind separation of sources.
377: \newblock NOLTA'95, pp. 37-42, 1995.
378:
379: \bibitem[Amari et~al.(1996)Amari, Cichocki, and Yang]{amari96new}
380: S.~Amari, A.~Cichocki, and H.~Yang.
381: \newblock A new learning algorithm for blind signal separation.
382: \newblock \emph{Advances in Neural Information Processing Systems}, 1996.
383:
384: \bibitem[Bach and Jordan(2002)]{bach02kernel}
385: Francis~R. Bach and Michael~I. Jordan.
386: \newblock Kernel {ICA}.
387: \newblock \emph{JMLR}, 3:\penalty0 1--48, 2002.
388:
389: \bibitem[Eriksson and Koivunen(2006)]{eriksson06complex}
390: Jan Eriksson and Visa Koivunen.
391: \newblock Complex random vectors and {ICA} models: Identifiability, uniqueness
392: and separability.
393: \newblock \emph{{IEEE} {TIT}}, 52\penalty0 (3), 2006.
394:
395: \bibitem[Fang et~al.(1990)Fang, Kotz, and Ng]{fang90symmetric}
396: Kai-Tai Fang, Samuel Kotz, and Kai~Wang Ng.
397: \newblock \emph{Symmetric multivariate and related distributions}.
398: \newblock Chapman and Hall, 1990.
399:
400: \bibitem[Frahm(2004)]{frahm04generalized}
401: Gabriel Frahm.
402: \newblock \emph{Generalized elliptical distributions: Theory and applications}.
403: \newblock PhD thesis, University of Köln, 2004.
404:
405: \bibitem[Gretton et~al.(2003)Gretton, Herbrich, and Smola]{gretton03kernel}
406: A.~Gretton, R.~Herbrich, and A.~Smola.
407: \newblock The kernel mutual information.
408: \newblock In \emph{IEEE ICASSP}, volume~4, pages 880--883, 2003.
409:
410: \bibitem[Hyv{\"a}rinen and Oja(1997)]{hyvarinen97fast}
411: A.~Hyv{\"a}rinen and E.~Oja.
412: \newblock A fast fixed-point algorithm for independent component analysis.
413: \newblock \emph{Neural Computation}, 9\penalty0 (7):\penalty0 1483--1492, 1997.
414:
415: \bibitem[Meyer-B{\"a}se et~al.(2006)Meyer-B{\"a}se, Gruber, Theis, and
416: Foo]{base06blind}
417: Anke Meyer-B{\"a}se, Peter Gruber, Fabian Theis, and Simon Foo.
418: \newblock Blind source separation based on self-organizing neural network.
419: \newblock \emph{Engng. Appl. of AI}, 19:\penalty0 305--311, 2006.
420:
421: \bibitem[P\'oczos and L\H{o}rincz(2006)]{poczos06noncombinatorial}
422: B.~P\'oczos and A.~L\H{o}rincz.
423: \newblock Non-combinatorial estimation of independent {AR} sources.
424: \newblock \emph{Neurocomp.}, 2006.
425: \newblock accepted.
426:
427: \bibitem[Szab\'o et~al.(2006{\natexlab{a}})Szab\'o, P\'oczos, and
428: L\H{o}rincz]{szabo06cross}
429: Z.~Szab\'o, B.~P\'oczos, and A.~L\H{o}rincz.
430: \newblock Cross-entropy optimization for independent process analysis.
431: \newblock In \emph{Proc. of {ICA}}, LNCS 3889, pages 909--916. Springer-Verlag,
432: 2006{\natexlab{a}}.
433:
434: \bibitem[Szab\'o et~al.(2006{\natexlab{b}})Szab\'o, P\'oczos, and L{\H
435: o}rincz]{szabo06separation}
436: Z.~Szab\'o, B.~P\'oczos, and A.~L{\H o}rincz.
437: \newblock Separation theorem for $\mathbb{K}$-independent subspace analysis
438: with sufficient conditions.
439: \newblock Technical report, E\"otv\"os Lor\'and University, Budapest,
440: 2006{\natexlab{b}}.
441: \newblock http://arxiv.org/abs/math.ST/0608100.
442:
443: \bibitem[Theis(2004)]{theis04uniqueness1}
444: F.J. Theis.
445: \newblock Uniqueness of complex and multidimensional independent component
446: analysis.
447: \newblock \emph{Signal Processing}, 84\penalty0 (5):\penalty0 951--956, 2004.
448: \end{thebibliography}
449:
450: \end{document}
451: