40f197f84426202a.tex
1: \begin{abstract}
2: \begin{quote}
3:   We shed new insights on the two commonly used updates for
4:     the online $k$-PCA problem, namely, Krasulina's and
5:     Oja's updates. We show that Krasulina's update
6:     corresponds to a projected gradient descent step on the
7:     Stiefel manifold of the orthonormal $k$-frames,
8:     while Oja's update amounts to a gradient descent step using the unprojected gradient.
9:     Following these observations, we derive a more
10:     \emph{implicit} form of Krasulina's $k$-PCA
11:     update, i.e. a version that uses the information of the
12:     future gradient as much as possible.  Most
13:     interestingly, our implicit Krasulina
14:     update avoids the costly QR-decomposition step
15:     by bypassing the orthonormality constraint. We show
16:     that the new update in fact corresponds to an online EM
17:     step applied to a probabilistic $k$-PCA model. The probabilistic view of the
18:     updates allows us to combine multiple models in a
19:     distributed setting. We show experimentally
20:     that the implicit Krasulina update yields
21:     superior convergence while being significantly faster.
22:     We also give strong evidence that the new update
23:     can benefit from parallelism and is more stable
24:     w.r.t. tuning of the learning rate.
25: \end{quote}
26: \end{abstract}
27: