5d793b49b28e037e.tex
1: \begin{abstract}
2:   In many applications, such as classification of images or videos, it
3:   is of interest to develop a framework for tensor data instead of an
4:   ad-hoc way of transforming data to vectors due to the computational
5:   and under-sampling issues. In this paper, we study convergence and
6:   statistical properties of two-dimensional canonical correlation
7:   analysis \citep{Lee2007Two} under an assumption that data come from
8:   a probabilistic model. We show that carefully initialized the power method
9:   converges to the optimum and provide a finite sample bound. Then
10:   we extend this framework to tensor-valued data and propose the
11:   higher-order power method, which is commonly used in tensor
12:   decomposition, to extract the canonical directions. Our method can
13:   be used effectively in a large-scale data setting by solving the
14:   inner least squares problem with a stochastic gradient descent, and
15:   we justify convergence via the theory of Lojasiewicz's inequalities
16:   without any assumption on data generating process and initialization. For practical
17:   applications, we further develop (a) an inexact updating scheme
18:   which allows us to use the state-of-the-art stochastic gradient
19:   descent algorithm, (b) an effective initialization scheme which
20:   alleviates the problem of local optimum in non-convex optimization,
21:   and (c) a deflation procedure for extracting several canonical
22:   components. Empirical analyses on challenging data including gene
23:   expression and air pollution indexes in Taiwan, show the
24:   effectiveness and efficiency of the proposed methodology. Our
25:   results fill a missing, but crucial, part in the literature on
26:   tensor data.
27: \end{abstract}
28: