1: \begin{abstract}
2: In many applications, such as classification of images or videos, it
3: is of interest to develop a framework for tensor data instead of an
4: ad-hoc way of transforming data to vectors due to the computational
5: and under-sampling issues. In this paper, we study convergence and
6: statistical properties of two-dimensional canonical correlation
7: analysis \citep{Lee2007Two} under an assumption that data come from
8: a probabilistic model. We show that carefully initialized the power method
9: converges to the optimum and provide a finite sample bound. Then
10: we extend this framework to tensor-valued data and propose the
11: higher-order power method, which is commonly used in tensor
12: decomposition, to extract the canonical directions. Our method can
13: be used effectively in a large-scale data setting by solving the
14: inner least squares problem with a stochastic gradient descent, and
15: we justify convergence via the theory of Lojasiewicz's inequalities
16: without any assumption on data generating process and initialization. For practical
17: applications, we further develop (a) an inexact updating scheme
18: which allows us to use the state-of-the-art stochastic gradient
19: descent algorithm, (b) an effective initialization scheme which
20: alleviates the problem of local optimum in non-convex optimization,
21: and (c) a deflation procedure for extracting several canonical
22: components. Empirical analyses on challenging data including gene
23: expression and air pollution indexes in Taiwan, show the
24: effectiveness and efficiency of the proposed methodology. Our
25: results fill a missing, but crucial, part in the literature on
26: tensor data.
27: \end{abstract}
28: