598309c8110e2119.tex
1: \begin{abstract}
2: 
3: The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning.
4: Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework.
5: However, classical algorithms for these linear methods are computationally infeasible for large-scale data.
6: Extensions to Deep CCA show great promise, but current training procedures are slow and complicated.
7: First we propose a novel unconstrained objective that characterizes the top subspace of GEPs.
8: Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives.
9: These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks.
10: This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants.
11: Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.
12: \end{abstract}
13: