abstract:598309c8110e2119.tex

1: \begin{abstract}

2:

3: The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning.

4: Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework.

5: However, classical algorithms for these linear methods are computationally infeasible for large-scale data.

6: Extensions to Deep CCA show great promise, but current training procedures are slow and complicated.

7: First we propose a novel unconstrained objective that characterizes the top subspace of GEPs.

8: Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives.

9: These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks.

10: This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants.

11: Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.

12: \end{abstract}

13: