abstract:7fe1fb05e2185092.tex

1: \begin{abstract}

2: Many loss functions in representation learning are invariant under a continuous symmetry transformation.

3: For example, the loss function of word embeddings~\citep{mikolov_distributed_2013} remains unchanged if we simultaneously rotate all word and context embedding vectors.

4: We show that representation learning models for time series possess an approximate continuous symmetry that leads to slow convergence of gradient descent.

5: We propose a new optimization algorithm that speeds up convergence using ideas from gauge theory in physics.

6: Our algorithm leads to orders of magnitude faster convergence and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models.

7: We further present an example application of our proposed algorithm that translates modern words into their historic equivalents.

8: \end{abstract}

9: