1: \begin{abstract}
2: Many loss functions in representation learning are invariant under a continuous symmetry transformation.
3: For example, the loss function of word embeddings~\citep{mikolov_distributed_2013} remains unchanged if we simultaneously rotate all word and context embedding vectors.
4: We show that representation learning models for time series possess an approximate continuous symmetry that leads to slow convergence of gradient descent.
5: We propose a new optimization algorithm that speeds up convergence using ideas from gauge theory in physics.
6: Our algorithm leads to orders of magnitude faster convergence and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models.
7: We further present an example application of our proposed algorithm that translates modern words into their historic equivalents.
8: \end{abstract}
9: