545b83f162acbb0b.tex
1: \begin{abstract}
2: %We discuss the trade-offs of different optimal transport metrics for their use as learning losses.
3: %The emerging use can be attributed to the impressive breakthroughs in speed achieved by Cuturi's relaxed transport metric named Sinkhorn Distance.
4: %We focus on one dimensional distributions (\eg~Power Spectral Densities), in which the optimal Earth Mover's Distance ($\EMD$) can be efficiently calculated.
5: %We derive a closed-form solution for its gradient that allows a non-iterative calculation.
6: %However, we reveal convergence issues for gradient descent learning, which are confirmed in synthetic tests.
7: %We counter this by suggesting a relaxed form $\EMD^\rho$ with equivalent complexity that converges faster.
8: %We also show how the $\EMD$ gradient affects the entire output space that provides considerable advantages over non-transport metrics (\eg~Mean Squared Error).
9: %For problems with smooth output spaces it provides a significant boost in convergence speed as we demonstrate on a polysomnography data set.
10: %In this case, the model converges within the first quarter of the epoch, demonstrating that generalization is achieved using little data.
11: %\end{abstract}
12: