abstract:5ed04d5e8bb15632.tex

1: \begin{abstract}

2: Regularized optimal transport (OT) is now increasingly used

3: as a loss or as a matching layer in neural networks.

4: Entropy-regularized OT can be computed using the Sinkhorn

5: algorithm but it leads to fully-dense transportation plans,

6: meaning that all sources are (fractionally) matched with all targets.

7: To address this issue,

8: several works have investigated quadratic regularization instead.

9: This regularization preserves

10: sparsity and leads to unconstrained and smooth (semi) dual objectives,

11: that can be solved with off-the-shelf gradient methods.

12: Unfortunately, quadratic regularization does not give direct control over

13: the cardinality (number of nonzeros) of the transportation plan.

14: %

15: We propose in this paper a new approach for OT

16: with explicit cardinality constraints on the transportation plan.

17: Our work is motivated by an application to sparse mixture of experts,

18: where OT can be used to match input tokens such as image patches

19: with expert models such as neural networks. Cardinality constraints ensure

20: that at most $k$ tokens are matched with an expert, which is crucial

21: for computational performance reasons.

22: Despite the nonconvexity of cardinality constraints, we show that the

23: corresponding (semi) dual problems are tractable and can be solved with

24: first-order gradient methods. Our method can be thought as a middle ground

25: between unregularized OT (recovered when $k$ is small enough)

26: and quadratically-regularized OT (recovered when $k$ is large

27: enough). The smoothness of the objectives increases as $k$

28: increases, giving rise to a trade-off

29: between convergence speed and sparsity of the optimal plan.

30: \end{abstract}

31: