1: \begin{abstract}
2: Regularized optimal transport (OT) is now increasingly used
3: as a loss or as a matching layer in neural networks.
4: Entropy-regularized OT can be computed using the Sinkhorn
5: algorithm but it leads to fully-dense transportation plans,
6: meaning that all sources are (fractionally) matched with all targets.
7: To address this issue,
8: several works have investigated quadratic regularization instead.
9: This regularization preserves
10: sparsity and leads to unconstrained and smooth (semi) dual objectives,
11: that can be solved with off-the-shelf gradient methods.
12: Unfortunately, quadratic regularization does not give direct control over
13: the cardinality (number of nonzeros) of the transportation plan.
14: %
15: We propose in this paper a new approach for OT
16: with explicit cardinality constraints on the transportation plan.
17: Our work is motivated by an application to sparse mixture of experts,
18: where OT can be used to match input tokens such as image patches
19: with expert models such as neural networks. Cardinality constraints ensure
20: that at most $k$ tokens are matched with an expert, which is crucial
21: for computational performance reasons.
22: Despite the nonconvexity of cardinality constraints, we show that the
23: corresponding (semi) dual problems are tractable and can be solved with
24: first-order gradient methods. Our method can be thought as a middle ground
25: between unregularized OT (recovered when $k$ is small enough)
26: and quadratically-regularized OT (recovered when $k$ is large
27: enough). The smoothness of the objectives increases as $k$
28: increases, giving rise to a trade-off
29: between convergence speed and sparsity of the optimal plan.
30: \end{abstract}
31: