1: \begin{abstract}
2: The Straight-Through (ST) estimator is a widely used technique for back-propagating gradients through discrete random variables.
3: %, in part due to its simplicity and effectiveness.
4: However, this effective method lacks theoretical justification.
5: %, which is demonstrably a biased estimator,
6: In this paper, we show that ST can be interpreted as the simulation of the projected Wasserstein gradient flow (pWGF).
7: Based upon this understanding, a theoretical foundation is established to justify the convergence properties of ST.
8: Further, another pWGF estimator variant is proposed, which exhibits superior performance on distributions with infinite support, \emph{e.g.}, Poisson distributions.
9: % we show the convergence of ST, and propose other variants that could converge faster.
10: % not only for commonly used distribution like Bernoulli and Categorical distributions, but also for distributions with infinite support such as Poisson distributions.
11: Empirically, we show that ST and our proposed estimator, while applied to different types of discrete structures (including both Bernoulli and Poisson latent variables), exhibit comparable or even better performances relative to other state-of-the-art methods.
12: Our results uncover the origin of the widespread adoption of ST estimator, and represent a helpful step towards exploring alternative gradient estimators for discrete variables.
13:
14: %binary latent variable models show that our approach is comparable in performance to other state-of-the-art methods.
15: \end{abstract}
16: