749b34b10d45d818.tex
1: \begin{abstract}
2: Let $V_* : \mathbb{R}^d \to \R$ be some (possibly non-convex) potential function, and consider the probability measure $\pi \propto e^{-V_*}$. 
3: When $\pi$ exhibits multiple modes, it is known that sampling techniques based on Wasserstein gradient flows of the Kullback-Leibler (KL) divergence (e.g. Langevin Monte Carlo) suffer poorly in the rate of convergence, where the dynamics are unable to easily traverse between modes. 
4: In stark contrast, the work of \cite{lu2019accelerating,lu2022birth}
5: has shown that the gradient flow of the KL with respect to the Fisher-Rao (FR) geometry exhibits a convergence rate to $\pi$ is that \textit{independent} of the potential function. 
6: In this short note, we complement these existing results in the literature by providing an explicit expansion of $\text{KL}(\rho_t^{\text{FR}}\|{\pi})$ in terms of $e^{-t}$, where $(\rho_t^{\text{FR}})_{t\geq 0}$ is the FR gradient flow of the KL divergence.
7: In turn, we are able to provide a clean asymptotic convergence rate, where the burn-in time is guaranteed to be finite.
8: Our proof is based on observing a similarity between FR gradient flows and simulated annealing with linear scaling, and facts about cumulant generating functions. 
9: We conclude with simple synthetic experiments that demonstrate our theoretical findings are indeed tight.
10: Based on our numerics, we conjecture that the asymptotic rates of convergence for Wasserstein-Fisher-Rao gradient flows are possibly related to this expansion in some cases.
11: \end{abstract}
12: