80a07271bac0e59f.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima.
3: On the other hand, Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and weak stochastic force, which in general slow down its convergence. This paper shows that these two algorithms %\blue{and their non-swapping variants} 
4: can ``collaborate" through a simple exchange mechanism, in which they swap their current positions if LD yields a lower objective function. 
5: This idea can be seen as the singular limit of the replica-exchange technique from the sampling literature. We show that this new algorithm converges to the global minimum linearly with high probability, assuming the objective function is strongly convex in a neighborhood  of the unique global minimum. By replacing gradients with stochastic gradients, and adding a proper threshold to the exchange mechanism, our algorithm can also be used in online settings. We also study non-swapping variants of the algorithm, which achieve similar performance.
6: We further verify our theoretical results through some numerical experiments, and observe superior performance of the proposed algorithm over running GD or LD alone.\end{abstract}
7: