1: \begin{abstract}
2: The Hidden Markov Model (HMM) is one of the mainstays of statistical
3: modeling of discrete time series, with applications including speech
4: recognition, computational biology, computer vision and econometrics.
5: Estimating an HMM from its observation process is often addressed via
6: the Baum-Welch algorithm, which is known to be susceptible to local
7: optima. In this paper, we first give a general characterization of
8: the basin of attraction associated with any global optimum of the
9: population likelihood. By exploiting this characterization, we
10: provide non-asymptotic finite sample guarantees on the Baum-Welch
11: updates, guaranteeing geometric convergence to a small ball of radius
12: on the order of the minimax rate around a global optimum. As a
13: concrete example, we prove a linear rate of convergence for a hidden
14: Markov mixture of two isotropic Gaussians given a suitable mean
15: separation and an initialization within a ball of large radius around
16: (one of) the true parameters. To our knowledge, these are the first
17: rigorous local convergence guarantees to global optima for the Baum-Welch algorithm in
18: a setting where the likelihood function is nonconvex. We complement
19: our theoretical results with thorough numerical simulations studying
20: the convergence of the Baum-Welch algorithm and illustrating the
21: accuracy of our predictions.
22: \end{abstract}
23: