1: \begin{abstract}
2: \label{abstract}
3: We present a machine learning framework for modeling protein dynamics. Our
4: approach uses $L_1$-regularized, reversible hidden Markov models to
5: understand large protein datasets generated via molecular dynamics
6: simulations. Our model is motivated by three design principles: (1) the
7: requirement of massive scalability; (2) the need to adhere to relevant
8: physical law; and (3) the necessity of providing accessible
9: interpretations, critical for both cellular biology and rational drug
10: design. We present an EM algorithm for learning and introduce a model
11: selection criteria based on the physical notion of convergence in
12: relaxation timescales. We contrast our model with standard methods in
13: biophysics and demonstrate improved robustness. We implement our algorithm
14: on GPUs and apply the method to two large protein simulation datasets
15: generated respectively on the NCSA Bluewaters supercomputer and the
16: Folding@Home distributed computing network. Our analysis identifies the
17: conformational dynamics of the ubiquitin protein critical to cellular
18: signaling, and elucidates the stepwise activation mechanism of the c-Src
19: kinase protein.
20: \end{abstract}
21: