abstract:e7e99aebadbbdfd7.tex

1: \begin{abstract}

2: \label{abstract}

3: We present a machine learning framework for modeling protein dynamics. Our

4: approach uses $L_1$-regularized, reversible hidden Markov models to

5: understand large protein datasets generated via molecular dynamics

6: simulations. Our model is motivated by three design principles: (1) the

7: requirement of massive scalability; (2) the need to adhere to relevant

8: physical law; and (3) the necessity of providing accessible

9: interpretations, critical for both cellular biology and rational drug

10: design. We present an EM algorithm for learning and introduce a model

11: selection criteria based on the physical notion of convergence in

12: relaxation timescales. We contrast our model with standard methods in

13: biophysics and demonstrate improved robustness. We implement our algorithm

14: on GPUs and apply the method to two large protein simulation datasets

15: generated respectively on the NCSA Bluewaters supercomputer and the

16: Folding@Home distributed computing network. Our analysis identifies the

17: conformational dynamics of the ubiquitin protein critical to cellular

18: signaling, and elucidates the stepwise activation mechanism of the c-Src

19: kinase protein.

20: \end{abstract}

21: