21f30f9812985ae1.tex
1: \begin{abstract}
2: 
3: A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees.
4: \flipit\ is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in \flipit, but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the \flipit\ game as a Markov Decision Process and introduce \Q{}, an adaptive strategy for \flipit{} based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of \Q{} against specific opponents. \Q{} converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces.  Finally, we introduce a generalized \Q{} strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent's strategy. We also release an OpenAI Gym environment for \flipit{} to facilitate future research.
5: 
6: %On average, our strategy performs better than existing adaptive \flipit\ strategies when playing against a Periodic opponent. We also found that our Reinforcement Learning strategy converges to optimal against certain randomized strategies such as Exponential.
7: 
8: %We demonstrate one of the first applications of Reinforcement Learning to security, by
9: 
10: 
11: \keywords{Security games  \and \flipit{} \and Reinforcement learning \and Adaptive strategies \and Markov Decision Processes \and Online learning.}
12: \end{abstract}