abstract:21f30f9812985ae1.tex

1: \begin{abstract}

2:

3: A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees.

4: \flipit\ is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in \flipit, but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the \flipit\ game as a Markov Decision Process and introduce \Q{}, an adaptive strategy for \flipit{} based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of \Q{} against specific opponents. \Q{} converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces.  Finally, we introduce a generalized \Q{} strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent's strategy. We also release an OpenAI Gym environment for \flipit{} to facilitate future research.

5:

6: %On average, our strategy performs better than existing adaptive \flipit\ strategies when playing against a Periodic opponent. We also found that our Reinforcement Learning strategy converges to optimal against certain randomized strategies such as Exponential.

7:

8: %We demonstrate one of the first applications of Reinforcement Learning to security, by

9:

10:

11: \keywords{Security games  \and \flipit{} \and Reinforcement learning \and Adaptive strategies \and Markov Decision Processes \and Online learning.}

12: \end{abstract}