abstract:b90fe06ce6fe96ad.tex

1: \begin{abstract}

2: A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment.

3: The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation.

4: In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off.

5: In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient.

6: Numerical results have demonstrated that our adaptive engagement policies can quickly attract  attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level.

7:  The results show that the expected utility is robust against attackers of a large range of persistence and intelligence.

8: Finally, we apply reinforcement learning to the SMDP to solve the \textit{curse of modeling}.

9: Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

10:

11: \keywords{Reinforcement Learning \and Semi-Markov Decision Processes \and Active Defense \and Honeynet  \and Risk Quantification}

12: \end{abstract}

13: