1: \begin{abstract}
2: A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment.
3: The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation.
4: In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off.
5: In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient.
6: Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level.
7: The results show that the expected utility is robust against attackers of a large range of persistence and intelligence.
8: Finally, we apply reinforcement learning to the SMDP to solve the \textit{curse of modeling}.
9: Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.
10:
11: \keywords{Reinforcement Learning \and Semi-Markov Decision Processes \and Active Defense \and Honeynet \and Risk Quantification}
12: \end{abstract}
13: