abstract:c2e65e5ca207ef9c.tex

1: \begin{abstract}

2: In edge computing, users' service profiles must be migrated in response to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so.

3: Nevertheless, these frameworks do not consider occasional server failures, which although rare,

4: can prevent the smooth and safe functioning of edge computing users' latency sensitive applications such as autonomous driving and real-time obstacle detection, because users' computing jobs can no longer be completed.

5: As these failures occur at a low probability, it is difficult for RL algorithms, which are inherently data-driven, to learn an optimal service migration solution for both the typical and rare event scenarios.

6: Therefore, we introduce a rare events adaptive resilience framework FIRE,

7: which integrates importance sampling  into reinforcement learning to place backup services. We sample rare events at a rate proportional to their contribution to the value function,

8: to learn an optimal policy. Our framework balances service migration trade-offs between delay and migration costs, with the costs of failure and the costs of backup placement and migration.

9: We propose an importance sampling based Q-learning algorithm, and prove its boundedness and convergence to optimality. Following which we propose novel eligibility traces, linear function approximation and deep Q-learning versions of our algorithm to ensure it scales to real-world scenarios. We extend our framework to cater to users with different risk tolerances towards failure.

10: Finally, we use trace driven experiments to show that our algorithm gives cost reductions in the event of failures.

11: \end{abstract}

12: