1: \begin{abstract}
2: In this paper, a lifelong learning problem is studied for an Internet of Things (IoT) system.
3: In the considered model, each IoT device aims to balance its information freshness and energy consumption tradeoff by controlling its computational resource allocation at each time slot under dynamic environments.
4: An unmanned aerial vehicle (UAV) is deployed as a flying base station so as to enable the IoT devices to adapt to novel environments.
5: To this end, a new lifelong reinforcement learning algorithm, used by the UAV, is proposed in order to adapt the operation of the devices at each visit by the UAV.
6: By using the experience from previously visited devices and environments, the UAV can help devices adapt faster to future states of their environment.
7: To do so, a knowledge base shared by all devices is maintained at the UAV. Simulation results show that the proposed algorithm can converge $25\%$ to $50\%$ faster than a policy gradient baseline algorithm that optimizes each device's decision making problem in isolation.
8:
9: %AoI and energy in isolation.
10: %yield a $25\%\sim50\%$ improvement in the convergence speed, compared to policy gradient baseline algorithm that optimizes each device's AoI and energy in isolation.
11: %improve the convergence speed by 25%-50%,
12: \end{abstract}
13: