1: \begin{abstract}
2: Maintaining the freshness of information in the Internet of Things
3: (IoT) is a critical yet challenging problem. In this paper, we study
4: cooperative data collection using multiple Unmanned Aerial Vehicles
5: (UAVs) with the objective of minimizing the total average Age of Information
6: (AoI). We consider various constraints of the UAVs, including kinematic,
7: energy, trajectory, and collision avoidance, in order to optimize
8: the data collection process. Specifically, each UAV, which has limited
9: on-board energy, takes off from its initial location and flies over
10: sensor nodes to collect update packets in cooperation with the other
11: UAVs. The UAVs must land at their final destinations with non-negative
12: residual energy after the specified time duration to ensure they have
13: enough energy to complete their missions. It is crucial to design
14: the trajectories of the UAVs and the transmission scheduling of the
15: sensor nodes to enhance information freshness. We model the multi-UAV
16: data collection problem as a Decentralized Partially Observable Markov
17: Decision Process (Dec-POMDP), as each UAV is unaware of the dynamics
18: of the environment and can only observe a part of the sensors. To
19: address the challenges of this problem, we propose a multi-agent \textcolor{black}{Deep
20: Reinforcement Learning (DRL)-}based algorithm with centralized learning
21: and decentralized execution. In addition to the reward shaping, we
22: use action masks to filter out invalid actions and ensure that the
23: constraints are met. \textcolor{black}{Simulation results demonstrate
24: that the proposed algorithms can significantly reduce the total average
25: AoI compared to the baseline algorithms, and the use of the action
26: mask method can improve the convergence speed of the proposed algorithm.}
27: \end{abstract}
28: