db3ea01559033268.tex
1: \begin{abstract}
2: \boldmath
3:  {In this paper, the problem of data pre-storing and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. %, and use ground base stations (BSs) to relay data to each user in the clusters using integrated access and backhaul (IAB) operations for reduced transmission delays. 
4:  A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storing and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D$^2$-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure.  
5: %Analytical results show that the proposed D$^2$-RMRL algorithm is guaranteed to converge to an optimal solution. 
6: Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a $31.8\%$ improvement of the pre-store hits and a $40.7\%$ improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to $43.7\%$.
7: }
8: % 
9:  \end{abstract}
10: