774eed26d60ad8c8.tex
1: \begin{abstract}
2: This paper considers a resource allocation problem where several Internet-of-Things (IoT) devices send data to a base station (BS)
3: with or without the help of the reconfigurable intelligent surface (RIS) assisted cellular network.
4: The objective is to maximize the sum rate of all IoT devices by finding the optimal RIS and spreading factor (SF) for each device.
5: Since these IoT devices lack prior information on the RISs or the channel state information (CSI),
6: a distributed resource allocation framework with low complexity and learning features is required to achieve this goal.
7: Therefore, we model this problem as a two-stage multi-player multi-armed bandit (MPMAB) framework to learn the optimal RIS and SF sequentially.
8: Then, we put forth an exploration and exploitation boosting (E2Boost) algorithm to solve this two-stage MPMAB problem by combining the $\epsilon$-greedy algorithm, Thompson sampling (TS) algorithm, and non-cooperation game method.
9: We derive an upper regret bound for the proposed algorithm, i.e., $\mathcal{O}(\log^{1+\delta}_2 T)$, increasing logarithmically with the time horizon $T$.
10: Numerical results show that the E2Boost algorithm has the best performance among the existing methods and exhibits a fast convergence rate.
11: More importantly, the proposed algorithm is not sensitive to the number of combinations of the RISs and SFs thanks to the two-stage allocation mechanism,
12: which can benefit high-density networks.
13: \end{abstract}
14: