1: \begin{abstract}
2: In this paper, a Deep Q-Network (DQN)
3: based multi-agent multi-user power allocation algorithm is proposed for hybrid networks composed of radio frequency (RF) and visible
4: light communication (VLC) access points (APs). The users are
5: capable of multihoming, which can bridge RF and VLC links for
6: accommodating their bandwidth requirements. By leveraging a
7: non-cooperative multi-agent DQN algorithm, where each AP is
8: an agent, an online power allocation strategy is developed to
9: optimize the transmit power for providing users' required data
10: rate. Our simulation results demonstrate that DQN's median
11: convergence time training is $90\%$ shorter than the Q-Learning
12: (QL) based algorithm. The DQN-based algorithm converges to
13: the desired user rate in half duration on average while converging
14: with the rate of $96.1\%$ compared to the QL-based algorithm's
15: convergence rate of $72.3\%.$ Additionally, thanks to its continuous
16: state-space definition, the DQN-based power allocation algorithm
17: provides average user data rates closer to the target rates than
18: the QL-based algorithm when it converges.
19: \end{abstract}
20: