1: \begin{abstract}
2: This paper proposes an online decentralized channel allocation scheme based on contextual multi-armed bandit (CMAB) learning in a densely deployed wireless local area network (WLAN) environment.
3: The communication quality in WLANs is significantly affected by the carrier sense relationship between access points (APs) and the channel.
4: To the best of our knowledge, conventional MAB-based channel allocation schemes dedicated to WLANs do not use any information other than the observed reward for learning.
5: Therefore, we aim to validate the effectiveness of prior information (i.e., the channels of neighboring APs) in improving the system throughput.
6: To this end, we propose contention-driven feature extraction (CDFE), which leverages the channels of neighboring APs to extract features corresponding to the adjacency relation of the contention graph.
7: CMAB learning with CDFE enables each AP to distinguish the impact of neighboring APs on the observed throughput.
8: Furthermore, we address the problem of non-convergence---the channel allocation cycle---which is an inherent difficulty in selfish decentralized learning.
9: To prevent such a cycle, we propose a penalized JointLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward of exploiting a different channel before and after the learning round.
10: To incorporate the effect of this discounted reward into a linear model, we also add a penalty term to the feature vector.
11: A numerical evaluation indicates that CMAB-based channel allocation using CDFE improves the system throughput.
12: Moreover, the proposed P-JLinUCB significantly reduces the number of channel adjustments, i.e., prevents cycles, without degrading the system throughput.
13: \end{abstract}
14: