8600b8a9d0e3fcee.tex
1: \begin{abstract}
2:   This paper proposes an online decentralized channel allocation scheme based on contextual multi-armed bandit (CMAB) learning in a densely deployed wireless local area network (WLAN) environment. 
3:   The communication quality in WLANs is significantly affected by the carrier sense relationship between access points (APs) and the channel.
4:   To the best of our knowledge, conventional MAB-based channel allocation schemes dedicated to WLANs do not use any information other than the observed reward for learning. 
5:   Therefore, we aim to validate the effectiveness of prior information (i.e., the channels of neighboring APs) in improving the system throughput.
6:   To this end, we propose contention-driven feature extraction (CDFE), which leverages the channels of neighboring APs to extract features corresponding to the adjacency relation of the contention graph.
7:   CMAB learning with CDFE enables each AP to distinguish the impact of neighboring APs on the observed throughput.
8:   Furthermore, we address the problem of non-convergence---the channel allocation cycle---which is an inherent difficulty in selfish decentralized learning.
9:   To prevent such a cycle, we propose a penalized JointLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward of exploiting a different channel before and after the learning round.
10:   To incorporate the effect of this discounted reward into a linear model, we also add a penalty term to the feature vector.
11:   A numerical evaluation indicates that CMAB-based channel allocation using CDFE improves the system throughput.
12:   Moreover, the proposed P-JLinUCB significantly reduces the number of channel adjustments, i.e., prevents cycles, without degrading the system throughput.
13: \end{abstract}
14: