ed2b8408911d11b1.tex
1: \begin{abstract}
2: Deep reinforcement learning (DRL) has been shown to be successful in many application domains.
3: %
4: Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. 
5: %
6: However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence.
7: %
8: In many targeted applications, such as those used in the fifth generation (5G) cellular communication, the environment is highly dynamic while the available training data is very limited. 
9: %
10: Therefore, it is extremely important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment requiring limited training overhead. 
11: %
12: In this paper, we introduce the deep echo state Q-network (DEQN) that can adapt to the highly dynamic environment in a short period of time with limited training data. 
13: %
14: We evaluate the performance of the introduced DEQN method under the dynamic spectrum sharing (DSS) scenario, which is a promising technology in 5G and future 6G networks to increase the spectrum utilization. 
15: %
16: Compared to conventional spectrum management policy that grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to share the spectrum with the primary system.
17: %
18: Our work sheds light on the application of an efficient DRL framework in highly dynamic environments with limited available training data.
19: \end{abstract}
20: