a8d463d1da0aa7c7.tex
1: \begin{abstract}
2: Resource allocation is an important issue in cognitive radio
3: systems. It can be done by carrying out negotiation among secondary
4: users. However, significant overhead may be incurred by the
5: negotiation since the negotiation needs to be done frequently due to
6: the rapid change of primary users' activity. In this paper, a
7: channel selection scheme without negotiation is considered for
8: multi-user and multi-channel cognitive radio systems. To avoid
9: collision incurred by non-coordination, each user secondary learns
10: how to select channels according to its experience. Multi-agent
11: reinforcement leaning (MARL) is applied in the framework of
12: Q-learning by considering the opponent secondary users as a part of
13: the environment. The dynamics of the Q-learning are illustrated
14: using Metrick-Polak plot. A rigorous proof of the convergence of
15: Q-learning is provided via the similarity between the Q-learning and
16: Robinson-Monro algorithm, as well as the analysis of convergence of
17: the corresponding ordinary differential equation (via Lyapunov
18: function). Examples are illustrated and the performance of learning
19: is evaluated by numerical simulations.
20: \end{abstract}
21: