abstract:a8d463d1da0aa7c7.tex

1: \begin{abstract}

2: Resource allocation is an important issue in cognitive radio

3: systems. It can be done by carrying out negotiation among secondary

4: users. However, significant overhead may be incurred by the

5: negotiation since the negotiation needs to be done frequently due to

6: the rapid change of primary users' activity. In this paper, a

7: channel selection scheme without negotiation is considered for

8: multi-user and multi-channel cognitive radio systems. To avoid

9: collision incurred by non-coordination, each user secondary learns

10: how to select channels according to its experience. Multi-agent

11: reinforcement leaning (MARL) is applied in the framework of

12: Q-learning by considering the opponent secondary users as a part of

13: the environment. The dynamics of the Q-learning are illustrated

14: using Metrick-Polak plot. A rigorous proof of the convergence of

15: Q-learning is provided via the similarity between the Q-learning and

16: Robinson-Monro algorithm, as well as the analysis of convergence of

17: the corresponding ordinary differential equation (via Lyapunov

18: function). Examples are illustrated and the performance of learning

19: is evaluated by numerical simulations.

20: \end{abstract}

21: