abstract:93809c9aa214edee.tex

1: \begin{abstract}

2: %Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the \textit{curse of dimensionality}

3: %and the \textit{curse of modeling}.

4: To overcome the \textit{curses of dimensionality and modeling}

5: %and the \textit{curse of modeling}

6: of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems,

7: %these issues,

8: Reinforcement Learning (RL) methods are adopted in practice.

9: Contrary to traditional RL algorithms

10: which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to

11: exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction

12: in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical

13: RL algorithms.

14: %\textcolor{blue}{We illustrate this using an example from queuing theory.}

15: %In this paper, we aim to

16: %obtain

17: %the optimal admission control policy in a system where different classes of customers are present.

18: %Using DP techniques, we prove that it is optimal to admit the $i^{\rm{th}}$  class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$.

19: Simulation results establish that the proposed algorithm converges faster than other RL algorithms.

20: %The techniques

21: %presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and

22: %communication networking.

23: \end{abstract}

24: