93809c9aa214edee.tex
1: \begin{abstract}
2: %Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the \textit{curse of dimensionality}
3: %and the \textit{curse of modeling}. 
4: To overcome the \textit{curses of dimensionality and modeling}
5: %and the \textit{curse of modeling} 
6: of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems,
7: %these issues, 
8: Reinforcement Learning (RL) methods are adopted in practice.  
9: Contrary to traditional RL algorithms
10: which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to 
11: exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction 
12: in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical
13: RL algorithms. 
14: %\textcolor{blue}{We illustrate this using an example from queuing theory.}
15: %In this paper, we aim to 
16: %obtain 
17: %the optimal admission control policy in a system where different classes of customers are present.
18: %Using DP techniques, we prove that it is optimal to admit the $i^{\rm{th}}$  class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$.
19: Simulation results establish that the proposed algorithm converges faster than other RL algorithms. 
20: %The techniques 
21: %presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and 
22: %communication networking.
23: \end{abstract}
24: