1: \begin{abstract}
2: %Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the \textit{curse of dimensionality}
3: %and the \textit{curse of modeling}.
4: To overcome the \textit{curses of dimensionality and modeling}
5: %and the \textit{curse of modeling}
6: of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems,
7: %these issues,
8: Reinforcement Learning (RL) methods are adopted in practice.
9: Contrary to traditional RL algorithms
10: which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to
11: exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction
12: in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical
13: RL algorithms.
14: %\textcolor{blue}{We illustrate this using an example from queuing theory.}
15: %In this paper, we aim to
16: %obtain
17: %the optimal admission control policy in a system where different classes of customers are present.
18: %Using DP techniques, we prove that it is optimal to admit the $i^{\rm{th}}$ class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$.
19: Simulation results establish that the proposed algorithm converges faster than other RL algorithms.
20: %The techniques
21: %presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and
22: %communication networking.
23: \end{abstract}
24: