1: \begin{abstract}
2: %Classical Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods.
3: %However, DP techniques suffer from the \textit{curse of dimensionality} and become computationally intractable
4: %in the face of large state spaces. Furthermore, DP methods suffer from the \textit{curse of modeling} since
5: %the computation of optimal policy requires the knowledge of transition probabilities.
6: To overcome the \textit{curse of dimensionality} and \textit{curse of modeling} in Dynamic Programming (DP) methods
7: for solving classical Markov Decision Process (MDP) problems,
8: Reinforcement Learning (RL) algorithms are popular.
9: In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain
10: conditions.
11: Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. In this paper, we propose
12: a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible
13: policy space. We establish that the proposed algorithm converges to the optimal policy.
14: It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms.
15: The proposed technique can be applied to a wide variety of optimization problems
16: that include energy efficient data transmission and management of queues.
17: We exhibit the improvement in
18: convergence speed of the proposed algorithm over other RL algorithms through simulations.
19: % This paper provides a sample of a \LaTeX\ document which conforms,
20: % somewhat loosely, to the formatting guidelines for
21: % ACM SIG Proceedings.\footnote{This is an abstract footnote}
22: \end{abstract}
23: