abstract:dd9f4310914eb4cf.tex

1: \begin{abstract}

2: %Classical Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods.

3: %However, DP techniques suffer from the \textit{curse of dimensionality} and become computationally intractable

4: %in the face of large state spaces. Furthermore, DP methods suffer from the \textit{curse of modeling} since

5: %the computation of optimal policy requires the knowledge of transition probabilities.

6: To overcome the \textit{curse of dimensionality} and \textit{curse of modeling} in Dynamic Programming (DP) methods

7: for solving classical Markov Decision Process (MDP) problems,

8: Reinforcement Learning (RL) algorithms are popular.

9: In this paper, we consider an infinite-horizon average reward MDP problem and prove the optimality of the threshold policy under certain

10: conditions.

11: Traditional RL techniques do not exploit the threshold nature of optimal policy while learning. In this paper, we propose

12: a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible

13: policy space. We establish that the proposed algorithm converges to the optimal policy.

14: It provides a significant improvement in convergence speed and computational and storage complexity over traditional RL algorithms.

15: The proposed technique can be applied to a wide variety of optimization problems

16: that include energy efficient data transmission and management of queues.

17: We exhibit the improvement in

18: convergence speed of the proposed algorithm over other RL algorithms through simulations.

19: % This paper provides a sample of a \LaTeX\ document which conforms,

20: % somewhat loosely, to the formatting guidelines for

21: % ACM SIG Proceedings.\footnote{This is an abstract footnote}

22: \end{abstract}

23: