abstract:cf45c791d232d483.tex

1: \begin{abstract}

2:

3: Reinforcement learning (RL)

4: problems over general state and action spaces are notoriously challenging.

5: In contrast to the tableau setting, one cannot enumerate all the states

6: and then iteratively update the policies for each state. This prevents the application

7: of many well-studied RL methods especially those with provable convergence

8: guarantees. In this paper, we first present

9: a substantial generalization of the recently developed policy mirror descent method to

10: deal with general state and action spaces. We introduce new approaches

11: to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all.

12: Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation

13: techniques can be applied. We establish linear convergence rate to  global optimality

14: or sublinear convergence to stationarity

15: for these methods applied to solve different classes of RL problems under exact policy evaluation.

16: We then define proper notions of approximation errors for

17: policy evaluation and investigate their impact on the convergence of these methods

18: applied to general-state RL problems with either finite-action or continuous-action spaces.

19: To the best of our knowledge, the development of these algorithmic frameworks

20: as well as their convergence analysis appear to be new in the literature.

21: \edits{Preliminary numerical results demonstrate the robustness of the aforementioned methods and show they can be competitive with state-of-the-art RL algorithms.}

22: \end{abstract}

23: