1: \begin{abstract}
2:
3: Reinforcement learning (RL)
4: problems over general state and action spaces are notoriously challenging.
5: In contrast to the tableau setting, one cannot enumerate all the states
6: and then iteratively update the policies for each state. This prevents the application
7: of many well-studied RL methods especially those with provable convergence
8: guarantees. In this paper, we first present
9: a substantial generalization of the recently developed policy mirror descent method to
10: deal with general state and action spaces. We introduce new approaches
11: to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all.
12: Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation
13: techniques can be applied. We establish linear convergence rate to global optimality
14: or sublinear convergence to stationarity
15: for these methods applied to solve different classes of RL problems under exact policy evaluation.
16: We then define proper notions of approximation errors for
17: policy evaluation and investigate their impact on the convergence of these methods
18: applied to general-state RL problems with either finite-action or continuous-action spaces.
19: To the best of our knowledge, the development of these algorithmic frameworks
20: as well as their convergence analysis appear to be new in the literature.
21: \edits{Preliminary numerical results demonstrate the robustness of the aforementioned methods and show they can be competitive with state-of-the-art RL algorithms.}
22: \end{abstract}
23: