abstract:f50df52affed1348.tex

1: \begin{abstract}

2: This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation,  both proposed approaches are

3: proved to converge linearly in finding optimal policies of the regularized LQC.

4: Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy.

5: Finally, when the optimal policy for an RL problem with a known environment  is appropriately transferred as the initial policy to an RL problem with an unknown environment,  the IPO method is shown to enable a super-linear convergence rate if the two environments are sufficiently close.

6:  Performances of these proposed algorithms are supported by numerical examples.

7: \end{abstract}

8: