abstract:ce13e77389206ec0.tex

1: \begin{abstract}

2: Deep Actor-Critic algorithms, which combine Actor-Critic with deep

3: neural network (DNN), have been among the most prevalent reinforcement

4: learning algorithms for decision-making problems in simulated environments.

5: However, the existing deep Actor-Critic algorithms are still not mature

6: to solve realistic problems with non-convex stochastic constraints

7: and high cost to interact with the environment. In this paper, we

8: propose a single-loop deep Actor-Critic (SLDAC) algorithmic framework

9: for general constrained reinforcement learning (CRL) problems. In

10: the actor step, the constrained stochastic successive convex approximation

11: (CSSCA) method is applied to handle the non-convex stochastic objective

12: and constraints. In the critic step, the critic DNNs are only updated

13: once or a few finite times for each iteration, which simplifies the

14: algorithm to a single-loop framework (the existing works require a

15: sufficient number of updates for the critic step to ensure a good

16: enough convergence of the inner loop for each iteration). Moreover,

17: the variance of the policy gradient estimation is reduced by reusing

18: observations from the old policy. The single-loop design and the observation

19: reuse effectively reduce the agent-environment interaction cost and

20: computational complexity. In spite of the biased policy gradient estimation

21: incurred by the single-loop design and observation reuse, we prove

22: that the SLDAC with a feasible initial point can converge to a Karush-Kuhn-Tuker

23: (KKT) point of the original problem almost surely. Simulations show

24: that the SLDAC algorithm can achieve superior performance with much

25: lower interaction cost.

26: \end{abstract}