fdc55fba3dc322a0.tex
1: \begin{abstract} \label{abstract}
2: Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the $Q$-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.
3: 
4: %\am{One important point is missing from the abstract which I am not exactly sure what yo write. Mudit, please try to answer that here, I will include it in the abstract text later. }
5: 
6: %\am{Question: Why studying the convergence analysis is important?}
7: 
8: %\am{Why the achieved bound of $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$ is interesting and why one should care about it?}
9: 
10: %Once these are answered, I will modify the abstract. 
11: \end{abstract}
12: