1: \begin{abstract}
2: Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. In this paper, we develop a model-free proximal policy search framework for the mean-variance function with finite-sample error bound analysis (to local optima). Previous analyses of this class of algorithms use stochastic approximation techniques
3: to prove asymptotic convergence, but no finite-sample analysis has yet been attempted. Our starting point is a reformulation of the original mean-variance function using the Fenchel duality, from which we propose a stochastic coordinate descent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.
4: \end{abstract}
5: