05a8b1fedb9e7dce.tex
1: \begin{abstract}
2: The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints.
3: However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality.
4: To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, \emph{spectral-risk-constrained policy optimization (SRCPO)}, a bilevel optimization approach that utilizes the duality of spectral risk measures.
5: In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables.
6: The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting.
7: Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.
8: \end{abstract}
9: