abstract:05a8b1fedb9e7dce.tex

1: \begin{abstract}

2: The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints.

3: However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality.

4: To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, \emph{spectral-risk-constrained policy optimization (SRCPO)}, a bilevel optimization approach that utilizes the duality of spectral risk measures.

5: In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables.

6: The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting.

7: Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.

8: \end{abstract}

9: