abstract:4007952a1927dc88.tex

1: \begin{abstract}

2: This paper studies the risk-averse mean-variance optimization in

3: infinite-horizon discounted Markov decision processes (MDPs). The

4: involved variance metric concerns reward variability during the

5: whole process, and future deviations are discounted to their present

6: values. This discounted mean-variance optimization yields a reward

7: function dependent on a discounted mean, and this dependency renders

8: traditional dynamic programming methods inapplicable since it

9: suppresses a crucial property---time consistency. To deal with this

10: unorthodox problem, we introduce a pseudo mean to transform the

11: untreatable MDP to a standard one with a redefined reward function

12: in standard form and derive a discounted mean-variance performance

13: difference formula. With the pseudo mean, we propose a unified

14: algorithm framework with a bilevel optimization structure for the

15: discounted mean-variance optimization. The framework unifies a

16: variety of algorithms for several variance-related problems

17: including, but not limited to, risk-averse variance and

18: mean-variance optimizations in discounted and average MDPs.

19: Furthermore, the convergence analyses missing from the literature

20: can be complemented with the proposed framework as well. Taking the

21: value iteration as an example, we develop a discounted mean-variance

22: value iteration algorithm and prove its convergence to a local

23: optimum with the aid of a Bellman local-optimality equation.

24: Finally, we conduct a numerical experiment on portfolio management

25: to validate the proposed algorithm.

26: \end{abstract}

27: