abstract:7bdaa52fed47d5d0.tex

1: \begin{abstract}

2: Dynamic optimization of mean and variance in Markov decision

3: processes (MDPs) is a long-standing challenge caused by the failure

4: of dynamic programming. In this paper, we propose a new approach to

5: find the globally optimal policy for combined metrics of

6: steady-state mean and variance in an infinite-horizon undiscounted

7: MDP. By introducing the concepts of pseudo mean and pseudo variance,

8: we convert the original problem to a bilevel MDP problem, where the

9: inner one is a standard MDP optimizing pseudo mean-variance and the

10: outer one is a single parameter selection problem optimizing pseudo

11: mean. We use the sensitivity analysis of MDPs to derive the

12: properties of this bilevel problem. By solving inner standard MDPs

13: for pseudo mean-variance optimization, we can identify worse policy

14: spaces dominated by optimal policies of the pseudo problems. We

15: propose an optimization algorithm which can find the globally

16: optimal policy by repeatedly removing worse policy spaces. The

17: convergence and complexity of the algorithm are studied. Another

18: policy dominance property is also proposed to further improve the

19: algorithm efficiency. Numerical experiments demonstrate the

20: performance and efficiency of our algorithms. To the best of our

21: knowledge, our algorithm is the first that efficiently finds the

22: globally optimal policy of mean-variance optimization in MDPs. These

23: results are also valid for solely minimizing the variance metrics in

24: MDPs.

25: \end{abstract}