abstract:31c1ec5f26e368fc.tex

1: \begin{abstract}

2: Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.

3: CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control.

4: The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the {\it entire distribution} of the value function and finding a {\it randomized} optimal policy.

5: The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA).

6: We provide theoretical convergence guarantees for all the proposed algorithms and also

7: illustrate the usefulness of CPT-based criteria in a traffic signal control application.

8: %empirically demonstrate the usefulness of our algorithms.

9: \end{abstract}

10: