1: \begin{abstract}
2: The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems.
3: In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users.
4: Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
5: In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
6: In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility.
7: In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
8: We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quantity.
9: We develop two algorithms to enable agents to learn policies to optimize the CPT-value, and evaluate these algorithms in environments where a target state has to be reached while avoiding obstacles.
10: We demonstrate that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.
11: \end{abstract}
12: