1: \begin{abstract}
2: Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms.
3: In this paper, we review this analogy across four problem classes with a unified solution characterization allowing for a systematic transformation of algorithms from one domain to the other.
4: In particular, we identify equivalent optimization and control algorithms that have already been pointed out in the existing literature, but mostly in a scattered way.
5: With this unifying framework in mind, we adopt the quasi-Newton method from convex optimization to introduce a novel control algorithm coined as quasi-policy iteration (QPI).
6: In particular, QPI is based on a novel approximation of the ``Hessian'' matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs and by allowing for the incorporation of prior information on the transition probability kernel.
7: While the proposed algorithm has the same computational complexity as value iteration, it interestingly exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
8:
9: \smallskip
10: \noindent \textsc{Keywords:} Dynamic programming, reinforcement learning, optimization algorithms, quasi-Newton methods, Markov decision processes.
11:
12: \end{abstract}
13: