f8f13140a2009a92.tex
1: \begin{abstract}
2: We present a \emph{Pontryagin-Guided Direct Policy Optimization} (PG-DPO) framework for Merton’s portfolio problem, unifying modern neural-network-based policy parameterization with the costate (adjoint) viewpoint from Pontryagin’s Maximum Principle (PMP). Instead of approximating the value function (as in “Deep BSDE”), we track a policy-fixed backward SDE for the adjoint variables, allowing each gradient update to align with continuous-time PMP conditions. This setup yields locally optimal consumption and investment policies that are closely tied to classical stochastic control. We further incorporate an alignment penalty that nudges the learned policy toward Pontryagin-derived solutions, enhancing both convergence speed and training stability. Numerical experiments confirm that PG-DPO effectively accommodates both consumption and investment, achieving strong performance and interpretability without requiring large offline datasets or model-free reinforcement learning.
3: \end{abstract}
4: