1: \begin{abstract} % Abstract of not more than 200 words.
2: We consider the framework of transfer-entropy-regularized Markov Decision Process (TERMDP) in which the weighted sum of the classical state-dependent cost and the transfer entropy from the state random process to the control random process is minimized.
3: Although TERMDP is generally a nonconvex optimization problem, we derive an analytical necessary optimality condition expressed as a finite set of nonlinear equations, based on which an iterative forward-backward computational procedure similar to the Arimoto-Blahut algorithm is proposed. Convergence of the proposed algorithm to a stationary point of the considered TERMDP is established.
4: Applications of TERMDP are discussed in the context of networked control systems theory and non-equilibrium thermodynamics.
5: The proposed algorithm is applied to an information-constrained maze navigation problem, whereby we study how the price of information qualitatively alters the optimal decision polices.
6: \end{abstract}
7: