6b829f682c7aab23.tex
1: \begin{abstract}
2: Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state, with the goal of maximizing a cumulative reward function. Predominantly, there are two families of algorithms to solve RL problems: value-based and policy-based methods, with the latter designed to learn a probabilistic parametric policy from states to actions. Most contemporary approaches implement this policy using a neural network (NN). However, NNs usually face issues related to convergence, architectural suitability, hyper-parameter selection, and underutilization of the redundancies of the state-action representations (e.g. locally similar states). This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy. More precisely, we leverage the PARAFAC decomposition to design \emph{tensor low-rank} policies. The key idea involves collecting the policy parameters into a tensor and leveraging tensor-completion techniques to enforce  \emph{low rank}. We establish theoretical guarantees of the proposed methods for various policy classes and validate their efficacy through numerical experiments. Specifically, we demonstrate that \emph{tensor low-rank} policy models reduce computational and sample complexities in comparison to NN models while achieving similar rewards.
3: \end{abstract}
4: