abstract:91e2eacb4245f3a6.tex

1: \begin{abstract}

2: Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample  and time efficient compared to  classical methods.

3:

4: % Model-based solutions to partial observability first learn a transition  model, which is then used to find the optimal policy. In this case, reward information is decoupled from the feature extraction, thus failling to prune out irrelevant features. We propose a novel control algorithm for partially-observable tasks which, by combining learning and planning into one step, leads to better sample and time complexity.

5: % Optimal control in partially observable domains has been considered as one of the most difficult problem in reinforcement learning.

6: % One classical solution to first model the dynamics of the environment, then combine with a planning algorithm to  form an optimal policy.

7: % However, this approach separates the reward information from the learned dynamics, and leads to a sample inefficient algorithm. Moreover, the planning algorithm is often iterative, and the convergence is often time-consuming.

8: % In this paper, we propose a novel planning algorithm for partially observable environment that incorporates reward information

9: \end{abstract}

10: