abstract:797b0bacebe50159.tex

1: \begin{abstract}

2: Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc.

3: Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality.

4: This is in contrast to classical control algorithms which are typically model-based.

5: An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC).

6: This, however, introduces new hyper-parameters related to the prediction horizon.

7: Furthermore, RL is usually concerned with Markov decision processes.

8: But the most of the real environments are not time-discrete.

9: The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system.

10: There is thus, in fact, yet another hyper-parameter -- the agent sampling time.

11: In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem.

12: We benchmark the agents with a simple variant of MPC.

13: The sampling showed a kind of a ``sweet spot'' behavior, whereas the RL agents demonstrated merits at shorter horizons.

14: %Remarkably,

15: %This paper is concerned with a canonical control example of a mobile robot parking problem where two variations of RL, that employ prediction, namely, a roll-out and so-called stacked Q-learning, are compared to each other and to the benchmark -- a model-predictive controller.

16: %The environment is modeled in a hybrid setting: the system dynamics are continuous, whereas the agent (the controller) is digital, which corresponds to the real physical conditions.

17: %Effects of controller discretization, prediction step size, and prediction horizon length are investigated, whereas some interesting tendencies are observed.

18: %Remarkably, the stacked RL approach demonstrated superior performance in some scenarios.

19: \end{abstract}

20: