797b0bacebe50159.tex
1: \begin{abstract}
2: Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc.
3: Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality.
4: This is in contrast to classical control algorithms which are typically model-based.
5: An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC).
6: This, however, introduces new hyper-parameters related to the prediction horizon.
7: Furthermore, RL is usually concerned with Markov decision processes.
8: But the most of the real environments are not time-discrete.
9: The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system.
10: There is thus, in fact, yet another hyper-parameter -- the agent sampling time.
11: In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem.
12: We benchmark the agents with a simple variant of MPC.
13: The sampling showed a kind of a ``sweet spot'' behavior, whereas the RL agents demonstrated merits at shorter horizons.
14: %Remarkably, 
15: %This paper is concerned with a canonical control example of a mobile robot parking problem where two variations of RL, that employ prediction, namely, a roll-out and so-called stacked Q-learning, are compared to each other and to the benchmark -- a model-predictive controller.
16: %The environment is modeled in a hybrid setting: the system dynamics are continuous, whereas the agent (the controller) is digital, which corresponds to the real physical conditions.
17: %Effects of controller discretization, prediction step size, and prediction horizon length are investigated, whereas some interesting tendencies are observed.
18: %Remarkably, the stacked RL approach demonstrated superior performance in some scenarios.
19: \end{abstract}
20: