e11a037768ce5de2.tex
1: \begin{abstract}
2:   Traditional reinforcement learning (RL) generates discrete control policies,
3:   assigning one action per cycle. These policies are usually implemented as in a
4:   fixed-frequency control loop. This rigidity presents challenges as optimal
5:   control frequency is task-dependent; suboptimal frequencies increase
6:   computational demands and reduce exploration efficiency. Variable Time Step
7:   Reinforcement Learning (VTS-RL) addresses these issues with adaptive control
8:   frequencies, executing actions only when necessary, thus reducing
9:   computational load and extending the action space to include action durations.
10:   In this paper we introduce the Multi-Objective Soft Elastic Actor-Critic
11:   (MOSEAC) method to perform VTS-RL, validating it through theoretical analysis
12:   and experimentation in simulation and on real robots. Results show faster
13:   convergence, better training results, and reduced energy consumption with
14:   respect to other variable- or fixed-frequency approaches.
15: \end{abstract}
16: