abstract:3c690cc0293ace5f.tex

1: \begin{abstract}

2: For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.

3: To solve this problem, reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn.

4: More specifically, RCG adapts the initial state distribution from the neighborhood of a goal to a distance as training proceeds.

5: However, the initial state distribution generated for each iteration might be biased, thus making the policy overfit or slowing down the reverse expansion rate.

6: While training RCG for actor-critic (AC) based RL algorithms, this poor generalization and slow convergence might be induced by the tight coupling between an AC pair.

7: Therefore, we propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.

8: We empirically demonstrate that this proposed approach can improve RCG in performance and convergence, and it can also be applied to other AC based RL algorithms with adapted initial state distribution.

9: \end{abstract}

10: