3c690cc0293ace5f.tex
1: \begin{abstract}
2: For reinforcement learning (RL), it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.
3: To solve this problem, reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn.
4: More specifically, RCG adapts the initial state distribution from the neighborhood of a goal to a distance as training proceeds.
5: However, the initial state distribution generated for each iteration might be biased, thus making the policy overfit or slowing down the reverse expansion rate.
6: While training RCG for actor-critic (AC) based RL algorithms, this poor generalization and slow convergence might be induced by the tight coupling between an AC pair.
7: Therefore, we propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.
8: We empirically demonstrate that this proposed approach can improve RCG in performance and convergence, and it can also be applied to other AC based RL algorithms with adapted initial state distribution.
9: \end{abstract}
10: