abstract:4e9ded215c04ed60.tex

1: \begin{abstract}

2: In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent.

3: In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly.

4: We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions.

5: To correct this behavior and achieve the maximal empowerment, we propose two methods respectively based on the transitional probability model and Gaussian mixture model.

6: We substantiate our claims through rigorous mathematical derivations and experimental analyses.

7: \end{abstract}

8: