4e9ded215c04ed60.tex
1: \begin{abstract}
2: In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. 
3: In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly.
4: We show that the intrinsic reward used in the latter is subject to bias in stochastic environments, causing convergence to suboptimal solutions.
5: To correct this behavior and achieve the maximal empowerment, we propose two methods respectively based on the transitional probability model and Gaussian mixture model.
6: We substantiate our claims through rigorous mathematical derivations and experimental analyses. 
7: \end{abstract}
8: