b5bf6cf4fda3739b.tex
1: \begin{abstract}
2: Nowozin \textit{et al} showed
3: last year how to extend the GAN \textit{principle} to all $f$-divergences. The
4: approach is elegant but falls short of a full description of the supervised game, and
5: says little about the key player, the generator: for example,
6: what does the generator actually converge to if solving the GAN game means convergence in some
7: space of parameters? How does that provide hints on the generator's design and
8: compare to the flourishing but almost exclusively experimental literature on the
9: subject?
10: 
11: In this paper, we unveil a broad class of distributions for which such
12: convergence happens --- namely, deformed exponential families, a wide
13: superset of exponential families --- and show tight connections with the three
14: other key GAN parameters: loss, game and architecture. In particular, we show that current deep architectures are
15: able to factorize a very large number of
16: such densities using an especially compact design, hence displaying the power of deep architectures and their concinnity in
17: the $f$-GAN game. This result holds given a sufficient condition on
18: \textit{activation functions} ---  which turns out to be
19: satisfied by popular choices. The key to our results is a variational
20: generalization of an old theorem that relates the KL divergence between regular exponential
21: families and divergences between their natural
22: parameters. We complete this picture with additional results and experimental insights on
23: how these results may be used to ground further improvements of GAN
24: architectures, via (i) a principled design of the activation
25: functions in the generator and (ii) an explicit integration of proper composite losses' link function in the discriminator.
26: \end{abstract}
27: