93d1647eb6b8870c.tex
1: \begin{abstract}
2: 
3: All existing 3D-from-2D generators are designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location and orientation, and the camera always points to the center of the scene.
4: This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses.
5: In this work, we develop \textit{\modelfullname\ (\modelname)}: a 3D synthesis framework with more general assumptions about the training data, and show that it scales to very challenging datasets, like ImageNet.
6: Our model is based on three new ideas.
7: First, we incorporate an \textit{inaccurate} off-the-shelf depth estimator into 3D GAN training via a special depth adaptation module to handle the imprecision.
8: Then, we create a flexible camera model and a regularization strategy for it to learn its distribution parameters during training.
9: Finally, we extend the recent ideas of transferring knowledge from pretrained classifiers into GANs for patch-wise trained models by employing a simple distillation-based technique on top of the discriminator.
10: It achieves more stable training than the existing methods and speeds up the convergence by at least 40\%.
11: We explore our model on four datasets: SDIP Dogs $256^2$, SDIP Elephants $256^2$, LSUN Horses $256^2$, and ImageNet $256^2$ and demonstrate that 3DGP outperforms the recent state-of-the-art in terms of both texture and geometry quality.
12: 
13: \begin{center}
14: Code and visualizations: \projecthref
15: \end{center}
16: 
17: \end{abstract}
18: