16c6b0fa50729371.tex
1: \begin{abstract}
2: 
3: A recent trend in generative modeling is building 3D-aware generators from 2D image collections.
4: To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions.
5: Over the past months, more than ten works have addressed this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature tensor) produced from a pure 3D generator. 
6: But this solution comes at a cost: not only does it break multi-view consistency (i.e., shape and texture change when the camera moves), but it also learns geometry in low fidelity.
7: In this work, we show that obtaining a high-resolution 3D generator with SotA image quality is possible by following a completely different route of simply training the model patch-wise.
8: We revisit and improve this optimization scheme in two ways.
9: First, we design a location- and scale-aware discriminator to work on patches of different proportions and spatial positions.
10: Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence.
11: The resulting model, named \modelname, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $256^2$ and $512^2$ resolutions.
12: It obtains state-of-the-art image quality, high-fidelity geometry and trains \({\approx} 2.5 \times\) \textit{faster} than the upsampler-based counterparts.
13: 
14: \begin{center}
15: Code/data/visualizations: \href{\projecturl}{\projecturl}
16: \end{center}
17: \end{abstract}
18: