abstract:16c6b0fa50729371.tex

1: \begin{abstract}

2:

3: A recent trend in generative modeling is building 3D-aware generators from 2D image collections.

4: To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions.

5: Over the past months, more than ten works have addressed this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature tensor) produced from a pure 3D generator.

6: But this solution comes at a cost: not only does it break multi-view consistency (i.e., shape and texture change when the camera moves), but it also learns geometry in low fidelity.

7: In this work, we show that obtaining a high-resolution 3D generator with SotA image quality is possible by following a completely different route of simply training the model patch-wise.

8: We revisit and improve this optimization scheme in two ways.

9: First, we design a location- and scale-aware discriminator to work on patches of different proportions and spatial positions.

10: Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence.

11: The resulting model, named \modelname, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $256^2$ and $512^2$ resolutions.

12: It obtains state-of-the-art image quality, high-fidelity geometry and trains \({\approx} 2.5 \times\) \textit{faster} than the upsampler-based counterparts.

13:

14: \begin{center}

15: Code/data/visualizations: \href{\projecturl}{\projecturl}

16: \end{center}

17: \end{abstract}

18: