abstract:ee47b70efd38d4f4.tex

1: \begin{abstract}

2: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation.

3: Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting.

4: At the core of our approach is a novel time-conditioned neural radiance field that represents scene dynamics using a set of compact latent codes.

5: To exploit the fact that changes between adjacent frames of a video are typically small and locally consistent, we propose two novel strategies for efficient training of our neural network:

6: 1) An efficient hierarchical training scheme, and

7: 2) an importance sampling strategy that selects the next rays for training based on the temporal variation of the input videos.

8: In combination, these two strategies significantly boost the training speed, lead to fast convergence of the training process, and enable high quality results.

9: Our learned representation is highly compact and able to represent a 10 second 30 FPS multi-view video recording by 18 cameras with a model size of just 28MB.

10: We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes.

11: We perform an extensive qualitative and quantitative evaluation that shows that our approach outperforms the current state of the art.

12: We include additional video and information at: \url{https://neural-3d-video.github.io}.

13: \end{abstract}

14: