ee47b70efd38d4f4.tex
1: \begin{abstract}
2: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation.
3: Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting.
4: At the core of our approach is a novel time-conditioned neural radiance field that represents scene dynamics using a set of compact latent codes.
5: To exploit the fact that changes between adjacent frames of a video are typically small and locally consistent, we propose two novel strategies for efficient training of our neural network:
6: 1) An efficient hierarchical training scheme, and  
7: 2) an importance sampling strategy that selects the next rays for training based on the temporal variation of the input videos.
8: In combination, these two strategies significantly boost the training speed, lead to fast convergence of the training process, and enable high quality results.
9: Our learned representation is highly compact and able to represent a 10 second 30 FPS multi-view video recording by 18 cameras with a model size of just 28MB.
10: We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes.
11: We perform an extensive qualitative and quantitative evaluation that shows that our approach outperforms the current state of the art.
12: We include additional video and information at: \url{https://neural-3d-video.github.io}.
13: \end{abstract}
14: