65c316d6378d292e.tex
1: \begin{abstract}
2: \vspace{-0.3cm}
3: Learning-based visual relocalizers exhibit leading pose accuracy, but require hours or days of training.
4: Since training needs to happen on each new scene again, long training times make learning-based relocalization impractical for most applications, despite its promise of high accuracy.
5: In this paper we show how such a system can actually achieve the same accuracy in less than 5 minutes.
6: We start from the obvious: a relocalization network can be split in a scene-agnostic feature backbone, and a scene-specific prediction head.
7: Less obvious: using an MLP prediction head allows us to optimize across thousands of view points simultaneously in each single training iteration.
8: This leads to stable and extremely fast convergence.
9: Furthermore, we substitute effective but slow end-to-end training using a robust pose solver with a curriculum over a reprojection loss. 
10: Our approach does not require privileged knowledge, such a depth maps or a 3D model, for speedy training.
11: Overall, our approach is up to 300x faster in mapping than state-of-the-art scene coordinate regression, while keeping accuracy on par. 
12: Code is available: \url{https://nianticlabs.github.io/ace}
13: \end{abstract}
14: