abstract:65c316d6378d292e.tex

1: \begin{abstract}

2: \vspace{-0.3cm}

3: Learning-based visual relocalizers exhibit leading pose accuracy, but require hours or days of training.

4: Since training needs to happen on each new scene again, long training times make learning-based relocalization impractical for most applications, despite its promise of high accuracy.

5: In this paper we show how such a system can actually achieve the same accuracy in less than 5 minutes.

6: We start from the obvious: a relocalization network can be split in a scene-agnostic feature backbone, and a scene-specific prediction head.

7: Less obvious: using an MLP prediction head allows us to optimize across thousands of view points simultaneously in each single training iteration.

8: This leads to stable and extremely fast convergence.

9: Furthermore, we substitute effective but slow end-to-end training using a robust pose solver with a curriculum over a reprojection loss.

10: Our approach does not require privileged knowledge, such a depth maps or a 3D model, for speedy training.

11: Overall, our approach is up to 300x faster in mapping than state-of-the-art scene coordinate regression, while keeping accuracy on par.

12: Code is available: \url{https://nianticlabs.github.io/ace}

13: \end{abstract}

14: