1: \begin{abstract}
2: \justifying High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system.
3: In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction.
4: We propose a unified permutation-equivalent modeling approach,
5: \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning.
6: To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision.
7: The proposed method well copes with various map elements with arbitrary shapes.
8: It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets.
9: Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes.
10: Code and more demos are available at \url{https://github.com/hustvl/MapTR} for facilitating further studies and applications.
11:
12: \end{abstract}
13: