1: \begin{abstract}
2:
3: In recent years, two-stage multimodal object detection methods based on deep learning have garnered significant attention.
4: However, these existing deep learning methods exhibit a notable decrease in detection accuracy when faced with occluded 3D objects. Additionally, the current two-stage methods struggle to converge quickly during model training.
5: This paper introduces HPC-Net, a high-precision and rapidly convergent object detection network.
6: HPC-Net comprises three key components: (1) \textbf{RP} (Replaceable Pooling), which enhances the network's detection accuracy, speed, robustness, and generalizability by incorporating pooling methods that can be flexibly replaced on 3D voxels and 2D BEV images.
7: (2) \textbf{DACConv} (Depth Accelerated Convergence Convolution), which integrates two convolution strategies—one for each input feature map and one for each input channel—to maintain the network's feature extraction ability (i.e., high accuracy) while significantly accelerating convergence speed.
8: (3) \textbf{MEFEM} (Multi-Scale Extended Receptive Field Feature Extraction Module), which addresses the challenge of low detection accuracy for 3D objects with high occlusion and truncation by employing a multi-scale feature fusion strategy and expanding the receptive field of the feature extraction module.
9: Our HPC-Net currently holds \textbf{the top position\footnote{As of the paper's completion date, October 10, 2023}} in the \textbf{KITTI Car 2D Object Detection Ranking}. In the \textbf{KITTI Car 3D Object Detection Ranking}, our HPC-Net currently \textbf{ranks fourth overall and first in hard mode}.
10: \end{abstract}
11: