abstract:255a14b504018ff5.tex

1: \begin{abstract}

2: \vspace{-0.33cm}

3:     Monocular \threeD detectors achieve remarkable performance on cars and smaller objects.

4:     However, their performance drops on larger objects, leading to fatal accidents.

5:     Some attribute the failures to training data scarcity or their receptive field requirements of large objects.

6:     In this paper, we highlight this understudied problem of generalization to large objects.

7:     We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets.

8:     We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects.

9:     To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes.

10:     We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case.

11:     Leveraging our theoretical insights, we propose \methodName (\methodNameFull) as the first step towards generalizing to large objects.

12:     \methodName effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss.

13:     \methodName achieves SoTA results on the \kittiThreeSixty leaderboard and improves existing detectors on the \nuscenes leaderboard, particularly for large objects.

14: \end{abstract}

15: