255a14b504018ff5.tex
1: \begin{abstract}
2: \vspace{-0.33cm}
3:     Monocular \threeD detectors achieve remarkable performance on cars and smaller objects. 
4:     However, their performance drops on larger objects, leading to fatal accidents. 
5:     Some attribute the failures to training data scarcity or their receptive field requirements of large objects.
6:     In this paper, we highlight this understudied problem of generalization to large objects.
7:     We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets.
8:     We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects.
9:     To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes.
10:     We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case.
11:     Leveraging our theoretical insights, we propose \methodName (\methodNameFull) as the first step towards generalizing to large objects.
12:     \methodName effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss.
13:     \methodName achieves SoTA results on the \kittiThreeSixty leaderboard and improves existing detectors on the \nuscenes leaderboard, particularly for large objects. 
14: \end{abstract}
15: