1: \begin{abstract}
2: \vspace{-0.33cm}
3: Monocular \threeD detectors achieve remarkable performance on cars and smaller objects.
4: However, their performance drops on larger objects, leading to fatal accidents.
5: Some attribute the failures to training data scarcity or their receptive field requirements of large objects.
6: In this paper, we highlight this understudied problem of generalization to large objects.
7: We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets.
8: We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects.
9: To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes.
10: We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case.
11: Leveraging our theoretical insights, we propose \methodName (\methodNameFull) as the first step towards generalizing to large objects.
12: \methodName effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss.
13: \methodName achieves SoTA results on the \kittiThreeSixty leaderboard and improves existing detectors on the \nuscenes leaderboard, particularly for large objects.
14: \end{abstract}
15: