abstract:3bdc102437a0c3fa.tex

1: \begin{abstract}

2:   Deep Convolutional Neural Networks (DCNNs) commonly use generic

3:   `max-pooling' (MP) layers to extract deformation-invariant features, but we

4:   argue in favor of a more refined treatment. First, we introduce {\em

5:     epitomic convolution} as a building block alternative to the common

6:   convolution-MP cascade of DCNNs; while having identical complexity to MP,

7:   Epitomic Convolution allows for parameter sharing across different filters,

8:   resulting in faster convergence and better generalization. Second, we

9:   introduce a Multiple Instance Learning approach to explicitly accommodate

10:   global translation and scaling when training a DCNN exclusively with class

11:   labels. For this we rely on a {\em `patchwork'} data structure that

12:   efficiently lays out all image scales and positions as candidates to a

13:   DCNN. Factoring global and local deformations allows a DCNN to `focus its

14:   resources' on the treatment of non-rigid deformations and yields a

15:   substantial classification accuracy improvement. Third, further pursuing

16:   this idea, we develop an efficient DCNN sliding window object detector that

17:   employs explicit search over position, scale, and aspect ratio. We

18:   provide competitive image classification and localization results on the

19:   ImageNet dataset and object detection results on the Pascal VOC 2007

20:   benchmark.

21: \end{abstract}

22: