abstract:b6b941c4e0f0b6c9.tex

1: \begin{abstract}

2: Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses.

3: A key to address this problem is to localize discriminative parts to extract pose-invariant features.

4: However, ground-truth part annotations can be expensive to acquire.

5: Moreover, it is hard to define parts for many fine-grained classes.

6: This work introduces \textbf{Fully Convolutional Attention Networks (FCANs)}, a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains.

7: Compared to previous methods, our approach enjoys three advantages:

8: %1) the three components including feature extraction, visual attention and fine-grained classification are unified in an end-to-end system;

9: 1) the weakly-supervised reinforcement learning procedure requires no expensive part annotations;

10: 2) the fully-convolutional architecture speeds up both training and testing;

11: 3) the greedy reward strategy accelerates the convergence of the learning.

12: % it is capable of simultaneous focusing its glimpse on multiple visual attention regions.

13: % Feng: I am not sure this is an advantage

14: We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including CUB-200-2011, Stanford Dogs, Stanford Cars and Food-101.

15: \end{abstract}

16: