16c2e86584650334.tex
1: \begin{abstract}
2: %   \todo[inline]{Define important problem (tracking), why attention is important. e.g. reducing the computational/cognitive load etc.}
3: 
4:   Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned \emph{a priori}. Inspired by how the human visual cortex employs spatial attention and separate ``where'' and ``what'' processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features \emph{particular} to the tracked object. 
5:   This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset.
6: 
7: \end{abstract}
8: