abstract:357716124436f178.tex

1: \begin{abstract}

2:

3: 		Robustness and discrimination power are two fundamental requirements in visual object tracking.

4: 		%

5: 		In most tracking paradigms, we find that the features extracted by the popular Siamese-like networks cannot fully discriminatively model the tracked targets and distractor objects, hindering them from simultaneously meeting these two requirements.

6: 		%

7: 		While most methods focus on designing robust correlation operations, we propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.

8: 		%

9: 		In contrast to the Siamese-like feature extraction,

10: 		%

11: 		our network deeply embeds cross-image feature correlation in multiple layers of the feature network.

12: 		%

13: 		By extensively matching the features of the two images through multiple layers, it is able to suppress non-target features, resulting in instance-varying feature extraction.

14: 		%

15: 		The output features of the search image can be directly used for predicting target locations without extra correlation step.

16: 		%

17: 		Moreover, our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.

18: 		%

19: 		Extensive experiments show our method achieves the state-of-the-art results while running at real-time. Our feature networks also can be applied to existing tracking pipelines seamlessly to raise the tracking performance. %

20: 		Code will be available.

21:

22: 	\end{abstract}

23: