357716124436f178.tex
1: \begin{abstract}
2: 		
3: 		Robustness and discrimination power are two fundamental requirements in visual object tracking. 
4: 		%
5: 		In most tracking paradigms, we find that the features extracted by the popular Siamese-like networks cannot fully discriminatively model the tracked targets and distractor objects, hindering them from simultaneously meeting these two requirements.
6: 		%
7: 		While most methods focus on designing robust correlation operations, we propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. 
8: 		%
9: 		In contrast to the Siamese-like feature extraction,
10: 		%
11: 		our network deeply embeds cross-image feature correlation in multiple layers of the feature network. 
12: 		%
13: 		By extensively matching the features of the two images through multiple layers, it is able to suppress non-target features, resulting in instance-varying feature extraction.
14: 		%
15: 		The output features of the search image can be directly used for predicting target locations without extra correlation step.
16: 		%
17: 		Moreover, our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
18: 		%
19: 		Extensive experiments show our method achieves the state-of-the-art results while running at real-time. Our feature networks also can be applied to existing tracking pipelines seamlessly to raise the tracking performance. %
20: 		Code will be available.
21: 		
22: 	\end{abstract}
23: