88e1036824772f0f.tex
1: \begin{abstract}
2: Self-supervised video hashing (SSVH) is a practical task in video indexing and retrieval. 
3: Although Transformers are predominant in SSVH for their impressive temporal modeling capabilities, they often suffer from computational and memory inefficiencies. 
4: Drawing inspiration from Mamba, an advanced state-space model, we explore its potential in SSVH to achieve a better balance between efficacy and efficiency. 
5: We introduce \modelname{}, a Mamba-based video hashing model with an improved self-supervised learning paradigm. 
6: Specifically, we design bidirectional Mamba layers for both the encoder and decoder, which are effective and efficient in capturing temporal relationships thanks to the data-dependent selective scanning mechanism with linear complexity. 
7: In our learning strategy, we transform global semantics in the feature space into semantically consistent and discriminative hash centers, followed by a center alignment loss as a global learning signal. 
8: Our self-local-global (SLG) paradigm significantly improves learning efficiency, leading to faster and better convergence. 
9: Extensive experiments demonstrate \modelname{}'s  improvements over state-of-the-art methods, superior transferability, and scalable advantages in inference efficiency. 
10: \end{abstract}
11: