1: \begin{abstract}
2: Self-supervised video hashing (SSVH) is a practical task in video indexing and retrieval.
3: Although Transformers are predominant in SSVH for their impressive temporal modeling capabilities, they often suffer from computational and memory inefficiencies.
4: Drawing inspiration from Mamba, an advanced state-space model, we explore its potential in SSVH to achieve a better balance between efficacy and efficiency.
5: We introduce \modelname{}, a Mamba-based video hashing model with an improved self-supervised learning paradigm.
6: Specifically, we design bidirectional Mamba layers for both the encoder and decoder, which are effective and efficient in capturing temporal relationships thanks to the data-dependent selective scanning mechanism with linear complexity.
7: In our learning strategy, we transform global semantics in the feature space into semantically consistent and discriminative hash centers, followed by a center alignment loss as a global learning signal.
8: Our self-local-global (SLG) paradigm significantly improves learning efficiency, leading to faster and better convergence.
9: Extensive experiments demonstrate \modelname{}'s improvements over state-of-the-art methods, superior transferability, and scalable advantages in inference efficiency.
10: \end{abstract}
11: