abstract:616484b26c8ab08e.tex

1: \begin{abstract}

2: Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks.

3: With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions.

4: These challenges arise from

5: (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion,

6: and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality.

7: To address modality unreliability, we propose the \textbf{U}ncertainty-Guided Cross-\textbf{Adapter} (\textbf{\adName}) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities.

8: For modality imbalance, we propose a \textbf{Re}balanced Modality \textbf{Grad}ient Modulation (\textbf{\gradName}) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients.

9: Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on \url{https://github.com/OMGGGGG/mmdg}.

10:

11: \end{abstract}

12: