616484b26c8ab08e.tex
1: \begin{abstract}
2: Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. 
3: With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. 
4: These challenges arise from 
5: (1) modality unreliability, where some modality sensors like depth and infrared undergo significant domain shifts in varying environments, leading to the spread of unreliable information during cross-modal feature fusion, 
6: and (2) modality imbalance, where training overly relies on a dominant modality hinders the convergence of others, reducing effectiveness against attack types that are indistinguishable sorely using the dominant modality.
7: To address modality unreliability, we propose the \textbf{U}ncertainty-Guided Cross-\textbf{Adapter} (\textbf{\adName}) to recognize unreliably detected regions within each modality and suppress the impact of unreliable regions on other modalities. 
8: For modality imbalance, we propose a \textbf{Re}balanced Modality \textbf{Grad}ient Modulation (\textbf{\gradName}) strategy to rebalance the convergence speed of all modalities by adaptively adjusting their gradients.
9: Besides, we provide the first large-scale benchmark for evaluating multi-modal FAS performance under domain generalization scenarios. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. Source code and protocols will be released on \url{https://github.com/OMGGGGG/mmdg}.
10: 
11: \end{abstract}
12: