aec8a714b7131ac1.tex
1: \begin{abstract}
2: Robust explanations of machine learning models are critical to establish human trust in the models.
3: Due to limited cognition capability,
4: most humans can only interpret the top few salient features. 
5: It is critical to make top salient features robust to adversarial attacks,
6: especially those against the more vulnerable gradient-based explanations.
7: Existing defense 
8: measures robustness using $\ell_p$-norms,
9: which have weaker protection power. 
10: % Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks.
11: % % and many works propose defense strategies for better robustness.
12: % However,
13: % existing works
14: % % on either attack or defense side,
15: % measure the explanation robustness based on $\ell_p$-norm,
16: % which can be counter-intuitive to humans, 
17: % who only pay attention to the top few salient features.
18: We define explanation thickness for measuring salient features ranking stability,
19: % We then present a new practical adversarial attacking goal for manipulating explanation rankings.
20: % To mitigate the ranking-based attacks 
21: and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features.
22: Theoretically, we prove a connection between R2ET and adversarial training.
23: % ; we further formulate a constrained multi-objective optimization problem and an attacking algorithm to prove a \textit{global} convergence rate, linking thickness maximization to ranking stability.
24: Experiments with a wide spectrum of network architectures and data modalities,
25: including brain networks, 
26: demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.
27: % We also experimentally check the coherence between the thickness and the explanation robustness.
28: % even more robust than widely accepted Hessian-based curvature smoothing approaches.
29: \end{abstract}
30: