abstract:aec8a714b7131ac1.tex

1: \begin{abstract}

2: Robust explanations of machine learning models are critical to establish human trust in the models.

3: Due to limited cognition capability,

4: most humans can only interpret the top few salient features.

5: It is critical to make top salient features robust to adversarial attacks,

6: especially those against the more vulnerable gradient-based explanations.

7: Existing defense

8: measures robustness using $\ell_p$-norms,

9: which have weaker protection power.

10: % Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks.

11: % % and many works propose defense strategies for better robustness.

12: % However,

13: % existing works

14: % % on either attack or defense side,

15: % measure the explanation robustness based on $\ell_p$-norm,

16: % which can be counter-intuitive to humans,

17: % who only pay attention to the top few salient features.

18: We define explanation thickness for measuring salient features ranking stability,

19: % We then present a new practical adversarial attacking goal for manipulating explanation rankings.

20: % To mitigate the ranking-based attacks

21: and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features.

22: Theoretically, we prove a connection between R2ET and adversarial training.

23: % ; we further formulate a constrained multi-objective optimization problem and an attacking algorithm to prove a \textit{global} convergence rate, linking thickness maximization to ranking stability.

24: Experiments with a wide spectrum of network architectures and data modalities,

25: including brain networks,

26: demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.

27: % We also experimentally check the coherence between the thickness and the explanation robustness.

28: % even more robust than widely accepted Hessian-based curvature smoothing approaches.

29: \end{abstract}

30: