abstract:ca663c0194132b3b.tex

1: \begin{abstract}

2: \noindent Machine learning models are used in many sensitive areas where, besides predictive accuracy, their comprehensibility is also important.

3: Interpretability of prediction models is necessary to determine their biases and causes of errors and is a prerequisite for users' confidence.

4: For complex state-of-the-art black-box models, post-hoc model-independent explanation techniques are an established solution. Popular and effective techniques, such as IME, LIME, and SHAP, use perturbation of instance features to explain individual predictions.  Recently, \citet{adversarial} put their robustness into question by showing that their outcomes can be manipulated due to poor perturbation sampling employed. This weakness would allow dieselgate type cheating of owners of sensitive models who could deceive inspection and hide potentially unethical or illegal biases existing in their predictive models. This could undermine public trust in machine learning models and give rise to legal restrictions on their use.

5:

6: We show that better sampling in these explanation methods prevents malicious manipulations. The proposed sampling uses data generators that learn the training set distribution and generate new perturbation instances much more similar to the training set. We show that the improved sampling increases the LIME and SHAP's robustness, while the previously untested method IME is already the most robust of all.

7:

8: %and speeds up the convergence of the IME explanation method.

9: \end{abstract}

10: