095ff677ad9c9b53.tex
1: \begin{abstract}
2: Machine Learning (ML) techniques can facilitate the automation of \underline{mal}icious soft\underline{ware} (malware for short) detection, but suffer from evasion attacks. Many studies counter such attacks in heuristic manners, lacking theoretical guarantees and defense effectiveness. In this paper, we propose a new adversarial training framework, termed \underline{P}rincipled \underline{A}dversarial Malware \underline{D}etection (PAD), which offers convergence guarantees for robust optimization methods. PAD lays on a learnable convex measurement that quantifies distribution-wise discrete perturbations to protect malware detectors from adversaries, 
3: whereby for smooth detectors, adversarial training can be performed with theoretical treatments. To promote defense effectiveness, we propose a new mixture of attacks to instantiate PAD to enhance deep neural network-based measurements and malware detectors. Experimental results on two Android malware datasets demonstrate: (i) the proposed method significantly outperforms the state-of-the-art defenses; (ii) it can harden ML-based malware detection against 27 evasion attacks with detection accuracies greater than 83.45\%, at the price of suffering an accuracy decrease smaller than 2.16\% in the absence of attacks;
4: (iii) it matches or outperforms many anti-malware scanners in VirusTotal against realistic adversarial malware.
5: \end{abstract}
6: