abstract:1f2ebdcf24ab8ef3.tex

1: \begin{abstract}

2:

3: Deep machine learning models are increasingly deployed in the wild for providing services to users. Adversaries may steal the knowledge of these valuable models by training substitute models according to the inference results of the targeted deployed models. Recent data-free model stealing methods are shown effective to extract the knowledge of the target model without using real query examples, but they assume rich inference information, e.g., class probabilities and logits. However, they are all based on competing generator-substitute networks and hence encounter training instability. In this paper we propose a data-free model stealing framework, \alg, which is based on collaborative generator-substitute networks and only requires the target model to provide label prediction for synthetic query examples. The core of our method is a model stealing optimization consisting of two collaborative models (i) the substitute model which imitates the target model through the synthetic query examples and their inferred labels and (ii) the generator which synthesizes images such that the confidence of the substitute model over each query example is maximized. We propose a novel coordinate descent training procedure and analyze its convergence. We also empirically evaluate the trained substitute model on three datasets and its application on black-box adversarial attacks. Our results show that the accuracy of our trained substitute model and the adversarial attack success rate over it can be up to 33\% and 40\% higher than state-of-the-art data-free black-box attacks.

4:

5: \end{abstract}

6: