2184ba35a27513b9.tex
1: \begin{abstract}
2: 	We consider the problem of measuring how much a system
3: 	reveals about its secret inputs.
4: 	We work in the black-box setting: we assume no prior
5: 	knowledge of the system's internals, and we run
6: 	the system for choices of secrets and measure its leakage from the respective outputs.
7: 	Our goal is to estimate the Bayes risk, 
8: 	from which one can derive some of the most
9: 	popular leakage measures (e.g., min-entropy
10: 	leakage).
11: 	
12: 	The state-of-the-art method for estimating these
13: 	leakage measures is the frequentist paradigm,
14: 	which approximates the system's internals by
15: 	looking at the frequencies of its inputs and outputs.
16: 	Unfortunately, this does not scale for
17: 	systems with large output spaces, where it would require
18: 	too many input-output examples.
19: 	Consequently, it also cannot be applied to systems
20: 	with continuous outputs (e.g., time side channels, network traffic).
21: 	
22: 	In this paper, we exploit an analogy between
23: 	Machine Learning (ML)  and black-box leakage estimation
24: 	to show that the Bayes risk  
25: 	of a system can be estimated by using a class
26: 	of ML methods: the  \textit{universally consistent}
27: 	learning rules; these rules can exploit patterns in the
28: 	input-output examples to improve the estimates' convergence,
29: 	while retaining formal optimality guarantees.
30: 	We focus on a set of them,
31: 	the nearest neighbor rules;
32: 	we show that they
33: 	significantly reduce
34: 	the number of black-box queries required
35: 	for a precise estimation
36: 	whenever nearby outputs tend to be produced by
37: 	the same secret;
38: 	furthermore, some of them can
39: 	tackle systems with continuous outputs.
40: 	We illustrate the applicability of
41: 	these techniques on both synthetic and real-world data,
42: 	and we compare them with the state-of-the-art tool,
43: 	\leakiest, which is based on the frequentist approach.
44: \end{abstract}
45: