1: \begin{abstract}
2: We consider the problem of measuring how much a system
3: reveals about its secret inputs.
4: We work in the black-box setting: we assume no prior
5: knowledge of the system's internals, and we run
6: the system for choices of secrets and measure its leakage from the respective outputs.
7: Our goal is to estimate the Bayes risk,
8: from which one can derive some of the most
9: popular leakage measures (e.g., min-entropy
10: leakage).
11:
12: The state-of-the-art method for estimating these
13: leakage measures is the frequentist paradigm,
14: which approximates the system's internals by
15: looking at the frequencies of its inputs and outputs.
16: Unfortunately, this does not scale for
17: systems with large output spaces, where it would require
18: too many input-output examples.
19: Consequently, it also cannot be applied to systems
20: with continuous outputs (e.g., time side channels, network traffic).
21:
22: In this paper, we exploit an analogy between
23: Machine Learning (ML) and black-box leakage estimation
24: to show that the Bayes risk
25: of a system can be estimated by using a class
26: of ML methods: the \textit{universally consistent}
27: learning rules; these rules can exploit patterns in the
28: input-output examples to improve the estimates' convergence,
29: while retaining formal optimality guarantees.
30: We focus on a set of them,
31: the nearest neighbor rules;
32: we show that they
33: significantly reduce
34: the number of black-box queries required
35: for a precise estimation
36: whenever nearby outputs tend to be produced by
37: the same secret;
38: furthermore, some of them can
39: tackle systems with continuous outputs.
40: We illustrate the applicability of
41: these techniques on both synthetic and real-world data,
42: and we compare them with the state-of-the-art tool,
43: \leakiest, which is based on the frequentist approach.
44: \end{abstract}
45: