948699d3af48a622.tex
1: \begin{abstract}
2: Lossy compression and clustering fundamentally involve a decision
3: about what features are relevant and which are not. The information
4: bottleneck method (IB) by Tishby, Pereira, and Bialek formalized this
5: notion as an information-theoretic optimization problem and proposed
6: an optimal tradeoff between throwing away as many bits as possible,
7: and selectively keeping those that are most important. In the IB,
8: compression is measure my mutual information. Here, we introduce an
9: alternative formulation that replaces mutual information with entropy,
10: which we call the deterministic information bottleneck (DIB), that
11: we argue better captures this notion of compression. As suggested
12: by its name, the solution to the DIB problem turns out to be a deterministic
13: encoder, or hard clustering, as opposed to the stochastic encoder,
14: or soft clustering, that is optimal under the IB. We compare the IB
15: and DIB on synthetic data, showing that the IB and DIB perform similarly
16: in terms of the IB cost function, but that the DIB significantly outperforms
17: the IB in terms of the DIB cost function. We also empirically find
18: that the DIB offers a considerable gain in computational efficiency
19: over the IB, over a range of convergence parameters. Our derivation
20: of the DIB also suggests a method for continuously interpolating between
21: the soft clustering of the IB and the hard clustering of the DIB.
22: \end{abstract}
23: