abstract:948699d3af48a622.tex

1: \begin{abstract}

2: Lossy compression and clustering fundamentally involve a decision

3: about what features are relevant and which are not. The information

4: bottleneck method (IB) by Tishby, Pereira, and Bialek formalized this

5: notion as an information-theoretic optimization problem and proposed

6: an optimal tradeoff between throwing away as many bits as possible,

7: and selectively keeping those that are most important. In the IB,

8: compression is measure my mutual information. Here, we introduce an

9: alternative formulation that replaces mutual information with entropy,

10: which we call the deterministic information bottleneck (DIB), that

11: we argue better captures this notion of compression. As suggested

12: by its name, the solution to the DIB problem turns out to be a deterministic

13: encoder, or hard clustering, as opposed to the stochastic encoder,

14: or soft clustering, that is optimal under the IB. We compare the IB

15: and DIB on synthetic data, showing that the IB and DIB perform similarly

16: in terms of the IB cost function, but that the DIB significantly outperforms

17: the IB in terms of the DIB cost function. We also empirically find

18: that the DIB offers a considerable gain in computational efficiency

19: over the IB, over a range of convergence parameters. Our derivation

20: of the DIB also suggests a method for continuously interpolating between

21: the soft clustering of the IB and the hard clustering of the DIB.

22: \end{abstract}

23: