abstract:8fa0ea9192fe128f.tex

1: \begin{abstract}

2:

3: %\vspace*{-.04cm}

4:

5:

6: We show that the herding procedure of~\citet{welling2009herding} takes exactly the form of a standard convex optimization algorithm---namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations.

7: Our experiments shed more light on the learning bias of  herding: they indicate that

8: while we can improve over herding on the task of approximating

9: integrals, the original herding algorithm

10: approaches more often the maximum

11: entropy distribution.

12:

13: \vspace*{-.1cm}

14:

15:

16: \end{abstract}

17: