8fa0ea9192fe128f.tex
1: \begin{abstract}
2: 
3: %\vspace*{-.04cm}
4: 
5: 
6: We show that the herding procedure of~\citet{welling2009herding} takes exactly the form of a standard convex optimization algorithm---namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations.
7: Our experiments shed more light on the learning bias of  herding: they indicate that
8: while we can improve over herding on the task of approximating
9: integrals, the original herding algorithm
10: approaches more often the maximum
11: entropy distribution.
12: 
13: \vspace*{-.1cm}
14: 
15: 
16: \end{abstract}
17: