1: \begin{abstract}
2: Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference.
3: In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions.
4: %thus is a promising candidate for learning distributions subject to structured sparsity constraints. Motivated with its success, we consider IHT for learning sparse discrete distributions.
5: We study the hardness of using IHT on the space of measures.
6: As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings.
7: Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions.
8: \end{abstract}
9: