abstract:297fd795b4e8874c.tex

1: \begin{abstract}

2: The aggregation and denoising of crowd labeled data is a task

3: that has gained increased significance with the advent of

4: crowdsourcing platforms and massive datasets.  In this paper, we

5: propose a permutation-based model for crowd labeled data that is a

6: significant generalization of the common Dawid-Skene model, and

7: introduce a new error metric by which to compare different estimators.

8: Working in a high-dimensional non-asymptotic framework that allows

9: both the number of workers and tasks to scale, we derive optimal rates

10: of convergence for the permutation-based model. We show that the

11: permutation-based model offers significant robustness in estimation

12: due to its richness, while surprisingly incurring only a small

13: additional statistical penalty as compared to the Dawid-Skene model.

14: Finally, we propose a computationally-efficient method, called the

15: {\sc OBI-WAN} estimator, that is uniformly optimal over a class

16: intermediate between the permutation-based and the Dawid-Skene models,

17: and is uniformly consistent over the entire permutation-based model

18: class. In contrast, the guarantees for estimators available in prior

19: literature are sub-optimal over the original Dawid-Skene model.

20: \end{abstract}

21: