1: \begin{abstract}
2: The aggregation and denoising of crowd labeled data is a task
3: that has gained increased significance with the advent of
4: crowdsourcing platforms and massive datasets. In this paper, we
5: propose a permutation-based model for crowd labeled data that is a
6: significant generalization of the common Dawid-Skene model, and
7: introduce a new error metric by which to compare different estimators.
8: Working in a high-dimensional non-asymptotic framework that allows
9: both the number of workers and tasks to scale, we derive optimal rates
10: of convergence for the permutation-based model. We show that the
11: permutation-based model offers significant robustness in estimation
12: due to its richness, while surprisingly incurring only a small
13: additional statistical penalty as compared to the Dawid-Skene model.
14: Finally, we propose a computationally-efficient method, called the
15: {\sc OBI-WAN} estimator, that is uniformly optimal over a class
16: intermediate between the permutation-based and the Dawid-Skene models,
17: and is uniformly consistent over the entire permutation-based model
18: class. In contrast, the guarantees for estimators available in prior
19: literature are sub-optimal over the original Dawid-Skene model.
20: \end{abstract}
21: