a1d58d4fad0a71b2.tex
1: \begin{abstract}
2: 	Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data. 
3: 	However, we can also formulate error detection as a semi-supervised classification problem that only requires domain expertise. 
4: 	The challenges for such an approach are twofold: (1)~to represent the data in a way that enables a classification model to identify various kinds of data errors, and (2)~to pick the most promising data values for learning.
5: 	In this paper, we address these challenges with \system{}, our new example-driven error detection method. 
6: 	First, we present a new two-dimensional multi-classifier sampling strategy for active learning.
7: 	Second, we propose novel multi-column features. 
8: 	The combined application of these techniques provides fast convergence of the classification task with high detection accuracy. 
9: 	On several real-world datasets, \system{} requires, on average, less than 1\%~labels to outperform existing error detection approaches.
10: 	
11: 	This report extends the peer-reviewed paper \emph{ED2: A Case for Active Learning in Error Detection}~\cite{neutatz2019ed2}. All source code related to this project is available on GitHub\footnote{\url{https://github.com/BigDaMa/ExampleDrivenErrorDetection}}.
12: \end{abstract}
13: