abstract:a1d58d4fad0a71b2.tex

1: \begin{abstract}

2: 	Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data.

3: 	However, we can also formulate error detection as a semi-supervised classification problem that only requires domain expertise.

4: 	The challenges for such an approach are twofold: (1)~to represent the data in a way that enables a classification model to identify various kinds of data errors, and (2)~to pick the most promising data values for learning.

5: 	In this paper, we address these challenges with \system{}, our new example-driven error detection method.

6: 	First, we present a new two-dimensional multi-classifier sampling strategy for active learning.

7: 	Second, we propose novel multi-column features.

8: 	The combined application of these techniques provides fast convergence of the classification task with high detection accuracy.

9: 	On several real-world datasets, \system{} requires, on average, less than 1\%~labels to outperform existing error detection approaches.

10:

11: 	This report extends the peer-reviewed paper \emph{ED2: A Case for Active Learning in Error Detection}~\cite{neutatz2019ed2}. All source code related to this project is available on GitHub\footnote{\url{https://github.com/BigDaMa/ExampleDrivenErrorDetection}}.

12: \end{abstract}

13: