abstract:eb1191dd6b721150.tex

1: \begin{abstract}

2:   Entity resolution targets at identifying records that represent the same real-world entity from one or more datasets. A major challenge in learning-based entity resolution is how to reduce the label cost for training. Due to the quadratic nature of record pair comparison, labeling is a costly task that requires a significant effort from human experts. However, without sufficient training data, a powerful machine learning model may be overfitting. This challenge is further aggravated when the underlying data distribution is highly imbalanced, which commonly occurs in entity resolution applications. Inspired by recent advances of generative adversarial network (GAN), in this paper, we propose a novel deep learning method, called \textsc{ErGAN}, to address the challenge. \textsc{ErGAN} consists of two key components: a label generator and a discriminator which are optimized alternatively through adversarial learning. To alleviate the issues of overfitting and highly imbalanced distribution, we design two novel modules for diversity and propagation, which can greatly improve the model generalization power. We theoretically prove that \textsc{ErGAN} can overcome the model   collapse and convergence problems in the original GAN. We also conduct extensive experiments to empirically verify the labeling and learning efficiency of \textsc{ErGAN}. The experimental results show that \textsc{ErGAN} beats all state-of-the-art baselines, including unsupervised, semi-supervised, and unsupervised learning methods.

3:   \end{abstract}

4: