abstract:d1b27b6084ebce67.tex

1: \begin{abstract}

2: Training a classifier over a large number of classes, known as 'extreme classification',

3: has become a topic of major interest with applications

4: in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost

5: proportional to the number of classes~$C$, which often is prohibitively expensive.

6: A popular scalable softmax approximation relies on uniform negative

7: sampling, which suffers from slow convergence due a poor signal-to-noise ratio.

8: In this paper, we propose a simple training method for drastically enhancing the gradient signal

9: by drawing negative samples from an adversarial model that mimics the data distribution.

10: Our contributions are three-fold: (i)~an adversarial sampling mechanism that

11: produces negative samples at a cost only logarithmic in~$C$, thus still resulting in cheap gradient updates;

12: (ii)~a mathematical proof that this adversarial sampling minimizes the gradient

13: variance while any bias due to non-uniform sampling can be removed;

14: (iii)~experimental results on large scale data sets that

15: show a reduction of the training time by an order of magnitude relative to several competitive baselines.

16: \end{abstract}

17: