abstract:652769866a471d12.tex

1: \begin{abstract}

2: This paper introduces the checkered regression model, a nonlinear generalization of logistic regression.

3: More precisely, this new binary classifier relies on the multivariate function $\frac{1}{2}\left( 1 + \tanh(\frac{z_1}{2})\times\dots\times\tanh(\frac{z_m}{2}) \right)$,

4: which coincides with the usual sigmoid function in the univariate case $m=1$.

5: While the decision boundary of logistic regression consists of a single hyperplane, our method is shown to tessellate the feature space by

6: any given number $m\ge 1$ of hyperplanes and then to ``color'' (say, ``white'' for predicted label $1$ and ``black'' for $0$) the resulting tiles in a checkered fashion.

7: In the bivariate case $m=2$, it leads to a smooth XOR model obtained by substituting the Boolean variables (either $0$ or $1$), that are used in

8: the XOR logical operator, by sigmoids taking continuous values in the whole interval $(0,1)$.

9: In particular, we show that our smooth XOR model significantly outperforms standard ReLU neural networks (and uses much less parameters)

10: on nonlinear classification tasks such as the XOR Gaussian mixture problem.

11: In order to fit the model's parameters to some labeled data, we perform a classic empirical risk minimization framework based on

12: the cross-entropy loss. % that can be optimized through stochastic gradient descent.

13: A multiclass version of our approach is also proposed: it is defined as the circular convolution of several SoftArgMax vectors

14: (which can be computed efficiently via fast Fourier transform).

15: Lastly, we provide a global convergence analysis of gradient descent holding for a class of functions that includes the loss

16: of our model, namely functionals equal to the negative logarithm of a sum of log-concave functions.

17: \end{abstract}

18: