1: \begin{abstract}
2: \mixup{} is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at \mixup{} through a \emph{representation learning} lens in a semi-supervised learning setup.
3: In particular, we study the role of \mixup{} in promoting linearity in the learned network representations. Towards this, we study two questions:
4: (1) how does the \mixup{} loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?;
5: and
6: (2) how does the enforcement of stronger \mixup{} loss on more than two data points affect the convergence of training?
7: We empirically investigate these properties of \mixup{} on vision datasets such as CIFAR-10, CIFAR-100 and SVHN.
8: Our results show that supervised \mixup{} training does not make \emph{all} the network layers linear;
9: in fact the \emph{intermediate layers} become more non-linear during \mixup{} training compared to a network that is trained \emph{without} \mixup{}.
10: However, when \mixup{} is used as an unsupervised loss, we observe that all the network layers become more linear resulting in faster training convergence.
11: \end{abstract}
12: