1: \begin{abstract}
2: In this paper we study \emph{shallow} neural network functions which are linear combinations of compositions of activation and \emph{quadratic} functions, replacing standard
3: \emph{affine linear} functions, often called neurons.
4: We show the universality of this approximation and prove convergence rates results based on the theory of wavelets and statistical learning.
5: We show for simple test cases that this ansatz requires a smaller numbers of neurons than standard affine linear neural networks. Moreover, we investigate the efficiency of this approach for clustering tasks with the MNIST data set.
6: Similar observations are made when comparing \emph{deep (multi-layer)} networks.
7: \end{abstract}
8: