1: \begin{abstract}
2: We consider a general non-linear model where the signal is a finite
3: mixture of an unknown, possibly increasing, number of features issued
4: from a continuous dictionary parameterized by a real non-linear
5: parameter. The signal is observed with Gaussian (possibly
6: correlated) noise in either a continuous or a discrete setup. We
7: propose an off-the-grid optimization method, that is, a method which
8: does not use any discretization scheme on the parameter space, to
9: estimate both the non-linear parameters of the features and the
10: linear parameters of the mixture.
11:
12:
13: We use recent results on the geometry of off-the-grid methods
14: %, see Poon, Keriven and Peyr\'e (2020),
15: to give minimal separation on the true underlying non-linear parameters such that interpolating certificate functions
16: can be constructed. Using also tail bounds for suprema of Gaussian
17: processes we bound the prediction error with high probability. Assuming that the certificate functions can be constructed, our prediction error bound is up to $\log-$factors similar to the rates attained by the Lasso predictor in the linear regression model. We also establish
18: convergence rates that quantify with high probability the quality of
19: estimation for both the linear and the non-linear parameters.
20:
21: \end{abstract}
22: