1: \begin{abstract}
2: Empirical risk minimization over classes functions that are bounded for some version of the variation norm has a long history, starting with Total Variation Denoising \citep{Rudin_et_al_1992}, and has been considered by several recent articles, in particular \cite{Fang-Guntuboyina-Sen-2019} and \cite{vdL_2015}. In this article, we consider empirical risk minimization over the class $\mathcal{F}_d$ of càdlàg functions over $[0,1]^d$ with bounded sectional variation norm (also called Hardy-Krause variation).
3:
4: We show how a certain representation of functions in $\mathcal{F}_d$ allows to bound the bracketing entropy of sieves of $\mathcal{F}_d$, and therefore derive rates of convergence in nonparametric function estimation. Specifically, for sieves whose growth is controlled by some rate $a_n$, we show that the empirical risk minimizer has rate of convergence $O_P(n^{-1/3} (\log n)^{2(d-1)/3} a_n)$. Remarkably, the dimension only affects the rate in $n$ through the logarithmic factor, making this method especially appropriate for high dimensional problems.
5:
6: In particular, we show that in the case of nonparametric regression over sieves of càdlàg functions with bounded sectional variation norm, this upper bound on the rate of convergence holds for least-squares estimators, under the random design, sub-exponential errors setting.
7: \end{abstract}
8: