abstract:5bf0b93da79f0ea1.tex

1: \begin{abstract}

2: The superior performance of ensemble methods with infinite models are well known.

3: Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization,

4: % for instance, kernel methods use $L^2$-regularization and boosting methods use $L^1$-regularization with the non-negative constraint.

5: for instance, boosting methods and convex neural networks use $L^1$-regularization with the non-negative constraint.

6: % Since handling $L^1$-regularization is more difficult than handling $L^2$-regularization, most boosting methods rely on early stopping to approximately solve

7: However, due to the difficulty of handling $L^1$-regularization, these problems require early stopping or a rough approximation to solve it inexactly.

8: In this paper, we propose a new ensemble learning method that performs in a space of probability measures,

9: that is, our method can handle the $L^1$-constraint and the non-negative constraint in a rigorous way.

10: Such an optimization is realized by proposing a general purpose stochastic optimization method for learning probability measures via parameterization using transport maps on base models.

11: As a result of running the method, a transport map to output an {\it infinite ensemble} is obtained, which forms a residual-type network.

12: From the perspective of functional gradient methods, we give a convergence rate as fast as that of a stochastic optimization method for finite dimensional nonconvex problems.

13: Moreover, we show an {\it interior optimality property} of a local optimality condition used in our analysis.

14: \end{abstract}

15: