abstract:9e2870a721631ca9.tex

1: \begin{abstract}

2: In many numerical simulations stochastic gradient descent (SGD) type optimization methods

3: perform very effectively in the training of deep neural networks (DNNs) but till this day

4: it remains an open problem of research to provide a mathematical convergence analysis

5: which rigorously explains the success of SGD type optimization methods in the training

6: of DNNs. In this work we study SGD type optimization methods in the training of

7: fully-connected feedforward DNNs with rectified linear unit (ReLU) activation.

8: We first establish general regularity properties for the risk functions and

9: their generalized gradient functions appearing in the training of such DNNs

10: and, thereafter, we investigate the plain vanilla SGD optimization method

11: in the training of such DNNs under the assumption that the target function

12: under consideration is a constant function.

13: Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method)

14: are sufficiently small but not $L^1$-summable and under the assumption that

15: the target function is a constant function that the expectation of the risk

16: of the considered SGD process converges in the training of such DNNs to zero

17: as the number of SGD steps increases to infinity.

18: \end{abstract}

19: