9e2870a721631ca9.tex
1: \begin{abstract}
2: In many numerical simulations stochastic gradient descent (SGD) type optimization methods 
3: perform very effectively in the training of deep neural networks (DNNs) but till this day 
4: it remains an open problem of research to provide a mathematical convergence analysis 
5: which rigorously explains the success of SGD type optimization methods in the training 
6: of DNNs. In this work we study SGD type optimization methods in the training of 
7: fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. 
8: We first establish general regularity properties for the risk functions and 
9: their generalized gradient functions appearing in the training of such DNNs 
10: and, thereafter, we investigate the plain vanilla SGD optimization method 
11: in the training of such DNNs under the assumption that the target function 
12: under consideration is a constant function. 
13: Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) 
14: are sufficiently small but not $L^1$-summable and under the assumption that 
15: the target function is a constant function that the expectation of the risk
16: of the considered SGD process converges in the training of such DNNs to zero 
17: as the number of SGD steps increases to infinity.
18: \end{abstract}
19: