1: \begin{abstract}
2:
3: Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning.
4: This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex optimization problems.
5: Instead of the actual optimization problem, a series of smoothed optimization problems that can be achieved in various ways are solved in the graduated optimization approach.
6: In this work, a formal formulation of the graduated optimization is provided based on the nonnegative approximate identity, which generalizes the idea of Gaussian smoothing.
7: Also, an asymptotic convergence result is achieved with the techniques in variational analysis.
8: Then, we show that the traditional SGD method can be applied to solve the smoothed optimization problem.
9: The Monte Carlo integration is used to achieve the gradient in the smoothed problem, which may be consistent with distributed computing schemes in real-life applications.
10: From the assumptions on the actual optimization problem, the convergence results of SGD for the smoothed problem can be derived straightforwardly.
11: Numerical examples show evidence that the graduated optimization approach may provide more accurate training results in certain cases.
12:
13: \end{abstract}
14: