abstract:660683a87f84ab4b.tex

1: \begin{abstract}

2: Stochastic gradient descent (with a mini-batch) is one of the most common iterative algorithms used in machine learning. While being computationally cheap to implement, recent literature suggests that it may also have implicit regularization properties that prevent overfitting. This paper analyzes the properties of stochastic gradient descent from a theoretical standpoint to help bridge the gap between theoretical and empirical results. Assuming smoothness, the Polyak-\L ojasiewicz inequality, and sub-gaussian stochasticity, we prove high probability bounds on the convergence rate, matching existing expected bounds. As an application of our convergence results, we combine them with existing uniform stability and generalization bounds to bound the true risk of the iterates of stochastic gradient descent. We find that for a certain number of epochs, the convergence and generalization balance in such a way that the true risk bound goes to the empirical minimum as the number of samples goes to infinity.

3: \end{abstract}

4: