abstract:c44780ad636fa867.tex

1: \begin{abstract}

2: Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems.

3: However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored.

4: While some increasingly weighted averaging schemes have  been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error.

5: It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and  generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}.

6: In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and non-convex objective functions in terms of both optimization error and generalization error.

7: In particular, we analyze a family of increasingly weighted averaging, where the weight for the solution at iteration $t$ is proportional to $t^{\alpha}$ ($\alpha > 0$).

8: We show how $\alpha$ affects the optimization error and the generalization error, and exhibit the trade-off caused by $\alpha$.

9: Experiments have demonstrated this trade-off and the effectiveness of polynomially increased weighted averaging compared with other averaging schemes  for a wide range of problems including deep learning.

10: \end{abstract}

11: