c44780ad636fa867.tex
1: \begin{abstract}
2: Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. 
3: However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored.
4: While some increasingly weighted averaging schemes have  been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error.
5: It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and  generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}.
6: In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and non-convex objective functions in terms of both optimization error and generalization error.
7: In particular, we analyze a family of increasingly weighted averaging, where the weight for the solution at iteration $t$ is proportional to $t^{\alpha}$ ($\alpha > 0$).
8: We show how $\alpha$ affects the optimization error and the generalization error, and exhibit the trade-off caused by $\alpha$.
9: Experiments have demonstrated this trade-off and the effectiveness of polynomially increased weighted averaging compared with other averaging schemes  for a wide range of problems including deep learning. 
10: \end{abstract}
11: