abstract:6d49e00be83a2793.tex

1: \begin{abstract}

2: 	Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while --- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence.

3:    For convex objectives,

4:    %we show that it is the gradient variance envelope that dictates whether frequent averaging is beneficial.

5:    we show the benefit of frequent averaging depends on the gradient variance envelope.

6:    For non-convex objectives, we illustrate that this benefit depends on the presence of multiple optimal points.

7:    %has a desirable property when the goal is a low variance solution.

8:    We complement our findings with multicore experiments on both synthetic and

9:    real data.

10:   \iflong

11:   { \color{green}

12:   On one extreme,

13:   one takes a single average at the end of execution; a method referred to

14:   as one-shot averaging. On the other extreme, models are averaged after

15:   every step. This is equivalent to mini-batching. Intuitively, the former

16:   is hardware efficient, while the latter can lead to convergence in fewer steps.

17:   More generally, one can choose to average the models after any number

18:   of steps -- a parameter that lets us explore the full spectrum of this

19:   hardware efficiency vs. statistical efficiency trade-off.

20:     The question then

21:   becomes: how frequently should we average to optimize for wall clock

22:   time? We share some analytic insight on the geometry of the objective

23:   function. If the variance of evaluated gradients grows far

24:   from the optimum, frequent averaging improves statistical efficiency.

25:   Otherwise, it is as good as one-shot averaging, while incurring extra

26:   communication costs at the expense hardware efficiency. We support these

27:   insights in a set of experiments.

28:   }

29:   \fi

30:

31: \end{abstract}

32: