1: \begin{abstract}
2: The \lion{} optimizer
3: has been a promising competitor with the \adamw{}
4: for training large AI models,
5: with advantages on memory, computation, and sample efficiency.
6: In this paper, we introduce \mavolion{}, an innovative adaptation of \lion{} for distributed training environments.
7: Leveraging the sign operator in \lion{},
8: our \mavolion{}
9: only requires to
10: communicate binary or lower-precision vectors
11: between workers to the center server,
12: significantly reducing the communication cost.
13: Our theoretical analysis confirms \mavolion{}'s convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, \mavolion{} attains comparable performance to standard \lion{} or \adamw{} optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that \mavolion{} presents a more favorable performance-bandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.
14: \end{abstract}
15: