abstract:b7dc776116bf8fa8.tex

1: \begin{abstract}

2: This work presents $\BAdam$, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. $\BAdam$ offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply $\BAdam$ to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using \emph{a single RTX3090-24GB GPU}. The results indicate that $\BAdam$ exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that $\BAdam$ modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare $\BAdam$ with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that $\BAdam$ is capable of narrowing the performance gap with Adam. Our code is available at \url{https://github.com/Ledzy/BAdam}.

3: \end{abstract}

4: