abstract:25280d2e60c7a4fa.tex

1: \begin{abstract}

2: We introduce \sysname (\shortname), a simple and effective methodology for highly efficient and accurate LLM training with extremely long sequences. \shortname partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage. Integrated with activation recomputation, it enables significant memory savings in both forward and backward passes. In experiments with the Llama3-8B model, with \shortname, we measure no degradation in throughput or convergence even with 12x longer sequences than standard implementations due to our careful memory optimizations. \shortname is fully general, implementation-agnostic, and requires minimal code changes to integrate with existing LLM training frameworks. %Moreover, It also supported scale to extremely long sequences in distributed settings.

3: \end{abstract}

4: