abstract:6ac931ddd0f26051.tex

1: \begin{abstract}

2: %弱化bench mark 为目标，强化权重分析，benchmark 应该只用于最后的评价

3: %启发式动态调整同时考虑到原始分布D，以及目标分布D'，按照来源来S调整

4: %首个针对34B级别以上模型，包括70B模型的权重收敛性分析

5: %%This paper introduces the HeuriMentor (HM) Framework, designed to enhance the efficiency of Large Language Model (LLM) training.

6: %The framework encompasses the Aquila2 series, which includes bilingual models ranging from 7 to 70 billion parameters.

7: This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion.

8: These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU), allows for precise monitoring of the model's training progress and enables efficient optimization of data distribution, thereby enhancing training effectiveness. Extensive evaluations show that the Aquila2 model series performs comparably well on both English and Chinese benchmarks. Specifically, Aquila2-34B demonstrates only a slight decrease in performance when quantized to Int4. Furthermore, we have made our training code\footnote{https://github.com/FlagOpen/FlagScale} and model weights\footnote{https://github.com/FlagAI-Open/Aquila2} publicly available to support ongoing research and the development of applications.

9: \end{abstract}

10: