abstract:99424837514308f7.tex

1: \begin{abstract}

2: \begin{comment}

3: deployment challenge

4: -> quantization

5:

6: -> 4 bit as a standard, sub4bits, huge accuracy loss compared to full precision.

7: ( -> exsiting works PTQ + QAT, still a lot of potential for improvement)->

8:

9: we propose xxx to unleash xxx, particularly in extreme low bit (3,2) settings

10: \end{comment}

11:

12: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.

13: Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands.

14: This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit).

15: Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clipping technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance.

16: Empirical evaluations demonstrate that BitDistiller significantly surpasses existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks.

17: Notably, BitDistiller is shown to be more cost-effective, demanding fewer data and training resources. The code is available at \url{https://github.com/DD-DuDa/BitDistiller}.

18:

19: %BitDistiller tackles two fundamental challenges in low-bit QAT with KD: preserving the fidelity of quantized weights and effectively transferring knowledge in distillation.

20:

21:

22: \end{abstract}

23: