abstract:5f8494342a16c422.tex

1: \begin{abstract}

2:   Foundation models (FMs) are pre-trained on

3: large-scale datasets and then

4: fine-tuned on a downstream task for a specific application.

5: The most successful and most commonly used

6: fine-tuning method is to update the pre-trained weights

7: via a low-rank adaptation (LoRA).

8: LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights.

9: Recent works focus on \emph{weight-driven} initialization or learning of adaptive ranks during training.

10: Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance.

11: We propose to enhance LoRA by initializing

12: the new weights in a \emph{data-driven} manner by computing

13: singular value decomposition on minibatches of activation vectors.

14: Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure.

15: This results in our new method \textbf{E}xplained \textbf{V}ariance \textbf{A}daptation (EVA).

16: We apply EVA to a variety of fine-tuning tasks ranging from

17: language generation and understanding to image classification and reinforcement learning.

18: EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

19: \end{abstract}

20: