569d06318b487ca0.tex
1: \begin{abstract}
2: Diffusion models often suffer from slow convergence, and current efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), primarily focus on fine-tuning pre-trained models. 
3: However, these methods face limitations when adapting models to varying sizes for real-world deployment.
4: We propose FINE, a method based on the \textit{Learngene} framework, which leverages pre-trained models for initializing downstream networks, taking into account both model size and task-specific requirements.
5: FINE decomposes pre-trained knowledge into a product of matrices $U$, $\Sigma$, and $V$, where $U$ and $V$ are shared across network blocks, and $\Sigma$ is layer-specific. This approach enables flexible recombination of knowledge tailored to different tasks and model sizes. 
6: During initialization, FINE fine-tunes $\Sigma$ using a small subset of data while keeping the learngene parameters fixed, marking the first approach to integrate both task and size considerations in initialization.
7: We present a comprehensive benchmark for evaluating learngene-based methods on image generation tasks and demonstrate FINE's superior performance. FINE consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across various model sizes. It also provides significant computational and storage savings, approximately $3n\times$ and $5\times$, respectively, where $n$ is the number of models. Additionally, FINE's ability to adapt to both model size and task yields an average performance improvement of 1.8\% and 1.2\% across multiple downstream datasets, demonstrating its versatility and efficiency.
8: \end{abstract}
9: