abstract:bf4f89a9b8ebf28b.tex

1: \begin{abstract}

2: Diffusion Transformers (DiT) have attracted significant attention in research. However, they suffer from a slow convergence rate. In this paper, we aim to accelerate DiT training without any architectural modification.

3: We identify the following issues in the training process: firstly, certain training strategies do not consistently perform well across different data. Secondly, the effectiveness of supervision at specific timesteps is limited.

4: In response, we propose the following contributions:

5: (1) We introduce a new perspective for interpreting the failure of the strategies. Specifically, we slightly extend the definition of Signal-to-Noise Ratio (SNR) and suggest observing the Probability Density Function (PDF) of SNR to understand the essence of the data robustness of the strategy.

6: (2) We conduct numerous experiments and report over one hundred experimental results to empirically summarize a unified accelerating strategy from the perspective of PDF.

7: (3) We develop a new supervision method that further accelerates the training process of DiT.

8: Based on them, we propose \textbf{FasterDiT}, an exceedingly simple and practicable design strategy. With few lines of code modifications, it achieves 2.30 FID on ImageNet at 256$\times$256 resolution with 1000 iterations, which is comparable to DiT (2.27 FID) but 7$\times$ faster in training.

9: \end{abstract}

10: