abstract:a7f03a80e4deaea2.tex

1: \begin{abstract}%

2: Diffusion models have shown exceptional capabilities in generating realistic videos.

3: Yet, their training has been predominantly confined to offline environments where models can repeatedly train on i.i.d. data to convergence.

4: % This work explores the feasibility of training diffusion models from a semantically continuous video stream, where video frames arrive sequentially and consecutive frames are highly correlated.

5: This work explores the feasibility of training diffusion models from a semantically continuous video stream, where correlated video frames sequentially arrive one at a time.

6: To investigate this, we introduce two novel continual video generative modeling benchmarks, \textit{Lifelong Bouncing Balls} and \textit{Windows 95 Maze Screensaver}, each containing over a million video frames generated from navigating stationary environments.

7: Surprisingly, our experiments show that diffusion models can be effectively trained online using experience replay, achieving performance comparable to models trained with i.i.d. samples given the same number of gradient steps.

8: \end{abstract}

9: