1: \begin{abstract}%
2: Diffusion models have shown exceptional capabilities in generating realistic videos.
3: Yet, their training has been predominantly confined to offline environments where models can repeatedly train on i.i.d. data to convergence.
4: % This work explores the feasibility of training diffusion models from a semantically continuous video stream, where video frames arrive sequentially and consecutive frames are highly correlated.
5: This work explores the feasibility of training diffusion models from a semantically continuous video stream, where correlated video frames sequentially arrive one at a time.
6: To investigate this, we introduce two novel continual video generative modeling benchmarks, \textit{Lifelong Bouncing Balls} and \textit{Windows 95 Maze Screensaver}, each containing over a million video frames generated from navigating stationary environments.
7: Surprisingly, our experiments show that diffusion models can be effectively trained online using experience replay, achieving performance comparable to models trained with i.i.d. samples given the same number of gradient steps.
8: \end{abstract}
9: