abstract:b7a86f9a644948e0.tex

1: \begin{abstract}

2: Large-scale simulations or scientific experiments produce petabytes

3: of data per run. This poses massive challenges for I/O and storage when

4: scientific analysis workflows are run manually offline.

5: Unsupervised deep learning-based techniques to extract patterns and non-linear

6: relations from these large amounts of data provide a way to build scientific

7: understanding from raw data, reducing the need for manual pre-selection of

8: analysis steps, but require exascale compute and memory to process the full dataset available.

9: In this paper, we demonstrate a heterogeneous streaming workflow in which plasma simulation data is streamed directly to a Machine Learning (ML) application training a model on the simulation data in-transit, completely circumventing the capacity-constrained filesystem bottleneck. This workflow employs openPMD to provide a high level interface to describe scientific data and also uses ADIOS2, to transfer volumes of data that exceed the capabilities of the filesystem.

10: We employ experience replay to avoid catastrophic forgetting in learning from this non-steady state process in a continual manner and adapt it to improve model convergence while

11: learning in-transit.

12: As a proof-of-concept we approach the ill-posed inverse problem of predicting

13: particle dynamics from radiation in a particle-in-cell (PIConGPU) simulation of

14: the Kelvin-Helmholtz instability (KHI).

15: We detail hardware-software co-design challenges as we scale PIConGPU to full Frontier, the Top-1 system as of June 2024 Top500 list.

16:

17:

18: \end{abstract}

19: