1: \begin{abstract}
2: Large-scale simulations or scientific experiments produce petabytes
3: of data per run. This poses massive challenges for I/O and storage when
4: scientific analysis workflows are run manually offline.
5: Unsupervised deep learning-based techniques to extract patterns and non-linear
6: relations from these large amounts of data provide a way to build scientific
7: understanding from raw data, reducing the need for manual pre-selection of
8: analysis steps, but require exascale compute and memory to process the full dataset available.
9: In this paper, we demonstrate a heterogeneous streaming workflow in which plasma simulation data is streamed directly to a Machine Learning (ML) application training a model on the simulation data in-transit, completely circumventing the capacity-constrained filesystem bottleneck. This workflow employs openPMD to provide a high level interface to describe scientific data and also uses ADIOS2, to transfer volumes of data that exceed the capabilities of the filesystem.
10: We employ experience replay to avoid catastrophic forgetting in learning from this non-steady state process in a continual manner and adapt it to improve model convergence while
11: learning in-transit.
12: As a proof-of-concept we approach the ill-posed inverse problem of predicting
13: particle dynamics from radiation in a particle-in-cell (PIConGPU) simulation of
14: the Kelvin-Helmholtz instability (KHI).
15: We detail hardware-software co-design challenges as we scale PIConGPU to full Frontier, the Top-1 system as of June 2024 Top500 list.
16:
17:
18: \end{abstract}
19: