1: \begin{abstract}
2: We examine what is an efficient and scalable nonlinear solver, with low work and memory complexity, for many classes of discretized partial differential equations (PDEs) -- matrix-free Full multigrid (FMG) with a Full Approximation Storage (FAS) -- in the context of current trends in computer architectures.
3: Brandt proposed an extremely low memory FMG-FAS algorithm in the 1970s that has several attractive properties for reducing costs on modern -- memory centric -- machines and has not been developed.
4: This method, \textit{segmental refinement} (SR), has very low memory requirements because the finest grids need not be held in memory at any one time but can be ``swept" through, computing coarse grid correction and any quantities of interest, allowing for orders of magnitude reduction in memory usage.
5: This algorithm has two useful ideas for effectively exploiting future architectures: improved data locality and reuse via ``vertical" processing of the multigrid algorithms and the method of $\tau$-corrections, which allows for not storing the fine grid(s).
6: This report develops a parallel generalization of the original sweeping technique and explores algorithmic details with the 1D model problem.
7: We show that FMG-FAS-SR can work as originally predicted, solving systems accurately enough to maintain the convergence rate of the discretization with one FMG iteration, and that the parallel algorithm provides a natural approach to fully exploiting the available parallelism of FMG.
8: The parallel algorithm is naturally expressed in asynchronous data-driven programming models, which is responsive to current directions in programming models for extreme scale machines.
9: \end{abstract}
10: