1: \begin{abstract}
2: A multigrid scheme is proposed \textcolor{black}{for the pressure equation of the incompressible unsteady fluid flow equations, allowing} efficient implementation on \textcolor{black}{clusters of} modern CPUs, many integrated core devices (MICs), and graphics processing units (GPUs).
3: \textcolor{black}{It is shown that the total number of the synchronization events can be significantly reduced} when a deep, 2h grid hierarchy is replaced with a two-level scheme using 16h-32h restriction, fitting to the the width of the SIMD engine of modern CPUs and GPUs.
4: In addition, optimal memory transfer is also ensured, since no strided memory access is required.
5: \textcolor{black}{We report increasing arithmetic intensity of the smoothing steps when compared to the conventional additive correction multigrid (ACM), however it is counterbalanced in runtime by the decreasing number } of the expensive restriction steps.
6: \textcolor{black}{A systematic construction methodology for the coarse grid stencil is also presented that helps in moderating the excess arithmetic intensity associated with the aggressive coarsening.}
7: Our higher order interpolated stencil improves convergence rate via minimizing spurious interference between the coarse and the fine scale solutions.
8: The method is demonstrated on solving the pressure equation for 2D incompressible fluid flow:
9: The benchmark setups cover shear driven laminar flow in cavity, and direct numerical simulation (DNS) of a turbulent jet.
10: \textcolor{black}{We have compared our scheme to the ACM in terms of the arithmetic intensity of the iterations and the number of the synchronization calls required.
11: Also the strong scaling is plotted for our scheme} when using a hybrid OpenCl/MPI based parallelization.
12: %structured grid
13: \end{abstract}
14: