1: \begin{abstract}
2: In this paper, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems.
3: The methods described here allow for the construction of effective preconditioners for high-order problems with optimal memory usage and computational complexity.
4: The preconditioners are based on the construction of a spectrally equivalent low-order discretization on a refined mesh, which is then amenable to, for example, algebraic multigrid preconditioning.
5: The constants of equivalence are independent of mesh size and polynomial degree.
6: For vector finite element problems in $\Hcurl$ and $\Hdiv$ (e.g.\ for electromagnetic or radiation diffusion problems) a specially constructed interpolation--histopolation basis is used to ensure fast convergence.
7: Detailed performance studies are carried out to analyze the efficiency of the GPU algorithms.
8: The kernel throughput of each of the main algorithmic components is measured, and the strong and weak parallel scalability of the methods is demonstrated.
9: The different relative weighting and significance of the algorithmic components on GPUs and CPUs is discussed.
10: Results on problems involving adaptively refined nonconforming meshes are shown, and the use of the preconditioners on a large-scale magnetic diffusion problem using all spaces of the finite element de Rham complex is illustrated.
11: \end{abstract}
12: