1: \begin{abstract}
2: We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations.
3: Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case.
4: Assuming sufficient regularity, an a priori analysis shows
5: that solutions obtained by this approach are unique and have
6: asymptotically optimal order convergence in the $H^1$- and the $L^2$-norm
7: on hierarchical hybrid grids.
8: For the pre-asymptotic regime, we present a local modification that guarantees uniform ellipticity of the operator.
9: Cost considerations show that our novel approach requires roughly one third of the floating-point operations compared to a classical finite element assembly scheme employing nodal integration.
10: Our theoretical considerations are illustrated by numerical tests
11: that confirm the expectations with respect to accuracy and run-time.
12: A large scale application with more than a
13: hundred billion ($1.6\cdot10^{11}$) % Potenzen im Abstract gehen IMMER irgend wann verloren
14: degrees of freedom executed on 14\,310 compute cores
15: demonstrates the efficiency of the new scaling approach.
16: \end{abstract}
17: