52fdd92f48f8428e.tex
1: \begin{abstract}
2: We evaluate AI-assisted generative capabilities 
3: on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C\texttt{++} (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numba, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and KernelAbstractions.jl). 
4: We use the GitHub Copilot capabilities powered by OpenAI Codex available in Visual Studio Code as of April 2023 to generate a vast amount of implementations given simple \texttt{<kernel> + <programming model> + <optional hints>} prompt variants. To quantify and compare the results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. 
5: Results suggest that the OpenAI Codex outputs for C\texttt{++} correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. 
6: We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding code keywords, while Julia prompts perform acceptably well for its mature programming models (e.g., Threads and CUDA.jl). 
7: We expect for these benchmarks to provide a point of reference for each programming model's community.
8: Overall, understanding the convergence of large language models, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions.
9: \end{abstract}
10: