1: \begin{abstract}
2: Accurate predictions of defect formation energies are crucial to understanding defect properties in materials, such as favorable defect types and concentration, and to investigating the interplay between defects and other functional properties of materials. To overcome the computational expense of density-functional theory calculations in large supercells, several graph neural network (GNN) models were proposed to predict the defect formation energies. However, their performance is limited due to the fact that defect formation energies depend strongly on the local atomic configurations near the defect sites and to the over-smoothing problem of GNN. Herein, we demonstrate that persistent homology features, which characterize the topological structure of the local chemical environment around each atomic site, encode the structural information of defects (such as vacancies and substitutions), including the size of the defects, the number of defects near each atom, and the distance to those defects. Using the dataset comprising a wide spectrum of \ch{O}-based perovskites with vacancies of various elemental type as an example, we construct and train the GNN models to predict the neutral vacancy formation energies, and show that incorporating the persistent homology features, along with proper choices of the graph pooling operations, significantly increases the prediction accuracy and overcomes the non-convergence issue with respect to the supercell size in previous GNN models. Furthermore, using defective \ch{BaTiO3} with multiple substitutions and multiple vacancies as examples, our GNN model can also predict the defect-defect interaction energies in materials accurately. These results suggest that persistent homology features are highly effective in predicting defect-related properties and can be integrated into the vast family of advanced graph neural network models for future defect studies.
3: \end{abstract}
4: