d54fba55a66123c0.tex
1: \begin{abstract}      
2:     Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification.
3:     In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap.
4:     Statistically, our estimator achieves parametric convergence rate and asymptotic normality, which enables valid inference. 
5:     Computationally, our method is efficient and can be accelerated by parallel computing and leveraging pre-storing intermediate results.
6:     On simulated datasets with known ground truth, we show our approach effectively controls type I error and achieves  power comparable with commonly used metrics.
7:     Furthermore, we demonstrate the performance of our method in evaluating diffusion models on real image datasets with statistical confidence.    
8: \end{abstract}
9: