abstract:d54fba55a66123c0.tex

1: \begin{abstract}

2:     Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification.

3:     In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap.

4:     Statistically, our estimator achieves parametric convergence rate and asymptotic normality, which enables valid inference.

5:     Computationally, our method is efficient and can be accelerated by parallel computing and leveraging pre-storing intermediate results.

6:     On simulated datasets with known ground truth, we show our approach effectively controls type I error and achieves  power comparable with commonly used metrics.

7:     Furthermore, we demonstrate the performance of our method in evaluating diffusion models on real image datasets with statistical confidence.

8: \end{abstract}

9: