abstract:1b0e6b5791ee49a2.tex

1: \begin{abstract}

2: Large-scale public datasets with high-quality annotations are rarely available for intelligent medical imaging research, due to data privacy concerns and the cost of annotations.

3: In this paper, we release SynFundus-1M, a high-quality synthetic dataset containing over \textbf{One million} fundus images in terms of \textbf{eleven disease types}.

4: Furthermore, we deliberately assign four readability labels to the key regions of the fundus images.

5: To the best of our knowledge, SynFundus-1M is currently the largest fundus dataset with the most sophisticated annotations.

6: Leveraging over 1.3 million private authentic fundus images from various scenarios, we trained a powerful Denoising Diffusion Probabilistic Model, named SynFundus-Generator.

7: The released SynFundus-1M are generated by SynFundus-Generator under predefined conditions.

8: To demonstrate the value of SynFundus-1M, extensive experiments are designed in terms of the following aspect:

9: 1) Authenticity of the images: we randomly blend the synthetic images with authentic fundus images, and find that experienced annotators can hardly distinguish the synthetic images from authentic ones. Moreover, we show that the disease-related vision features (e.g. lesions) are well simulated in the synthetic images.

10: 2) Effectiveness for down-stream fine-tuning and pretraining: we demonstrate that retinal disease diagnosis models of either convolutional neural networks (CNN) or Vision Transformer (ViT) architectures can benefit from SynFundus-1M, and compared to the datasets commonly used for pretraining, models trained on SynFundus-1M not only achieve superior performance but also demonstrate faster convergence on various downstream tasks.

11: SynFundus-1M is already public available for the open-source community.

12:

13: \keywords{Fundus image\and Synthetic images\and Pretraining}

14: \end{abstract}

15: