58d1948f89d51937.tex
1: \begin{abstract}
2: Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially improve the performance of underrepresented clients without a significant trade-off in overall accuracy. This improvement is attributed to the Vision transformer's ability to capture long-range dependencies within the input data.
3: 
4: 
5: % In this paper, we developed a federated learning method, FedMHA, to tackle the challenges of data heterogeneity and fairness in the medical domain. The method employs the Vision Transformer architecture and Multi-Head Attention mechanism, and its performance is assessed on lung cancer CT scans with various levels of data heterogeneity, as denoted by distinct $\alpha_{\mathrm{LDA}}$ values. A thorough comparison of FedMHA with other advanced federated learning models, such as FedAvg, FedAvg ResNet, FedBN, and FedProx, is conducted. The results indicate that the alignment of encoder results performs better than these models, especially in highly heterogeneous settings, and is more appropriate for large-scale federated learning situations involving a greater number of clients. Additionally, the study examines the effect of incorporating weighted averaging during model aggregation, revealing that taking into account clients' training sample size can improve the overall accuracy of the federated learning system. This research underscores the significance of addressing fairness and generalization in federated learning for medical image analysis and promotes further investigation in this field.
6: \end{abstract}
7: