abstract:1ac09f6272ab8f85.tex

1: \begin{abstract}

2:   In this work, we study to release the potential of massive heterogeneous weak computing power to collaboratively train large-scale models  on dispersed datasets.

3:   % To the best of our knowledge,

4:   In order to improve both efficiency and accuracy in resource-adaptive collaborative learning,

5:   we take the first step to consider the \textit{unstructured pruning}, \textit{varying submodel architectures}, \textit{knowledge loss}, and \textit{straggler} challenges simultaneously.

6:   % in a united framework.

7: % we design a framework ${Co\text{-}S}^2{P}$ to release the potential of various resource-limited computing power and dispersed datasets to train one single large-scale model collaboratively.

8: % In detail, ${Co\text{-}S}^2{P}$ is

9: We propose a novel semi-asynchronous collaborative training framework, namely ${Co\text{-}S}^2{P}$, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns.

10: Furthermore, we provide theoretical proof that ${Co\text{-}S}^2{P}$ can achieve asymptotic optimal convergence rate of $O(1/\sqrt{N^*EQ})$.

11: Finally, we conduct extensive experiments on a real-world hardware testbed, in which 16 heterogeneous Jetson devices can be united to train large-scale models with parameters up to 0.11 billion.

12: The experimental results demonstrate that $Co\text{-}S^2P$ improves accuracy by up to 8.8\% and resource utilization by up to 1.2$\times$ compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.

13: \end{abstract}

14: