1185b0c33c8d07e4.tex
1: \begin{abstract}
2: Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.
3: In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model.
4: Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
5: % The amount of progress is even large enough for us to achieve the same performance with only one-third to two-third of training epochs for Conv-TasNet.
6: % Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.
7: 
8: \end{abstract}
9: