abstract:1185b0c33c8d07e4.tex

1: \begin{abstract}

2: Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

3: In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model.

4: Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.

5: % The amount of progress is even large enough for us to achieve the same performance with only one-third to two-third of training epochs for Conv-TasNet.

6: % Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.

7:

8: \end{abstract}

9: