abstract:bd98fa584105a45d.tex

1: \begin{abstract}

2:   We propose to learn surrogate functions of universal speech priors for determined blind speech separation.

3:   Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible.

4:   Instead, we do away with exact majorization and directly approximate the surrogate.

5:   Taking advantage of iterative source steering (ISS) updates, we back propagate the permutation invariant separation loss through multiple iterations of AuxIVA.

6:   ISS lends itself well to this task due to its lower complexity and lack of matrix inversion.

7:   Experiments show large improvements in terms of scale invariant signal-to-distortion (SDR) ratio and word error rate compared to baseline methods.

8:   Training is done on two speakers mixtures and we experiment with two losses, SDR and coherence.

9:   We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification.

10:   We also demonstrate generalization to a different variation of the AuxIVA update equations.

11:   The SDR loss leads to fastest convergence in iterations, while coherence leads to the lowest word error rate (WER).

12:   We obtain as much as \SI{36}{\percent} reduction in WER.

13: \end{abstract}

14: