bd98fa584105a45d.tex
1: \begin{abstract}
2:   We propose to learn surrogate functions of universal speech priors for determined blind speech separation.
3:   Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible.
4:   Instead, we do away with exact majorization and directly approximate the surrogate.
5:   Taking advantage of iterative source steering (ISS) updates, we back propagate the permutation invariant separation loss through multiple iterations of AuxIVA.
6:   ISS lends itself well to this task due to its lower complexity and lack of matrix inversion.
7:   Experiments show large improvements in terms of scale invariant signal-to-distortion (SDR) ratio and word error rate compared to baseline methods.
8:   Training is done on two speakers mixtures and we experiment with two losses, SDR and coherence.
9:   We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification.
10:   We also demonstrate generalization to a different variation of the AuxIVA update equations.
11:   The SDR loss leads to fastest convergence in iterations, while coherence leads to the lowest word error rate (WER).
12:   We obtain as much as \SI{36}{\percent} reduction in WER.
13: \end{abstract}
14: