1: \begin{abstract}
2: We propose to learn surrogate functions of universal speech priors for determined blind speech separation.
3: Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible.
4: Instead, we do away with exact majorization and directly approximate the surrogate.
5: Taking advantage of iterative source steering (ISS) updates, we back propagate the permutation invariant separation loss through multiple iterations of AuxIVA.
6: ISS lends itself well to this task due to its lower complexity and lack of matrix inversion.
7: Experiments show large improvements in terms of scale invariant signal-to-distortion (SDR) ratio and word error rate compared to baseline methods.
8: Training is done on two speakers mixtures and we experiment with two losses, SDR and coherence.
9: We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification.
10: We also demonstrate generalization to a different variation of the AuxIVA update equations.
11: The SDR loss leads to fastest convergence in iterations, while coherence leads to the lowest word error rate (WER).
12: We obtain as much as \SI{36}{\percent} reduction in WER.
13: \end{abstract}
14: