abstract:673436bf247f9e5f.tex

1: \begin{abstract}

2: Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model \textbf{during training}, and show its effectiveness on the speech recognition task. We show that Seq2Seq models with Cold Fusion are able to better utilize language information enjoying

3: i) faster convergence and better generalization, and ii) almost complete transfer to a new domain while using less than 10\% of the labeled training data.

4: % Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences. However, the amount of supervised data typically required to learn a decent model can be quite large limiting its applicability to tasks or domains suffering from a paucity of data. We present a simple algorithm called Cold Fusion which enjoys i) faster convergence and better generalization, and ii) almost complete transfer to a new domain while using less than 10\% of the labeled target domain data. Cold Fusion Seq2Seq models learn to use an external language model, naturally leveraging a large volume of unlabeled texts. The specialization in text generation and the main task of interest allows Seq2Seq to efficiently focus on solving the task at hand with fewer data. We demonstrate these capabilities on the speech recognition task.

5: \end{abstract}