abstract:0d80fe8a3a726428.tex

1: \begin{abstract}

2: We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance. Instead of minimizing measures between the predicted probability distribution and true distribution, we use error-correcting codes to represent both predictions and outputs.

3: % \newline

4: Secondly, we propose multiple ways to improve accuracy and convergence rates by maximizing the separability between codes that correspond to classes proportional to word embedding similarities.

5:

6: Lastly, we introduce our main contribution called \textit{Latent Variable Mixture Sampling}, a technique that is used to mitigate exposure bias, which can be integrated into training latent variable-based neural sequence predictors such as ECOC.

7: % \newline

8: This involves mixing the latent codes of past predictions and past targets in one of two ways: (1) according to a predefined sampling schedule or (2) a differentiable sampling procedure whereby the mixing probability is learned throughout training by replacing the greedy argmax operation with a smooth approximation. ECOC-NSP leads to consistent improvements on language modelling datasets and the proposed Latent Variable mixture sampling methods are found to perform well for text generation tasks such as image captioning.

9: \end{abstract}

10: