ff2d1f3f00577ff0.tex
1: \begin{abstract}
2: This paper presents a challenge to the community: Generative adversarial networks (GANs) can perfectly align independent English word embeddings induced using {\em the same}~algorithm, based on distributional information alone; but fails to do so, for two different embeddings algorithms. {\em Why is that?} We believe understanding why, is key to understand {\em both} modern word embedding algorithms {\em and } the limitations and instability dynamics of GANs. This paper shows that (a) in all these cases, where alignment fails, there exists a linear transform between the two embeddings (so algorithm biases do not lead to non-linear differences), and (b) similar effects can not easily be obtained by varying hyper-parameters. One plausible suggestion based on our initial experiments is that the differences in the inductive biases of the embedding  algorithms lead to an optimization landscape that is riddled with local optima, leading to a very small basin of convergence, but we present this more as a challenge paper than a technical contribution.  
3: 
4: 
5: %We study the limitations of unsupervised bilingual dictionary algorithms -- learning to align monolingual embeddings by exploiting common, global structure -- by starting with the seemingly trivial case of unsupervised alignment of two embedding spaces induced from identical corpora. We observe, however, that if the algorithms are learned by different embedding algorithms, unsupervised alignment fails, suggesting that embedding algorithms do not learn unbiased representations of global structure. We then manipulate the target corpus and/or the target embeddings in several ways: scrambling sentences, normalizing vectors, changing embedding algorithm, subsampling the data, etc. These controlled experiments on semi-synthetic enable us to study the limitations of unsupervised bilingual dictionary algorithms, finding that XXX is key to their performance.  
6: \end{abstract}
7: