942822a86978ec69.tex
1: \begin{abstract}
2: We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. 
3: The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. 
4: The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. 
5: When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. 
6: Such a strategy provides the updating of word embeddings with robust guidance, improving the  algorithmic convergence. 
7: As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.
8: \end{abstract}
9: