1: \begin{abstract}
2: This paper introduces a new concept called ``transferable visual words'' (TransVW), aiming to achieve annotation efficiency for deep learning in medical image analysis. Medical imaging---focusing on particular parts of the body for defined clinical purposes---generates images of great similarity in anatomy across patients and yields sophisticated anatomical patterns across images, which are associated with rich {\em semantics} about human anatomy and which are natural {\em visual words}. We show that these visual words can be automatically harvested according to anatomical consistency via self-discovery, and that the self-discovered visual words can serve as strong yet free supervision signals for deep models to learn semantics-enriched generic image representation via self-supervision (self-classification and self-restoration).
3: Our extensive experiments demonstrate the annotation efficiency of TransVW by offering higher performance and faster convergence with reduced annotation cost in several applications.
4: Our TransVW has several important advantages, including (1) TransVW is a fully autodidactic scheme, which exploits the semantics of visual words for self-supervised learning, requiring no expert annotation; (2) visual word learning is an add-on strategy, which complements existing self-supervised methods, boosting their performance; and (3) the learned image representation is semantics-enriched models, which have proven to be more robust and generalizable, saving annotation efforts for a variety of applications through transfer learning.
5: Our code, pre-trained models, and curated visual words are available at \href{https://github.com/JLiangLab/TransVW}{https://github.com/JLiangLab/TransVW}.
6: \end{abstract}
7: