abstract:621544216ca522a8.tex

1: \begin{abstract}

2: This work presents a method for visual text recognition without

3: using any paired supervisory data.

4: We formulate the text recognition task as one of aligning the

5: conditional distribution of strings predicted from given text images,

6: with lexically valid strings sampled from target corpora.

7: This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets.

8: We present detailed analysis for various aspects of the proposed method,

9: namely --- (1) impact of the length of training sequences on convergence,

10: (2) relation between character frequencies and the order in which they are learnt,

11: (3) generalisation ability of our recognition network to inputs of arbitrary lengths,

12: and (4) impact of varying the text corpus on recognition accuracy.

13: Finally, we demonstrate excellent text recognition accuracy on both

14: synthetically generated text images, and scanned images of real printed books,

15: using no labelled training examples.

16: \end{abstract}

17: