abstract:f928ee53d7246d68.tex

1: \begin{abstract}

2: \vspace{-0.3cm}

3: Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially.

4:   Training of two tasks in a unified framework is non-trivial due to significant differences in

5:   %learning difficulties and convergence rates.

6:   optimisation difficulties.

7: In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot.

8: Our main contributions are three-fold:

9:   1) we propose a novel \emph{text-alignment} layer that allows it to precisely compute convolutional features of a text instance in arbitrary orientation, which is the key to boost the performance;

10:   2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition;

11:   3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which is end-to-end trainable. This allows the two tasks to work collaboratively by sharing convolutional features, which is critical to identify challenging text instances.

12: Our model achieves impressive results in end-to-end recognition on the ICDAR2015 \cite{ICDAR2015} dataset,

13:   significantly advancing  most recent results \cite{Busta2017}, with improvements of F-measure from $(0.54, 0.51, 0.47)$

14:   to $(0.82, 0.77, 0.63)$,

15:   by using a strong, weak and generic lexicon respectively.

16:   Thanks to joint training, our method can also serve as a good detector by achieving a new state-of-the-art detection performance on

17:   two  datasets.

18: \vspace{-0.5cm}

19: \end{abstract}

20: