abstract:4a1955fb186c84df.tex

1: \begin{abstract}

2: Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications.

3: %Recognizing out-of-vocabulary (OOV) words remains a challenge, and some studies suggest distinguishing between in-vocabulary (IV) and OOV words.

4: In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications.

5: Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data.

6: Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data.

7: %loss to increase inter-class distances and reduce intra-class distances, moreover the quality detector could decrease the low-quality image influence for training converge.introduce an approach that optimizes the geodesic distance margins to reduce the impact of noisy samples in training data on model convergence during training. A novel text quality adaptive mechanism has been introduced to dynamically adjust the margin of each class.

8: Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.

9: %The code and models will be publicly available at \url{https://github.com/xuhuaren/Pseudo-OCR}.

10: \end{abstract}

11: