abstract:3b3812486babc9d0.tex

1: \begin{abstract}

2: Collecting data for deep learning is so tedious which makes it hard to establish a perfect database.

3: In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make the background conditions more sundry.

4: Our architecture is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost training for avoiding over-fitting on monotone synthetic object surface and utilizing accurate informations of synthetic images like object poses and lightning conditions which are helpful for recognizing regular photos.

5: Firstly, our generative model with metric learning utilizes additional foreground object channels generated from semantic foreground object reconstruction sub-network for recognizing the original input images.

6: Multi-triplet cost function based on poses is used for metric learning which makes it possible training an effective categorical classifier purely based on synthetic data.

7: Secondly, we design a coordinate training strategy with the help of adaptive noises applied on inputs of both of the concatenated sub-networks to make them benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks.

8: Our architecture achieves the state of the art accuracy of 50.5\% on ShapeNet database with data migration obstacle from synthetic images to real photos.

9: This pipeline makes it applicable to do recognition on real images only based on 3D models.

10: \end{abstract}

11: