abstract:8dfda9464bc40995.tex

1: \begin{abstract}

2: Data simulation engines like Unity are becoming an increasingly important data source that allows us to acquire ground truth labels conveniently. Moreover, we can flexibly edit the \emph{content} of an image in the engine, such as objects (position, orientation) and environments (illumination, occlusion).

3: When using simulated data as training sets, its editable content can be leveraged to mimick the distribution of real-world data, and thus reduce the content difference between the synthetic and real domains.

4: This paper explores content adaptation in the context of semantic segmentation, where the complex street scenes are fully synthesized using 19 classes of virtual objects from a first person driver perspective and controlled by 23 attributes.

5: To optimize the attribute values and obtain a training set of similar content to real-world data, we propose a scalable discretization-and-relaxation (SDR) approach.

6: %We formulate the attribute optimization as a distribution mapping problem that maps random attribute value to optimized one.

7: Under a reinforcement learning framework, we formulate attribute optimization as a random-to-optimized mapping problem using a neural network.

8: Our method has three characteristics.

9: 1) Instead of editing attributes of individual objects, we focus on global attributes that have large influence on the scene structure, such as object density and illumination.

10: 2) Attributes are quantized to discrete values, so as to reduce search space and training complexity.

11: 3) Correlated attributes are jointly optimized in a group, so as to avoid meaningless scene structures and find better convergence points.

12: Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy compared with existing synthetic training sets.

13: \end{abstract}

14: