abstract:e9e157792f1e8535.tex

1: \begin{abstract}

2: We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features.

3: Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task.

4: We enable a neural network to group the representations of different objects in an iterative manner through a differentiable mechanism.

5: We achieve very fast convergence by allowing the system to amortize the joint iterative inference of the groupings and their representations.

6: In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities.

7: We evaluate our method on multi-digit classification of very cluttered images that require texture segmentation.

8: Remarkably our method achieves improved classification performance over convolutional networks despite being fully connected, by making use of the grouping mechanism.

9: Furthermore, we observe that our system greatly improves upon the semi-supervised result of a baseline Ladder network on our dataset.

10: These results are evidence that grouping is a powerful tool that can help to improve sample efficiency.

11: \end{abstract}

12: