1: \begin{abstract}
2: We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features.
3: Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task.
4: We enable a neural network to group the representations of different objects in an iterative manner through a differentiable mechanism.
5: We achieve very fast convergence by allowing the system to amortize the joint iterative inference of the groupings and their representations.
6: In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities.
7: We evaluate our method on multi-digit classification of very cluttered images that require texture segmentation.
8: Remarkably our method achieves improved classification performance over convolutional networks despite being fully connected, by making use of the grouping mechanism.
9: Furthermore, we observe that our system greatly improves upon the semi-supervised result of a baseline Ladder network on our dataset.
10: These results are evidence that grouping is a powerful tool that can help to improve sample efficiency.
11: \end{abstract}
12: