ac3c90ae4e0b9fd7.tex
1: \begin{abstract}
2: \label{sec:abstract}
3: Throughout the evolution of the neural networks more specialized cells were
4: added to the set of basic building blocks. These cells aim to improve training
5: convergence, increase the overall performance, and reduce the number of required
6: labels, all while preserving the expressive power of the universal network.
7: Inspired by the partitioning of the human visual perception system between the eyes and the
8: cerebral cortex, we present TPNET, which offloads universal and
9: application-specific CNN from the bulk processing  of the high resolution pixel
10: data and performs the translation-variant image correction while delegating all
11: non-linear decision making to the network.
12:  
13: In this work, we explore application of TPNET to 3D perception with a
14: narrow-baseline (0.0001-0.0025) quad stereo camera and prove that a trained
15: network provides a disparity prediction from the 2D phase correlation output by
16: the Tile Processor (TP) that is twice as accurate as the prediction from a
17: carefully hand-crafted algorithm. The TP in turn reduces the dimensions of the
18: input features of the network and provides instrument-invariant and
19: translation-invariant data, making real-time high resolution stereo 3D
20: perception feasible and easing the requirement to have a complete end-to-end
21: network.
22: \end{abstract}
23: