abstract:ac3c90ae4e0b9fd7.tex

1: \begin{abstract}

2: \label{sec:abstract}

3: Throughout the evolution of the neural networks more specialized cells were

4: added to the set of basic building blocks. These cells aim to improve training

5: convergence, increase the overall performance, and reduce the number of required

6: labels, all while preserving the expressive power of the universal network.

7: Inspired by the partitioning of the human visual perception system between the eyes and the

8: cerebral cortex, we present TPNET, which offloads universal and

9: application-specific CNN from the bulk processing  of the high resolution pixel

10: data and performs the translation-variant image correction while delegating all

11: non-linear decision making to the network.

12:

13: In this work, we explore application of TPNET to 3D perception with a

14: narrow-baseline (0.0001-0.0025) quad stereo camera and prove that a trained

15: network provides a disparity prediction from the 2D phase correlation output by

16: the Tile Processor (TP) that is twice as accurate as the prediction from a

17: carefully hand-crafted algorithm. The TP in turn reduces the dimensions of the

18: input features of the network and provides instrument-invariant and

19: translation-invariant data, making real-time high resolution stereo 3D

20: perception feasible and easing the requirement to have a complete end-to-end

21: network.

22: \end{abstract}

23: