abstract:4086be5a804c298f.tex

1: \begin{abstract}

2:  Deep neural networks use skip connections

3: to improve training convergence.

4: However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements.

5: In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach.

6: We argue that while a network's skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss.

7: We introduce \tool, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network's skip connections to lower their hardware cost.

8: \DIFdelbegin \DIFdel{The optimized hardware designs improve }\DIFdelend \DIFaddbegin \tool \DIFadd{improves }\DIFaddend resource utilization by up to 34\% for BRAMs, 13\% for FFs, and 16\% for LUTs \DIFdelbegin \DIFdel{.

9: }\DIFdelend \DIFaddbegin \DIFadd{for hls4ml architectures.

10: }\tool \DIFadd{increases performance by 30\% and reduces memory bandwidth by 45\% for a 2D processing element array architecture.

11: }

12:

13: \DIFaddend \end{abstract}