abstract:ad29f35de66ae672.tex

1: \begin{abstract}

2:

3:

4: Runtime and scalability of

5: large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement

6: is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required

7: and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on

8: a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs.

9: On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet,

10: our method on average achieves 16\% improvement over human experts and 9.2\% improvement over

11: the prior art

12: with 15$\times$ faster convergence.

13: To further reduce the computation cost,

14: we pre-train the policy network on a set of dataflow graphs and use a superposition network to fine-tune it on each individual graph,

15: achieving state-of-the-art performance on large hold-out graphs with over 50k nodes, such as an 8-layer GNMT.

16: \end{abstract}

17: