074dfd673ed47522.tex
1: \begin{abstract}
2: Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like \citet{glorot2010understanding} and \citet{he2015delving}, when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. %The result is a dependence on ad-hoc solutions. 
3: We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.
4: % todo: best if we can quantify the improvement here
5: % good if we can say they work with sgd and adam, whereas before they only work with adam.
6: %  think the punchline we are looking for in an initialization paper is to show the existence of models that fail in a spectacular way on a bad init, rather than faster convergence by X%
7: % killer app: is probably hypernet generating hypernet generating hypernet ...
8: \end{abstract}
9: