1679a725e134afc5.tex
1: \begin{abstract}
2: 
3: Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. 
4: Finding these smaller neural network architectures can be time-consuming.
5: We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously.
6: Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies.
7: We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints.  
8: We show that our method scales well - more training resources produce faster convergence to higher-performing architectures.
9: We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor.
10: Website: \href{https://sites.google.com/usc.edu/hyperppo}{https://sites.google.com/usc.edu/hyperppo}
11: 
12: 
13: \end{abstract}
14: