abstract:1679a725e134afc5.tex

1: \begin{abstract}

2:

3: Models with fewer parameters are necessary for the neural control of memory-limited, performant robots.

4: Finding these smaller neural network architectures can be time-consuming.

5: We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously.

6: Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies.

7: We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints.

8: We show that our method scales well - more training resources produce faster convergence to higher-performing architectures.

9: We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor.

10: Website: \href{https://sites.google.com/usc.edu/hyperppo}{https://sites.google.com/usc.edu/hyperppo}

11:

12:

13: \end{abstract}

14: