1: \begin{abstract}
2: With the increasing penetration
3: of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. %At the same time, they also introduce new challenges such as the need to select appropriate penalty parameter values. Although these parameter values strongly affect convergence rates, they are usually selected heuristically in an ad-hoc manner.
4: In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence.
5: %and present the design of the reward function and the state and action spaces for the proposed
6: We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59\% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods).
7: Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50\% reduction in iterations).
8: This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
9: %even under varying loads and network contingencies. We demonstrate our RL model using AC optimal power flow test instances.
10: \end{abstract}
11: