1: \begin{abstract}
2: We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara,
3: Borst, and van Leeuwaarden~\cite{MDBL17} (SIGMETRICS~'17), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers $N\to\infty$.
4: In the above work, the asymptotic results are obtained \emph{under the assumption that the queues have fixed-size finite buffers}, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open.
5: In this paper, we address this fundamental stability question.
6: The system stability under the usual subcritical load assumption is not automatic.
7: Moreover, the stability may \emph{ not} even hold for all $N$.
8: The key challenge stems from the fact that the process \emph{lacks monotonicity}, which has been the powerful primary tool for establishing stability in load balancing models.
9: We develop a novel method to prove that the subcritically loaded system is stable for \emph{large enough}~$N$, and establish convergence of steady-state distributions to the optimal one, as $N \to \infty$.
10: The method goes beyond the state of the art techniques -- it uses an induction-based idea and a ``weak monotonicity'' property of the model; this technique is of independent interest and may have broader applicability.
11: \end{abstract}
12: