a2f15e0b923a3753.tex
1: \begin{abstract}
2: We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in Chapter~\ref{chap:energy1}, which offers efficient scalable implementation, and  asymptotically optimal steady-state delay performance and energy consumption as the number of servers $N\to\infty$. In Chapter~\ref{chap:energy1}, the asymptotic results were obtained \emph{under the assumption that the queues have fixed-size finite buffers}, and therefore the fundamental question of stability with infinite buffers was left open. In this chapter, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may \emph{not} even hold for all $N$. The key challenge stems from the fact that the process \emph{lacks monotonicity}, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for \emph{large enough}~$N$, and establish convergence of steady-state distributions to the optimal one, as $N \to \infty$. 
3: The method advances the state of the art with an induction-based idea that exploits a weak monotonicity property of the model.
4: This novel method is of independent interest and may have broader applicability.
5: \end{abstract}
6: