1: \begin{abstract}
2: This paper considers simulation-based optimization of the performance of a regime-switching stochastic system over a finite set of feasible configurations. Inspired by the stochastic fictitious play learning rules in game theory, we propose an adaptive simulation-based search algorithm that uses a smooth best-response sampling strategy and tracks the set of global optima, yet distributes the search so that most of the effort is spent on simulating the system performance at the global optima. The algorithm converges weakly to the set of global optima even when the observation data is correlated (as long as a weak law of large numbers holds). Numerical examples show that the proposed scheme yields a faster convergence for finite sample lengths compared with several existing random search and pure exploration methods in the literature.
3: \end{abstract}