1: \begin{abstract}
2: This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation.
3: In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima.
4: Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions.
5: For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation
6: is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation.
7: The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods.
8: At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory.
9: Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.
10: \end{abstract}
11: