abstract:1a974e362fc01695.tex

1: \begin{abstract}

2: 	This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation.

3: In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima.

4: Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions.

5: For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation

6: is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation.

7: The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods.

8: At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory.

9: Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.

10: \end{abstract}

11: