1: \begin{abstract}
2: In this paper, we solve the arms exponential exploding issues in the multivariate Multi-Armed Bandit (Multivariate-MAB) problem when the arm dimension hierarchy is considered. We propose a framework called path planning (TS-PP), which utilizes paths in a graph (formed by trees) to model arm reward success rate with m-way dimension interaction and adopts Thompson Sampling (TS) for a heuristic search of arm selection. Naturally, it is straightforward to combat the curse of dimensionality using a serial process that operates sequentially by focusing on one dimension per each process. For our best acknowledge, we are the first to solve the Multivariate-MAB problem using trees with graph path planning strategy and deploying alike Monte-Carlo tree search ideas. Our proposed method utilizing tree models has advantages comparing with traditional models such as general linear regression. Real data and simulation studies validate our claim by achieving faster convergence speed, better efficient optimal arm allocation, and lower cumulative regret.
3: \end{abstract}
4: