abstract:4132889570a58b43.tex

1: \begin{abstract}

2: The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions.

3: In this research, we address a dueling bandit problem based on a cost function over a continuous space.

4: We propose a stochastic mirror descent algorithm

5: and show that the algorithm achieves an $O(\sqrt{T\log T})$-regret bound under strong convexity and smoothness assumptions for the cost function.

6: Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function.

7: Moreover, when considering a lower bound in convex optimization,

8: our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.

9:

10: \end{abstract}

11: