4132889570a58b43.tex
1: \begin{abstract}
2: The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions.
3: In this research, we address a dueling bandit problem based on a cost function over a continuous space.  
4: We propose a stochastic mirror descent algorithm 
5: and show that the algorithm achieves an $O(\sqrt{T\log T})$-regret bound under strong convexity and smoothness assumptions for the cost function.
6: Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. 
7: Moreover, when considering a lower bound in convex optimization, 
8: our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.
9: 
10: \end{abstract}
11: