abstract:6eeaa328752073c3.tex

1: \begin{abstract}

2: Conversational contextual bandits elicit user preferences by occasionally querying for explicit feedback on key-terms to accelerate learning.

3: However, there are aspects of existing approaches which limit their performance.

4: First, information gained from key-term-level conversations and arm-level recommendations is not appropriately incorporated to speed up learning.

5: Second, it is important to ask explorative key-terms to quickly elicit the user's potential interests in various domains to accelerate the convergence of user preference estimation, which has never been considered in existing works.

6: To tackle these issues, we first propose ``ConLinUCB", a general framework for conversational bandits with better information incorporation, combining arm-level and key-term-level feedback to estimate user preference in one step at each time. Based on this framework, we further design two bandit algorithms with explorative key-term selection strategies, ConLinUCB-BS and ConLinUCB-MCR. We prove tighter regret upper bounds of our proposed algorithms. Particularly, ConLinUCB-BS achieves a regret bound of $O(d\sqrt{T\log T})$, better than the previous result $O(d\sqrt{T}\log T)$. Extensive experiments on synthetic and real-world data show significant advantages of our algorithms in learning accuracy (up to 54\% improvement) and computational efficiency (up to 72\% improvement), compared to the classic ConUCB algorithm, showing the potential benefit to recommender systems.

7: \end{abstract}

8: