abstract:d9246705ebbe04ed.tex

1: \begin{abstract}

2: Online controlled experiments have emerged as industry gold standard for assessing new web features.

3: As new web algorithms proliferate, experimentation platform faces an increasing demand on the velocity of online experiments,

4: which encourages adaptive traffic testing methods to speed up identifying best variant by efficiently allocating traffic.

5: This paper proposed four Bayesian batch bandit algorithms (\textbf{NB-TS}, \textbf{WB-TS}, \textbf{NB-TTTS}, \textbf{WB-TTTS}) for eBay's experimentation platform, using summary batch statistics of a goal metric without incurring new engineering technical debts.

6: The novel \textbf{WB-TTTS}, in particular, demonstrates as an efficient, trustworthy and robust alternative to fixed horizon A/B testing.

7: Another novel contribution is to bring trustworthiness of best arm identification algorithms into evaluation criterion and highlight the existence of severe false positive inflation with equivalent best arms.

8: To gain the trust of experimenters, experimentation platform must consider both efficiency and trustworthiness;

9: However, to the best of authors' knowledge, trustworthiness as an important topic is rarely discussed.

10: This paper shows that Bayesian bandits without neutral posterior reshaping, particularly naive Thompson sampling (\textbf{NB-TS}), are untrustworthy because they can always identify an arm as the best from equivalent best arms.

11: To restore trustworthiness, a novel finding uncovers connections between convergence distribution of posterior optimal probabilities of equivalent best arms and neutral posterior reshaping, which controls false positives.

12: Lastly, this paper presents lessons learned from eBay's experience, as well as thorough evaluation.

13: We hope that this paper is useful to other industrial practitioners and inspires academic researchers interested in the trustworthiness of adaptive traffic experimentation.

14: \end{abstract}

15: