1: \begin{abstract}
2: Online controlled experiments have emerged as industry gold standard for assessing new web features.
3: As new web algorithms proliferate, experimentation platform faces an increasing demand on the velocity of online experiments,
4: which encourages adaptive traffic testing methods to speed up identifying best variant by efficiently allocating traffic.
5: This paper proposed four Bayesian batch bandit algorithms (\textbf{NB-TS}, \textbf{WB-TS}, \textbf{NB-TTTS}, \textbf{WB-TTTS}) for eBay's experimentation platform, using summary batch statistics of a goal metric without incurring new engineering technical debts.
6: The novel \textbf{WB-TTTS}, in particular, demonstrates as an efficient, trustworthy and robust alternative to fixed horizon A/B testing.
7: Another novel contribution is to bring trustworthiness of best arm identification algorithms into evaluation criterion and highlight the existence of severe false positive inflation with equivalent best arms.
8: To gain the trust of experimenters, experimentation platform must consider both efficiency and trustworthiness;
9: However, to the best of authors' knowledge, trustworthiness as an important topic is rarely discussed.
10: This paper shows that Bayesian bandits without neutral posterior reshaping, particularly naive Thompson sampling (\textbf{NB-TS}), are untrustworthy because they can always identify an arm as the best from equivalent best arms.
11: To restore trustworthiness, a novel finding uncovers connections between convergence distribution of posterior optimal probabilities of equivalent best arms and neutral posterior reshaping, which controls false positives.
12: Lastly, this paper presents lessons learned from eBay's experience, as well as thorough evaluation.
13: We hope that this paper is useful to other industrial practitioners and inspires academic researchers interested in the trustworthiness of adaptive traffic experimentation.
14: \end{abstract}
15: