abstract:a1b81d2f7aec02cc.tex

1: \begin{abstract}

2: % \alf{Current length is 5 pages before the experiments.}

3:

4: % \alf{ToDo: 1. Add a notation section. 2. Discuss how to write about the dual bound. 3. Fix Literature review and probably remove the note. 4. Reduce experiments length and decide what to add and how to write them. 5. Discuss any change on the experiments. 6 Appendix: write all proofs for current setting. 7. Appendix: Add more details of the 2nd price experiment. 8. (Minor) Re-think how to write $\alpha b$ which represents the vector $(\alpha_1 b_1, \dots, \alpha_K b_K)$}

5:

6:

7: % \alf{My understanding is that ICML requires short abstracts of up to six lines. Also, with Paul we discuss that the terminology of 'revenue' and 'cost' could be skipped since this is not just a revenue management problem. } \\

8:

9: We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process.  When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.

10: \end{abstract}

11: