abstract:e2bcab7869aec91b.tex

1: \begin{abstract}

2: Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.

3: Optimal scheduling can enhance system performance, typically measured by the reward-to-cost ratio, under some task arrival distribution.

4: On one hand, both reward and cost are dependent on task context (e.g., evaluation metric) and remain black-box in practice.

5: These render reward and cost hard to model thus unknown before decision making.

6: On the other hand, task arrival behaviors remain sensitive to factors like unpredictable system fluctuation whereby a prior estimation or the conventional assumption of arrival distribution (e.g., Poisson) may fail.

7: This implies another practical yet often neglected challenge, i.e., uncertain task arrival distribution.

8: Towards effective scheduling under a stationary environment with various uncertainties, we propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.

9: Specifically, DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module that utilizes the Robbins-Monro method to implicitly learn task arrival distribution while making scheduling decisions.

10: Theoretically, DOL-RM achieves

11: % fast learning with a $O(T^{-1/4})$ convergence gap and no regret learning with

12: a sub-linear regret of $O(T^{3/4})$, which is the first result for online task scheduling under uncertain task arrival distribution and unknown reward and cost.

13: Our numerical results in a synthetic experiment and a real-world application demonstrate the effectiveness of DOL-RM in achieving the best cumulative reward-to-cost ratio compared with other state-of-the-art baselines.

14: \end{abstract}

15: