1: \begin{abstract}
2: Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
3: Optimal scheduling can enhance system performance, typically measured by the reward-to-cost ratio, under some task arrival distribution.
4: On one hand, both reward and cost are dependent on task context (e.g., evaluation metric) and remain black-box in practice.
5: These render reward and cost hard to model thus unknown before decision making.
6: On the other hand, task arrival behaviors remain sensitive to factors like unpredictable system fluctuation whereby a prior estimation or the conventional assumption of arrival distribution (e.g., Poisson) may fail.
7: This implies another practical yet often neglected challenge, i.e., uncertain task arrival distribution.
8: Towards effective scheduling under a stationary environment with various uncertainties, we propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
9: Specifically, DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module that utilizes the Robbins-Monro method to implicitly learn task arrival distribution while making scheduling decisions.
10: Theoretically, DOL-RM achieves
11: % fast learning with a $O(T^{-1/4})$ convergence gap and no regret learning with
12: a sub-linear regret of $O(T^{3/4})$, which is the first result for online task scheduling under uncertain task arrival distribution and unknown reward and cost.
13: Our numerical results in a synthetic experiment and a real-world application demonstrate the effectiveness of DOL-RM in achieving the best cumulative reward-to-cost ratio compared with other state-of-the-art baselines.
14: \end{abstract}
15: