abstract:5a5364bc81f5317c.tex

1: \begin{abstract}

2: Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice.

3: %

4: Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions.

5: %

6: We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs.

7: %

8: %

9: Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret.

10: %

11: We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP.

12: %

13: At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.

14: %optimization in episodic MDP, \red{including fitter Q-iteration and online imitation learning.}

15: \end{abstract}