abstract:d3f8efcfcfa71049.tex

1: \begin{abstract}

2: Consider a predictor, a learner, whose input is a stream of discrete

3: items.  The predictor's task, at every time point, is {\em

4:   probabilistic multiclass prediction}, \ie to predict

5: % what item may

6: which item may

7: % come

8: occur next by outputting zero or more candidate items, each with a

9: probability, after which the actual item is revealed and the predictor

10: learns from this observation.  To output probabilities, the

11: predictor keeps track of the proportions of the items it has seen.

12: The predictor has constant (limited) space and we seek efficient

13: prediction and update techniques: The stream is unbounded, the set of

14: items is unknown to the predictor and their totality can also grow

15: unbounded.  Moreover, there is {\em non-stationarity}: the underlying

16: frequencies of items may change, substantially, from time to time. For

17: instance, new items may start appearing and a few currently frequent

18: items may cease to occur again. The predictor, being space-bounded,

19: need only provide probabilities for those items with (currently) {\em

20:   sufficiently high} frequency, \ie the {\em salient} items. This

21: problem is motivated in the setting of {\em prediction games}, a

22: self-supervised learning regime where concepts serve as {\em both the

23:   predictors and the predictands}, and the set of concepts grows over

24: time, resulting in non-stationarities as new concepts are generated

25: and used.

26: %

27: We develop moving average techniques

28: % , based on moving averages,

29: % We develop two techniques, based on moving averages,

30: designed to respond to such non-stationarities in a timely manner, and

31: explore their properties. One is a simple technique based on queuing of

32: count snapshots, and

33: %count-based estimators, and

34: another

35: % that

36: is a combination of queuing together with an extended version of

37: sparse EMA. The latter combination supports {\em predictand-specific

38:   dynamic learning rates}. We find that this flexibility allows for a

39: more accurate and timely convergence.

40: \end{abstract}

41: