d3f8efcfcfa71049.tex
1: \begin{abstract}
2: Consider a predictor, a learner, whose input is a stream of discrete
3: items.  The predictor's task, at every time point, is {\em
4:   probabilistic multiclass prediction}, \ie to predict
5: % what item may
6: which item may
7: % come
8: occur next by outputting zero or more candidate items, each with a
9: probability, after which the actual item is revealed and the predictor
10: learns from this observation.  To output probabilities, the
11: predictor keeps track of the proportions of the items it has seen.
12: The predictor has constant (limited) space and we seek efficient
13: prediction and update techniques: The stream is unbounded, the set of
14: items is unknown to the predictor and their totality can also grow
15: unbounded.  Moreover, there is {\em non-stationarity}: the underlying
16: frequencies of items may change, substantially, from time to time. For
17: instance, new items may start appearing and a few currently frequent
18: items may cease to occur again. The predictor, being space-bounded,
19: need only provide probabilities for those items with (currently) {\em
20:   sufficiently high} frequency, \ie the {\em salient} items. This
21: problem is motivated in the setting of {\em prediction games}, a
22: self-supervised learning regime where concepts serve as {\em both the
23:   predictors and the predictands}, and the set of concepts grows over
24: time, resulting in non-stationarities as new concepts are generated
25: and used.
26: %
27: We develop moving average techniques
28: % , based on moving averages,
29: % We develop two techniques, based on moving averages,
30: designed to respond to such non-stationarities in a timely manner, and
31: explore their properties. One is a simple technique based on queuing of
32: count snapshots, and
33: %count-based estimators, and
34: another
35: % that
36: is a combination of queuing together with an extended version of
37: sparse EMA. The latter combination supports {\em predictand-specific
38:   dynamic learning rates}. We find that this flexibility allows for a
39: more accurate and timely convergence.
40: \end{abstract}
41: