1: \documentclass[epsf]{article}
2: \setlength{\parskip}{5pt}
3: \setlength{\textwidth}{6true in}
4: \setlength{\hoffset}{-1true in}
5: \newcommand{\p}{^\prime}
6: \input epsf
7: \def\iitem#1{\noindent#1\vglue-\baselineskip\vglue-\parskip
8: \hangindent=\parindent\hangafter=1}
9:
10: \pagestyle{plain}
11: \setcounter{page}{1}
12:
13:
14: \begin{document}
15:
16:
17: \title{MOO: A Methodology for Online Optimization through
18: Mining the Offline Optimum}
19:
20: \author{\sl Jason W.H. Lee \qquad Y.C. Tay \qquad Anthony K.H. Tung \\
21: \rm National University of Singapore, \\
22: Kent Ridge 117543, REPUBLIC OF SINGAPORE \\
23: \tt tay@acm.org}
24:
25: \date{Department of Mathematics Research Report No. 743\\
26: (June 98; revised: January 99)}
27: \maketitle
28:
29:
30:
31:
32: \centerline{\sl Abstract}
33: Ports, warehouses and courier services have to decide online how an arriving
34: task is to be served in order that cost is minimized (or profit maximized).
35: These operators have a wealth of historical data on task assignments;
36: can these data be mined for knowledge or rules that can help
37: the decision-making?
38:
39: MOO is a novel application of data mining to online optimization.
40: The idea is to mine (logged) expert decisions or the offline optimum
41: for rules that can be used for online decisions.
42: It requires little knowledge about the task distribution and cost structure,
43: and is applicable to a wide range of problems.
44:
45: This paper presents a feasibility study of the methodology
46: for the well-known $k$-server problem.
47: Experiments with synthetic data show that optimization can be recast as
48: classification of the optimum decisions;
49: the resulting heuristic can achieve the optimum for strong request patterns,
50: consistently outperforms other heuristics for weak patterns,
51: and is robust despite changes in cost model.
52:
53:
54:
55: \section{Introduction}
56:
57: In online optimization, a stream of tasks arrives at a system for service.
58: Each task must be served --- before the next arrival ---
59: at a cost that depends on the system's state,
60: which may be changed by the task.
61: The objective is to minimize the cost of servicing the entire task stream.
62:
63: The introduction of competitive analysis [ST, KMRS]
64: inspired a large body of work on online optimization in the last ten years
65: [BoE].
66: This form of analysis uses a {\it competitive ratio}
67: to compare the online heuristic's cost to the offline optimum
68: (obtained with the task stream known in advance).
69: In other words, the objective of the online decision algorithm
70: is to match the offline optimum, and this often means imitating the latter.
71:
72: This objective is the basis of our proposal on a new methodology
73: for online optimization.
74: Suppose there are patterns in the task arrivals
75: --- i.e. task generation is constrained by a distribution;
76: these patterns and the cost structure in turn combine to induce
77: patterns in the offline optimum solution,
78: and the online decision algorithm can exploit these patterns to get close
79: to the optimum. Hence, the idea is:
80:
81: \setlength{\parindent}{35pt}
82:
83: \iitem{{\bf Step 1}}
84: Take a task stream (the {\it training stream})
85: that was previously generated by the distribution.
86:
87: \iitem{{\bf Step 2}}
88: Obtain the offline optimum solution
89: (i.e. the sequence of decisions for servicing the tasks).
90:
91: \iitem{{\bf Step 3}}
92: Transform the optimum solution into a database of records.
93:
94: \iitem{{\bf Step 4}}
95: Apply data mining to this database to extract patterns.
96:
97: \iitem{{\bf Step 5}}
98: Use the patterns to formulate online decision rules for servicing a task
99: stream (the {\it test stream}) generated by the same distribution.
100:
101:
102: \setlength{\parindent}{25pt}
103:
104: We call this methodology for online optimization {\it MOO},
105: whose essential feature is mining the offline optimum (Step 4).
106: This feature distinguishes MOO from the vast literature in machine learning
107: and database mining;
108: it is also different from applying algorithms for online learning
109: to online optimization [BB],
110: from using data collected online to make decisions [KMMO, FM], and
111: from mining database access histories for buffer management [FLTT].
112: MOO's strengths are:
113: (1)
114: It is a methodology that is applicable to a wide range of problems in
115: online optimization (e.g. taxi assignment [FRR],
116: packet routing [AAFPW], web caching [Y]).
117: (2)
118: It requires minimal knowledge about the task distribution and cost structure
119: (and the mining in Step 4 makes no effort to discover them).
120: (3)
121: The sort of information to be mined
122: (classification, association, clustering, etc.)
123: may vary to suit the context.
124: (4)
125: The technique for mining (item-set sampling, neural networks, etc.)
126: can be appropriately chosen.
127:
128:
129: On the other hand, MOO's weaknesses are:
130: (1)
131: An optimum solution for the training stream must be available.
132: This is an issue if no tractable algorithm is known for generating
133: the optimum.
134: MOO, however, only requires the availability of the optimum and does not
135: assume its tractability; it thus treats the optimum solution like an oracle.
136: This oracle may, in fact, be human,
137: in which case the methodology's objective is to approximate the expert's
138: performance (for this, MOO is milking the oracle offline).
139: Incidentally, the oracle may yield the optimum solution without providing
140: information about the costs.
141: (2)
142: The task distribution must be stationary [KMMO],
143: so that the information mined with the training stream remains relevant
144: for the test stream.
145: (3)
146: MOO may need a significant amount of memory to
147: store the rules for making online decisions.
148:
149:
150: To demonstrate MOO, we apply it to the {\it $k$-server problem}.
151: We chose this problem because
152: it is the prototypical and most intensively studied online problem [BoE].
153: It is also close to a container yard management problem
154: that the Port of Singapore Authority is interested in.
155:
156:
157: The decision is cast as a classification problem,
158: and we use Quinlan's C4.5 to mine the optimum,
159: as well as for online classification.
160: This software [Q] was written for machine learning,
161: but suffices for our purpose since the data set is not large
162: and both the offline mining and online classification are fast.
163: However, we envisage that other applications of MOO
164: (e.g. using techniques other than classification,
165: or approximating an expert through mining historical data)
166: may require software that are specifically equipped
167: with data mining technology [A+, H+].
168:
169: We present here an experimental study of how classification
170: can be used for the $k$-server problem.
171: The objectives are:
172: to establish the viability of the methodology;
173: to explore how MOO's effectiveness is influenced by the strength of patterns,
174: the cost structure, the stream lengths, etc.;
175: and
176: to prepare a case for access to commercial data.
177:
178:
179: As is implicit in that third objective,
180: our experiments use synthetic data;
181: this is because
182: a systematic exploration of MOO's effectiveness requires controlled
183: experiments in which various factors can be tuned individually;
184: whereas
185: real data are affected by constraints and noise (that affect optimality),
186: and these get in the way of a feasibility study
187: that tries to build up an understanding of the methodology.
188: Moreover,
189: gaining access to commercial data is difficult without first making a case
190: with synthetic data.
191: (As far as we know, no real data for the $k$-server problem is available
192: in the research community.)
193:
194:
195: The work reported here is significant in the following ways:
196: (1)
197: The experiments on synthetic data show that the methodology is feasible
198: --- MOO fits into the gap between the offline optimum and other online
199: heuristics, can come close to the optimum for strong patterns,
200: does well for weak patterns, and is robust with respect to the cost structure.
201: (2)
202: It shows that optimization can be recast as classification.
203: (3)
204: MOO is a novel application of a concept in data engineering to a problem
205: in algorithm theory,
206: thus serving as a bridge between the two:
207: This application poses challenging new problems in the analysis
208: of online optimization (see Section 5.2);
209: conversely, data mining (being an art --- consider Steps 3 to 5)
210: will benefit from the algorithm community's insight
211: into what information to look for and how to do the mining.
212: (For example, the optimum solution for buffer replacement [MS]
213: suggests that
214: association rules $S\rightarrow P$ between a set of pages $S$ and a page
215: reference $P$ should be annotated by a ``distance'' $d$ between $S$ and $P$
216: mined from the reference stream, and $d$ used for buffer management [TTL].)
217: By offering a database perspective on online optimization,
218: MOO has the potential of facilitating a mutually enriching interaction
219: among database management, machine learning and algorithm analysis.
220:
221:
222: We first describe the $k$-server problem in Section 2.
223: The experimental setup is presented in Section 3
224: and the results examined in Section 4.
225: Section 5 then concludes with a summary of our observations
226: and poses some interesting and hard problems for this new application of
227: data mining.
228:
229:
230: \section{The $k$-server problem}
231:
232: The $k$-server problem is defined on a set of points with a distance
233: function $d$.
234: Conceptually, the set may be infinite but, for our experiments,
235: it consists of $n$ {\it nodes}.
236: Unlike most papers on $k$-servers,
237: we do not require that $d$ satisfy the triangular inequality,
238: nor that it be symmetric.
239: We also do not assume that $d$ is known to the online decision algorithm.
240:
241: There are $k$ {\it servers} who are positioned at different nodes.
242: (Some authors allow multiple servers at one node [KP].)
243: A {\it task} is a request that specifies a node $i$,
244: and is served at 0 cost if there is already a server at $i$,
245: or by moving a server from some node $j$ to $i$, at cost $d(j,i)$.
246: (Some authors allow multiple server movements per task [CL].)
247:
248: A task {\it stream} is a sequence of arriving requests $T_1,\ldots,T_s$;
249: an {\it online solution} uses only $T_1,\ldots, T_{m-1}$ to determine
250: how $T_m$ is served,
251: while an {\it offline solution} uses $T_1,\ldots,T_s$ to determine how
252: each request is served.
253: A {\it configuration} is a set of $k$ nodes that specifies the location
254: of the servers before the arrival of a request.
255:
256: Most algorithms in the literature for the $k$-server problem are for
257: special cases.
258: For example, Fiat et al's marking algorithm is for paging,
259: and Coppersmith et al's RWALK is for resistive metric spaces [FKLMSY, CDRS].
260: The work function algorithm [KP] is, in theory, applicable to any $k$-server
261: problem, but it is computationally intensive and (as far as we know)
262: implemented only for special cases.
263: In our experiments, we compare MOO to three algorithms.
264: If an arriving request is for node $i$ and there is no server at $i$,
265: these algorithms respond as follows:
266:
267: \setlength{\parindent}{50pt}
268:
269: \iitem{Greedy:}
270: Choose a server at node $j$ for which $d(j,i)$ is minimum.
271:
272: \iitem{Balance:}
273: Let $b_j=c_j+d(j,i)$ where $c_j$ is the cost incurred so far by the server
274: at node $j$; choose a server with minimum $b_j$ [MMS].
275:
276: \iitem{Harmonic:}
277: Let $h_j=1/d(j,i)$ for each node $j$ with a server;
278: choose the server at $j$ with probability $h_j/\sum_r h_r$ [RS].
279:
280:
281: \noindent
282: Note that, unlike MOO, these three heuristics require knowledge of $d$.
283:
284: \setlength{\parindent}{25pt}
285:
286:
287:
288: \section{Experimental setup}
289:
290: \subsection{Classification}
291:
292: In {\it classification}, a decision tree is built from a set of {\it cases},
293: where each case is a tuple of {\it attribute} values.
294: Each attribute may be {\it discrete} (i.e. its values come from a finite set)
295: or {\it continuous} (i.e. the possible values form the real line).
296: Each case can be assigned a {\it class},
297: which may also be discrete (e.g. good, bad) or continuous (e.g. temperature).
298:
299: Each leaf in the decision tree is a class,
300: and each internal node branches out based on the outcome of a test on an
301: attribute's value.
302: The tree is built from cases with known classification,
303: and a test case can then be classified by traversing the tree from root to
304: leaf, along a path determined by the test outcomes.
305:
306: For the $k$-server problem, the request distribution and distance function
307: induce patterns in the optimum decisions,
308: and MOO tries to extract these patterns for use in online assignment.
309: Specifically, we look for patterns that relate an assignment to the arriving
310: request and the configuration it sees.
311: Hence, the class specifies which node to move the server from,
312: and the classification is based on $n+1$ attributes in a case,
313: where one attribute specifies the arriving request and the other $n$ attributes
314: specify whether a node has a server;
315: the class and attributes are considered discrete.
316:
317: (A possible alternative is to name the $k$ servers,
318: have the class specify the server, and
319: use $k$ attributes to specify the location of the servers.
320: With this $(k+1)$-tuple formulation of a case, however,
321: the classifier considers
322: ``server $A$ at node 1 and server $B$ at node 2''
323: to be different from
324: ``server $A$ at node 2 and server $B$ at node 1''.
325: This differentiation of servers is not appropriate for the $k$-server problem,
326: unless the cost model is changed to, say, let servers charge different costs
327: for movement.
328: It is also not appropriate to declare the class and attributes as continuous,
329: unless we are considering nodes on a line with a linear distance function.)
330:
331: In our application of MOO, Step 2 uses network flow to solve for the
332: offline optimum [CKPV];
333: in Step 3, this optimum is scanned to produce a file of cases,
334: one for each request;
335: Step 4 then uses C4.5 to build a decision tree with these training cases.
336: For a test stream, this tree is used to classify each arriving request.
337: This classification may be invalid, in that the tree may decide
338: to move a server from a node that has no server;
339: in this case, the server at $j$ with minimum $d(j,i)$ is chosen,
340: i.e. use a greedy strategy.
341: (If $d$ is unknown, MOO can choose a random server, say.)
342:
343:
344: \subsection{Distance function}
345:
346: We choose the distance functions to test MOO's applicability for different
347: neighborhood structures and distance properties.
348: We start with $1,2,\ldots,n$ as nodes
349: and $d(x,x^\prime)$ given by $|x-x^\prime|$, $(x-x^\prime)^2$
350: and $|x-x^\prime|x^\prime$ ---
351: only $|x-x\p|$ satisfies the triangular inequality,
352: and $|x-x\p|x\p$ is not symmetric.
353: We also consider $n$ nodes on a square grid with integer coordinates,
354: with $d((x,y),(x^\prime,y^\prime))$ given by
355: $|x-x^\prime|+|y-y^\prime|$ and
356: $|x-x^\prime|x^\prime+|y-y^\prime|y^\prime$.
357:
358:
359: \subsection{Request generation}
360:
361: The training and test streams are generated with transition matrices
362: in which an entry $p_{ij}$ is the probability that a request is for node $j$
363: given that the previous request was for node $i$.
364: The fraction of nonzero entries is 10--20\% for a {\it sparse} matrix and
365: 80--90\% for a {\it dense} matrix.
366: We use these matrices to generate a stream in two ways:
367:
368: \setlength{\parindent}{10pt}
369:
370: \iitem{$\bullet$}
371: A {\it 1-matrix} stream is generated with a single matrix.
372: This is similar to Karlin et al's markov paging,
373: or a random walk on Borodin et al's access graph [KPR, BIRS].
374:
375: \iitem{$\bullet$}
376: A {\it 2-matrix} stream is generated alternately with two matrices:
377: $L$ requests are generated with one matrix,
378: followed by $L$ requests from the other matrix;
379: at the switchover, if the last request from one matrix is $i$,
380: then $p_{ij}$ from the other matrix is used to generate the next request.
381: This gives a nonhomogeneous markov chain that is a random walk on two graphs,
382: in contrast to the simultaneous walks used by Fiat et al [FK, FM].
383: In this paper, we arbitrarily fix $L$ to be 10.
384: The purpose of using a 2-matrix stream is to see how MOO reacts to a mixture
385: of request patterns.
386:
387: \setlength{\parindent}{25pt}
388:
389: \noindent
390: An example of a matrix and a stream that it generates
391: are given in the Appendix.
392:
393: \begin{figure}[tbp]
394:
395: \noindent
396: $k=5$ servers, $n=9$ nodes on a line,
397: distance function $(x-x^\prime)^2$ \hfill\break
398: 1-matrix (sparse) stream, training length 2000, test length 2000
399:
400: \vbox{
401: \def\tablerule{\noalign{\hrule}}
402: %\hrule
403: \halign
404: {&\vrule#& \strut\ #
405: &\vrule#& \ #\
406: &\vrule#& \ #\
407: &\vrule#& \ #\
408: &\vrule#& \ #\
409: &\vrule#& \ #\
410: &\vrule#& \ #\
411: &\vrule#\cr\tablerule
412: & && optimum && \multispan7 $\underline{\rm \phantom{XXXXXXXXXXXX}
413: competitive\ ratio \phantom{XXXXXXXXXXX}}$ && invalid &\cr
414: %& && && \multispan7\leaders\hrule\hfil && &\cr
415: & && cost && MOO && Greedy && Balance && Harmonic && assignment &\cr\tablerule
416: & $S_1$ && 402/408/\underbar{381} && 1.00/1.01/\underbar{1.00}
417: && 1.36/1.75/\underbar{2.13} && 2.29/2.31/\underbar{2.30}
418: && 5.58/5.22/\underbar{4.93} && 0/0/\underbar{0} &\cr\tablerule
419: & $S_2$ && 90/113/104 && 1.09/3.04/1.30 && 4.93/4.76/1.62
420: && 3.00/1.98/1.92 && 5.21/5.83/5.40 && 13/101/2 &\cr
421: }
422: \hrule}
423: \noindent
424: $S_1$ and $S_2$ are different matrices.
425: A triple $x/y/z$ for row $S_i$ gives the results from three task streams
426: generated with $S_i$.
427: For MOO, the first stream is used as the training stream,
428: and all three are used as test streams;
429: $x$ is the result for the training stream used as test stream
430: (this is why we have the same length for training and test streams).\hfill\break
431: The underlined numbers are results for one run (i.e. one task stream) of $S_1$.
432: The competitive ratio is cost incurred by an algorithm for a run divided
433: by the optimum cost for that run.
434: The last column reports the number of times the MOO classifier makes an
435: invalid server assignment.
436:
437: \vglue 5pt
438: \centerline{{\bf Table 1}\quad
439: For strong patterns, MOO can be close to the optimum.}
440: \vglue 5pt
441: \hrule
442: \vglue -5 pt
443: \end{figure}
444:
445:
446: \section{Experimental results}
447:
448: There are several variables in our experimental setup:
449: $k$, $n$, line/grid, distance, sparse/dense, pattern mixture,
450: starting configuration and stream length.
451: The stream length $s$ is the most crucial because the
452: offline optimum has complexity $O(ks^2)$ ---
453: on a 167MHz UltraSPARC, it can take 7 minutes for $s=2000$
454: and 1 hour for $s=2500$.
455: The time complexity is compounded by the large memory required
456: to store the network for finding the optimum
457: --- we have only one machine with sufficient main memory.
458:
459: If we choose $s$ large enough for the optimum and heuristics
460: to all reach steady state, the time commitment would be overwhelming.
461: Instead, in most cases, we set $s$ just large enough
462: that conclusions can already be drawn,
463: despite significant statistical variations for any particular solution.
464: (This is similar to analysis of variance in statistics,
465: where one can separate the means of two variables if the variation of each
466: is ``smaller'' than the separation.)
467:
468: With the bottleneck of one workstation generating the results,
469: we have chosen a small number of experiments that cut through the myriad
470: possible combinations of variables.
471: We concede that the data may be insufficient to support some of our conclusions,
472: so these should be regarded as tentative insight
473: rather than authoritative conclusions.
474:
475:
476: \subsection{Nodes on a line}
477:
478: Table 1 presents an experiment with a strong pattern in the stream of requests
479: coming to 5 servers for 9 nodes on a line,
480: with a $d$ that violates triangular inequality.
481: After 2000 requests, the fluctuations are small enough for us to draw
482: some conclusions.
483:
484: First, the average optimum cost per request is less than 1,
485: and this is because most requests are for a node that already has a server.
486: Second, the competitive ratios for a fixed request distribution can be
487: significantly smaller than the $k$-server bound [MMS];
488: this is similar to previous observations [BaE, FR].
489: Third, MOO can achieve the optimum ---
490: the sparse matrix induces a strong pattern in the offline optimum solution,
491: and this pattern is captured in the decision tree used by MOO.
492:
493: The starting configuration used in the three runs are the same for $S_1$,
494: but different for $S_2$.
495: The results for $S_2$ show that the configuration can have a strong effect
496: --- the heuristics' performance ordering and competitive ratios
497: both become erratic.
498: In contrast, the ordering for the three runs of $S_1$ are the same,
499: and the ratios are reasonably stable except for Greedy,
500: which is sensitive to the stream instance.
501: To factor in the effect of the starting configuration,
502: this configuration is henceforth changed from run to run,
503: unless otherwise stated.
504:
505: Despite the erratic results for $S_2$ and the fact that MOO uses a greedy
506: strategy whenever the classifier makes an invalid assignment,
507: MOO has a significantly smaller ratio that Greedy,
508: thus showing the contribution from data mining.
509: A check shows that the trees are small but unintuitive
510: --- an example is given in the Appendix ---
511: since they imitate the offline optimum (which ``sees'' future requests).
512:
513:
514: \begin{figure}[tbp]
515: \noindent
516: $k=5$ servers, $n=9$ nodes on a line,
517: distance function $(x-x^\prime)^2$ \hfill\break
518: 1-matrix (dense) stream, training length 2000, test length 2000
519:
520: \vbox{
521: \def\tablerule{\noalign{\hrule}}
522: %\hrule
523: \halign
524: {&\vrule#& \strut\ #
525: &\vrule#& \ #\
526: &\vrule#& \ #\
527: &\vrule#& \ #\
528: &\vrule#& \ #\
529: &\vrule#& \ #\
530: &\vrule#& \ #\
531: &\vrule#\cr\tablerule
532: & && optimum && \multispan7 $\underline{\rm \phantom{XXXXXXXXXXXX}
533: competitive\ ratio \phantom{XXXXXXXXXXX}}$ && invalid &\cr
534: %& && && \multispan7\leaders\hrule\hfil && &\cr
535: & && cost && MOO && Greedy && Balance && Harmonic && assignment &\cr\tablerule
536: & $D_1$ && 715/687/728 && 1.16/1.21/1.20 && 1.28/1.27/2.09
537: && 1.72/1.85/1.66 && 4.24/4.40/4.26 && 1/1/0 &\cr\tablerule
538: & $D_2$ && 684/692/732 && 1.19/1.22/1.18 && 1.94/1.44/1.29
539: && 1.72/1.88/1.87 && 3.71/4.70/4.37 && 1/10/0 &\cr
540: }
541: \hrule}
542:
543: \vglue 5pt
544: \centerline{{\bf Table 2}\quad For weak patterns, MOO is best.}
545: \vglue 5pt
546: \hrule
547: \vglue -5pt
548: \end{figure}
549:
550: In Table 1, MOO can get close to the optimum because the patterns are strong.
551: For a dense matrix, the pattern is much weaker.
552: Nonetheless, Table 2 shows that MOO has the smallest ratio,
553: and the invalid assignments are surprisingly few.
554: Further,
555: the difference in starting configurations between the training and test
556: streams does not have a big effect on MOO's results,
557: in contrast to the results for a strong pattern
558: (recall: the starting configurations in Table 1
559: are the same for 1.00/1.01/1.00 and different for 1.09/3.04/1.30).
560:
561: The number of potential cases for the classifier is $n {n\choose k}$,
562: which is 1134 and comparable to the training length (2000) for Table 2.
563: Even so, the performance ordering and ratios are reasonably stable,
564: except for Greedy;
565: when we tested the heuristics again with the runs using the same
566: starting configuration,
567: fluctuation in Greedy's ratios narrowed down considerably,
568: thus indicating that Greedy remains sensitive to the starting configuration
569: for weak patterns.
570: The decision trees, though bigger than the two for Table 1, remain small:
571: the tree for $D_1$ is 3Kbytes
572: and has only 27 decision nodes.
573:
574:
575:
576: \begin{figure}[tbp]
577: \noindent
578: \vbox{\tabskip=1em
579: \halign{& #\hfil\cr
580: \epsfbox{figure1.eps} & \epsfbox{figure2.eps} \cr
581: \qquad $n=9$ nodes, distance $|x-x^\prime|$ &
582: \qquad $k=5$ servers, distance $|x-x^\prime|x^\prime$ \cr
583: \qquad 2-matrix (sparse-dense) stream &
584: \qquad 2-matrix (dense-dense) stream \cr
585: \qquad stream length 2000 &
586: \qquad stream length varies with $n$ \cr
587: \qquad H is for Harmonic, B for Balance, &
588: \qquad at $n=6$, H is 9.6 and G is 10.6\cr
589: \qquad G for Greedy, M for MOO &
590: \qquad \hglue 1 true cm \cr
591: {\bf Figure 1}\quad MOO fits into the gap &
592: {\bf Figure 2}\quad MOO stays close to optimum \cr
593: \hglue 1.8 true cm between Greedy and optimum. &
594: \hglue 1.9 true cm for all $n$. \cr
595: }}
596: \vskip 5pt
597: \hrule
598: \vskip -5pt
599: \end{figure}
600:
601:
602: All heuristics are trivially optimum if $k=1$,
603: but the gap between existing heuristics and the optimum should open up
604: as $k$ increases;
605: to prove its worth, MOO must fit into this gap.
606:
607: In Figure 1 (and the following graphs),
608: each data point is the average of 6 runs.
609: It shows that, for a 2-matrix stream and distance $|x-x^\prime|$,
610: the gap between Greedy and optimum opens up at $k=5$ for $n=9$,
611: and MOO does fit into the gap.
612: At $k=5$ for $|x-x^\prime|$, the difference between MOO and Greedy is
613: negligible (if we consider the average ratio over 6 runs;
614: Greedy's ratio is smaller in some runs and MOO's smaller in others).
615: In contrast, Tables 1 and 2 show that MOO's ratios are noticeably smaller
616: than Greedy's at $k=5$ for $(x-x^\prime)^2$,
617: which penalizes large movements.
618: The gaps among the heuristics open further at $k=5$ and $n=9$ for $|x-x\p|x\p$
619: in Figure 2.
620:
621: The alternation between strong and weak patterns does not affect
622: MOO's ability to outperform the other heuristics in Figure 1,
623: and Figure 2 shows this remains so for alternating between two weak patterns.
624: In fact, unlike Harmonic and Balance,
625: MOO stays close to the optimum as $n$ scales up,
626: thus demonstrating again its ability to learn from the optimum solution.
627:
628: For an asymmetrical and punitive $|x-x\p|x\p$,
629: the ``right'' server placement is important to being close to optimum
630: for small $n$,
631: so Greedy's simplistic strategy does poorly there.
632: For large $n$, even the optimum has its servers spread out,
633: and the violation of the triangular inequality favors incremental
634: server movements,
635: thus making it possible for Greedy to get close to the optimum.
636:
637:
638: \begin{figure}[tbp]
639: \noindent
640: \vbox{\tabskip=1em
641: \halign{& #\hfil\cr
642: \epsfbox{figure3.eps} & \epsfbox{figure4.eps} \cr
643: \qquad $n=9$, distance $|x-x^\prime|+|y-y\p|$ &
644: \qquad $k=5$, distance $|x-x^\prime|x^\prime+|y-y\p|y\p$ \cr
645: \qquad same stream and starting configuration &
646: \qquad same stream and starting configuration \cr
647: \qquad as Figure 1 &
648: \qquad as Figure 2 \cr
649: {\bf Figure 3}\quad For a grid, &
650: {\bf Figure 4}\quad For a grid, \cr
651: \hglue 1.8 true cm MOO still fits in the gap. &
652: \hglue 1.9 true cm MOO still stays close to optimum. \cr
653: }}
654: \vskip 5pt
655: \hrule
656: \vskip -5pt
657: \end{figure}
658:
659:
660: \subsection{Nodes on a grid}
661:
662: Intuitively, a heuristic should incur lower costs if nodes have more neighbors,
663: but its ratio can increase because
664: the optimum may make better use of the neighbors in reducing its cost.
665:
666: Figure 3 shows the results of repeating the runs for Figure 1
667: --- same starting configurations and request streams ---
668: on a grid (instead of a line).
669: Harmonic does perform better,
670: but the effect on the ratios for Balance and Greedy is mixed.
671: A check (of the detailed data) shows that, contrary to our intuition,
672: their costs are sometimes higher for the grid.
673: It appears that the increase in the number of neighbors
674: also leads Balance and Greedy to make short-sighted moves that
675: raise costs eventually.
676: In any case, MOO remains in the gap between Greedy and optimum
677: when $k$ increases.
678:
679: Similar results hold when $n$ is varied.
680: Comparing Figures 2 and 4,
681: we see that the ratios for a grid are noticeably smaller for Harmonic
682: but larger for Greedy.
683: A check shows that costs are lower (often by an order of magnitude),
684: so all solutions benefit from having more neighbors
685: when $d$ is $|x-x\p|x\p+|y-y\p|y\p$.
686: However, the spreading-out effect that allows Greedy to get close to
687: the optimum in Figure 2 is less for a grid,
688: so Greedy is further from the optimum in Figure 4.
689: Again, we see the gap among the heuristics opening up at $k=5$ and $n=9$
690: when $d$ changes from $|x-x\p|+|y-y\p|$ to $|x-x\p|x\p+|y-y\p|y\p$.
691:
692: MOO, on the other hand, stays close to optimum, like in Figure 2.
693: The detailed data show that there are at most 2 invalid assignments
694: (that are resolved greedily) at $n=9$ and less than $12\%$ such assignments
695: at $n=25$; hence, MOO relies mostly on the decision tree,
696: which has successfully captured the optimum solution
697: even though the requests are a mixture of two weak patterns.
698:
699:
700:
701: \section{Conclusion}
702:
703: \subsection{Summary}
704:
705: We now summarize our observations:
706:
707: \iitem{$\bullet$}
708: MOO fits into the gap between the offline optimum and other online heuristics
709: (Figures 1--4).
710: For a strong pattern, MOO can be close to optimum,
711: but may lose to other heuristics because of sensitivity to the starting
712: configuration (Table 1).
713: MOO does well even if the requests have a weak pattern (Table 2)
714: or alternate between patterns (Figures 1--4).
715:
716: \iitem{$\bullet$}
717: MOO outperforms the other heuristics even if the distances are asymmetric
718: (Figures 2 and 4) or violate the triangular inequality (Tables 1 and 2).
719: Increasing the number of neighbors can increase costs,
720: but MOO's ratios remain stable (Figures 1 and 3, 2 and 4).
721:
722: \iitem{$\bullet$}
723: MOO stays close to the optimum as $n$ varies (Figures 2 and 4).
724:
725: \iitem{$\bullet$}
726: The classifier can get an effective decision tree even for
727: relatively short stream lengths, the trees are small and the mining (Step 4)
728: is fast (sub-second).
729:
730:
731:
732:
733:
734: \subsection{Challenging issues}
735:
736: MOO poses some challenging problems for
737: this new application of data mining:
738:
739:
740: \iitem{$\bullet$}
741: How to analyze the competitive ratios produced with data mining?
742:
743: \iitem{$\bullet$}
744: For the $k$-server problem,
745: why does MOO perform well for weak patterns and short training
746: streams?
747: (For the buffer replacement problem, mining can produce good results
748: even if the requests are a mixture of 100 patterns [TTL].)
749:
750: \iitem{$\bullet$}
751: What sort of data mining would be appropriate for
752: web caching, video-on-demand, etc.?
753:
754:
755:
756:
757: \vskip25pt\noindent
758: {\bf Acknowledgment}
759:
760: \noindent
761: Many thanks to C.P. Teo for his help with network flow
762: and Hongjun Lu for his comments.
763:
764:
765: \subsection{References}
766:
767: %\setlength{\parskip}{4pt}
768: %\parskip=4pt
769: %\baselineskip=11pt
770: \setlength{\parindent}{2.0 cm}
771:
772: \iitem{[A+]}
773: R. Agrawal, M. Mehta, J. Shafer, R. Srikant, A. Arning and T. Bollinger,
774: {\it The Quest data mining system},
775: Proc. KDD, Portland, OR (Aug. 1996), 244--249.
776:
777: \iitem{[AAFPW]}
778: J. Aspnes, Y. Azar, A. Fiat, S. Plotkin and O. Waarts,
779: {\it On-line load balancing with applications to machine scheduling and
780: virtual circuit routing},
781: Proc. STOC, San Diego, CA (May 1993), 623--630.
782:
783: \iitem{[BB]}
784: A. Blum and C. Burch,
785: {\it On-line learning and the metrical task system problem},
786: Proc. COLT, Nashville, TN (July 1997), 45--53.
787:
788: \iitem{[BaE]}
789: R. Bachrach and R. El-Yaniv,
790: {\it Online list accessing algorithms and their applications: recent
791: empirical evidence},
792: Proc. SODA, New Orleans, LA (Jan. 97), 53--62.
793:
794: \iitem{[BoE]}
795: A. Borodin and R. El-Yaniv,
796: {\sl Online Computation and Competitive Analysis},
797: Cambridge University Press, Cambridge, UK (1998).
798:
799: \iitem{[BIRS]}
800: A. Borodin, S. Irani, P. Raghavan and B. Schieber,
801: {\it Competitive paging with locality of reference},
802: Proc. STOC, New Orleans, LA (May 1991), 249--259.
803:
804: \iitem{[CDRS]}
805: D. Coppersmith, P. Doyle, P. Raghavan and M. Snir,
806: {\it Random walks on weighted graphs and applications to on-line algorithms},
807: J. ACM 40, 3 (July 1993), 421--453.
808:
809: \iitem{[CKPV]}
810: M. Chrobak, H. Karloff, T. Payne and S. Vishwanathan,
811: {\it New results on server problems},
812: SIAM J. Disc. Math. 4, 2(May 1991), 172--181.
813:
814: \iitem{[CL]}
815: M. Chrobak and L.L. Larmore,
816: {\it An optimal on-line algorithm for $k$-servers on trees},
817: SIAM J. Computing 20, 1(1991), 144--148.
818:
819: \iitem{[FK]}
820: A. Fiat and A.R. Karlin,
821: {\it Randomized and multipointer paging with locality of reference},
822: Proc. STOC, Las Vegas, NV (May 1995), 626--634.
823:
824: \iitem{[FKLMSY]}
825: A. Fiat, R.M. Karp, M. Luby, L.A. McGoech, D.D. Sleator and N.E. Young,
826: {\it Competitive paging algorithms},
827: J. Algorithms 12, 4(Dec. 1991), 685--699.
828:
829: \iitem{[FLTT]}
830: L. Feng, H. Lu, Y.C. Tay and K.H. Tung,
831: {\it Buffer management in distributed database systems:
832: A data mining approach},
833: Proc. EDBT, Valencia, Spain (Apr. 1998), 246--260.
834:
835: \iitem{[FM]}
836: A. Fiat and M. Mendel,
837: {\it Truly online paging with locality of reference},
838: Proc. FOCS, Miami Beach, FL (Oct. 1997), 326--335.
839:
840: \iitem{[FR]}
841: A. Fiat and Z. Rosen,
842: {\it Experimental studies of access graph based heuristics:
843: beating the LRU standard?},
844: Proc. SODA, New Orleans, LA (Jan. 1997), 63--72.
845:
846: \iitem{[FRR]}
847: A. Fiat, Y. Rabani and Y. Ravid,
848: {\it Competitive $k$-server algorithms},
849: Proc. FOCS, St. Louis, MO (Oct. 1990), 454--463.
850:
851: \iitem{[H+]}
852: J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu,
853: A. Rajan, N. Stefanovic, B. Xia and O.R. Zaiane,
854: {\it DBMiner: A system for mining knowledge in large relational databases},
855: Proc. KDD, Portland, OR (Aug. 1996), 250--255.
856:
857: \iitem{[KMMO]}
858: A.R. Karlin, M.S. Manasse, L.A. McGeoch and S. Owicki,
859: {\it Competitive randomized algorithms for non-uniform problems},
860: Proc. SODA, San Francisco, CA (Jan. 1990), 301--309.
861:
862: \iitem{[KMRS]}
863: A.R. Karlin, M.S. Manasse, L. Rudolph and D.D. Sleator,
864: {\it Competitive snoopy caching},
865: Algorithmica 3, 1(1988), 79--119.
866:
867: \iitem{[KP]}
868: E. Koutsoupias and C. Papadimitriou,
869: {\it On the $k$-server conjecture},
870: Proc. STOC, Montreal, Canada (May 1994), 507--511.
871:
872: \iitem{[KPR]}
873: A.R. Karlin, S.J. Phillips and P. Raghavan,
874: {\it Markov paging},
875: Proc. FOCS, Pittsburgh, PA (Oct. 1992), 208--217.
876:
877: \iitem{[MMS]}
878: M.S. Manasse, L.A. McGeoch and D.D. Sleator,
879: {\it Competitive algorithms for on-line problems},
880: Proc. STOC, Chicago, IL (May 1988), 322--333.
881:
882: \iitem{[MS]}
883: L.A. McGeoch and D.D. Sleator,
884: {\it A strongly competitive randomized paging algorithm},
885: Algorithmica 6, 6(1991), 816--825.
886:
887: \iitem{[Q]}
888: J.R. Quinlan,
889: {\sl C4.5: Programs for Machine Learning},
890: Morgan Kaufman, San Mateo, CA (1993).
891:
892: \iitem{[RS]}
893: P. Raghavan and M. Snir,
894: {\it Memory versus randomization in on-line algorithms},
895: Proc. ICALP, Stresa, Italy (July 1989), 687--703.
896:
897: \iitem{[ST]}
898: D.D. Sleator and R.E. Tarjan,
899: {\it Amortized efficiency of list update and paging rules},
900: C. ACM 28, 2(Feb. 1985), 202--208.
901:
902: \iitem{[T]}
903: K.H. Tung,
904: {\it Parking in a Marina},
905: Honors Year Project Report, DISCS, National University of Singapore (1997).
906:
907: \iitem{[TTL]}
908: K.H. Tung, Y.C. Tay and H. Lu,
909: {\it BROOM: Buffer replacement using online optimization by mining},
910: Proc. CIKM, Bethesda, MD (Nov. 1998), 185--192.
911:
912: \iitem{[Y]}
913: N. Young,
914: {\it On-line file caching},
915: Proc. SODA, San Francisco, CA (Jan. 1998), 82--86.
916:
917: \newpage
918: \section{Appendix}
919:
920: $$\bordermatrix{
921: & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \cr
922: 0 & 0.00 & 0.45 & 0.00 & 0.55 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr
923: 1 & 0.00 & 0.00 & 0.00 & 0.58 & 0.00 & 0.00 & 0.00 & 0.00 & 0.42 \cr
924: 2 & 0.31 & 0.69 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr
925: 3 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00 & 0.00 \cr
926: 4 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr
927: 5 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.02 & 0.98 \cr
928: 6 & 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr
929: 7 & 0.00 & 0.00 & 0.35 & 0.00 & 0.62 & 0.03 & 0.00 & 0.00 & 0.00 \cr
930: 8 & 0.00 & 0.47 & 0.00 & 0.00 & 0.00 & 0.00 & 0.53 & 0.00 & 0.00 \cr}$$
931:
932: \centerline{{\bf Figure A.1}\quad Sparse matrix $S_1$ of Table 1.}
933:
934: \vglue 10pt
935:
936: \noindent
937: 1 8 6 0 1 3 5 8 6 0 3 5 8 1 3 5 8 6 0 1 3 5 8 1 3 5 7 2 1 3 5 8 1 3 5 8 6 0 1 8 6 0 1 8 1 3 5 8 1 3 5 8 1
938:
939: \vglue 5pt
940: \centerline{{\bf Figure A.2}\quad $S_1$ generates a strong pattern.}
941:
942: \vglue 10pt
943:
944: \vbox{\tabskip=1em
945: \halign{& #\hfil\cr
946: Request from = 2: 3 &\cr
947: Request from = 4: 5 &\cr
948: Request from = 7: 8 &\cr
949: Request from = 0: &\cr
950: $|$\quad Node 0 status = 0: 1 &
951: {\tt // this tree has depth 1 only }\cr
952: $|$\quad Node 0 status = 1: 0 &
953: {\tt // weaker patterns induce deeper trees }\cr
954: Request from = 1: &\cr
955: $|$\quad Node 0 status = 0: 1 &\cr
956: $|$\quad Node 0 status = 1: 0 &
957: {\tt // how to read C4.5's decision tree:}\cr
958: Request from = 3: &
959: {\tt // if the request is for node 3 }\cr
960: $|$\quad Node 2 status = 0: 3 &
961: {\tt // then (a) if no server is at 2, then use server at 3}\cr
962: $|$\quad Node 2 status = 1: 2 &
963: {\tt // \qquad \ (b) else move the server from 2 }\cr
964: Request from = 5: &
965: {\tt // note: the tree is used only if no server }\cr
966: $|$\quad Node 5 status = 0: 4 &
967: {\tt //\qquad\qquad is at the requested node}\cr
968: $|$\quad Node 5 status = 1: 5 &
969: {\tt //\qquad\qquad so (a) is an invalid assignment }\cr
970: Request from = 6: &
971: {\tt //\qquad\qquad and (b) will not put two servers at 3 }\cr
972: $|$\quad Node 6 status = 0: 5 &\cr
973: $|$\quad Node 6 status = 1: 6 &\cr
974: Request from = 8: &
975: {\tt // this tree always assigns a server from a neighboring node }\cr
976: $|$\quad Node 8 status = 0: 7 &
977: {\tt // in agreement with $d$ in Table 1}\cr
978: $|$\quad Node 8 status = 1: 8 &
979: {\tt // which favors incremental movements }\cr
980: }}
981:
982: \vglue 5pt\noindent
983: Note that C4.5 (appropriately) selects the request to be the root.
984: However, the rest of the tree is unintuitive,
985: since the tree is mined from an offline optimum that ``sees'' future requests.
986:
987: \vglue 5pt
988: \centerline{{\bf Figure A.3}\quad
989: Decision tree from an optimum solution for a sequence generated with $S_1$.}
990:
991: \end{document}
992: