0003:cs0003072/moo.tex

1: \documentclass[epsf]{article}

2: \setlength{\parskip}{5pt}

3: \setlength{\textwidth}{6true in}

4: \setlength{\hoffset}{-1true in}

5: \newcommand{\p}{^\prime}

6: \input epsf

7: \def\iitem#1{\noindent#1\vglue-\baselineskip\vglue-\parskip

8:              \hangindent=\parindent\hangafter=1}

9:

10: \pagestyle{plain}

11: \setcounter{page}{1}

12:

13:

14: \begin{document}

15:

16:

17: \title{MOO: A Methodology for Online Optimization through

18: Mining the Offline Optimum}

19:

20: \author{\sl Jason W.H. Lee \qquad Y.C. Tay \qquad Anthony K.H. Tung \\

21: \rm National University of Singapore, \\

22: Kent Ridge 117543, REPUBLIC OF SINGAPORE \\

23: \tt tay@acm.org}

24:

25: \date{Department of Mathematics Research Report No. 743\\

26: (June 98; revised: January 99)}

27: \maketitle

28:

29:

30:

31:

32: \centerline{\sl Abstract}

33: Ports, warehouses and courier services have to decide online how an arriving

34: task is to be served in order that cost is minimized (or profit maximized).

35: These operators have a wealth of historical data on task assignments;

36: can these data be mined for knowledge or rules that can help

37: the decision-making?

38:

39: MOO is a novel application of data mining to online optimization.

40: The idea is to mine (logged) expert decisions or the offline optimum

41: for rules that can be used for online decisions.

42: It requires little knowledge about the task distribution and cost structure,

43: and is applicable to a wide range of problems.

44:

45: This paper presents a feasibility study of the methodology

46: for the well-known $k$-server problem.

47: Experiments with synthetic data show that optimization can be recast as

48: classification of the optimum decisions;

49: the resulting heuristic can achieve the optimum for strong request patterns,

50: consistently outperforms other heuristics for weak patterns,

51: and is robust despite changes in cost model.

52:

53:

54:

55: \section{Introduction}

56:

57: In online optimization, a stream of tasks arrives at a system for service.

58: Each task must be served --- before the next arrival ---

59: at a cost that depends on the system's state,

60: which may be changed by the task.

61: The objective is to minimize the cost of servicing the entire task stream.

62:

63: The introduction of competitive analysis [ST, KMRS]

64: inspired a large body of work on online optimization in the last ten years

65: [BoE].

66: This form of analysis uses a {\it competitive ratio}

67: to compare the online heuristic's cost to the offline optimum

68: (obtained with the task stream known in advance).

69: In other words, the objective of the online decision algorithm

70: is to match the offline optimum, and this often means imitating the latter.

71:

72: This objective is the basis of our proposal on a new methodology

73: for online optimization.

74: Suppose there are patterns in the task arrivals

75: --- i.e. task generation is constrained by a distribution;

76: these patterns and the cost structure in turn combine to induce

77: patterns in the offline optimum solution,

78: and the online decision algorithm can exploit these patterns to get close

79: to the optimum.  Hence, the idea is:

80:

81: \setlength{\parindent}{35pt}

82:

83: \iitem{{\bf Step 1}}

84: Take a task stream (the {\it training stream})

85: that was previously generated by the distribution.

86:

87: \iitem{{\bf Step 2}}

88: Obtain the offline optimum solution

89: (i.e. the sequence of decisions for servicing the tasks).

90:

91: \iitem{{\bf Step 3}}

92: Transform the optimum solution into a database of records.

93:

94: \iitem{{\bf Step 4}}

95: Apply data mining to this database to extract patterns.

96:

97: \iitem{{\bf Step 5}}

98: Use the patterns to formulate online decision rules for servicing a task

99: stream (the {\it test stream}) generated by the same distribution.

100:

101:

102: \setlength{\parindent}{25pt}

103:

104: We call this methodology for online optimization {\it MOO},

105: whose essential feature is mining the offline optimum (Step 4).

106: This feature distinguishes MOO from the vast literature in machine learning

107: and database mining;

108: it is also different from applying algorithms for online learning

109: to online optimization [BB],

110: from using data collected online to make decisions [KMMO, FM], and

111: from mining database access histories for buffer management [FLTT].

112: MOO's strengths are:

113: (1)

114: It is a methodology that is applicable to a wide range of problems in

115: online optimization (e.g. taxi assignment [FRR],

116: packet routing [AAFPW], web caching [Y]).

117: (2)

118: It requires minimal knowledge about the task distribution and cost structure

119: (and the mining in Step 4 makes no effort to discover them).

120: (3)

121: The sort of information to be mined

122: (classification, association, clustering, etc.)

123: may vary to suit the context.

124: (4)

125: The technique for mining (item-set sampling, neural networks, etc.)

126: can be appropriately chosen.

127:

128:

129: On the other hand, MOO's weaknesses are:

130: (1)

131: An optimum solution for the training stream must be available.

132: This is an issue if no tractable algorithm is known for generating

133: the optimum.

134: MOO, however, only requires the availability of the optimum and does not

135: assume its tractability; it thus treats the optimum solution like an oracle.

136: This oracle may, in fact, be human,

137: in which case the methodology's objective is to approximate the expert's

138: performance (for this, MOO is milking the oracle offline).

139: Incidentally, the oracle may yield the optimum solution without providing

140: information about the costs.

141: (2)

142: The task distribution must be stationary [KMMO],

143: so that the information mined with the training stream remains relevant

144: for the test stream.

145: (3)

146: MOO may need a significant amount of memory to

147: store the rules for making online decisions.

148:

149:

150: To demonstrate MOO, we apply it to the {\it $k$-server problem}.

151: We chose this problem because

152: it is the prototypical and most intensively studied online problem [BoE].

153: It is also close to a container yard management problem

154: that the Port of Singapore Authority is interested in.

155:

156:

157: The decision is cast as a classification problem,

158: and we use Quinlan's C4.5 to mine the optimum,

159: as well as for online classification.

160: This software [Q] was written for machine learning,

161: but suffices for our purpose since the data set is not large

162: and both the offline mining and online classification are fast.

163: However, we envisage that other applications of MOO

164: (e.g. using techniques other than classification,

165: or approximating an expert through mining historical data)

166: may require software that are specifically equipped

167: with data mining technology [A+, H+].

168:

169: We present here an experimental study of how classification

170: can be used for the $k$-server problem.

171: The objectives are:

172: to establish the viability of the methodology;

173: to explore how MOO's effectiveness is influenced by the strength of patterns,

174: the cost structure, the stream lengths, etc.;

175: and

176: to prepare a case for access to commercial data.

177:

178:

179: As is implicit in that third objective,

180: our experiments use synthetic data;

181: this is because

182: a systematic exploration of MOO's effectiveness requires controlled

183: experiments in which various factors can be tuned individually;

184: whereas

185: real data are affected by constraints and noise (that affect optimality),

186: and these get in the way of a feasibility study

187: that tries to build up an understanding of the methodology.

188: Moreover,

189: gaining access to commercial data is difficult without first making a case

190: with synthetic data.

191: (As far as we know, no real data for the $k$-server problem is available

192: in the research community.)

193:

194:

195: The work reported here is significant in the following ways:

196: (1)

197: The experiments on synthetic data show that the methodology is feasible

198: --- MOO fits into the gap between the offline optimum and other online

199: heuristics, can come close to the optimum for strong patterns,

200: does well for weak patterns, and is robust with respect to the cost structure.

201: (2)

202: It shows that optimization can be recast as classification.

203: (3)

204: MOO is a novel application of a concept in data engineering to a problem

205: in algorithm theory,

206: thus serving as a bridge between the two:

207: This application poses challenging new problems in the analysis

208: of online optimization (see Section 5.2);

209: conversely, data mining (being an art --- consider Steps 3 to 5)

210: will benefit from the algorithm community's insight

211: into what information to look for and how to do the mining.

212: (For example, the optimum solution for buffer replacement [MS]

213: suggests that

214: association rules $S\rightarrow P$ between a set of pages $S$ and a page

215: reference $P$ should be annotated by a ``distance'' $d$ between $S$ and $P$

216: mined from the reference stream, and $d$ used for buffer management [TTL].)

217: By offering a database perspective on online optimization,

218: MOO has the potential of facilitating a mutually enriching interaction

219: among database management, machine learning and algorithm analysis.

220:

221:

222: We first describe the $k$-server problem in Section 2.

223: The experimental setup is presented in Section 3

224: and the results examined in Section 4.

225: Section 5 then concludes with a summary of our observations

226: and poses some interesting and hard problems for this new application of

227: data mining.

228:

229:

230: \section{The $k$-server problem}

231:

232: The $k$-server problem is defined on a set of points with a distance

233: function $d$.

234: Conceptually, the set may be infinite but, for our experiments,

235: it consists of $n$ {\it nodes}.

236: Unlike most papers on $k$-servers,

237: we do not require that $d$ satisfy the triangular inequality,

238: nor that it be symmetric.

239: We also do not assume that $d$ is known to the online decision algorithm.

240:

241: There are $k$ {\it servers} who are positioned at different nodes.

242: (Some authors allow multiple servers at one node [KP].)

243: A {\it task} is a request that specifies a node $i$,

244: and is served at 0 cost if there is already a server at $i$,

245: or by moving a server from some node $j$ to $i$, at cost $d(j,i)$.

246: (Some authors allow multiple server movements per task [CL].)

247:

248: A task {\it stream} is a sequence of arriving requests $T_1,\ldots,T_s$;

249: an {\it online solution} uses only $T_1,\ldots, T_{m-1}$ to determine

250: how $T_m$ is served,

251: while an {\it offline solution} uses $T_1,\ldots,T_s$ to determine how

252: each request is served.

253: A {\it configuration} is a set of $k$ nodes that specifies the location

254: of the servers before the arrival of a request.

255:

256: Most algorithms in the literature for the $k$-server problem are for

257: special cases.

258: For example, Fiat et al's marking algorithm is for paging,

259: and Coppersmith et al's RWALK is for resistive metric spaces [FKLMSY, CDRS].

260: The work function algorithm [KP] is, in theory, applicable to any $k$-server

261: problem, but it is computationally intensive and (as far as we know)

262: implemented only for special cases.

263: In our experiments, we compare MOO to three algorithms.

264: If an arriving request is for node $i$ and there is no server at $i$,

265: these algorithms respond as follows:

266:

267: \setlength{\parindent}{50pt}

268:

269: \iitem{Greedy:}

270: Choose a server at node $j$ for which $d(j,i)$ is minimum.

271:

272: \iitem{Balance:}

273: Let $b_j=c_j+d(j,i)$ where $c_j$ is the cost incurred so far by the server

274: at node $j$; choose a server with minimum $b_j$ [MMS].

275:

276: \iitem{Harmonic:}

277: Let $h_j=1/d(j,i)$ for each node $j$ with a server;

278: choose the server at $j$ with probability $h_j/\sum_r h_r$ [RS].

279:

280:

281: \noindent

282: Note that, unlike MOO, these three heuristics require knowledge of $d$.

283:

284: \setlength{\parindent}{25pt}

285:

286:

287:

288: \section{Experimental setup}

289:

290: \subsection{Classification}

291:

292: In {\it classification}, a decision tree is built from a set of {\it cases},

293: where each case is a tuple of {\it attribute} values.

294: Each attribute may be {\it discrete} (i.e. its values come from a finite set)

295: or {\it continuous} (i.e. the possible values form the real line).

296: Each case can be assigned a {\it class},

297: which may also be discrete (e.g. good, bad) or continuous (e.g. temperature).

298:

299: Each leaf in the decision tree is a class,

300: and each internal node branches out based on the outcome of a test on an

301: attribute's value.

302: The tree is built from cases with known classification,

303: and a test case can then be classified by traversing the tree from root to

304: leaf, along a path determined by the test outcomes.

305:

306: For the $k$-server problem, the request distribution and distance function

307: induce patterns in the optimum decisions,

308: and MOO tries to extract these patterns for use in online assignment.

309: Specifically, we look for patterns that relate an assignment to the arriving

310: request and the configuration it sees.

311: Hence, the class specifies which node to move the server from,

312: and the classification is based on $n+1$ attributes in a case,

313: where one attribute specifies the arriving request and the other $n$ attributes

314: specify whether a node has a server;

315: the class and attributes are considered discrete.

316:

317: (A possible alternative is to name the $k$ servers,

318: have the class specify the server, and

319: use $k$ attributes to specify the location of the servers.

320: With this $(k+1)$-tuple formulation of a case, however,

321: the classifier considers

322: ``server $A$ at node 1 and server $B$ at node 2''

323: to be different from

324: ``server $A$ at node 2 and server $B$ at node 1''.

325: This differentiation of servers is not appropriate for the $k$-server problem,

326: unless the cost model is changed to, say, let servers charge different costs

327: for movement.

328: It is also not appropriate to declare the class and attributes as continuous,

329: unless we are considering nodes on a line with a linear distance function.)

330:

331: In our application of MOO, Step 2 uses network flow to solve for the

332: offline optimum [CKPV];

333: in Step 3, this optimum is scanned to produce a file of cases,

334: one for each request;

335: Step 4 then uses C4.5 to build a decision tree with these training cases.

336: For a test stream, this tree is used to classify each arriving request.

337: This classification may be invalid, in that the tree may decide

338: to move a server from a node that has no server;

339: in this case, the server at $j$ with minimum $d(j,i)$ is chosen,

340: i.e. use a greedy strategy.

341: (If $d$ is unknown, MOO can choose a random server, say.)

342:

343:

344: \subsection{Distance function}

345:

346: We choose the distance functions to test MOO's applicability for different

347: neighborhood structures and distance properties.

348: We start with $1,2,\ldots,n$ as nodes

349: and $d(x,x^\prime)$ given by $|x-x^\prime|$, $(x-x^\prime)^2$

350: and $|x-x^\prime|x^\prime$ ---

351: only $|x-x\p|$ satisfies the triangular inequality,

352: and $|x-x\p|x\p$ is not symmetric.

353: We also consider $n$ nodes on a square grid with integer coordinates,

354: with $d((x,y),(x^\prime,y^\prime))$ given by

355: $|x-x^\prime|+|y-y^\prime|$ and

356: $|x-x^\prime|x^\prime+|y-y^\prime|y^\prime$.

357:

358:

359: \subsection{Request generation}

360:

361: The training and test streams are generated with transition matrices

362: in which an entry $p_{ij}$ is the probability that a request is for node $j$

363: given that the previous request was for node $i$.

364: The fraction of nonzero entries is 10--20\% for a {\it sparse} matrix and

365: 80--90\% for a {\it dense} matrix.

366: We use these matrices to generate a stream in two ways:

367:

368: \setlength{\parindent}{10pt}

369:

370: \iitem{$\bullet$}

371: A {\it 1-matrix} stream is generated with a single matrix.

372: This is similar to Karlin et al's markov paging,

373: or a random walk on Borodin et al's access graph [KPR, BIRS].

374:

375: \iitem{$\bullet$}

376: A {\it 2-matrix} stream is generated alternately with two matrices:

377: $L$ requests are generated with one matrix,

378: followed by $L$ requests from the other matrix;

379: at the switchover, if the last request from one matrix is $i$,

380: then $p_{ij}$ from the other matrix is used to generate the next request.

381: This gives a nonhomogeneous markov chain that is a random walk on two graphs,

382: in contrast to the simultaneous walks used by Fiat et al [FK, FM].

383: In this paper, we arbitrarily fix $L$ to be 10.

384: The purpose of using a 2-matrix stream is to see how MOO reacts to a mixture

385: of request patterns.

386:

387: \setlength{\parindent}{25pt}

388:

389: \noindent

390: An example of a matrix and a stream that it generates

391: are given in the Appendix.

392:

393: \begin{figure}[tbp]

394:

395: \noindent

396: $k=5$ servers, $n=9$ nodes on a line,

397: distance function $(x-x^\prime)^2$ \hfill\break

398: 1-matrix (sparse) stream, training length 2000, test length 2000

399:

400: \vbox{

401: \def\tablerule{\noalign{\hrule}}

402: %\hrule

403: \halign

404: {&\vrule#& \strut\ #

405:  &\vrule#& \ #\

406:  &\vrule#& \ #\

407:  &\vrule#& \ #\

408:  &\vrule#& \ #\

409:  &\vrule#& \ #\

410:  &\vrule#& \ #\

411:  &\vrule#\cr\tablerule

412: & && optimum	&& \multispan7 $\underline{\rm \phantom{XXXXXXXXXXXX}

413: 		competitive\ ratio \phantom{XXXXXXXXXXX}}$ && invalid &\cr

414: %& &&	        && \multispan7\leaders\hrule\hfil &&        &\cr

415: & && cost  && MOO  && Greedy && Balance && Harmonic && assignment &\cr\tablerule

416: & $S_1$ && 402/408/\underbar{381}   && 1.00/1.01/\underbar{1.00}

417: 	&& 1.36/1.75/\underbar{2.13} && 2.29/2.31/\underbar{2.30}

418: 	&& 5.58/5.22/\underbar{4.93} && 0/0/\underbar{0} &\cr\tablerule

419: & $S_2$ && 90/113/104  && 1.09/3.04/1.30	&& 4.93/4.76/1.62

420: 	&& 3.00/1.98/1.92	&& 5.21/5.83/5.40 && 13/101/2 &\cr

421: }

422: \hrule}

423: \noindent

424: $S_1$ and $S_2$ are different matrices.

425: A triple $x/y/z$ for row $S_i$ gives the results from three task streams

426: generated with $S_i$.

427: For MOO, the first stream is used as the training stream,

428: and all three are used as test streams;

429: $x$ is the result for the training stream used as test stream

430: (this is why we have the same length for training and test streams).\hfill\break

431: The underlined numbers are results for one run (i.e. one task stream) of $S_1$.

432: The competitive ratio is cost incurred by an algorithm for a run divided

433: by the optimum cost for that run.

434: The last column reports the number of times the MOO classifier makes an

435: invalid server assignment.

436:

437: \vglue 5pt

438: \centerline{{\bf Table 1}\quad

439: 		For strong patterns, MOO can be close to the optimum.}

440: \vglue 5pt

441: \hrule

442: \vglue -5 pt

443: \end{figure}

444:

445:

446: \section{Experimental results}

447:

448: There are several variables in our experimental setup:

449: $k$, $n$, line/grid, distance, sparse/dense, pattern mixture,

450: starting configuration and stream length.

451: The stream length $s$ is the most crucial because the

452: offline optimum has complexity $O(ks^2)$ ---

453: on a 167MHz UltraSPARC, it can take 7 minutes for $s=2000$

454: and 1 hour for $s=2500$.

455: The time complexity is compounded by the large memory required

456: to store the network for finding the optimum

457: --- we have only one machine with sufficient main memory.

458:

459: If we choose $s$ large enough for the optimum and heuristics

460: to all reach steady state, the time commitment would be overwhelming.

461: Instead, in most cases, we set $s$ just large enough

462: that conclusions can already be drawn,

463: despite significant statistical variations for any particular solution.

464: (This is similar to analysis of variance in statistics,

465: where one can separate the means of two variables if the variation of each

466: is ``smaller'' than the separation.)

467:

468: With the bottleneck of one workstation generating the results,

469: we have chosen a small number of experiments that cut through the myriad

470: possible combinations of variables.

471: We concede that the data may be insufficient to support some of our conclusions,

472: so these should be regarded as tentative insight

473: rather than authoritative conclusions.

474:

475:

476: \subsection{Nodes on a line}

477:

478: Table 1 presents an experiment with a strong pattern in the stream of requests

479: coming to 5 servers for 9 nodes on a line,

480: with a $d$ that violates triangular inequality.

481: After 2000 requests, the fluctuations are small enough for us to draw

482: some conclusions.

483:

484: First, the average optimum cost per request is less than 1,

485: and this is because most requests are for a node that already has a server.

486: Second, the competitive ratios for a fixed request distribution can be

487: significantly smaller than the $k$-server bound [MMS];

488: this is similar to previous observations [BaE, FR].

489: Third, MOO can achieve the optimum ---

490: the sparse matrix induces a strong pattern in the offline optimum solution,

491: and this pattern is captured in the decision tree used by MOO.

492:

493: The starting configuration used in the three runs are the same for $S_1$,

494: but different for $S_2$.

495: The results for $S_2$ show that the configuration can have a strong effect

496: --- the heuristics' performance ordering and competitive ratios

497: both become erratic.

498: In contrast, the ordering for the three runs of $S_1$ are the same,

499: and the ratios are reasonably stable except for Greedy,

500: which is sensitive to the stream instance.

501: To factor in the effect of the starting configuration,

502: this configuration is henceforth changed from run to run,

503: unless otherwise stated.

504:

505: Despite the erratic results for $S_2$ and the fact that MOO uses a greedy

506: strategy whenever the classifier makes an invalid assignment,

507: MOO has a significantly smaller ratio that Greedy,

508: thus showing the contribution from data mining.

509: A check shows that the trees are small but unintuitive

510: --- an example is given in the Appendix ---

511: since they imitate the offline optimum (which ``sees'' future requests).

512:

513:

514: \begin{figure}[tbp]

515: \noindent

516: $k=5$ servers, $n=9$ nodes on a line,

517: distance function $(x-x^\prime)^2$ \hfill\break

518: 1-matrix (dense) stream, training length 2000, test length 2000

519:

520: \vbox{

521: \def\tablerule{\noalign{\hrule}}

522: %\hrule

523: \halign

524: {&\vrule#& \strut\ #

525:  &\vrule#& \ #\

526:  &\vrule#& \ #\

527:  &\vrule#& \ #\

528:  &\vrule#& \ #\

529:  &\vrule#& \ #\

530:  &\vrule#& \ #\

531:  &\vrule#\cr\tablerule

532: & && optimum	&& \multispan7 $\underline{\rm \phantom{XXXXXXXXXXXX}

533: 		competitive\ ratio \phantom{XXXXXXXXXXX}}$ && invalid &\cr

534: %& &&	        && \multispan7\leaders\hrule\hfil &&        &\cr

535: & && cost  && MOO  && Greedy && Balance && Harmonic && assignment &\cr\tablerule

536: & $D_1$ && 715/687/728	&& 1.16/1.21/1.20	&& 1.28/1.27/2.09

537: 	&& 1.72/1.85/1.66	&& 4.24/4.40/4.26 && 1/1/0 &\cr\tablerule

538: & $D_2$ && 684/692/732  && 1.19/1.22/1.18	&& 1.94/1.44/1.29

539: 	&& 1.72/1.88/1.87	&& 3.71/4.70/4.37 && 1/10/0 &\cr

540: }

541: \hrule}

542:

543: \vglue 5pt

544: \centerline{{\bf Table 2}\quad  For weak patterns, MOO is best.}

545: \vglue 5pt

546: \hrule

547: \vglue -5pt

548: \end{figure}

549:

550: In Table 1, MOO can get close to the optimum because the patterns are strong.

551: For a dense matrix, the pattern is much weaker.

552: Nonetheless, Table 2 shows that MOO has the smallest ratio,

553: and the invalid assignments are surprisingly few.

554: Further,

555: the difference in starting configurations between the training and test

556: streams does not have a big effect on MOO's results,

557: in contrast to the results for a strong pattern

558: (recall: the starting configurations in Table 1

559: are the same for 1.00/1.01/1.00 and different for 1.09/3.04/1.30).

560:

561: The number of potential cases for the classifier is $n {n\choose k}$,

562: which is 1134 and comparable to the training length (2000) for Table 2.

563: Even so, the performance ordering and ratios are reasonably stable,

564: except for Greedy;

565: when we tested the heuristics again with the runs using the same

566: starting configuration,

567: fluctuation in Greedy's ratios narrowed down considerably,

568: thus indicating that Greedy remains sensitive to the starting configuration

569: for weak patterns.

570: The decision trees, though bigger than the two for Table 1, remain small:

571: the tree for $D_1$ is 3Kbytes

572: and has only 27 decision nodes.

573:

574:

575:

576: \begin{figure}[tbp]

577: \noindent

578: \vbox{\tabskip=1em

579: \halign{& #\hfil\cr

580: \epsfbox{figure1.eps} & \epsfbox{figure2.eps} \cr

581: \qquad $n=9$ nodes, distance $|x-x^\prime|$ &

582: \qquad $k=5$ servers, distance $|x-x^\prime|x^\prime$ \cr

583: \qquad 2-matrix (sparse-dense) stream &

584: \qquad 2-matrix (dense-dense) stream \cr

585: \qquad stream length 2000 &

586: \qquad stream length varies with $n$ \cr

587: \qquad H is for Harmonic, B for Balance, &

588: \qquad at $n=6$, H is 9.6 and G is 10.6\cr

589: \qquad G for Greedy, M for MOO &

590: \qquad \hglue 1 true cm  \cr

591: {\bf Figure 1}\quad MOO fits into the gap &

592:             {\bf Figure 2}\quad MOO stays close to optimum \cr

593: \hglue 1.8 true cm   between Greedy and optimum. &

594: \hglue 1.9 true cm   for all $n$. \cr

595: }}

596: \vskip 5pt

597: \hrule

598: \vskip -5pt

599: \end{figure}

600:

601:

602: All heuristics are trivially optimum if $k=1$,

603: but the gap between existing heuristics and the optimum should open up

604: as $k$ increases;

605: to prove its worth, MOO must fit into this gap.

606:

607: In Figure 1 (and the following graphs),

608: each data point is the average of 6 runs.

609: It shows that, for a 2-matrix stream and distance $|x-x^\prime|$,

610: the gap between Greedy and optimum opens up at $k=5$ for $n=9$,

611: and MOO does fit into the gap.

612: At $k=5$ for $|x-x^\prime|$, the difference between MOO and Greedy is

613: negligible (if we consider the average ratio over 6 runs;

614: Greedy's ratio is smaller in some runs and MOO's smaller in others).

615: In contrast, Tables 1 and 2 show that MOO's ratios are noticeably smaller

616: than Greedy's at $k=5$ for $(x-x^\prime)^2$,

617: which penalizes large movements.

618: The gaps among the heuristics open further at $k=5$ and $n=9$ for $|x-x\p|x\p$

619: in Figure 2.

620:

621: The alternation between strong and weak patterns does not affect

622: MOO's ability to outperform the other heuristics in Figure 1,

623: and Figure 2 shows this remains so for alternating between two weak patterns.

624: In fact, unlike Harmonic and Balance,

625: MOO stays close to the optimum as $n$ scales up,

626: thus demonstrating again its ability to learn from the optimum solution.

627:

628: For an asymmetrical and punitive $|x-x\p|x\p$,

629: the ``right'' server placement is important to being close to optimum

630: for small $n$,

631: so Greedy's simplistic strategy does poorly there.

632: For large $n$, even the optimum has its servers spread out,

633: and the violation of the triangular inequality favors incremental

634: server movements,

635: thus making it possible for Greedy to get close to the optimum.

636:

637:

638: \begin{figure}[tbp]

639: \noindent

640: \vbox{\tabskip=1em

641: \halign{& #\hfil\cr

642: \epsfbox{figure3.eps} & \epsfbox{figure4.eps} \cr

643: \qquad $n=9$, distance $|x-x^\prime|+|y-y\p|$ &

644: \qquad $k=5$, distance $|x-x^\prime|x^\prime+|y-y\p|y\p$ \cr

645: \qquad same stream and starting configuration &

646: \qquad same stream and starting configuration \cr

647: \qquad as Figure 1 &

648: \qquad as Figure 2 \cr

649: {\bf Figure 3}\quad For a grid, &

650:             {\bf Figure 4}\quad For a grid, \cr

651: \hglue 1.8 true cm   MOO still fits in the gap. &

652: \hglue 1.9 true cm   MOO still stays close to optimum. \cr

653: }}

654: \vskip 5pt

655: \hrule

656: \vskip -5pt

657: \end{figure}

658:

659:

660: \subsection{Nodes on a grid}

661:

662: Intuitively, a heuristic should incur lower costs if nodes have more neighbors,

663: but its ratio can increase because

664: the optimum may make better use of the neighbors in reducing its cost.

665:

666: Figure 3 shows the results of repeating the runs for Figure 1

667: --- same starting configurations and request streams ---

668: on a grid (instead of a line).

669: Harmonic does perform better,

670: but the effect on the ratios for Balance and Greedy is mixed.

671: A check (of the detailed data) shows that, contrary to our intuition,

672: their costs are sometimes higher for the grid.

673: It appears that the increase in the number of neighbors

674: also leads Balance and Greedy to make short-sighted moves that

675: raise costs eventually.

676: In any case, MOO remains in the gap between Greedy and optimum

677: when $k$ increases.

678:

679: Similar results hold when $n$ is varied.

680: Comparing Figures 2 and 4,

681: we see that the ratios for a grid are noticeably smaller for Harmonic

682: but larger for Greedy.

683: A check shows that costs are lower (often by an order of magnitude),

684: so all solutions benefit from having more neighbors

685: when $d$ is $|x-x\p|x\p+|y-y\p|y\p$.

686: However, the spreading-out effect that allows Greedy to get close to

687: the optimum in Figure 2 is less for a grid,

688: so Greedy is further from the optimum in Figure 4.

689: Again, we see the gap among the heuristics opening up at $k=5$ and $n=9$

690: when $d$ changes from $|x-x\p|+|y-y\p|$ to $|x-x\p|x\p+|y-y\p|y\p$.

691:

692: MOO, on the other hand, stays close to optimum, like in Figure 2.

693: The detailed data show that there are at most 2 invalid assignments

694: (that are resolved greedily) at $n=9$ and less than $12\%$ such assignments

695: at $n=25$; hence, MOO relies mostly on the decision tree,

696: which has successfully captured the optimum solution

697: even though the requests are a mixture of two weak patterns.

698:

699:

700:

701: \section{Conclusion}

702:

703: \subsection{Summary}

704:

705: We now summarize our observations:

706:

707: \iitem{$\bullet$}

708: MOO fits into the gap between the offline optimum and other online heuristics

709: (Figures 1--4).

710: For a strong pattern, MOO can be close to optimum,

711: but may lose to other heuristics because of sensitivity to the starting

712: configuration (Table 1).

713: MOO does well even if the requests have a weak pattern (Table 2)

714: or alternate between patterns (Figures 1--4).

715:

716: \iitem{$\bullet$}

717: MOO outperforms the other heuristics even if the distances are asymmetric

718: (Figures 2 and 4) or violate the triangular inequality (Tables 1 and 2).

719: Increasing the number of neighbors can increase costs,

720: but MOO's ratios remain stable (Figures 1 and 3, 2 and 4).

721:

722: \iitem{$\bullet$}

723: MOO stays close to the optimum as $n$ varies (Figures 2 and 4).

724:

725: \iitem{$\bullet$}

726: The classifier can get an effective decision tree even for

727: relatively short stream lengths, the trees are small and the mining (Step 4)

728: is fast (sub-second).

729:

730:

731:

732:

733:

734: \subsection{Challenging issues}

735:

736: MOO poses some challenging problems for

737: this new application of data mining:

738:

739:

740: \iitem{$\bullet$}

741: How to analyze the competitive ratios produced with data mining?

742:

743: \iitem{$\bullet$}

744: For the $k$-server problem,

745: why does MOO perform well for weak patterns and short training

746: streams?

747: (For the buffer replacement problem, mining can produce good results

748: even if the requests are a mixture of 100 patterns [TTL].)

749:

750: \iitem{$\bullet$}

751: What sort of data mining would be appropriate for

752: web caching, video-on-demand, etc.?

753:

754:

755:

756:

757: \vskip25pt\noindent

758: {\bf Acknowledgment}

759:

760: \noindent

761: Many thanks to C.P. Teo for his help with network flow

762: and Hongjun Lu for his comments.

763:

764:

765: \subsection{References}

766:

767: %\setlength{\parskip}{4pt}

768: %\parskip=4pt

769: %\baselineskip=11pt

770: \setlength{\parindent}{2.0 cm}

771:

772: \iitem{[A+]}

773: R. Agrawal, M. Mehta, J. Shafer, R. Srikant, A. Arning and T. Bollinger,

774: {\it The Quest data mining system},

775: Proc. KDD, Portland, OR (Aug. 1996), 244--249.

776:

777: \iitem{[AAFPW]}

778: J. Aspnes, Y. Azar, A. Fiat, S. Plotkin and O. Waarts,

779: {\it On-line load balancing with applications to machine scheduling and

780: virtual circuit routing},

781: Proc. STOC, San Diego, CA (May 1993), 623--630.

782:

783: \iitem{[BB]}

784: A. Blum and C. Burch,

785: {\it On-line learning and the metrical task system problem},

786: Proc. COLT, Nashville, TN (July 1997), 45--53.

787:

788: \iitem{[BaE]}

789: R. Bachrach and R. El-Yaniv,

790: {\it Online list accessing algorithms and their applications: recent

791: empirical evidence},

792: Proc. SODA, New Orleans, LA (Jan. 97), 53--62.

793:

794: \iitem{[BoE]}

795: A. Borodin and R. El-Yaniv,

796: {\sl Online Computation and Competitive Analysis},

797: Cambridge University Press, Cambridge, UK (1998).

798:

799: \iitem{[BIRS]}

800: A. Borodin, S. Irani, P. Raghavan and B. Schieber,

801: {\it Competitive paging with locality of reference},

802: Proc. STOC, New Orleans, LA (May 1991), 249--259.

803:

804: \iitem{[CDRS]}

805: D. Coppersmith, P. Doyle, P. Raghavan and M. Snir,

806: {\it Random walks on weighted graphs and applications to on-line algorithms},

807: J. ACM 40, 3 (July 1993), 421--453.

808:

809: \iitem{[CKPV]}

810: M. Chrobak, H. Karloff, T. Payne and S. Vishwanathan,

811: {\it New results on server problems},

812: SIAM J. Disc. Math. 4, 2(May 1991), 172--181.

813:

814: \iitem{[CL]}

815: M. Chrobak and L.L. Larmore,

816: {\it An optimal on-line algorithm for $k$-servers on trees},

817: SIAM J. Computing 20, 1(1991), 144--148.

818:

819: \iitem{[FK]}

820: A. Fiat and A.R. Karlin,

821: {\it Randomized and multipointer paging with locality of reference},

822: Proc. STOC, Las Vegas, NV (May 1995), 626--634.

823:

824: \iitem{[FKLMSY]}

825: A. Fiat, R.M. Karp, M. Luby, L.A. McGoech, D.D. Sleator and N.E. Young,

826: {\it Competitive paging algorithms},

827: J. Algorithms 12, 4(Dec. 1991), 685--699.

828:

829: \iitem{[FLTT]}

830: L. Feng, H. Lu, Y.C. Tay and K.H. Tung,

831: {\it Buffer management in distributed database systems:

832: A data mining approach},

833: Proc. EDBT, Valencia, Spain (Apr. 1998), 246--260.

834:

835: \iitem{[FM]}

836: A. Fiat and M. Mendel,

837: {\it Truly online paging with locality of reference},

838: Proc. FOCS, Miami Beach, FL (Oct. 1997), 326--335.

839:

840: \iitem{[FR]}

841: A. Fiat and Z. Rosen,

842: {\it Experimental studies of access graph based heuristics:

843: beating the LRU standard?},

844: Proc. SODA, New Orleans, LA (Jan. 1997), 63--72.

845:

846: \iitem{[FRR]}

847: A. Fiat, Y. Rabani and Y. Ravid,

848: {\it Competitive $k$-server algorithms},

849: Proc. FOCS, St. Louis, MO (Oct. 1990), 454--463.

850:

851: \iitem{[H+]}

852: J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu,

853: A. Rajan, N. Stefanovic, B. Xia and O.R. Zaiane,

854: {\it DBMiner: A system for mining knowledge in large relational databases},

855: Proc. KDD, Portland, OR (Aug. 1996), 250--255.

856:

857: \iitem{[KMMO]}

858: A.R. Karlin, M.S. Manasse, L.A. McGeoch and S. Owicki,

859: {\it Competitive randomized algorithms for non-uniform problems},

860: Proc. SODA, San Francisco, CA (Jan. 1990), 301--309.

861:

862: \iitem{[KMRS]}

863: A.R. Karlin, M.S. Manasse, L. Rudolph and D.D. Sleator,

864: {\it Competitive snoopy caching},

865: Algorithmica 3, 1(1988), 79--119.

866:

867: \iitem{[KP]}

868: E. Koutsoupias and C. Papadimitriou,

869: {\it On the $k$-server conjecture},

870: Proc. STOC, Montreal, Canada (May 1994), 507--511.

871:

872: \iitem{[KPR]}

873: A.R. Karlin, S.J. Phillips and P. Raghavan,

874: {\it Markov paging},

875: Proc. FOCS, Pittsburgh, PA (Oct. 1992), 208--217.

876:

877: \iitem{[MMS]}

878: M.S. Manasse, L.A. McGeoch and D.D. Sleator,

879: {\it Competitive algorithms for on-line problems},

880: Proc. STOC, Chicago, IL (May 1988), 322--333.

881:

882: \iitem{[MS]}

883: L.A. McGeoch and D.D. Sleator,

884: {\it A strongly competitive randomized paging algorithm},

885: Algorithmica 6, 6(1991), 816--825.

886:

887: \iitem{[Q]}

888: J.R. Quinlan,

889: {\sl C4.5: Programs for Machine Learning},

890: Morgan Kaufman, San Mateo, CA (1993).

891:

892: \iitem{[RS]}

893: P. Raghavan and M. Snir,

894: {\it Memory versus randomization in on-line algorithms},

895: Proc. ICALP, Stresa, Italy (July 1989), 687--703.

896:

897: \iitem{[ST]}

898: D.D. Sleator and R.E. Tarjan,

899: {\it Amortized efficiency of list update and paging rules},

900: C. ACM 28, 2(Feb. 1985), 202--208.

901:

902: \iitem{[T]}

903: K.H. Tung,

904: {\it Parking in a Marina},

905: Honors Year Project Report, DISCS, National University of Singapore (1997).

906:

907: \iitem{[TTL]}

908: K.H. Tung, Y.C. Tay and H. Lu,

909: {\it BROOM: Buffer replacement using online optimization by mining},

910: Proc. CIKM, Bethesda, MD (Nov. 1998), 185--192.

911:

912: \iitem{[Y]}

913: N. Young,

914: {\it On-line file caching},

915: Proc. SODA, San Francisco, CA (Jan. 1998), 82--86.

916:

917: \newpage

918: \section{Appendix}

919:

920: $$\bordermatrix{

921:   &   0   &  1    & 2    & 3    & 4    & 5    & 6    & 7    & 8    \cr

922: 0 &  0.00 &  0.45 & 0.00 & 0.55 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr

923: 1 &  0.00 &  0.00 & 0.00 & 0.58 & 0.00 & 0.00 & 0.00 & 0.00 & 0.42 \cr

924: 2 &  0.31 &  0.69 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr

925: 3 &  0.00 &  0.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00 & 0.00 & 0.00 \cr

926: 4 &  1.00 &  0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr

927: 5 &  0.00 &  0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.02 & 0.98 \cr

928: 6 &  1.00 &  0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \cr

929: 7 &  0.00 &  0.00 & 0.35 & 0.00 & 0.62 & 0.03 & 0.00 & 0.00 & 0.00 \cr

930: 8 &  0.00 &  0.47 & 0.00 & 0.00 & 0.00 & 0.00 & 0.53 & 0.00 & 0.00 \cr}$$

931:

932: \centerline{{\bf Figure A.1}\quad Sparse matrix $S_1$ of Table 1.}

933:

934: \vglue 10pt

935:

936: \noindent

937: 1 8 6 0 1 3 5 8 6 0 3 5 8 1 3 5 8 6 0 1 3 5 8 1 3 5 7 2 1 3 5 8 1 3 5 8 6 0 1 8 6 0 1 8 1 3 5 8 1 3 5 8 1

938:

939: \vglue 5pt

940: \centerline{{\bf Figure A.2}\quad $S_1$ generates a strong pattern.}

941:

942: \vglue 10pt

943:

944: \vbox{\tabskip=1em

945: \halign{& #\hfil\cr

946: Request from = 2: 3 	  &\cr

947: Request from = 4: 5 	  &\cr

948: Request from = 7: 8 	  &\cr

949: Request from = 0:         &\cr

950: $|$\quad   Node 0 status = 0: 1  &

951: 	{\tt // this tree has depth 1 only }\cr

952: $|$\quad   Node 0 status = 1: 0  &

953: 	{\tt // weaker patterns induce deeper trees }\cr

954: Request from = 1:         &\cr

955: $|$\quad   Node 0 status = 0: 1  &\cr

956: $|$\quad   Node 0 status = 1: 0  &

957: 	{\tt // how to read C4.5's decision tree:}\cr

958: Request from = 3:         &

959: 	{\tt // if the request is for node 3 }\cr

960: $|$\quad   Node 2 status = 0: 3  &

961: 	{\tt // then (a) if no server is at 2, then use server at 3}\cr

962: $|$\quad   Node 2 status = 1: 2  &

963: 	{\tt // \qquad \ (b) else move the server from 2 }\cr

964: Request from = 5:         &

965: 	{\tt // note: the tree is used only if no server }\cr

966: $|$\quad   Node 5 status = 0: 4  &

967: 	{\tt //\qquad\qquad is at the requested node}\cr

968: $|$\quad   Node 5 status = 1: 5  &

969: 	{\tt //\qquad\qquad so (a) is an invalid assignment }\cr

970: Request from = 6:         &

971: 	{\tt //\qquad\qquad and (b) will not put two servers at 3 }\cr

972: $|$\quad   Node 6 status = 0: 5  &\cr

973: $|$\quad   Node 6 status = 1: 6  &\cr

974: Request from = 8:         &

975: 	{\tt // this tree always assigns a server from a neighboring node }\cr

976: $|$\quad   Node 8 status = 0: 7  &

977: 	{\tt // in agreement with $d$ in Table 1}\cr

978: $|$\quad   Node 8 status = 1: 8  &

979: 	{\tt // which favors incremental movements }\cr

980: }}

981:

982: \vglue 5pt\noindent

983: Note that C4.5 (appropriately) selects the request to be the root.

984: However, the rest of the tree is unintuitive,

985: since the tree is mined from an offline optimum that ``sees'' future requests.

986:

987: \vglue 5pt

988: \centerline{{\bf Figure A.3}\quad

989: Decision tree from an optimum solution for a sequence generated with $S_1$.}

990:

991: \end{document}

992: