0202:cs0202008/cup.tex

1: \documentclass[10pt,twocolumn]{article}

2: \usepackage[dvips]{graphicx}

3: \usepackage{myusenix}

4:

5: \usepackage{amsmath}

6: \begin{document}

7:

8: \renewcommand\dbltopfraction{.95}

9: \renewcommand\textfraction{.05}

10:

11:

12: \title{CUP: Controlled Update Propagation in Peer-to-Peer Networks}

13:

14:

15: \author{

16: Mema Roussopoulos \hspace{0.75cm} Mary Baker\\

17: {\em Department of Computer Science} \\

18: {\em Stanford University}\\

19: {\em Stanford, California, 94305}\\

20: \\

21: \normalsize \{mema, mgbaker\}@cs.stanford.edu \\

22:  http://mosquitonet.stanford.edu/} % end author

23:

24: \date{February 1, 2002}

25:

26:

27: \maketitle

28:

29: \begin{abstract}

30:

31: \small Recently the problem of indexing and locating content in

32: peer-to-peer networks has received much attention.  Previous work

33: suggests caching index entries at intermediate nodes that lie on the

34: paths taken by search queries, but until now there has been little

35: focus on how to maintain these intermediate caches.  This paper

36: proposes CUP, a new comprehensive architecture for Controlled Update

37: Propagation in peer-to-peer networks.  CUP asynchronously builds

38: caches of index entries while answering search queries.  It then

39: propagates updates of index entries to maintain these caches.  Under

40: unfavorable conditions, when compared with standard caching based on

41: expiration times, CUP reduces the average miss latency by as much as a

42: factor of three.  Under favorable conditions, CUP can reduce the

43: average miss latency by more than a factor of ten.

44:

45: CUP refreshes intermediate caches, reduces query latency, and reduces

46: network load by coalescing bursts of queries for the same item.  CUP

47: controls and confines propagation to updates whose cost is likely to

48: be recovered by subsequent queries.  CUP gives peer-to-peer nodes the

49: flexibility to use their own incentive-based policies to determine

50: when to receive and when to propagate updates.  Finally, the small

51: propagation overhead incurred by CUP is more than compensated for by

52: its savings in cache misses.

53:

54: \end{abstract}

55:

56:

57:

58: % ------------------------------------------------------------------------

59: \section{Introduction}

60: % ------------------------------------------------------------------------

61:

62: Peer-to-peer systems are self-organizing distributed systems where

63: participating nodes both provide and receive services from each other

64: in a cooperative effort to prevent any one node or set of nodes from

65: being overloaded.  Peer-to-peer systems have recently gained much

66: attention, primarily because of the great number of features they

67: offer applications that are built on top of them.  These features

68: include: scalability, availability, fault tolerance, decentralized

69: administration, and anonymity.

70:

71: Along with these features has come an array of technical challenges.

72: In particular, over the past year, there has been much focus on the

73: fundamental indexing and routing problem inherent in all peer-to-peer

74: systems: Given the name of an object of interest, how do you locate

75: the object within the peer-to-peer network in a well-defined,

76: structured manner that avoids flooding the network \cite{ratnasamy01a,

77: rowstron01b, stoica01, zhao01}?

78:

79: As a performance enhancement, the designers of these systems suggest

80: caching index entries with expiration times at intermediate nodes that

81: lie on the path taken by a search query.  Intermediate caches are

82: desirable because they balance query load for an item across multiple

83: nodes, reduce latency, and alleviate hot spots. However, little

84: attention has been given to how to maintain these intermediate caches.

85: This problem is interesting because the peer-to-peer model assumes the

86: index will change constantly.  This constant change stems from several

87: factors: peer nodes continuously join and leave the network, content

88: is continuously added to and deleted from the network, and replicas of

89: existing content are continuously added to alleviate bandwidth

90: congestion at nodes holding the content.

91:

92: In this paper we propose a new comprehensive architecture for

93: Controlled Update Propagation (CUP) in peer-to-peer networks that

94: asynchronously builds caches of index entries while answering search

95: queries.  It then propagates updates of index entries to maintain

96: these caches.  The basic idea is that every node in the peer-to-peer

97: network maintains two logical channels per neighbor: a query channel

98: and an update channel.  The query channel is used to forward search

99: queries for items of interest to the neighbor that is closest to the

100: authority node for those items.  The update channel is used to forward

101: query responses (first-time updates) asynchronously to a neighbor and

102: to update index entries that are cached at the neighbor.

103:

104:

105: Queries for an item travel ``up'' the query channels of nodes along

106: the path toward the authority node for that item.  Updates travel

107: ``down'' the update channels along the reverse path taken by a query.

108: Figure~\ref{fig:logicalChannels} shows this process.

109:

110: \begin{figure}[tb]

111: \centerline{\includegraphics[height=7cm]{logChannel.eps}}

112: \caption[]{\small This figure shows how CUP works.  \(A_1\) and

113: \(A_2\) are authority nodes.  \(Q_{A_1}\), \(Q_{A_2}\), \(U_{A_1}\),

114: and \(U_{A_2}\) are the four logical channels between nodes \(N_1\)

115: and \(N_2\).  A query  arriving at node \(N_2\) for an item for which

116: \(A_1\) is the authority is pushed onto query channel \(Q_{A_1}\) to

117: \(N_1\).  If \(N_1\) has a cached entry for the item, it returns it through

118: \(U_{A_1}\).  Otherwise, it forwards the query  towards \(A_1\).

119: Any update originating from \(A_1\) flows

120: downstream to \(N_1\) which may forward it onto \(N_2\) through

121: \(U_{A_1}\).  The analogous process holds for queries

122: at \(N_1\) for items for which \(A_2\) is the authority.}

123: \label{fig:logicalChannels}

124: \end{figure}

125:

126:

127: The advantages of the query channel are twofold.  First, if a node

128: receives two or more queries for an item for which it does not have a

129: fresh response, the node pushes only one instance of the query for

130: that item up its query channel.

131: This approach can have

132: significant savings in traffic, because bursts of requests for an item

133: are coalesced into a single request.  Second, using a single query

134: channel solves the ``open connection'' problem suffered by some

135: peer-to-peer systems.  Each time a query arrives at a node which does

136: not have a cached response, the node opens one or more connections to

137: neighboring nodes and must maintain those connections open until the

138: response returns through them.  The asynchronous nature of the query

139: channel relieves nodes from having to maintain many open connections

140: since all responses return through the update channel.  Through simple

141: bookkeeping (setting an interest bit) the node registers the interest

142: of its neighbors so it knows which of its neighbors to push the query

143: response to when the answer arrives.

144:

145:

146: The cascaded propagation of updates from authority nodes down the

147: reverse paths of search queries has many advantages.  First, updates

148: extend the lifetime of cached entries allowing intermediate nodes to

149: continue serving queries from their caches without having to push

150: queries up their channels explicitly.  It has been shown that up to

151: fifty percent of content hits at caches are instances where the

152: content is valid but stale and therefore cannot be used to serve

153: queries without first being re-validated \cite{cohen01b}.  These occurrences are

154: called \emph{freshness misses}. Second, a node that

155: proactively pushes updates to interested neighbors reduces its load of

156: queries generated by those neighbors.  The cost of pushing the update

157: down is recovered  by the first query for the same item

158: following the update.

159: Third, the further down an update gets pushed, the shorter the

160: distance subsequent queries need to travel to reach a fresh cached

161: answer.  As a result, query response latency is reduced.  Finally,

162: updates can help prevent errors.  For example, an update to invalidate

163: an index entry prevents a node from answering queries using the entry

164: before it expires.

165:

166:

167: In CUP, nodes decide individually when to receive updates. A node only

168: receives updates for an item if the node has registered interest in

169: that item.  Furthermore, each node uses its own incentive-based policy to

170: determine when to cut off its incoming supply of updates for an

171: item.  This way the propagation of updates is controlled and does not

172: flood the network.

173:

174: Similarly, nodes decide individually when to propagate updates to

175: interested neighbors.  This is useful because a node may not always be

176: able or willing to forward updates to interested neighbors. In fact, a

177: node's ability or willingness to propagate updates may vary with its

178: workload.  CUP addresses this by introducing an adaptive mechanism

179: each node uses to regulate the rate of updates it propagates

180: downstream.  A salient feature of CUP is that even if a node's

181: capacity to push updates becomes zero, nodes dependent on the

182: node for updates fall back with no overhead to the case

183: of standard caching with expiration.

184:

185:

186: When compared with standard caching, under unfavorable conditions, CUP

187: reduces the average miss latency by as much as a factor of three.

188: Under favorable conditions, CUP reduces the average miss latency by

189: more than a factor of ten.  CUP overhead is more than compensated for

190: by its savings in cache misses.  In fact, the ``investment'' return

191: per update pushed in saved misses grows substantially with

192: increasing network size and query rates.  The cost of saved misses can

193: be one to two orders of magnitude higher than the cost of updates

194: pushed.

195:

196:

197: We demonstrate that the performance of CUP depends highly on the

198: policy a node uses to cut off its incoming updates. We find that the

199: cut-off policy should adapt to the node's query workload and we

200: present probabilistic and log based methods for doing so.  Finally, we

201: show that CUP continues to outperform standard caching even when

202: update propagation is reduced by either node capacity or network

203: conditions.

204:

205: The rest of the paper is organized as follows:

206: Section~\ref{Architecture} describes in detail the design of the CUP

207: architecture.  Section~\ref{Evaluation} describes the cost model we use to

208: evaluate CUP and presents experimental evidence of the benefits of CUP.

209: Section~\ref{RelatedWork} discusses related work and

210: Section~\ref{Conclusions} concludes the paper.

211:

212:

213:

214:

215: \section{CUP Architecture Design}

216: \label{Architecture}

217:

218: First, we provide some background terminology we use throughout the

219: paper and very briefly describe how peer-to-peer networks for which

220: CUP is appropriate perform their indexing and lookup operations.

221: Then we describe the components of the CUP protocol.

222:

223: \subsection{Background}

224:

225: The following terms will be useful for the remainder of the paper:

226:

227: \emph{Node}: This is a node in the peer-to-peer network.  Each node

228: periodically exchanges ``keep-alive'' messages with its neighbors to

229: confirm their existence and to trigger recovery mechanisms should one

230: of the neighbors fail.

231: Every node also maintains two logical channels (connections) for each neighbor:

232: the query channel and the update channel.  The query channel is used

233: by the node to push queries to its neighbor.  The update channel is

234: used by the node to push updates that are of interest to the neighbor.

235:

236:

237: \emph{Global Index}: The most important operation in a peer-to-peer

238: network is that of locating content.  As in \cite{ratnasamy01a} we

239: assume a hashing scheme that maps keys (names of content files or

240: keywords) onto a virtual coordinate space using a uniform hash

241: function that evenly distributes the keys to the space.  The

242: coordinate space serves as a global index that stores index entries

243: which are \emph{(key, value)} pairs.  The value in an index entry is a

244: pointer (typically an IP address) to the location of a replica that

245: stores the content file associated with the entry's key.  There can be

246: several index entries for the same key, one for each replica of the

247: content.

248:

249:

250: \emph{Authority Node}: Each node N in the peer-to-peer system is

251: dynamically allocated a subspace of the coordinate space (i.e., a

252: partition of the global index) and all index entries mapped into its

253: subspace are owned by N. We refer to N as the authority node of these

254: entries.  \emph{Replicas} of content whose key corresponds to an

255: authority node N send birth messages to N to announce they are willing

256: to serve the content.  Depending on the application supported,

257: replicas might periodically send refresh messages to indicate they are

258: still serving a piece of content.  They might also send deletion

259: messages that explicitly indicate they are no longer serving the

260: content.  These deletion messages trigger the authority node to delete

261: the corresponding index entry from its local index directory.

262:

263: \emph{Search Query}: A search query posted at a node N is a request to

264: locate a replica for key K.  The response to such a search query is a

265: set of index entries that point to replicas that serve the content

266: associated with K.

267:

268: \emph{Query Path for Key K}: This is the path a search query for key

269: \emph{K} takes.  Each hop on the query path is in the direction of

270: the authority node that owns \emph{K}.  If an intermediate node on this

271: path has fresh entries cached, the path ends at the intermediate

272: node; otherwise the path ends at the authority node.

273:

274: \emph{Reverse Query Path for Key K}: This path is the reverse of the

275: query path defined above.

276:

277:

278: \emph{Local index directory}: This is the subset of  global index

279: entries owned by a node.

280:

281: \emph{Cached index entries}: This is the set of index entries cached

282: by a node N in the process of passing up queries and propagating down

283: updates for keys for which N is not the authority.  The set of

284: cached index entries and the local index directory are disjoint sets.

285:

286: \emph{Lifetime of index entries}: We assume that each index entry

287: cached at a node has associated with it a lifetime and a timestamp

288: indicating the time at which the lifetime was set.  When the

289: difference between the current time and the timestamp is greater than

290: the lifetime field, the entry has expired and cannot be used to answer

291: queries.  An index entry is considered fresh until it expires.

292:

293:

294: \subsection{How Routing Works}

295:

296: We assume that anytime a node issues a query for key \emph{K}, the

297: query will be routed along a well-defined structured path with a

298: bounded number of hops from the querying node to the authority node

299: for \emph{K}.  The routing mechanism is designed so that each node on

300: the path hashes \emph{K} using the same hash function to deterministically choose

301: which of its neighbors will serve as the next hop.

302: Examples of peer-to-peer systems that provide this type of structured

303: location and routing mechanism include content-addressable networks

304: (CANs) \cite{ratnasamy01a}, Chord \cite{stoica01}, Pastry

305: \cite{rowstron01b} and Tapestry \cite{zhao01}.  CUP can be used in the

306: context of any of these systems.

307:

308:

309: \subsection{Node Bookkeeping}

310:

311: At each node, index entries are grouped together by key.  For each key

312: K, the node stores a flag that indicates whether the node is waiting

313: to receive an update for K for the first time and an interest bit

314: vector.  Each bit in the vector corresponds to a neighbor and is set

315: or clear depending on whether that neighbor is or is not interested in

316: receiving updates for K.

317:

318:

319: Each node tracks the popularity or request frequency of each non-local

320: key K for which it receives queries.  The popularity measure for a key

321: K can be the number of queries for K a node receives between arrivals

322: of consecutive updates for K or a rate of queries of a larger moving

323: window.  On an update arrival for K, a node uses its popularity

324: measure to re-evaluate whether it is beneficial to continue caching and

325: receiving updates for K.  We elaborate on this cut-off decision in

326: Section~\ref{CutOffPolicies}.

327:

328: Node bookkeeping in CUP involves no network overhead. With increasing

329: CPU speeds and memory sizes, this bookkeeping is negligible when we

330: consider the reduction in query latency achieved.

331:

332:

333: \subsection{Update Types}

334:

335: We classify updates into four categories: first-time updates, deletes,

336: refreshes, and appends.  Deletes, refreshes, and

337: appends originate from the replicas of a piece of content and are

338: directed toward the authority node that owns the index entries for

339: that content.

340:

341: First-time updates are query responses that travel down the reverse

342: query path.

343:

344: Deletes are directives to remove a cached index entry.  Deletes can be

345: triggered by two events:

346: 1) a replica sends a message indicating it no

347: longer serves a piece of content to the authority node that owns the

348: index entry pointing to that replica.

349: 2) the authority node notices a

350: replica has stopped sending ``keep-alive'' messages and assumes the

351: replica has failed.  In either case, the authority node deletes the

352: corresponding index entry from its local index directory and

353: propagates the delete to interested neighbors.

354:

355: Refreshes are keep-alive messages that extend the lifetimes of index

356: entries.

357: Refreshes that arrive at a cache do not result in errors as deletes

358: do, but help prevent freshness misses.

359:

360:

361:

362: Finally, appends are directives to add index entries for new replicas

363: of content.  These updates help alleviate the demand for content from the

364: existing set of replicas since they add to the number of replicas from

365: which clients can download content.

366:

367:

368:

369: % ------------------------------------------------------------------------

370: \subsection{Handling Queries}

371: \label{HowQueryingWorks}

372: % ------------------------------------------------------------------------

373:

374: Upon receipt of a query for a key \emph{K}, there are three basic

375: cases to consider.  In each of the cases, the node updates its

376: popularity measure for \emph{K}.  The node also sets the appropriate

377: bit in the interest bit vector for \emph{K} if the query originates

378: from a neighbor.  Otherwise, if the query is from a local client, the

379: node maintains the connection until it can return a fresh answer to the

380: client.

381: To simplify the protocol description we use the phrase

382: ``push the query'' to indicate that a node pushes a query upstream

383: toward the authority node.  We use the phrase ``push the update'' to

384: indicate that a node pushes an update downstream in the direction of

385: the reverse query path.

386:

387: \textbf{Case 1: Fresh Entries for key K are cached.}

388: The node uses its cached entries for \emph{K} to push the response

389: as a first-time update to the querying neighbor or local client.

390:

391: \textbf{Case 2: Key K is not in cache.}  The node adds \emph{K} to its

392: cache and marks it with a \emph{Pending-First-Update} flag.  The

393: purpose of the \emph{Pending-First-Update} flag is to coalesce bursts

394: of queries for the same key into one query.  A subsequent query for

395: \emph{K} from a neighbor or a local client will save the node from

396: pushing another instance of the query for \emph{K}.

397:

398:

399: \textbf{Case 3: All cached entries for key K have expired.}  The node

400: must obtain the fresh index entries for \emph{K}.  If the

401: \emph{Pending-First-Update} flag is set, the node does not need to

402: push the query; otherwise, the node sets the flag and pushes the

403: query.

404:

405: % ------------------------------------------------------------------------

406: \subsection{Handling Updates}

407: \label{UpdateArrivals}

408: % ------------------------------------------------------------------------

409:

410: A key feature of CUP is that a node does not forward an update for

411: \emph{K} to its neighbors unless those neighbors have registered

412: interest in \emph{K}.  Therefore, with some light bookkeeping, we

413: prevent unwanted updates from wasting network bandwidth.

414:

415: Upon receipt of an update for key \emph{K} there are three cases to

416: consider.

417:

418: \textbf{Case 1: Pending-First-Update flag is set.}  This means that

419: the update is a first-time update carrying a set of index entries in

420: response to a query.  The node stores the index entries in its cache,

421: clears the \emph{Pending-First-Update} flag, and pushes the update to

422: neighbors whose interest bits are set and to local client connections

423: open at the node.

424:

425:

426: \textbf{Case 2: Pending-First-Update flag is clear.}  If all the

427: interest bits for \emph{K} are clear, the node decides whether it

428: wants to continue receiving updates for \emph{K}.  The node bases its

429: decision on \emph{K}'s popularity measure.  Each node uses its own

430: policy for deciding whether the popularity of a key is high enough to

431: warrant receiving further updates for it.  If the node decides

432: \emph{K}'s popularity is too low, it pushes a \emph{Clear-Bit} control

433: message to the neighbor from whom it received the update.  The

434: \emph{Clear-Bit} message indicates that the neighbor's interest bit

435: for this node should be cleared.  Otherwise, if the popularity is high

436: or some interest bits are set, the node applies the update to its

437: cache and pushes the update to the neighbors whose bits are set.

438:

439: Note that a greedy or selfish node can choose not to push updates for

440: a key K to interested neighbors.  This forces downstream nodes to fall

441: back to standard caching for K.  However, by choosing to cut off

442: downstream propagation, a node runs the risk of receiving subsequent

443: queries from its neighbors.  Handling each of these queries is twice

444: the cost of propagating an update downward because the node has to

445: receive the query from the downstream neighbor and then push the

446: response as an update.  Therefore, although each node is free to stop

447: pushing updates at any time it is in its best interest to push updates

448: for which there are interested neighbors.

449:

450: \textbf{Case 3: Incoming update has expired.}  This could occur when

451: the network path has long delays and the update does not arrive in

452: time.  The node does not apply the update and does not push it downstream.

453:

454: \subsection{Handling Clear-Bit Messages}

455: \label{Clear-Bit-Messages}

456:

457: A \emph{Clear-Bit} control message is pushed by a node to indicate to

458: its neighbor that it is no longer interested in updates for a

459: particular key from that neighbor.

460:

461: When a node receives a \emph{Clear-Bit} message for key K, it clears

462: the interest bit for the neighbor from which the message was sent.  If

463: the node's popularity measure for K is low and all of its interest

464: bits are clear, the node also pushes a \emph{Clear-Bit} message for K.

465: This propagation of \emph{Clear-Bit} messages toward the authority

466: node for K continues until a node is reached where the popularity of K

467: is high or where at least one interest bit is set.

468:

469: \emph{Clear-Bit} messages can be piggy-backed onto queries or updates

470: intended for the neighbor, or if there are no pending queries or

471: updates, they can be pushed separately.

472:

473: \subsection{Adaptive Control of Update Push}

474: \label{ControllingUpdateArrivals}

475:

476: Ideally every node would propagate all updates to interested neighbors

477: to save itself from having to handle future downstream misses.

478: However, from time to time, nodes are likely to be limited in their

479: capacity to push updates downstream.  Therefore, we introduce an

480: adaptive control mechanism that a node uses to regulate the rate of

481: pushed updates.

482:

483: We assume each node N has a capacity U for pushing updates that varies

484: with N's workload, network bandwidth, and/or network connectivity.  N

485: divides U among its outgoing update channels such that each channel

486: gets a share that is proportional to the length of its queue.  This

487: allocation maintains the queues roughly equally sized.  The queues are

488: guaranteed to be bounded by the expiration times of the entries in the

489: queues.  So even if a node has its update channels completely shut

490: down for a long period, all entries will expire and be removed from

491: the queues.

492:

493:

494: Under a limited capacity and while updates are waiting in the queues,

495: each node can re-order the updates in its

496: outgoing update channels by pushing ahead updates that are likely to

497: have greater impact on query latency reduction, on query accuracy, or

498: on the load balancing of content demand across replicas.  During the

499: re-ordering any expired updates are eliminated.

500:

501: The strategy for re-ordering depends on the application.  For example,

502: in an application where query latency and accuracy are of the most

503: importance, one can push updates in the following order: first-time

504: updates, deletes, refreshes, and appends. In an application subject to

505: flash crowds that query for a particular item, appends might be given

506: higher priority over the other updates.  This would help distribute

507: the load faster across the entire set of replicas.

508:

509:

510: A node can also re-order refreshes and appends so that entries that

511: are closer to expiring are given higher priority.  Such entries are

512: more likely to cause freshness misses which in turn trigger a new query search.

513: So it is advantageous to try to catch this in time by pushing these first.

514:

515: \subsection{Node Arrivals and Departures}

516:

517: The peer-to-peer model assumes that participating nodes will

518: continuously join and leave the network.  CUP must be able to handle

519: both node arrivals and departures seamlessly.

520:

521: \textbf{Arrivals.}

522: When a new node N enters the peer-to-peer

523: network, it becomes the authority node for a portion of the index

524: entries owned by an existing node M.  N, M, and all surrounding

525: affected nodes (old neighbors of M) update the bookkeeping structures

526: they maintain for indexing and routing purposes.  To support CUP, the

527: issues at hand are updating the interest bit vectors of the affected

528: nodes and deciding what to do with the index entries stored at M.

529:

530: Depending on the indexing mechanism used, the cardinality of the bit

531: vectors of the affected nodes may change.  That is, bit vectors may

532: expand or contract as some nodes may now have more or fewer neighbors

533: than before N's arrival.  Since all nodes already need to track who

534: their neighbors are as part of the routing mechanism, updating the

535: cardinality of the interest bit vectors to reflect N's arrival is

536: straightforward.  For example, nodes that now have both N and M as

537: neighbors have to increase their bit vectors by one element to include

538: N.  The affected nodes also need to modify the mappings from bit ID to

539: neighbor IP address in their bit vectors.  For example, if a node that

540: previously had M as its neighbor now has N as its neighbor, the node

541: must make the bit ID that pointed to M now point to N.

542:

543: To deal with its stored index entries, M could simply not hand over

544: any of its entries to N.  This would cause entries at some of M's

545: previous neighbors to expire and subsequent queries from those nodes

546: will restart update propagations from N.  Alternatively, M could give

547: a copy of its stored index entries to N.  Both N and M would then go

548: through each entry and patch its bit vector.  This way nodes that

549: previously depended on M for updates of particular keys could continue

550: to receive updates from either M or N but not both.

551:

552: \textbf{Departures.}  Node departures can be either graceful (planned)

553: or ungraceful (due to sudden failure of a node).  In either case the

554: index mechanism in place dictates that a neighboring node M take over

555: the departing node N's portion of the global index.  To support CUP,

556: the interest bit vectors of all affected nodes must be patched to

557: reflect N's departure.

558:

559: If N leaves gracefully, N can choose not to hand over to M its index

560: entries.  Any entries at surrounding nodes that were dependent on N to

561: be updated will simply expire and subsequent queries will restart

562: update propagations.  Again, alternatively N may give M its set of

563: entries.  M must then merge its own set of index entries with N's, by

564: eliminating duplicate entries and patching the interest bit vectors as

565: necessary.  If N's departure is not planned, there can be no hand over

566: of entries and all entries in the affected neighboring nodes will

567: expire as in standard caching.

568:

569: Note that the transfer of entries can be coincided with the transfer

570: of information that is already occurring as part of the routing mechanism

571: in the peer-to-peer network, and therefore does not add extra network

572: overhead.  Also the bit vector patching is a local operation that affects

573: only each individual node.  Therefore even in cases where a node's

574: neighborhood changes often, the effect on the overall performance of

575: CUP is limited to that node's neighborhood (see section 3.7).

576:

577:

578:

579: \subsection{CUP Query/Update Trees}

580:

581: Figure~\ref{fig:CUPTrees} shows a snapshot of CUP in progress in a

582: network of seven nodes.  The left hand side of each node shows the set

583: of keys for which the node is the authority.  The right hand side

584: shows the set of keys for which the node has cached index entries as a

585: result of handling queries.  For example, node A owns

586: K3 and has cached entries for K1 and K5.

587:

588: For each key, the authority node that owns the key is the root of a

589: CUP tree.  The branches of the CUP tree are formed by the paths

590: traveled by queries from other nodes in the network.  For example, one

591: path in the tree rooted at A is \{F, D, C, A\}.

592:

593: Updates originate at the root (authority node) of a CUP tree and

594: travel downstream to interested nodes.  Queries travel upstream toward

595: the root.

596:

597: \begin{figure}[tb]

598: \centerline{\includegraphics[width=7cm, height=7cm]{7nodesA}}

599: \caption[]{\small CUP Trees}

600: \label{fig:CUPTrees}

601: \end{figure}

602:

603:

604: % ------------------------------------------------------------------------

605: \section{Evaluation}

606: \label{Evaluation}

607: % ------------------------------------------------------------------------

608:

609: The goal of CUP is to extend the benefits of standard caching based on

610: expiration times.  There are two key performance questions to address.

611: First, by how much does CUP reduce the average cost per query?

612: Second, how much overhead does CUP incur in providing this reduction?

613:

614: We first present the cost model based on economic incentive used by

615: each node to determine when to cut off the propagation of updates for a

616: particular key.  We give a simple analysis of how the cost per query

617: is reduced (or eliminated) through CUP.  We then describe our

618: experimental results comparing the performance of CUP with that of

619: standard caching.

620:

621: \subsection{Cost Model}

622: \label{CostModel}

623:

624: Consider an authority node A that owns key K and consider the tree

625: generated by issuing a query for K from every node in the peer-to-peer

626: network.  The resulting tree, rooted at A, is the \emph{Virtual Query

627: Spanning Tree} for K, V(A,K), and contains all possible query paths

628: for K.  The \emph{Real Query Tree} for K, R(A,K) is a subtree of

629: V(A,K) also rooted at A and contains all paths generated by real

630: queries.  The exact structure of R(A,K) depends on the actual workload

631: of queries for K.  The entire workload of queries for all keys results

632: in a collection of criss-crossing Real Query Trees with overlapping

633: branches.

634:

635: We first consider the case of standard caching at the intermediate

636: nodes along the query path for key K.  Consider a node N that is at

637: distance D from A in V(A,K).  We define the cost per query for K at N

638: as the number of hops in the peer-to-peer network that must be

639: traversed to return an answer to N.  When a query for K is posted at N

640: for the first time, it travels toward A looking for the response.  If

641: none of the nodes between N and A have a fresh response cached, the

642: cost of the query at N is \(2 D\): D hops to reach A and D hops for

643: the response to travel down the reverse query path as a first-time

644: update.  If there is a node on the query path with a fresh answer

645: cached, the cost is less than \(2 D \).  Subsequent queries for K at N

646: that occur within the lifetime of the entries now cached at N have a

647: cost of zero.  As a result, caching at intermediate nodes has the

648: benefits of balancing query load for K across multiple nodes and

649: lowering average latency per query.

650:

651: We can gauge the performance of CUP by calculating the percentage of

652: updates CUP propagates that are ``justified''.  We precisely define

653: what a justified update is below, but simply put, an update is

654: justified if it recovers the overhead it incurs, i.e., if its cost is

655: recovered by a subsequent query.  An unjustified update is therefore

656: overhead that is not recovered (wasted).  Updates for popular keys are

657: likely to be justified more often than updates for less popular keys.

658:

659: A refresh update is justified if a query arrives sometime

660: between the previous expiration of the cached entry and the new

661: expiration time supplied by the refresh update.  An append

662: update is justified if at least one query arrives between

663: the time the append is performed and the time of its expiration.

664:

665: A first-time update is always justified because it carries a query's

666: response toward the node that originally issues the query.  A deletion

667: update is considered justified if at least one query arrives between

668: the time the deletion is performed and the expiration time of the

669: entry to be deleted.

670:

671: For each update, let \(T\) be the critical time interval described

672: above during which a query must arrive in order for the update to be

673: justified.  (For first-time updates \(T = \infty\)).  Consider a node

674: N at distance D from A  in R(A,K).  An update propagated down to N is justified

675: if at least one query Q is posted within \(T\) time units at any of the

676: nodes of the virtual subtree V(N,K).  Note that an update  is justified

677: if Q arrives at the virtual tree V(N,K), \emph{not} the real query

678: tree R(N,K) because Q can be posted anywhere in V(N,K).

679:

680: Given the distribution of query arrivals at each node in the tree

681: V(N,K), we can find the probability that the update at N is justified

682: by calculating the probability that a query will arrive at some node

683: in V(N,K).  Assume that queries for K arrive at each node \(N_i\) in

684: V(N,K) according to a Poisson process with parameter \(\lambda_i\).

685: Then it follows that queries for K arrive at V(N,K) according to a

686: Poisson process with parameter \(\Lambda\) equal to the sum of all

687: \(\lambda's\).  Therefore, the probability that a query for K will

688: arrive within \(T\) time units is \(1 - e^{-\Lambda T}\) and  equals

689: the probability that the update pushed to N is justified.

690: The closer to the authority N is, the higher the \(\Lambda\)

691: and thus the higher the probability for an update pushed to N to be justified.

692: For \(\Lambda=1\) query arrival per second and  \(T=6\)

693: seconds, the probability that an update arriving at N is justified

694: is 99 percent.

695:

696: The benefit of a justified update goes beyond recovery of its cost.

697: For each hop an update is pushed down, exactly one hop is saved since

698: without the propagation, a subsequent query arriving within \(T\) time

699: units would have to travel one hop up and one hop down.  This halves

700: the number of hops traveled which reduces query response latency, and

701: at the same time provides enough benefit margin for more

702: aggressive CUP strategies.

703: For example, a more aggressive strategy would be to push some updates

704: even if they are not justified.  As long as the number of justified

705: updates is at least fifty percent the total number of updates pushed,

706: the overall update overhead is completely recovered.  If the

707: percentage of justified updates is less than fifty percent, then the

708: overhead will not be fully recovered but query latency will be further

709: reduced.  Therefore, if network load is not the prime concern, an

710: ``all-out'' push strategy achieves minimum latency.

711:

712:

713: \subsection{Experiments}

714: \label{Experiments}

715:

716: One of the challenges in evaluating this work is the unavailability of

717: real data traces of completely decentralized peer-to-peer networks

718: such as those assumed by CUP.  The reason for this is that such

719: systems \cite{ratnasamy01a, rowstron01b, stoica01, zhao01} are not yet

720: in widespread use to make collecting traces feasible.

721: Therefore, in the evaluation of CUP we choose simulation parameters

722: that range from unfavorable to favorable conditions for CUP in order

723: to show the spectrum of performance and how it compares to standard

724: caching under the same conditions.  For example, low query rates do

725: not favor CUP because updates are less likely to be justified since

726: there may not be enough subsequent queries to recover the cost of the

727: updates.  On the other hand, queries for keys that become suddenly hot

728: not only justify the propagation overhead, but also enjoy a significant

729: reduction in latency.

730:

731: For our experiments, we simulated a two-dimensional ``bare-bones''

732: content-addressable network (CAN) \cite{ratnasamy01a} using the

733: Stanford Narses simulator \cite{maniatis01}.  The simulation takes as

734: input the number of nodes in the overlay peer-to-peer network, the

735: number of keys owned per node, the distribution of queries for keys,

736: the distribution of query inter-arrival times, the number of replicas

737: per key, and the lifetime of replicas in the system.

738: We ran experiments for n = \(2^k\) nodes where k ranged from 3 to 12.

739: Simulation time was 22000 seconds, with 3000 seconds of querying time.

740: We present results for experiments with replica lifetime of 300

741: seconds to reflect the dynamic nature of peer-to-peer networks where

742: nodes might only serve content for short periods of time.  For all

743: experiments, refreshes of index entries occur at expiration.  Query

744: arrivals were generated according to a Poisson process.

745: Nodes were randomly selected to post the queries.

746:

747: We present five experiments.  First we compare the performance and

748: overhead of CUP against standard caching where CUP propagates updates

749: without any concern for whether the updates are justified.  In this

750: experiment, we vary the level in the CUP tree to which updates are

751: propagated.  We use this experiment to establish the level that

752: provides the maximum benefit and then use the performance results at

753: this level as a benchmark for comparison in later experiments.

754: Second, we compare the effect on CUP performance of different

755: incentive-based cut-off policies and compare the performance of these

756: policies to that of the benchmark.  Third, using the best cut-off

757: policy of the previous experiment, we study how CUP performs as we

758: vary the size of the network.  Fourth, we study the effect on

759: performance of increasing the number of replicas corresponding to a

760: key.  Finally, we study the effect of limiting the outgoing update

761: capacities of nodes.

762:

763: \subsection{Varying the CUP Push Level}

764:

765: In this set of experiments we compare standard caching with a version

766: of CUP that propagates updates down the Real Query Tree of a key

767: regardless of whether or not the updates are justified.  We use this

768: information to determine a maximal performance baseline.  We determine

769: the reduction in misses achieved by CUP and the overhead CUP incurs to

770: achieve this reduction.  We define \emph{miss cost} as the total

771: number of hops incurred by all misses, i.e.  freshness and first-time

772: misses. We define the CUP overhead as the total number of hops

773: traveled by all updates sent downstream plus the total number of hops

774: traveled by all clear-bit messages upstream. (We assume clear-bit

775: messages are not piggybacked onto updates.  This somewhat inflates the

776: overhead measure.)  We define \emph{total cost} as the sum of the

777: \emph{miss cost} and any overhead hops incurred.  Note that in

778: standard caching, the \emph{total cost} is equal to the \emph{miss

779: cost}.

780:

781:

782: Figures~\ref{fig:PushLevQ1,10} and \ref{fig:PushLevQ100,1000} plot

783: CUP's total cost and miss cost versus the push level for a network of

784: \(2^{10}\) nodes.  A push level of \(p\) means that updates are

785: propagated to all nodes that have queried for the key and that are at

786: most \(p\) hops from the authority node. A push level of \(0\)

787: corresponds to the case of standard caching, since all updates from

788: the authority node (the root of the CUP tree) are immediately

789: squelched and not forwarded on.  For this set of experiments, query

790: arrivals were generated according to a Poisson process with average

791: rate \(\lambda\) of 1, 10, 100, and 1000 queries per second at the

792: network.

793:

794:

795: The figures show that as the push level increases, CUP significantly

796: reduces the miss cost when compared with standard caching and does

797: so with little overhead as shown by the displacement of each pair

798: of curves.

799:

800: In Figure~\ref{fig:PushLevQ1,10}, for $\lambda = 1$ query per second,

801: the total cost incurred by CUP decreases and reaches a minimum at

802: around push level 20, after which it slightly increases. This

803: turning point is the level beyond which the overhead cost of updates is

804: not recoverable.

805: For $\lambda = 10$ queries per second, a similar turning

806: point occurs at around push level 25.  In

807: Figure~\ref{fig:PushLevQ100,1000} the minimum total cost occurs at

808: push level 25 and tapers off for both $\lambda =100$ and $\lambda

809: =1000$ queries per second. For low query arrival rates, the turning

810: point occurs at lower push levels.  For example, for $\lambda = 0.01$

811: queries per second, the turning point occurs at push level 15.  These

812: results show that there is no specific optimal push level at which CUP

813: achieves the minimum total cost across all workloads.  If there were,

814: then the simplest strategy for CUP would be to have updates be

815: propagated to that optimal push level.  In fact, we have found that in

816: addition to the query workload, the optimal push level is affected by

817: the number of nodes in the network and the rate at which updates are

818: generated, both of which change dynamically.

819:

820: In the absence of an optimal push level, each node needs a policy for

821: determining when to stop receiving updates.  We next examine some

822: cut-off policies.

823:

824: \begin{figure}[tb]

825: \centerline{\includegraphics[width=9cm]{Pop4TotalCostVersusPushLevQ1,10.eps}}

826: \caption{\small Total cost and miss cost versus push level. }

827: \label{fig:PushLevQ1,10}

828: \end{figure}

829:

830: \begin{figure}[tb]

831: \centerline{\includegraphics[width=9cm]{Pop4TotalCostVersusPushLevQ100,1000.eps}}

832: \caption{\small Total cost and miss cost versus push level. The

833: y-axis is log scale.}

834: \label{fig:PushLevQ100,1000}

835: \end{figure}

836:

837:

838:

839: \subsection{Varying the Cut-Off Policies}

840: \label{CutOffPolicies}

841:

842: On receiving an update for a key, each node determines

843: whether or not there is incentive to continue receiving updates

844: or to cut off updates by pushing up a clear-bit message.  We base the

845: incentive on the \emph{popularity} of the key at the node.

846: The more popular a key is, the more incentive there is to receive updates

847: for that key.  For a key K, the popularity is the number of queries a

848: node has received for K since the last update for K arrived at the

849: node.

850:

851: We examine two types of thresholds against which to test a key's

852: popularity when making the cut-off decision: probability-based and

853: log-based.

854:

855: A probability-based threshold uses the distance of a node N from the

856: authority node A to approximate the probability that an update pushed

857: to N is justified.  Per our cost model of section 3.2, the further N

858: is from A, the less likely an update at N will be justified.

859: We examine two such thresholds, a linear one and a logarithmic one.  With a

860: linear threshold, if an update for key K arrives at a node at distance

861: $D$ and the node has received at least $\alpha D$ queries for K since

862: the last update, then K is considered popular and the node continues

863: to receive updates for K.  Otherwise, the node cuts off its intake of

864: updates for K by pushing up a clear-bit message.  The logarithmic

865: popularity threshold is similar.  A key K is popular if the node has

866: received $\alpha \lg(D)$ queries since the last update.  The

867: logarithmic threshold is more lenient than the linear  in that it

868: increases at a slower rate as we move away from the root.

869:

870: A log-based threshold is one that is based on the recent history of

871: the last \emph{n} update arrivals at the node.

872: If within \emph{n} updates, the node has not received

873: any queries, then the key is not popular and the node pushes up a clear-bit message.  A specific example of a

874: log-based policy is the second-chance policy.  In this policy, $n=3$.

875: When an update arrives, if no queries have arrived since the last

876: update, the policy gives the key a ``second chance'' and does not push

877: a clear-bit message immediately.  If at the next update arrival the

878: node has still not received any queries for K, then it pushes a

879: clear-bit message.  The philosophy behind this policy is that pushing

880: these two updates down from the parent node costs two hops.  If a

881: query arrives in the interval between these two updates,

882: then it will recover the cost of

883: pushing them down, since a query miss would incur

884: two hops, one up and one down.

885:

886: \begin{table*}

887: \caption[]{\small Total cost for varying cut-off policies.}

888: \label{tab:policies2}

889: \begin{center}

890: \begin{tabular}{|l|r|r|r|r|} \hline

891: \textsf{\textbf{Policy}} & \textsf{\textbf{1 q/s Total Cost}}

892: & \textsf{\textbf{10 q/s Total Cost}} & \textsf{\textbf{100 q/s Total Cost}}

893:  & \textsf{\textbf{1000 q/s Total Cost}} \\\hline

894: Standard Caching & 55905 (1.00) & 125288 (1.00) & 399669 (1.00) & 1967452 (1.00) \\ \hline

895: %Linear, $\alpha = 0.5$ & 69964 (1.25) & 127198 (1.02)& 69555 (0.17)& 191859 (0.10)\\ \hline

896: Linear, $\alpha = 0.25$ & 61778 (1.11)& 65668 (0.52)& 61521 (0.15)& 188772 (0.10)\\ \hline

897: Linear, $\alpha = 0.10$ & 39707 (0.71)& 44259 (0.35)& 56170 (0.14)& 184907 (0.09)\\ \hline

898: Linear, $\alpha = 0.01$ & 28234 (0.51)& 37396 (0.30)& 53613 (0.13)& 182198 (0.09)\\ \hline

899: Linear, $\alpha = 0.001$ & 28234 (0.51)& 37396 (0.30)& 53613 (0.13)& 182198 (0.09)\\ \hline

900: Logarithmic, $\alpha = 0.5$ & 30415 (0.54)& 39325 (0.31) & 54816 (0.14)& 183476 (0.09)\\ \hline

901: Logarithmic, $\alpha = 0.25$ & 28165 (0.50)& 37392 (0.30) & 53613 (0.13) & 182183 (0.09)\\ \hline

902: Logarithmic, $\alpha = 0.10$ & 28165 (0.50)& 37392 (0.30) & 53613 (0.13) & 182183 (0.09)\\ \hline

903: Logarithmic, $\alpha = 0.01$ & 28165 (0.50)& 37392 (0.30) & 53613 (0.13) & 182183 (0.09)\\ \hline

904: Second-chance & 15183 (0.27) & 24886 (0.20) & 46385 (0.12) & 177755 (0.09)\\ \hline

905: Optimal push level & 13916 (0.25) & 21805 (0.17) & 44062 (0.11) & 175914 (0.09) \\ \hline

906: \end{tabular}

907: \end{center}

908: \end{table*}

909:

910: Table~\ref{tab:policies2} compares the total cost of standard caching

911: with that of the linear and logarithmic policies for various $\alpha$

912: values, the second chance policy, and that of the optimal push level.

913: The experiment is for a network of \(2^{10}\) nodes and \(\lambda\)

914: rates of 1, 10, 100 and 1000 queries per second.  In each table entry,

915: the first number is the total cost and the number in the parentheses

916: is the total cost normalized by the total cost for standard caching.

917:

918: For the lower query rates, the performance of the linear and the

919: logarithmic policies is greatly affected by the choice of parameter

920: $\alpha$.  In some cases, the total cost of the linear policy exceeds

921: that of standard caching.  For both the linear and logarithmic

922: policies, the total cost decreases with $\alpha$ until it reaches a

923: minimum that cannot be reduced any further. For both $\lambda = 1$ and

924: $\lambda =10$, this minimum is 1.5 to almost two times higher than that of

925: the second chance policy.  For higher query rates, the performance of

926: the linear and logarithmic policies is less affected by the choice of

927: $\alpha$.

928: These results show that choosing a priori an $\alpha$ value for the

929: linear and logarithmic policies that will perform well across all

930: workloads is difficult.

931:

932: The log-based second-chance policy consistently outperforms both

933: probability-based policies and achieves a total cost very near the

934: minimum total cost.  This is because, unlike the probability-based

935: policies that depend on a function of the node's network distance from

936: the root node, the second-chance policy adapts to the timing of the

937: queries within the workload and thus accounts for shifts in key

938: popularity, which is independent of the distance of the node.  In all

939: remaining experiments, we use second-chance as the cut-off policy.

940:

941:

942: \subsection{Varying the Network Size}

943: \label{NetworkSize}

944:

945: In this section we show that CUP performs well as we vary the size

946: of the network.

947:

948: \begin{table*}

949: \caption[]{\small Comparison of CUP with standard caching for varying

950: numbers of nodes n = $2^k$ for k between 3 and 12.}

951: \label{tab:comparison2}

952: \begin{center}

953: \begin{tabular}{|l|r|r|r|r|r|r|r|r|r|r|r|} \hline

954: \textsf{\textbf{Number of Nodes}} & \textsf{\textbf{8}} & \textsf{\textbf{16}}

955:   & \textsf{\textbf{32}} & \textsf{\textbf{64}}  & \textsf{\textbf{128}}

956:   & \textsf{\textbf{256}} & \textsf{\textbf{512}}  & \textsf{\textbf{1024}}

957:   & \textsf{\textbf{2048}} & \textsf{\textbf{4096}} \\\hline

958: CUP / STD Caching Miss Cost & 0.47  & 0.41  & 0.36  & 0.20 & 0.19 & 0.23 & 0.17 & 0.15 & 0.15 & 0.15   \\\hline

959: CUP miss latency &2.3 &2.7 &3.0 &3.0 &3.2 &4.0 & 3.9 & 3.9  & 5.5 & 7.3 \\\hline

960: STD Caching miss latency &2.8 &3.0 &3.5 &4.4 &5.1 & 5.4 & 7.7 & 9.4 & 13.0 & 19.1 \\\hline

961: Saved miss hops per CUP overhead hop & 0.77 & 1.05  & 1.47  & 2.70 & 3.44 & 3.0 & 5.57 & 7.05 & 9.13 & 13.2 \\\hline

962: \end{tabular}

963: \end{center}

964: \end{table*}

965:

966:

967:

968: Table~\ref{tab:comparison2} compares CUP and standard caching for

969: varying numbers of nodes using three metrics.  We use a $\lambda$ rate

970: of one query per second.  The first row shows the CUP miss cost as a

971: fraction of the standard caching miss cost.  The second and third rows

972: show the query latency measured by average number of hops needed to

973: handle a miss for CUP and standard caching respectively.  As can be

974: observed, CUP reduces latency respectively by 5.5, 7.5, and 11.8 hops

975: per miss for the 1024, 2048, and 4096 node networks.  This is a

976: substantial reduction (as much as a factor of three) in query response

977: time in peer-to-peer networks.

978:

979: The fourth row in Table~\ref{tab:comparison2} shows the ``investment''

980: return per update push performed by CUP.  This is computed as the

981: overall ratio of saved miss cost to overhead incurred by CUP:

982:

983: \small

984: \[

985: \label{eq:MissCostSaved}

986: \begin{split}

987: {S}&{avedMissOverheadRatio} \\ % \qquad \qquad \qquad \qquad \qquad \\

988: & = \frac {MissCost_{StandardCaching} - MissCost_{CUP}} {OverheadCost_{CUP}}

989: \end{split}

990: \] %\end{equation}

991:

992: \normalsize For the last three network sizes, the return is

993: respectively 7.05, 9.13, and 13.2 which are remarkable values for a

994: fully recoverable overhead investment.  The return increases with the

995: network size, thus CUP is more amenable to larger networks.

996:

997: Note that this table was generated using a very low query arrival

998: rate.  As a point of comparison, for a network of 1024 nodes and

999: \(\lambda = 1000\) queries per second, the CUP miss cost is 0.09 that

1000: of standard caching, the average CUP miss latency, 2.4 hops, is over

1001: ten times less than that for standard caching (25.1 hops), and the

1002: ratio of saved miss cost to update cost is 168 to 1.  This

1003: demonstrates that CUP is indeed amenable to higher query arrival

1004: rates.

1005:

1006: \subsection{Multiple Replicas per Key}

1007:

1008: In this section we examine the effect on CUP performance of having

1009: multiple replicas per key and propagating updates from each of these

1010: replicas.

1011:

1012: A node that is receiving updates for key K will likely have

1013: multiple index entries cached for K.  At first glance it seems

1014: that the more replicas there are in the system, the fewer the

1015: freshness misses for K the node should experience.

1016:

1017: It turns out this will occur only if the cut-off policy is independent

1018: of the number of replicas in the network.  A naive implementation of a

1019: cut-off policy applies its decision at \emph{every} update arrival for

1020: K.  Since the rate of update arrivals for K increases with the number

1021: of replicas in the system, the chances that the node will receive

1022: enough queries to pass its cut-off test are lower, and the naive

1023: implementation mistakenly concludes that there is not enough incentive

1024: to continue receiving updates.  It therefore pushes up a clear-bit

1025: message earlier than necessary.

1026:

1027: Table~\ref{tab:replicas} illustrates this problem for a network of 1024

1028: nodes using the second-chance policy and a \(\lambda\) rate of 1 query

1029: per second.  Column 2 in the table shows the miss cost and, in

1030: parentheses, the absolute number of misses incurred by the naive

1031: implementation.  In this case, adding more replicas has the opposite

1032: effect of what we expected.  More replicas can mean more misses.  In

1033: fact, the single replica run exhibits the smallest miss cost and

1034: absolute number of misses.

1035:

1036: A solution to this problem requires that the cut-off decision be

1037: independent of the number of replicas in the network.  One way to

1038: achieve this is to trigger the decision only when updates for a

1039: particular replica arrive, and to reset the popularity measure only at

1040: those times.  This ensures that the popularity measure remains the same

1041: across updates for other replicas for the same key.  Column 3 of

1042: Table~\ref{tab:replicas} shows that with this fix, the miss cost and

1043: number of misses do decrease as the number of replicas increases.

1044:

1045:

1046: \begin{table*}

1047: \caption[]{\small Miss cost, absolute number of misses, and total cost

1048: for varying number of replicas }

1049: \label{tab:replicas}

1050: \begin{center}

1051: \begin{tabular}{|l|r|r|r|} \hline

1052: \textsf{\textbf{ }} & \textsf{\textbf{Naive Cut-Off }} & \textsf{\textbf{Replica-independent Cut-Off}} & \textsf{\textbf{Replica-independent Cut-Off}} \\

1053: \textsf{\textbf{Replicas}} & \textsf{\textbf{Miss Cost \& (Misses)}}

1054: & \textsf{\textbf{Miss Cost \& (Misses)}} & \textsf{\textbf{Total Cost}} \\\hline

1055: %500 & 62249 (4171) & 7562 (502) & 3033595 \\ \hline

1056: 100 & 58068 (4493) & 7562 (502) & 608998 \\ \hline

1057: 50 & 59261  (4522) & 7562 (502) & 310301 \\ \hline

1058: 10 & 44079  (4296) & 7565 (504) & 69086 \\ \hline

1059: 5 & 24406   (2936) & 7565 (504) & 39615 \\ \hline

1060: 2 & 11614   (1366) & 7607 (522) & 21050 \\ \hline

1061: 1 & 8460    (1206) & 8460 (1206) & 15183  \\ \hline

1062: \end{tabular}

1063: \end{center}

1064: \end{table*}

1065:

1066: The last column of Table~\ref{tab:replicas} shows the total cost when

1067: each replica refresh is sent as a separate update.  When compared to

1068: the 55905 hops of total cost for standard caching from

1069: Table~\ref{tab:policies2}, we observe that the total cost of CUP will

1070: eventually overtake that of standard caching as we increase the number

1071: of replicas.  In fact this occurs at eight replicas where the total

1072: cost is 57430.  While these results may seem to imply that a handful

1073: of replicas is enough for good CUP performance, for some applications,

1074: having many more replicas in the network is necessary even if they run

1075: the risk of unrecoverable additional CUP overhead.  For example,

1076: having multiple replicas of content helps to balance the demand for

1077: that content across many nodes and reduces latency.

1078:

1079: One may view pushing updates for multiple replicas as an

1080: example of an aggressive CUP policy we refer to in

1081: Section~\ref{CostModel}. At 100 replicas, the total cost is about 10

1082: times that of standard caching.  CUP pays the price of extra overhead

1083: but achieves a miss cost that is about 13.5 percent that of standard

1084: caching.  Therefore, at the cost of extra network load, both query

1085: latency is reduced and the demand for content is balanced across a

1086: greater number of nodes.

1087:

1088: If however network load is a concern, there are a couple of techniques an

1089: authority node can use to reduce the overhead of CUP when there are

1090: many replicas in the network.

1091: First, rather than push all replica refreshes,

1092: the authority node can selectively choose to propagate a subset of the

1093: replica refreshes and suppress others.  This allows the authority node

1094: to reduce update overhead as well as balance demand for content across

1095: the replicas.  Another alternative would be to aggregate replica

1096: refreshes.  When a refresh arrives for one replica, the authority node

1097: waits a threshold amount of time for other updates for the same key to

1098: arrive.  It then batches all updates that arrive within that time and

1099: propagates them together as one update.  This threshold would be a

1100: function of the lifetime of a replica and could be dynamically

1101: adjusted with the number of replicas in the system.  We are

1102: experimenting with different kinds of threshold functions.

1103:

1104:

1105: \subsection{Varying Outgoing Update Capacity}

1106: Our experiments thus far show that CUP clearly outperforms standard

1107: caching under conditions where all nodes have full outgoing capacity.

1108: A node with full outgoing capacity is a node that can and does

1109: propagate all updates for which there are interested neighbors.  In

1110: reality, an individual node's outgoing capacity will vary with its

1111: workload, network connectivity, and willingness to propagate updates.

1112: In this section we study the effect on CUP performance of reducing the

1113: outgoing update capacity of nodes.

1114:

1115:

1116: We present two experiments run on a network of 1024 nodes.  In the

1117: first experiment, called "Up-And-Down", after a five minute warm up

1118: period, we randomly select twenty percent of the nodes to reduce their

1119: capacity to a fraction of their full capacity.  These nodes operate at

1120: reduced capacity for ten minutes after which they return to full

1121: capacity.  After another five minutes for stabilization, we randomly

1122: select another set of nodes and reduce their capacity for ten minutes.

1123: We proceed this way for the entire 3000 seconds during which queries

1124: are posted, so capacity loss occurs three times during the simulation.

1125: In the second experiment, called "Once-Down-Always-Down", after the

1126: initial five minute warmup period, the randomly selected nodes reduce

1127: and remain at reduced capacity for the remainder of the experiment.

1128:

1129: Figure~\ref{fig:capacityQ1} shows the total cost incurred by CUP

1130: versus reduced capacity $c$ for both Up-And-Down and

1131: Once-Down-Always-Down configurations.  A reduced capacity $c = .25$

1132: means a node is only pushing out one-fourth the updates it receives.

1133: The figure also shows the total cost for standard caching as a

1134: horizontal line for comparison.  The $\lambda$ rate is 1 query per

1135: second.  Figure \ref{fig:capacityQ1000} shows the same for $\lambda$=

1136: 1000 which is especially interesting because CUP has bigger wins with

1137: higher query rates since more updates are justified than with lower

1138: query rates.  Therefore, with high query rates CUP has more to lose if

1139: updates do not get propagated.

1140:

1141: Note that even when the capacity of one fifth of the nodes

1142: is reduced to zero percent  and these nodes do not propagate updates,

1143: CUP outperforms standard

1144: caching for both query rates.  The total cost incurred by CUP is about

1145: half that of standard caching for one  query per second for both

1146: configurations.  For 1000 queries per second, the total cost of CUP is

1147: 0.56 and 0.77 that of standard caching for "Up-And-Down" and

1148: "Once-Down-Always-Down" respectively.

1149:

1150: A key observation from these experiments is that CUP's performance

1151: degrades gracefully as $c$ decreases.  This is because the reduction

1152: in propagation saves any

1153: overhead that would have occurred

1154: otherwise.  The important point here is that even if nodes can only

1155: push out a fraction of updates to interested neighbors, CUP still

1156: extends the benefits of standard caching.  Clearly though, CUP

1157: achieves its full potential when all nodes have maximum propagation

1158: capacity.

1159:

1160: \begin{figure}[tb]

1161: \centerline{\includegraphics[width=9cm]{UpAndDown,OnceDownAlwaysDownTotalCost,1.eps}}

1162: \caption{\small Total cost versus reduced capacity.}

1163: \label{fig:capacityQ1}

1164: \end{figure}

1165:

1166: \begin{figure}[tb]

1167: \centerline{\includegraphics[width=9cm]{UpAndDown,OnceDownAlwaysDownTotalCost,1000.eps}}

1168: \caption{\small Total cost versus reduced capacity. The y-axis is

1169: log scale.}

1170: \label{fig:capacityQ1000}

1171: \end{figure}

1172:

1173: % ------------------------------------------------------------------------

1174: \section{Related Work}

1175: \label{RelatedWork}

1176: % ------------------------------------------------------------------------

1177:

1178: Some peer-to-peer systems suffer from what

1179: we call the ``open-connection'' problem.  Every time a peer node

1180: receives a query for which it does not have an answer cached, it asks

1181: one (e.g., Freenet \cite{clarke00}) or more (e.g., Gnutella \cite{gnutella})

1182: neighbors the same query by

1183: opening a connection and forwarding the query through that connection.

1184: The node keeps the connection open until the answer is returned through it.

1185: For every query on every item for which the node does not have a cached

1186: answer, the connection is maintained until the answer comes back.

1187: This results in excessive overhead for the node because it must

1188: maintain the state of many open connections.  CUP avoids this overhead

1189: by asynchronously pushing responses as first-time updates and by

1190: coalescing queries for the same item into one query.

1191:

1192: Chord \cite{stoica01} and CFS \cite{dabek01} suggest alternatives to

1193: making the query response travel down the reverse query path back to

1194: the query issuer.  Chord suggests iterative searches where the query

1195: issuer contacts each node on the query path one-by-one for the item of

1196: interest until one of the nodes is found to have the item.  CFS

1197: suggests that the query be forwarded from node to node until a node is

1198: found to have the item.  This node then directly sends the query

1199: response back to the issuer.  Both of these approaches help avoid some

1200: of the long latencies that may occur as the query response traverses

1201: the reverse query path.

1202: CUP is advantageous regardless of whether the

1203: response is delivered directly to the issuer or through the reverse

1204: query path.  However, to make this work for direct response delivery,

1205: CUP must not coalesce queries for the same item at a node into one

1206: query since each query would need to explicitly carry the return

1207: address information of the query issuer.

1208:

1209: All of the above systems (Gnutella, Freenet, Chord, and CFS) enable

1210: caching at the nodes along the query path.  They do not focus on how

1211: to maintain entries once they have been cached.  Cached items are

1212: removed when they expire and refetched on subsequent queries.  For

1213: very popular items this can lead to higher average response time since

1214: subsequent bursts of queries must wait for the response to travel up

1215: and (possibly) down the query path.  CUP can avoid this problem by

1216: refreshing or updating cached items for which there is interest before

1217: they expire.

1218:

1219: Consistent hashing work by Karger et al. \cite{karger97} looks at

1220: relieving hot spots at origin web servers by caching at intermediate

1221: caches between client caches and origin servers.  Requests for items

1222: originate at the leaf clients of a conceptual tree and travel up

1223: through intermediate caches toward the origin server at the root of the

1224: tree.  This work uses a model slightly different from the

1225: peer-to-peer model.  Their model and analysis assume

1226: requests are made only at leaf clients and that intermediate caches do

1227: not store an item until it has been requested some threshold number of

1228: times.  Also, this work does not focus on maintaining cache freshness.

1229:

1230: Update propagations in CUP form trees very similar to the application-level

1231: multicast trees built by Scribe \cite{rowstron01a}. Scribe

1232: is a publish-subscribe infrastructure built on top

1233: of Pastry \cite{rowstron01b}.  Scribe creates a multicast tree rooted

1234: at the rendez-vous point of each multicast group.

1235: Publishers send a message to the rendez-vous point which then

1236: transmits the message to the entire group by sending it down the

1237: multicast tree.

1238: The multicast tree is formed by joining the Pastry routes from each

1239: subscriber node to the rendez-vous point.  Scribe could apply the

1240: ideas CUP introduces to provide update propagation for cache

1241: maintenance in Pastry.

1242:

1243: Cohen and Kaplan study the effect that aging through cascaded caches

1244: has on the miss rates of web client caches \cite{cohen01a}. For each

1245: object an intermediate cache refreshes its copy of the object when its

1246: age exceeds a fraction \emph{v} of the lifetime duration.  The

1247: intermediate cache does not push this refresh to the client; instead,

1248: the client waits until its own copy has expired at which point it

1249: fetches the intermediate cache's copy with the remaining lifetime.

1250: For some sequences of requests at the client cache and some

1251: \emph{v}'s, the client cache can suffer from a higher miss rate than

1252: if the intermediate cache only refreshed on expiration.  Their model

1253: assumes zero communication delay. A CUP tree could be viewed as a

1254: series of cascaded caches in that each node depends on the previous

1255: node in the tree for updates to an index entry.  The key difference is

1256: that in CUP, refreshes are pushed down the entire tree of interested

1257: nodes.  Therefore, barring communication delays, whenever a parent

1258: cache gets a refresh so does the interested child node.  In such

1259: situations, the miss rate at the child node actually improves.

1260:

1261: % ------------------------------------------------------------------------

1262: \section{Conclusions}

1263: \label{Conclusions}

1264: % ------------------------------------------------------------------------

1265:

1266: In this paper we propose CUP: Controlled Update Propagation for cache

1267: maintenance.  CUP query channels coalesce bursts of queries for the

1268: same item into a single query.  CUP update channels refresh

1269: intermediate caches and reduce the average query latency by over a

1270: factor of ten in favorable conditions, and as much as a factor of

1271: three in unfavorable conditions.  Through light book-keeping, CUP

1272: controls and confines propagations so that only updates that are

1273: likely to be justified are propagated.  In fact, when only half the

1274: number of updates propagated are justified, CUP's overhead is

1275: completely recovered.  Finally, even when a large percentage of nodes

1276: cannot propagate updates (due to limited capacity), CUP continues to

1277: outperform standard caching with expiration.

1278:

1279: % ------------------------------------------------------------------------

1280: \section{Acknowledgements}

1281: % ------------------------------------------------------------------------

1282: This research is supported by the Stanford Networking Reseach Center,

1283: and by DARPA (contract N66001-00-C-8015).

1284:

1285: The work presented here has benefited greatly from discussions with

1286: Petros Maniatis, Armando Fox, Nick McKeown, and Rajeev Motwani. We

1287: thank them for their invaluable feedback.

1288:

1289: \small

1290: %\begin{singlespace}

1291: %\bibliographystyle{alpha} \bibliography{cup}

1292: %\end{singlespace}

1293:

1294: \newcommand{\etalchar}[1]{$^{#1}$}

1295: \begin{thebibliography}{DKK{\etalchar{+}}01}

1296:

1297: \bibitem[CK01a]{cohen01a}

1298: Edith Cohen and Haim Kaplan.

1299: \newblock {Aging Through Cascaded Caches: Performance Issues in the

1300:   Distribution of Web Content}.

1301: \newblock In {\em SIGCOMM}, 2001.

1302:

1303: \bibitem[CK01b]{cohen01b}

1304: Edith Cohen and Haim Kaplan.

1305: \newblock {Refreshment Policies for Web Content Caches}.

1306: \newblock In {\em INFOCOM}, 2001.

1307:

1308: \bibitem[CSWH00]{clarke00}

1309: Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore~W. Hong.

1310: \newblock {Freenet: A Distributed Anonymous Information Storage and Retrieval

1311:   System}.

1312: \newblock In {\em DIAU}, July 2000.

1313:

1314: \bibitem[DKK{\etalchar{+}}01]{dabek01}

1315: Frank Dabek, M.~Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica.

1316: \newblock {Wide-area Cooperative Storage with CFS}.

1317: \newblock In {\em SOSP}, 2001.

1318:

1319: \bibitem[gnu]{gnutella}

1320: {The Gnutella Protocol Specification v0.4}.

1321: \newblock http://gnutella.wego.com/.

1322:

1323: \bibitem[KLL{\etalchar{+}}97]{karger97}

1324: David Karger, Eric Lehman, Tom Leighton, Mathew Levine, Daniel Lewin, and Rina

1325:   Panigrahy.

1326: \newblock {Consistent Hashing and Random Trees: Distributed Caching Protocols

1327:   for Relieving Hot Spots on the World Wide Web}.

1328: \newblock In {\em STOC}, 1997.

1329:

1330: \bibitem[MGB01]{maniatis01}

1331: Petros Maniatis, T.J. Giuli, and Mary Baker.

1332: \newblock {Enabling the Long-Term Archival of Signed Documents through Time

1333:   Stamping}.

1334: \newblock Technical Report cs.DC/0106058, Stanford University, 2001.

1335: \newblock http://www.arxiv.org/abs/cs.DC/0106058.

1336:

1337: \bibitem[RD01]{rowstron01b}

1338: Antony Rowstron and Peter Druschel.

1339: \newblock {Pastry: Scalable, distributed object location and routing for

1340:   large-scale peer-to-peer systems}.

1341: \newblock In {\em MiddleWare}, November 2001.

1342:

1343: \bibitem[RFH{\etalchar{+}}01]{ratnasamy01a}

1344: Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker.

1345: \newblock {A Scalable Content-Addressable Network}.

1346: \newblock In {\em SIGCOMM}, 2001.

1347:

1348: \bibitem[RKCD01]{rowstron01a}

1349: Antony Rowstron, Anne-Marie Kermarrec, Miguel Castro, and Peter Druschel.

1350: \newblock {SCRIBE: The design of a large-scale event notification

1351:   infrastructure}.

1352: \newblock In {\em NGC}, 2001.

1353:

1354: \bibitem[SMK{\etalchar{+}}01]{stoica01}

1355: Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan.

1356: \newblock {Chord: A Scalable Peer-to-peer Lookup Service for Internet

1357:   Applications}.

1358: \newblock In {\em SIGCOMM}, 2001.

1359:

1360: \bibitem[ZKJ01]{zhao01}

1361: Ben~Y. Zhao, John~D. Kubiatowicz, and Anthony~D. Joseph.

1362: \newblock {Tapestry: An Infrastructure for Fault-tolerant Wide-area Location

1363:   and Routing}.

1364: \newblock Technical Report UCB/CSD-01-1141, U. C. Berkeley, April 2001.

1365:

1366: \end{thebibliography}

1367:

1368:

1369:

1370:

1371: % \begin{tiny}

1372: % \begin{verbatim}

1373: % $Id: cup.tex,v 1.12 2002/02/02 05:04:16 mema Exp mema $

1374: % \end{verbatim}

1375: % \end{tiny}

1376:

1377: \end{document}

1378:

1379: