0411:cs0411013/main.tex

1: \documentclass[10pt,twocolumn]{article}

2:

3: \usepackage{fullpage}             % apply uniform 1-inch margins

4: \usepackage{graphicx}             % import, scale, and rotate graphics

5: \usepackage{subfigure}            % group figures

6: \usepackage{url}                  % facilitate linebreaking of URLs

7: \usepackage[latin1]{inputenc}     % use characters with accents in the source

8: \usepackage{nicefrac}             % write fractions in the text

9: \usepackage[super,negative]{nth}  % write 1st, 2nd, 3rd, 4th, etc. in superscript

10: \usepackage{indentfirst}          % indent the first paragraph

11: \usepackage{algorithm}            % float algorithms

12: \usepackage{algpseudocode}        % describe algorithms

13: \usepackage{amssymb}              % use math symbols

14:

15: \newcommand{\dfn}[1]{\textit{#1}}            %introducing new terms

16:

17: \hyphenation{trace-route trace-routes trace-rout-ing}  % prevent tracer-oute

18:

19: \title{Efficient Algorithms for Large-Scale Topology Discovery}

20:

21: \author{Benoit Donnet, Philippe Raoult, Timur Friedman, Mark

22: Crovella\thanks{Mr. Donnet, Mr. Raoult, and Mr. Friedman are with

23: the Laboratoire LiP6-CNRS of the Universit� Pierre et Marie Curie,

24: Paris.  Mr. Crovella is with the Computer Science Department,

25: Boston University. The authors are participants in the

26: traceroute@home project. This work was supported by: the RNRT

27: project Metropolis, NSF grants ANI-9986397 and CCR-0325701, a

28: SATIN European Doctoral Research Foundation grant, the e-Next

29: European Network of Excellence, and LiP6 2004 project funds.  This

30: work was performed while Mr. Crovella was at LiP6, with support

31: from the CNRS and Sprint Labs.}}

32:

33: \date{}   % suppress the date

34:

35: \begin{document}

36:

37: \maketitle

38:

39: \begin{abstract}

40: There is a growing interest in discovery of internet topology at

41: the interface level.  A new generation of highly distributed

42: measurement systems is currently being deployed. Unfortunately,

43: the research community has not examined the problem of how to

44: perform such measurements efficiently and in a network-friendly

45: manner.  In this paper we make two contributions toward that end.

46: First, we show that standard topology discovery methods (e.g.,

47: skitter) are quite inefficient, repeatedly probing the same

48: interfaces. This is a concern, because when scaled up, such

49: methods will generate so much traffic that they will begin to

50: resemble DDoS attacks. We measure two kinds of redundancy in

51: probing (intra- and inter-monitor) and show that both kinds are

52: important.  We show that straightforward approaches to addressing

53: these two kinds of redundancy must take opposite tacks, and are

54: thus fundamentally in conflict.  Our second contribution is to

55: propose and evaluate Doubletree, an algorithm that reduces both

56: types of redundancy simultaneously on routers and end systems. The

57: key ideas are to exploit the tree-like structure of routes to and

58: from a single point in order to guide when to stop probing, and to

59: probe each path by starting near its midpoint. Our results show

60: that Doubletree can reduce both types of measurement load on the

61: network dramatically, while permitting discovery of nearly the

62: same set of nodes and links.  We then show how to enable efficient

63: communication between monitors through the use of Bloom

64: filters.

65: \end{abstract}

66:

67: \section*{Introduction}\label{introduction}

68:

69: Systems for active measurements in the internet are undergoing a

70: radical shift.  Whereas the present generation of systems operates

71: on largely dedicated hosts, numbering between 20 and 200, a new

72: generation of easily downloadable measurement software means that

73: infrastructures based on thousands of hosts could spring up

74: literally overnight.  Unless carefully controlled, these new

75: systems have the potential to impose a heavy load on parts of the

76: network that are being measured.  They also have the potential to

77: raise alarms, as their traffic can easily resemble a distributed

78: denial of service (DDoS) attack.  This paper examines the problem,

79: and proposes and evaluates an algorithm for controlling one of the

80: most common forms of active measurement:

81: \dfn{traceroute}~\cite{traceroute}.

82:

83: There are a number of systems active today that aim to elicit the

84: internet topology at the IP interface level. The most extensive

85: tracing system, \textsc{Caida}'s \dfn{skitter}~\cite{skitter},

86: uses 24 monitors, each targeting on the order of one million

87: destinations. Some other well known systems, such as the

88: \textsc{Ripe} NCC's \dfn{TTM service}~\cite{ripeNccTtm} and the

89: \textsc{NLanr} \dfn{AMP}~\cite{nlanrAmp}, have larger numbers of

90: monitors (between one- and two-hundred), and conduct traces in a

91: full mesh, but avoid tracing to outside destinations.

92:

93: The uses of the raw data from these traces are numerous.  From a

94: scientific point of view, the results underlie efforts to model

95: the network~\cite{agarwal, connectivity, faloutsos, relationship,

96: asanalysis, assize}. From an  engineering standpoint, the results

97: inform a wide variety of protocol development choices, such as

98: multicast and overlay construction \cite{discovering}.

99:

100: However, recent studies have shown that reliance upon a relatively

101: small number of monitors can introduce unwanted biases.  For

102: instance, work by Faloutsos et al.~\cite{faloutsos} found that the

103: distribution of router degrees follows a power law. That work was

104: based upon an internet topology collected from just twelve

105: traceroute hosts by Pansiot and Grad~\cite{onRoutes}.  However,

106: Lakhina et al.~\cite{sampling} showed that, in simulations of a

107: network in which the degree distribution does not at all follow a

108: power law, traceroutes conducted from a small number of monitors

109: can tend to induce a subgraph in which the node degree

110: distribution does follow a power law. Clauset and

111: Moore~\cite{tracerouteSampling} have since demonstrated

112: analytically that such a phenomenon is to be expected for the

113: specific case of the Erd\"os-R\'enyi random

114: graphs~\cite{erdosRenyi}.

115:

116: Removing potential bias is not the only reason to employ

117: measurement systems that use a larger number of monitors. With

118: more monitors to probe the same space, each one can take a smaller

119: portion and probe it more frequently. Network dynamics that might

120: be missed by smaller systems can more readily be captured by the

121: larger ones.

122:

123: The idea of releasing easily deployable measurement software is

124: not new.  To the best of our knowledge, the idea of incorporating

125: a traceroute monitor into a screen saver was first discussed in a

126: paper by Cheswick et al.~\cite{mapping} from the year 2000 (they

127: attribute the suggestion to J\"org Nonnenmacher).  Since that

128: time, a number of measurement tools have been released to the

129: public in the form of screen savers or daemons.

130: \dfn{Grenouille}~\cite{grenouille}, which is used for measuring

131: available bandwidth in DSL connections, was perhaps the first, and

132: appears to be the most widely adopted.  More recently, we have

133: seen the introduction of \dfn{NETI@home}~\cite{neti}, a passive

134: measurement tool inspired by the distributed signal analysis tool,

135: \dfn{SETI@home}~\cite{seti}.  In the summer of 2004, the first

136: tracerouting tool was made available: \dfn{DIMES}~\cite{dimes}

137: conducts traceroutes and pings from, at the time of this writing,

138: 323 sites in 43 countries.

139:

140: Given that much large scale network mapping is on the way,

141: contemplating such a measurement system demands attention to

142: efficiency, in order to avoid generating undesirable network load.

143: Unfortunately, this issue has not been yet successfully tackled by

144: the research community.  As Cheswick, Burch and Branigan note,

145: such a system ``would have to be engineered very carefully to

146: avoid abuse''~\cite[Sec.~7]{mapping}. Traceroutes emanating from a

147: large number of monitors and converging on selected targets can

148: easily appear to be a DDoS attack. Whether or not it triggers

149: alarms, it clearly is not desirable for a measurement system to

150: consume undue network resources.  A \dfn{traceroute@home} system,

151: as we label this class of applications, must work hard to avoid

152: sampling router interfaces and traversing links multiple times,

153: and to avoid multiple pings of end systems.

154:

155: This lack of consideration on efficiency is in contrast to the

156: number of papers on efficient monitoring of networks that are in a

157: single administrative domain (see for instance, Bejerano and

158: Rastogi's work \cite{robust}). However, both problems are

159: completely different.  An administrator knows their entire network

160: topology in advance, and can freely choose where to place their

161: monitors. Neither of these assumptions hold for monitoring the

162: internet with a highly distributed software.  Since the existing

163: literature is based upon these assumptions, we need to look

164: elsewhere for solutions.

165:

166: In this paper, we first evaluate the extent to which classical

167: topology discovery systems involve duplicated effort. By classical

168: topology discovery, we mean those tracerouting from a small number

169: of monitors to a large set of common destinations, such as

170: skitter.  Duplicated effort in such systems takes two forms:

171: measurements made by an individual monitor that replicate its own

172: work, and measurements made by multiple monitors that replicate

173: each other's work. We term the first \dfn{intra-monitor

174: redundancy} and the second \dfn{inter-monitor redundancy}.

175:

176: Using skitter data from August 2004, we quantify both kinds of

177: redundancy.  We show that intra-monitor redundancy is high close

178: to each monitor.  This fact is not surprising given the tree-like

179: structure (or \dfn{cone} \cite{connectivity}) of routes emanating

180: from a single monitor.  However, the degree of such redundancy is

181: quite serious: some interfaces are visited once for each

182: destination probed (which could be hundreds of thousands of times

183: per day in a large-scale system).  Further, with respect to

184: inter-monitor redundancy, we find that most interfaces are visited

185: by all monitors, especially when close to destinations.  This

186: latter form of redundancy is also potentially quite serious, since

187: this would be expected to grow proportional to the number of

188: monitors in future large-scale measurement systems.

189:

190: Our analysis of the nature of redundant probing suggests more

191: efficient algorithms for topology discovery.  In particular, our

192: second contribution is to propose and evaluate an algorithm called

193: Doubletree.  We show that Doubletree can dramatically reduce the

194: impact on routers and final destinations by reducing redundant

195: probing, while maintaining high coverage in terms of interface and

196: link discovery.  Doubletree is particularly effective at removing

197: the worst cases of highly redundant probing that would be expected

198: to raise alarms.

199:

200: Doubletree takes advantage of the tree-like structure of

201: single-source or single-destination routing to avoid duplication

202: of effort. Unfortunately, general strategies for reducing these

203: two kinds of redundancy are in conflict. On the one hand,

204: intra-monitor redundancy is reduced by starting probing far from

205: the monitor, and working backward along the tree-like structure

206: that is rooted at that monitor.  Once an interface is encountered

207: that has already been discovered by the monitor, probing stops. On

208: the other hand, inter-monitor redundancy reduced by probing

209: forwards towards a destination until encountering a

210: previously-seen interface.  In this case, the tree-like structure

211: is based on the probes of multiple monitors towards a same

212: destination.

213:

214: We show how to balance these conflicting strategies in Doubletree.

215: In Doubletree, probing starts at a distance that is intermediate

216: between monitor and destination.  We demonstrate methods for

217: choosing this distance, and we then evaluate the resulting

218: performance of Doubletree. Despite the challenge inherent in

219: reducing both forms of redundancy simultaneously, we show that

220: probing via Doubletree can reduce measurement load by

221: approximately 70\% while maintaining interface and link coverage

222: above 90\%.

223:

224: The Doubletree algorithm requires communication between monitors

225: in order to reduce inter-monitor redundancy.  Information

226: regarding interfaces seen when tracing towards each destination

227: must be shared.  However, this can lead to considerable overhead

228: as the number of known interfaces grows. In this paper, we also

229: propose to reduce this cost through the use of Bloom filters for

230: lossy encoding of the interface set. Surprisingly, we find that

231: using Bloom filters can increase node and link coverage without a

232: large increase in redundancy.

233:

234: The remainder of this paper is organized as follow:

235: Chapter~\ref{redundancy} evaluates the extent of redundancy in

236: classical topology tracing systems. Chapter~\ref{algo} describes

237: and evaluates the Doubletree algorithm.  Chapter~\ref{bf} shows

238: how Bloom filters can help to reduce the communication cost

239: required by our algorithm. Finally, Chapter~\ref{conclusion}

240: concludes this paper and discusses directions for future work.

241:

242: \section{Redundancy}\label{redundancy}

243:

244: In this chapter we quantify and analyze the extensive measurement

245: redundancy that can be found in a classical topology discovery

246: system.

247:

248: \subsection{Methodology}\label{redundancy.methodology}

249:

250: Our study is based on skitter data from August \nth{1} through

251: \nth{3}, 2004. This data set was generated by 24 monitors located

252: in the United States, Canada, the United Kingdom, France, Sweden,

253: the Netherlands, Japan, and New Zealand. The monitors share a

254: common destination set of nearly one million IPv4 addresses. Each

255: monitor cycles through the destination set at its own rate, taking

256: typically three days to complete a cycle. For the purpose of our

257: studies, in order to reduce computing time to a manageable level,

258: we worked from a limited destination set of 50,000, randomly

259: chosen from the original set.

260:

261: Visits to host and router interfaces are the metric by which we

262: evaluate redundancy.  We consider an interface to have been

263: visited if its IP address appears at one of the hops in a

264: traceroute.  Though it would be of interest to calculate the load

265: at the host and router level, rather than at the individual

266: interface level, we make no attempt to disambiguate interfaces in

267: order to obtain a router-level graph.  The alias resolution

268: techniques described by Pansiot and Grad~\cite{onRoutes}, by

269: Govindan and Tangmunarunkit~\cite{heuristics}, for

270: \emph{Mercator}, and applied in the \emph{iffinder} tool from

271: \textsc{Caida}~\cite{iffinder}, would require active probing

272: beyond the skitter data, preferably at the same time that the

273: skitter data is collected. The methods used by Spring et

274: al.~\cite{rocketfuel}, in \emph{Rocketfuel}, and by Teixeira et

275: al.~\cite{pathDiversity}, apply to routers in the network core,

276: and are untested in stub networks.  Despite these limitations, we

277: believe that the load on individual interfaces is a useful

278: measure. As Broido and claffy note~\cite{connectivity},

279: ``interfaces are individual devices, with their own individual

280: processors, memory, buses, and failure modes. It is reasonable to

281: view them as nodes with their own connections.''

282:

283: What does it mean for an IP address to appear at a given hop

284: distance from a monitor?  Skitter, like many standard traceroute

285: implementations, sends three probe packets for each hop count. Our

286: accounting assumes a baseline probing method which, instead, tries

287: up to three times to get a response at each hop. After the first

288: successful response, the probe moves to the next hop.  Thus, the

289: first successfully reached address at each hop is the one used. If

290: none of the three probes are returned, the hop is recorded as

291: non-responding.  In terms of redundancy, this method in fact

292: revisits interfaces less often than the current version of

293: skitter, but is more consistent with the goal of minimizing

294: measurement load, and its behavior can be easily simulated from

295: skitter traces.

296:

297: Even if an IP address is returned for a given hop count, it might

298: not be valid.  Due to the presence of poorly configured routers

299: along traceroute paths, skitter occasionally records anomalies

300: such as private IP addresses that are not globally routable.  We

301: account for invalid hops as if they were non-responding hops.  The

302: addresses that we consider as invalid are a subset of the

303: special-use IPv4 addresses described in RFC~3330~\cite{rfc3330}.

304: Specifically, we eliminate visits to the private IP address blocks

305: 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16.  We also remove the

306: loopback address block 127.0.0.0/8.  In our data set, we find

307: 4,435 different special addresses, more precisely 4,434 are

308: private addresses and only one is a loopback address.  Special

309: addresses cover around 3\% of the entire considered addresses set.

310: Though there were no visits in the data to the following address

311: blocks, they too would be considered invalid: the ``this network''

312: block 0.0.0.0/8, the 6to4 relay anycast address block

313: 192.88.99.0/24, the benchmark testing block 198.18.0.0/15, the

314: multicast address block 224.0.0.0/4, and the reserved address

315: block formerly known as the Class E addresses, 240.0.0.0/4, which

316: includes the \textsc{lan} broadcast address, 255.255.255.255.

317:

318: We evaluate the redundancy at two levels. One is the microscopic

319: level of a single monitor, considered in isolation from the rest

320: of the system.  This intra-monitor redundancy is measured by the

321: number of times the same monitor visits an interface.  The other,

322: macroscopic, level considers the system as an ensemble of

323: monitors. This inter-monitor redundancy is measured by the number

324: of monitors that visit a given interface, counting only once each

325: monitor that has non-zero intra-monitor redundancy for that

326: interface. By separating the two levels, we separate the problem

327: of redundancy into two problems that can be treated somewhat

328: separately.  Each monitor can act on its own to reduce its

329: intra-monitor redundancy,  but cooperation between monitors is

330: required to reduce inter-monitor redundancy.

331:

332: \subsection{Description of the Plots}\label{redundancy.description}

333:

334: Since the redundancy distributions are generally skewed, quantile

335: plots give us a better sense of the data than would plots of the

336: mean and variance. There are several possible ways to calculate

337: quantiles.  We calculate them in the manner described by

338: Jain~\cite[p.~194]{jainArtCSPerfAnalysis}, which is: rounding to

339: the nearest integer value to obtain the index of the element in

340: question, and using the lower integer if the quantile falls

341: exactly halfway between two integers.

342:

343: Fig.~\ref{quantiles.key} provides a key to reading the quantile

344: plots found in Figs.~\ref{redundancy.intra.fig} and

345: \ref{redundancy.inter.global} and figures found later in the

346: paper.

347:

348: \begin{figure}[tbp]

349:   \begin{center}

350:     \includegraphics[height=2.5cm]{Pictures/quantiles.eps}

351:   \end{center}

352:   \caption{Quantiles key}

353:   \label{quantiles.key}

354: \end{figure}

355:

356: A dot marks the median (the \nth{2} quartile, or \nth{50}

357: percentile). The vertical line below the dot delineates the range

358: from the minimum to the \nth{1} quartile, and leaves a space from

359: the \nth{1} to the \nth{2} quartile. The space above the dot runs

360: from the \nth{2} to the \nth{3} quartile, and the line above that

361: extends from the \nth{3} quartile to the maximum.  Small tick

362: marks to either side of the lines mark some additional

363: percentiles: marks to the left for the \nth{10} and \nth{90}, and

364: marks to the right for the \nth{5} and \nth{95}.

365:

366: In the case of highly skewed distributions, or distributions drawn

367: from small amounts of data, the vertical lines or the spaces

368: between them might not appear. For instance, if there are tick

369: marks but no vertical line above the dot, this means that the

370: \nth{3} quartile is identical to the maximum value. In the

371: figures, each quantile plot sits directly above an accompanying

372: bar chart that indicates the quantity of data upon which the

373: quantiles were based. For each hop count, the bar chart displays

374: the number of interfaces at that distance.  For these bar charts,

375: a log scale is used on the vertical axis. This allows us to

376: identify quantiles that are based upon very few interfaces (fewer

377: than twenty, for instance), and so for which the values risk being

378: somewhat arbitrary.

379:

380: \subsection{Intra-monitor Redundancy}\label{redundancy.intra}

381:

382: Intra-monitor redundancy occurs in the context of the tree-like

383: graph that is generated when all traceroutes originate at a single

384: point. Since there are fewer interfaces closer to the monitor,

385: those interfaces will tend to be visited more frequently. In the

386: extreme case, if there is a single gateway router between the

387: monitor and the rest of the internet, a single IP address

388: belonging to that router should show up in every one of the

389: traceroutes.

390:

391: We measure intra-monitor redundancy by considering all traceroutes

392: from the monitor to the common destinations, whether there be

393: problems with a traceroute, as described in

394: Sec.~\ref{redundancy.methodology}, or not.

395:

396: Having calculated the intra-monitor redundancy for each interface,

397: we organize the results by the distance of the interfaces from the

398: monitor.  We measure distance by hop count.  Since the same

399: interface can appear at a number of different hop counts from a

400: monitor, for instance if routes change between traceroutes, we

401: arbitrarily attribute to each interface the hop count at which it

402: was first visited.  This process yields, for each hop count, a set

403: of interfaces that we sort by number of visits.  We then plot, hop

404: by hop, the redundancy distribution for interfaces at that hop

405: count.

406:

407:   \subsubsection{Results}\label{redundancy.intra.results}

408:

409:   \begin{figure}[!t]

410:     \begin{center}

411:       \subfigure[\texttt{arin}]{\label{redundancy.intra.arin}

412:         \includegraphics[width=5.5cm]{Pictures/Redundancy/arin.intra.Quantile.eps}}

413:       \subfigure[\texttt{champagne}]{\label{redundancy.intra.champ}

414:         \includegraphics[width=5.5cm]{Pictures/Redundancy/champagne.intra.Quantile.eps}}

415:     \end{center}

416:     \caption{Skitter intra-monitor redundancy}

417:     \label{redundancy.intra.fig}

418:   \end{figure}

419:

420: Fig.~\ref{redundancy.intra.fig} shows intra-monitor redundancy quantile

421: distributions for two representative skitter monitors: \url{arin} and

422: \url{champagne}.

423:

424: Looking first at the histograms for interface counts (lower half

425: of each plot), we see that these data are consistent with

426: distributions typically seen in such cases.  Plotted on a linear

427: scale (not shown here) these distributions display the familiar

428: bell-shaped curve typical of internet interface distance

429: distributions.  The distribution for \url{champagne} is fairly

430: typical of all monitors. It represents the 92,355 unique IP

431: addresses discovered by that monitor.  This value is shown as a

432: separate bar to the right of the histogram, labeled ``all''.  We

433: see that the interface distances are distributed with a mean at 18

434: hops corresponding to a peak of 9,135 interfaces that are visited

435: at that distance.

436:

437: The quantile plots show the nature of the intra-monitor redundancy

438: problem. Looking first to the bar at the right hand of each chart,

439: showing the quantiles for all of the interfaces taken together, we

440: can see that the distributions are highly skewed. The median

441: interface has a redundancy of one.  Even the \nth{75} quantile is

442: one, as evidenced by the lack of a gap between the dot and the

443: line representing the top quarter of values. However, for a very

444: small portion of the interfaces there is a very high redundancy.

445: The maximum redundancy in each case is 50,000---equal to the

446: number of destinations.

447:

448: Looking at how the redundancy varies by distance, we see that the

449: problem is worse the closer one is to the monitor.  This is what

450: we expect given the tree-like structure of routing from a monitor,

451: but here we see how serious the phenomenon is from a quantitative

452: standpoint. For the first three hops from each monitor, the median

453: redundancy is 50,000. A look at the histograms shows that there

454: are very few interfaces at these distances.  Just one interface

455: for \url{arin}, and the same for \url{champagne}, save for the

456: presence of a second interface at the third hop. This second

457: interface is only visited once, as represented by the presence of

458: the \nth{5} and \nth{10} percentile marks (since there are only

459: two data points, the lower valued point is represented by the

460: entire lower quarter of values on the plot).

461:

462: Beyond three hops, the median redundancy drops rapidly.  By the

463: sixth hop, in both cases, the median is below ten.  By the twelfth

464: hop, the median is one.  However, the distributions remain highly

465: skewed.  Even fifteen hops out, some interfaces experience a

466: redundancy on the order of several hundred visits.  With small

467: variations, these patterns are repeated for each of the monitors.

468:

469: From the point of view of planning a measurement system, the

470: extreme values are the most worrisome.  It is clear that there is

471: significant duplicated effort, but it is especially concentrated

472: in selected areas.  The problem is most severe on the first few

473: interfaces, but even interfaces many hops out receive hundreds or

474: thousands of repeat visits.  Beyond the danger of triggering

475: alarms, there is a simple question of measurement efficiency.

476: Resources devoted to reprobing the same interfaces would be better

477: saved, or reallocated to more fruitful probing tasks.

478:

479:   \begin{table}[!t]

480:     \begin{center}

481:       \begin{tabular}{lr@{.}l}

482:         \multicolumn{2}{c}{Destinations}\\

483:         Responding     & 59 & 7\%\\

484:         Not responding & 40 & 3\%\\

485:         \multicolumn{2}{c}{Probes}\\

486:         Interface discovery   & 10 & 4\%\\

487:         Invalid addresses     &  1 & 5\%\\

488:         No response           &  0 & 5\%\\

489:         Redundant             & 87 & 6\%\\

490:       \end{tabular}

491:     \end{center}

492:     \caption{Probes statistics for \texttt{champagne}}

493:     \label{redundancy.intra.tab}

494:   \end{table}

495:

496: Table~\ref{redundancy.intra.tab} presents additional statistics for

497: \url{champagne}.  The first part of the table indicates the portion of

498: destinations that respond and the portion that do not respond. Fully 40.3\%

499: of the traceroutes do not terminate with a destination response.  The second

500: part of the table describes redundancy in terms of probes sent, rather than

501: from an interface's perspective. Only 10.4\% of probes serve to discover a

502: new interface.  (Note: in the intra-monitor context, an interface is

503: considered to be new if that particular monitor has not previously visited

504: it.)  An additional 2.0\% of probes hit invalid addresses, as defined in

505: Sec.~\ref{redundancy.methodology}, or do not result in a response.  This

506: leaves 87.6\% of the probes that are redundant in the sense that they visit

507: interfaces that the monitor has already discovered.  The statistics in this

508: table are typical of the statistics for every one of the 24 monitors.

509:

510: \subsection{Inter-monitor Redundancy}\label{redundancy.inter}

511:

512: Inter-monitor redundancy occurs when multiple monitors visit the

513: same interface. The degree of such redundancy is of keen interest

514: to us when increasing the number of monitors by several orders of

515: magnitude is envisaged.

516:

517: We calculate the inter-monitor redundancy for each interface by

518: counting the number of monitors that have visited it.  A monitor

519: can be counted at most once towards an interface's inter-monitor

520: redundancy, even if it has visited that interface multiple times.

521: For a given interface, the redundancy is calculated just once with

522: respect to the entirety of the monitors: it does not vary from

523: monitor to monitor as does intra-monitor redundancy. However, what

524: does vary depending upon the monitor is whether the particular

525: interface is seen, and at what distance.  In order to attribute a

526: single distance to an interface, a distance that does not depend

527: upon the perspective of a single monitor but that nonetheless has

528: meaning when examining the effects of distance on redundancy, we

529: attribute the minimum distance at which an interface has been seen

530: among all the monitors.

531:

532:   \subsubsection{Results}\label{redundancy.inter.results}

533:   %%%%%%%%%%%%%%%%%%%%

534:   \begin{figure}[tbp]

535:     \begin{center}

536:       \includegraphics[width=5cm]{Pictures/Redundancy/GlobalInter.eps}

537:     \end{center}

538:     \caption{Skitter inter-monitor redundancy}

539:     \label{redundancy.inter.global}

540:   \end{figure}

541:

542: Fig.~\ref{redundancy.inter.global} shows inter-monitor redundancy

543: for the skitter data.

544:

545: The distribution of interfaces by hop count differs from the

546: intra-monitor case due to the difference in how we account for

547: distances.  The mean is closer to the traceroute source (10 hops),

548: corresponding to the peak of 21,222 interfaces that are visited at

549: that distance.

550:

551: The redundancy distribution also has a very different aspect.

552: Considering, first of all, the redundancy over all of the

553: interfaces (at the far right of the plot), we see that the median

554: interface is visited by all 24 monitors, which is a subject of

555: great concern. The distribution is also skewed, though the effect

556: is less dramatic since the vertical axis is a linear scale, with

557: only 24 possible values.

558:

559: We also see a very different distribution by distance.  Interfaces

560: that are very close in to a monitor, at one or two hops, have a

561: median inter-monitor redundancy of one.  The same is true of

562: interfaces that are far from all monitors, at distances over 20,

563: though there are very few of these.  (The presence of an interface

564: at hop 27 that is seen by all monitors serves to raise the median

565: at that distance to 24.)  What is especially notable is that

566: interfaces at intermediate distances (6 to 15) tend to be visited

567: by all, or almost all, of the monitors.  Though their distances

568: are in the middle of the distribution, this does not mean that the

569: interfaces themselves are in the middle of the network.  Many of

570: these interfaces are in fact destinations.  Recall that every

571: destination is targeted by every host.

572:

573: \section{Algorithm}\label{algo}

574:

575: In this chapter, we present the Doubletree algorithm, our method

576: for probing the network in a friendly manner while discovering

577: nearly all the interfaces and links that a classical tracerouting

578: approach would discover.

579:

580: Sec.~\ref{algo.description} describes how Doubletree works.

581: Sec.~\ref{algo.tension} discusses the results of varying the

582: single parameter of this algorithm. Finally,

583: Sec.~\ref{algo.redundancy} shows the extent of intra- and

584: inter-monitor redundancy reduction when using the algorithm.

585:

586: \subsection{Description}\label{algo.description}

587:

588: Doubletree takes advantage of the tree-like structure of routes in

589: the internet. Routes lead out from a monitor towards multiple

590: destinations in a tree-like way, as shown in

591: Fig.~\ref{treeStructureFig.intra}, and the manner in which routes

592: converge towards a destination from multiple monitors is similarly

593: tree-like, as shown in Fig.~\ref{treeStructureFig.inter}.  The

594: tree is an idealisation of the structure encountered in practice.

595: Paths separate and reconverge.  Loops can arise.  But a tree may

596: be a good enough first approximation on which to base a probing

597: algorithm.

598:

599:   \begin{figure}[!tbp]

600:     \begin{center}

601:       \subfigure[Monitor-rooted]{\label{treeStructureFig.intra}

602:         \includegraphics[width=5cm]{Pictures/TreeIntra.eps}}

603:       \subfigure[Destination-rooted]{\label{treeStructureFig.inter}

604:         \includegraphics[width=5cm]{Pictures/TreeInter.eps}}

605:     \end{center}

606:     \caption{Tree-like routing structures}

607:     \label{treeStructureFig}

608:   \end{figure}

609:

610: A probing algorithm can reduce its redundancy by tracking its

611: progress through a tree, as it probes from the direction of the

612: leaves towards the root.  So long as it is probing in a previously

613: unknown part of the tree, it continues to probe.  But once it

614: encounters a node that is already known to belong to the tree, it

615: stops.  The idea being that the remainder of the path to the root

616: must already be known.  In reality, there is only a likelihood and

617: not a certainty that the remainder of the path is known.  The

618: redundancy saved by not reprobing the same paths to the root may

619: nonetheless be worth the loss in coverage that results from not

620: probing the occasional different path.

621:

622: Doubletree uses both the monitor-rooted and the destination-rooted

623: trees. When probing backwards from the destinations towards the

624: monitor, it applies a stopping rule based upon the monitor-rooted

625: tree. The goal in this case is to reduce intra-monitor redundancy.

626: When probing forwards, the stopping rule is based upon the

627: destination-rooted tree, with the goal being to reduce

628: inter-monitor redundancy. There is an inherent tension between the

629: two goals.

630:

631: Suppose the algorithm were to start probing only far from each

632: monitor. Probing would necessarily be backwards.  In this case,

633: the destination-based trees cannot be used to reduce redundancy. A

634: monitor might discover, with destination $d$ at hop $h$, an

635: interface that another monitor also discovered when probing with

636: destination $d$.  However this does not inform the monitor as to

637: whether the interface at hop $h-1$ is likely to have been

638: discovered as well.  So it is not clear how to reduce

639: inter-monitor redundancy when conducting backwards probing.

640:

641: Similarly, when conducting forwards probing (of the classic

642: traceroute sort), it is not clear how intra-monitor redundancy can

643: be avoided.  Paths close to the monitor will tend to be probed and

644: reprobed, for lack of knowledge of where the path to a given

645: destination might diverge from the paths already seen.

646:

647: In order to reduce both inter- and intra-monitor redundancy,

648: Doubletree starts probing at what is hoped to be an intermediate

649: point.  For each monitor, there is an initial hop count $h$.

650: Probing proceeds forwards from $h$, to $h+1$, $h+2$, and so forth,

651: applying the stopping rule based on the destination-rooted tree.

652: Then it probes backwards from $h$, to $h-1$, $h-2$, etc., using

653: the monitor-based tree stopping rule.  In the special case where

654: there is no response at distance $h$, the distance is halved, and

655: halved again until there is a reply, and probing continues

656: forwards and backwards from that point.

657:

658: Rather than maintaining detailed information on tree structures,

659: it is sufficient for the stopping rules to make use of sets of

660: interfaces.  Each monitor tracks the interfaces that it has

661: discovered.  These form a stop set $B$, called the \dfn{backwards

662: tracing stop set}, or more concisely, the \dfn{local stop set}, to

663: be used in a monitor's own backwards probing. When probing

664: backwards from a destination $d$, encountering an interface in $B$

665: causes the monitor to stop and move on to the next destination.

666: Each monitor also receives another stop set, $F$, called the

667: \dfn{forwards tracing stop set}, or more concisely, the

668: \dfn{global stop set}, that contains

669: $(\mathrm{interface},\mathrm{destination})$ pairs. When probing

670: forwards towards destination $d$ and encountering an interface

671: $i$, forwards probing stops if $(i,d) \in F$. Communication

672: between monitors is needed in order to share this second stop set.

673:

674: Only one aspect of Doubletree has been suggested in prior

675: literature. Govindan and Tangmunarunkit~\cite{heuristics} employ

676: backwards probing with a stopping rule in the Mercator system, in

677: order to reduce intra-monitor redundancy.  However, no results

678: have been published regarding the efficacy of this approach.  Nor

679: have the effects on inter-monitor redundancy been considered, or

680: the tension between reducing the two types of redundancy (for

681: instance, Mercator tries to start probing at the destination, or

682: as close to it as possible). Nor has any prior work suggested a

683: manner in which to exploit the tree-like structure of routes that

684: converge on a destination.  Finally, no prior work has suggested

685: cooperation among monitors.

686:

687: Algorithm~\ref{algo.formal} is a formal definition of the

688: Doubletree algorithm. It assumes that the following two functions

689: are defined. The \textit{response}() procedure returns true if an

690: interface replies to at least one of the probes that were sent.

691: \textit{halt}() is a primitive that checks if the probing must be

692: stopped for different reasons: a loop is detected or a gap (five

693: successive non-responding nodes) is discovered.

694:

695:   \begin{algorithm}[!t]

696:     \caption{Doubletree}

697:     \label{algo.formal}

698:     \begin{algorithmic}[1]

699:       \Require $F$, the global stop set received by this monitor.

700:       \Ensure $F$ updated with all (interface,destination) pairs discovered by this monitor.

701:       \Statex

702:       \Procedure{Doubletree}{$h$, $D$}

703:           \State $B \leftarrow \emptyset$\Comment{Local stop set}

704:           \ForAll{$d \in D$}\Comment{Destinations}

705:               \State $h \leftarrow$

706:               \textsc{AdaptHValue}($h$)\Comment{Initial hop}

707:               \State \textsc{TraceForwards}($h$, $d$)

708:               \State \textsc{TraceBackwards}($h-1$, $d$)

709:           \EndFor

710:       \EndProcedure

711:       \Statex

712:       \Procedure{AdaptHValue}{$h$}

713:           \While{$\neg \mathrm{response}(v_h) \wedge h \neq

714:           1$}\Comment{$v_h$ the interface discovered at $h$ hops}

715:               \State $h \leftarrow \frac{h}{2}$\Comment{$h$ an integer}

716:           \EndWhile

717:           \State \textbf{return} $h$

718:       \EndProcedure

719:       \Statex

720:       \Procedure{TraceForwards}{$i$, $d$}

721:           \While{$v_i \neq d \wedge (v_i, d) \notin F \wedge \neg\mathrm{halt}()$}

722:               \State $F \leftarrow F \bigcup (v_i, d)$

723:               \State $i++$

724:           \EndWhile

725:       \EndProcedure

726:       \Statex

727:       \Procedure{TraceBackwards}{$i$, $d$}

728:           \While{$i \geqslant 1 \wedge v_i \notin B \wedge \neg\mathrm{halt}()$}

729:               \State $B = B \bigcup v_i$

730:               \State $F = F \bigcup (v_i, d)$

731:               \State $i--$

732:           \EndWhile

733:       \EndProcedure

734:     \end{algorithmic}

735:   \end{algorithm}

736:

737: This algorithm has only one tunable parameter: the initial hop

738: count $h$.  In the remainder of this section, we explain how to

739: set this parameter in terms of another parameter that we call $p$.

740:

741: We wish for each monitor to be able to determine a reasonable

742: value for $h$: one that is far enough from the monitor to avoid

743: excess intra-monitor redundancy, yet not so far as to generate too

744: much inter-monitor redundancy. Since each monitor will be

745: positioned differently with respect to the internet, what is

746: reasonable for one monitor might not be reasonable for another. We

747: thus base our rule for choosing $h$ on the distribution of path

748: lengths as seen from the perspective of a given monitor.  The

749: general idea is to start probing at a distance that is rich in

750: interfaces, but that is not so far as to exacerbate inter-monitor

751: redundancy.

752:

753: Based upon our intra-monitor redundancy studies, discussed above,

754: we would expect an initial hop distance of five or more from the

755: typical monitor to be fairly rich in interfaces.  However, we also

756: know that this is the distance at which inter-monitor redundancy

757: can become a problem.  We are especially concerned about

758: inter-monitor redundancy at destinations, because this is what is

759: most likely to look like a DDoS attack.

760:

761:   \begin{figure}[!tbp]

762:     \begin{center}

763:       \includegraphics[width=5cm]{Pictures/Algorithm/apan-jp.cdfClassical.eps}

764:     \end{center}

765:     \caption{Lengths of paths from monitor \texttt{apan-jp}}

766:     \label{algo.cdf}

767:   \end{figure}

768:

769: One parameter that a monitor can estimate without much effort is

770: its probability of hitting a responding destination at any

771: particular hop count $h$.  For instance, Fig.~\ref{algo.cdf} shows

772: the cumulative mass plot of path lengths from monitor

773: \url{apan-jp}. If \url{apan-jp} choses $h=10$, that implies a

774: $0.1$ probability of hitting a responding destination on the first

775: probe.  The shape of this curve is very similar for each of the 24

776: skitter monitors, but the horizontal position of the curve can

777: vary by a number of hops from monitor to monitor.  So if we are to

778: fix the probability, $p$, of hitting a responding destination on

779: the first probe, there will be different values $h$ for each

780: monitor, but that value will correspond to a similar level of

781: incursion into the network across the board.

782:

783: We have chosen $p$ to be the single independent parameter that

784: must be tuned to guide Doubletree. In the following section, we

785: study the effect of varying $p$ on the tension between inter- and

786: intra-monitor redundancy, and the overall interface and link

787: coverage that Doubletree obtains.

788:

789: \subsection{Tuning the Parameter p}\label{algo.tension}

790:

791: This section discusses the effect of varying $p$.

792: Sec.~\ref{algo.tension.methodology} describes our experimental

793: methodology, and Sec.~\ref{algo.tension.results} presents the

794: results.

795:

796: \subsubsection{Methodology}\label{algo.tension.methodology}

797:

798: In order to test the effects of the parameter $p$ on both

799: redundancy and coverage, we implement Doubletree in a simulator.

800: We examine the following values for $p$: between 0 (i.e., forwards

801: probing only) and 0.2, we increment $p$ in steps of 0.01. From 0.2

802: to $1$ (i.e., backwards probing in all cases when the destination

803: replies to the first probe), we increment $p$ in steps of 0.1. As

804: will be shown, the concentration of values close to 0 allows us to

805: trace the greater variation of behavior in this area.

806:

807: To validate our results, we run the simulator using the same

808: skitter data set we considered in Sec.~\ref{redundancy}.  We

809: assume that Doubletree is running on the skitter monitors, during

810: the same period of time that the skitter data represents, and

811: implementing the same baseline probing technique described in

812: Sec.~\ref{redundancy.methodology}, of probing up to three times at

813: a given hop distance.  The difference lies in the order in which

814: Doubletree probes the hops, and the application of Doubletree's

815: stopping rules.

816:

817: Doubletree requires communication of the global stop set from one

818: monitor to another.  We therefore choose a random order for the

819: monitors and simulate the running of Doubletree on each one in

820: turn.  The global stop set is added to and passed on to each

821: monitor in turn.  This is a simplified scenario compared to the

822: way in which a fully operational cooperative topology discovery

823: protocol might function, which is to say with all of the monitors

824: probing and communicating in parallel. However, we feel that the

825: scenario allows greater realism in the study of intra-monitor

826: redundancy. The typical monitor in a large, highly distributed

827: infrastructure will begin its probing in a situation in which much

828: of the topology has already been discovered by other monitors. The

829: closest we can get to simulating the experience of such a monitor

830: is by studying what happens to the last in our random sequence of

831: monitors.  All Doubletree intra-monitor redundancy results are for

832: the last monitor in the sequence.  (Inter-monitor redundancy, on

833: the other hand, is monitor independent.)

834:

835: \subsubsection{Results}\label{algo.tension.results}

836:

837:   \begin{figure}[!tbp]

838:     \begin{center}

839:       \includegraphics[width=8cm]{Pictures/Redundancy/TensionRedundancy.eps}

840:     \end{center}

841:     \caption{Doubletree redundancy, \nth{95} percentile.  Inter-monitor redundancy on destinations,

842:       gross redundancy on router interfaces.}

843:     \label{algo.tension.redundancy}

844:   \end{figure}

845:

846: Since the value $p$ has a direct effect on the redundancy of

847: destination interfaces, we initially look at the effect of $p$

848: separately on destination redundancy and on router interface

849: redundancy.  We are most concerned about destination redundancy

850: because of its tendency to appear like a DDoS attack, and we are

851: concerned in particular with the inter-monitor redundancy on these

852: destinations, because a variety of sources is a prime indicator of

853: such an attack.  The right-side vertical axis of

854: Fig.~\ref{algo.tension.redundancy} displays destination

855: redundancy.  With regards router interface redundancy, we are

856: concerned with overall load, and so we consider a combined intra-

857: and inter-monitor redundancy measure that we call \dfn{gross

858: redundancy}, that counts the total number of visits to an

859: interface.  For both destinations and router interfaces, we are

860: concerned with the extreme values, so we consider the \nth{95}

861: percentile.

862:

863: As expected the \nth{95} percentile inter-monitor redundancy on

864: destinations increases with $p$. Values increase until $p=0.5$, at

865: which point they plateau at 24.  The point $p=0.5$ is the point at

866: which, in 50\% of the cases, the probe sent to a distance $h$ hits

867: a destination.  Doubletree allows a reduction in \nth{95}

868: percentile inter-monitor redundancy when compared to classical

869: probing for lower values of $p$.  The reduction is 84\% when

870: $p=0$.

871:

872: As opposed to destination redundancy, the \nth{95} percentile

873: gross router interface redundancy decreases with $p$.  The

874: \nth{95} percentile for the internal interface gross redundancy

875: using the classical approach is 449. Doubletree thus allows a

876: reduction between 59\% ($p=0$) and 72\% ($p=1$).

877:

878: This preliminary analysis suggests that Doubletree should employ a

879: low value for $p$, certainly below 0.5, in order to reduce

880: inter-monitor redundancy on destinations.  This is a very

881: different approach than that taken by Mercator, which attempts to

882: hit a destination every time.  On the other hand, too low a value

883: will have a negative impact on router interfaces.  We now examine

884: other evidence that will bear on our choice of $p$.

885:

886:   \begin{figure}[!tbp]

887:     \begin{center}

888:       \includegraphics[width=8cm]{Pictures/Redundancy/PercentageSeen.eps}

889:     \end{center}

890:     \caption{Links and nodes coverage in comparison to classic probing}

891:     \label{algo.tension.coverage}

892:   \end{figure}

893:

894: Fig.~\ref{algo.tension.coverage} illustrates the effects of $p$ on

895: the node and link coverage percentage in comparison to classic

896: probing. As we can see, the coverage increases with $p$ but a

897: small decrease is noticed for values of $p$ greater than $0.7$.

898: The maximum of coverage is reached when $p=0.7$: Doubletree

899: discovers 95,49\% of links and 98,4\% of nodes.  The minimum of

900: coverage appears when $p=0$: 77\% of links and 89\% of nodes.

901: However, link coverage grows rapidly for $p$ values between $0$

902: and $0.4$. After that point, a kind of plateau is reached, before

903: a small decrease.

904:

905: Fig.~\ref{algo.tension.coverage} shows that the information (i.e.

906: links and nodes) discovery of our algorithm is satisfactory,

907: especially for non zero values of $p$.

908:

909:   \begin{figure}[!tbp]

910:     \begin{center}

911:       \includegraphics[width=8cm]{Pictures/Redundancy/NbVisit.eps}

912:     \end{center}

913:     \caption{Amount of probes sent}

914:     \label{algo.tension.visit}

915:   \end{figure}

916:

917: Fig.~\ref{algo.tension.visit} shows the effects of $p$ on the

918: number of probes sent.  The horizontal axis indicates the value

919: for $p$.  The vertical axis represents the number of probes sent.

920: If we consider an ideal system in which each probe sent visits a

921: new interface (i.e., there is no redundancy), the number of probes

922: sent to discover all the interfaces will be 131,078 (i.e., the

923: number of different interfaces in the data set).  The lowest

924: vertical bar represents this fact. Furthermore, if the ideal

925: system is also able to elicit links without any redundancy, then

926: it should send 279,798 probes. The second vertical line considers

927: that point in the plot.  On the other hand, if our system works

928: like skitter, it has to send 19,280,551 probes.  This is

929: represented by the highest vertical line.  In order to plot all

930: these lines on the same figure, the vertical axis has been plotted

931: in a log scale.  With Doubletree, the number of probes needed

932: varies between 11,684,439 (i.e., a reduction of 40\% in comparison

933: to the classical approach) and 5,330,098 (i.e., a reduction of

934: 73\% in comparison to the classical approach).   This minimum is

935: reached when $p=0.12$.

936:

937: In the skitter data set that we consider, not all nodes (internal

938: interfaces and destinations) necessarily reply to probes. An

939: internal interface might not response to probes because the router

940: does not send ICMP messages, or is too busy. Usually, destinations

941: do not reply because of security policy. In our study, we consider

942: two kinds of nodes that do not reply to probes: the

943: \dfn{non-responding nodes} and the \dfn{unidentifiable nodes}.

944: Non-responding nodes appear when a node does not response in the

945: path, but there are other interfaces (either router or

946: destination) that respond at a more distant hop count.  On the

947: other hand, unidentifiable nodes appear at the end of the path,

948: when skitter does not complete a path. As we do not know if these

949: nodes are destinations or not, we consider them to be

950: unidentifiable.

951:

952: Table~\ref{knownUnknown} compares Doubletree with classic probing

953: as concerns the non-responding and unidentifiable nodes.  We can

954: show that we strongly reduce the impact on unidentifiable nodes.

955: We note that when $p$ is at its maximum, the stress on

956: unidentifiable nodes is identical to the classical approach.

957:

958:   \begin{table}[!tbp]

959:     \begin{center}

960:       \begin{tabular}{l|cc}

961:         & Nonresponding & Unidentifiable\\

962:         \hline

963:         classic  & 126,168 & 512,764\\

964:         \hline

965:         $p=0$    & 34,867 & 62,009\\

966:         $p=0.05$ & 26,136 & 127,917\\

967:         $p=0.10$ & 28,857 & 162,845\\

968:         $p=0.15$ & 31,792 & 196,220\\

969:         $p=0.20$ & 34,624 & 232,215\\

970:         $p=0.50$ & 47,362 & 383,000\\

971:         $p=1$    & 52,422 & 512,764\\

972:       \end{tabular}

973:     \end{center}

974:     \caption{Load on anonymous interfaces}

975:     \label{knownUnknown}

976:   \end{table}

977:

978: Out of concern that our solution might be too tightly tied to fit

979: to our data set, we perform the same experiment on another data

980: set of 50,000 destinations, randomly chosen from the whole set.

981: There is no overlapping between the two destination subsets (i.e.,

982: they are totally disjoint).  We find that the results obtained

983: with the second data set are consistent with the first one.

984:

985: These results presented in this section are important in the case

986: of a highly distributed measurement tool.  They demonstrate that

987: it is possible to probe in a network friendly manner while

988: maintaining a very high level of topological information gathered

989: by monitors.

990:

991: In this section, we discussed the effects of different $p$ values.

992: However, results permit now to identify a range of values where a

993: good compromise between redundancy reduction and high level of

994: coverage is possible.   Thus, hitting a destination with the very

995: first probe in 20\% of the cases seems to us to be a reasonable

996: maximum.  Further, in terms of coverage, a probability $p$ of

997: $0.05$ seems also reasonable.

998:

999: In the perspective of a real system implementing our algorithm,

1000: the value $p$ (and the corresponding $h$) cannot be chosen a

1001: priori, as we did in our experimentations.  However, it can be

1002: easily computed on the fly by using an iterative process, as the

1003: monitor's knowledge about paths and topology improves.

1004:

1005: \subsection{Redundancy Reduction}\label{algo.redundancy}

1006:

1007: In this section, we study the effects of Doubletree on the intra-

1008: and inter-monitor redundancy for some values of $p$.

1009:

1010: Sec.~\ref{algo.redundancy.methodology} describes our methodology.

1011: Sec.~\ref{algo.redundancy.intra} presents the intra- and

1012: Sec.~\ref{algo.redundancy.inter} the inter-monitor redundancy

1013: reduction.

1014:

1015: \subsubsection{Methodology}\label{algo.redundancy.methodology}

1016:

1017: We use the simulator to study the effects of Doubletree on intra-

1018: and inter-monitor redundancy.  Again, for comparison reasons, we

1019: use the same data set as in Sec.~\ref{redundancy}.

1020:

1021: The plots are presented in the same way as in

1022: Sec.~\ref{redundancy}. However, the lower part of the graphs, the

1023: histograms, contains additional information.  The bars are now

1024: enveloped by a curve. This curve indicates, for each hop, the

1025: quantity of nodes discovered while using the classical method.

1026: The bars themselves describe the number of nodes discovered by

1027: Doubletree. Therefore, the space between the bars and the curve

1028: represents the quantity of nodes Doubletree misses.

1029:

1030: In Sec.~\ref{algo.tension}, we identify the range of $p$ values for which

1031: redundancy is sufficiently low and coverage high enough. We run simulations

1032: for $p=0.05$, $p=0.1$, $p=0.15$ and $p=0.2$ and study the effects on inter-

1033: and intra-monitor redundancy reduction. However, we note that the

1034: differences between the results for each $p$ value are small. Therefore, we

1035: choose to present in the following sections only the results for $p=0.05$.

1036:

1037: \subsubsection{Intra-monitor}\label{algo.redundancy.intra}

1038:

1039:   \begin{figure}[!tbp]

1040:     \begin{center}

1041:       \includegraphics[width=5cm]{Pictures/Algorithm/EfficientIntraSequence.champagne14.eps}

1042:     \end{center}

1043:     \caption{Intra-monitor redundancy for the \texttt{champagne}

1044:       monitor with $p=0.05$.}

1045:     \label{algo.redundancy.intra.fig}

1046:   \end{figure}

1047:

1048: Fig.~\ref{algo.redundancy.intra.fig} shows intra-monitor redundancy when

1049: using Doubletree with $p=0.05$ for a representative monitor: \url{champagne}.

1050:

1051: First of all, we could note that, using Doubletree,

1052: \url{champagne} is able to elicit 97\% of the interfaces in

1053: comparison to the classical method.

1054:

1055: Looking to the right part of the plot first, we note that the

1056: median has a redundancy of 3.  It is a little bit higher than for

1057: the classical method.  As in the classical approach, for a very

1058: small number of interfaces there is a high redundancy.

1059: Nevertheless, the maximum is 15,029. Compared to the 50,000 in the

1060: classical approach, there is a reduction of 70\%.

1061:

1062: Looking now at how the redundancy varies by distance, we note a

1063: strong reduction for the median values close to the monitor.

1064: However, for further hops, the median values drop lower than in

1065: the classical approach (see, for comparison,

1066: Fig.~\ref{redundancy.intra.champ}). Finally, we note that high

1067: quantiles for hops far from the source have higher values than for

1068: the classical method.

1069:

1070: \subsubsection{Inter-monitor}\label{algo.redundancy.inter}

1071:

1072:   \begin{figure}[!t]

1073:     \begin{center}

1074:       \includegraphics[width=5cm]{Pictures/Algorithm/EfficientInterSequence1.eps}

1075:     \end{center}

1076:     \caption{Inter-monitor redundancy with $p=0.05$.}

1077:     \label{algo.redundancy.inter.fig}

1078:   \end{figure}

1079:

1080: Fig.~\ref{algo.redundancy.inter.fig} shows inter-monitor

1081: redundancy when using Doubletree with $p=0.05$.

1082:

1083: We first analyse the lower part of the graph.  The distribution of

1084: hop counts for interfaces shows that most of the undiscovered

1085: interfaces are far from the source.  As most of these nodes are

1086: only visited by a single monitor (see

1087: Fig.~\ref{redundancy.inter.global}), due to the nature of the

1088: global stop set and the stop rule, the risk of missing them is

1089: very high. Probably, with a higher value for $p$, we would have

1090: elicited them. However, this solution would raise the redundancy

1091: for destinations, as explained in Sec.~\ref{algo.tension.results}.

1092: Those undiscovered nodes are, in a certain sense, the price to pay

1093: to reduce the redundancy. Again, this fact represents the inherent

1094: tension in the topology discovery problem. Some nodes are

1095: sacrificed in order to reduce the redundancy.

1096:

1097: If we compare Fig.~\ref{algo.redundancy.inter.fig} with

1098: Fig.~\ref{redundancy.inter.global}, we can see that the redundancy

1099: is strongly reduced.  The highest value for the median is only 6.

1100: For the classical method, it is equal to the maximum, i.e., 24.

1101: Furthermore, the highest quantiles between hop 4 and 13 are more

1102: dissipated.

1103:

1104: Finally, the right part of the graph, called \textit{all},

1105: indicates that the median value is 2.  If we compare with

1106: Fig.~\ref{redundancy.inter.global}, where the median equals 24, we

1107: note that Doubletree allows a very strong reduction in

1108: inter-monitor redundancy.

1109:

1110: \section{Bloom Filters}\label{bf}

1111:

1112: The algorithm presented in Sec.~\ref{algo.description} requires

1113: that monitors exchange a set of

1114: $(\mathrm{interface},\mathrm{destination})$ pairs.  The maximum

1115: size of the global stop set will be the maximum number of

1116: $(\mathrm{interface},\mathrm{destination})$ pairs in the data set

1117: considered. Our study considers only 50,000 common destinations,

1118: but skitter monitors probe towards a common set of on the order of

1119: a million destinations.  Sharing a stop set based on this number

1120: of destinations or even more could lead to a severe communication

1121: overhead.  This should be avoided, or at least strongly limited,

1122: in the case of a highly distributed measurement tool.

1123:

1124: In order to reduce communication bandwidth cost, we propose to use

1125: Bloom filters~\cite{bloom}, a technique that employs hash

1126: functions to conduct lossy compression, and that has already seen

1127: a number of networking applications, as described by Broder, A.

1128: and Mitzenmacher in a 2002 survey~\cite{survey}.  A feature of

1129: Bloom filters is that the tradeoffs between the degree of

1130: compression they offer and the degree of error that their

1131: lossiness introduces are well understood.

1132:

1133: Bloom filters are used for verifying set membership.  The elements

1134: of a set of data items, in this case the (interface,destination)

1135: pairs of Doubletree's global stop set, are each hashed multiple

1136: times to a vector, the filter. Subsequently, set membership can be

1137: tested by examining the hash values that correspond to an

1138: $(\mathrm{interface},\mathrm{destination})$ pair. If a pair is in

1139: the set, the filter will always return true. However, there is a

1140: finite, well defined, probability of a false positive for a pair

1141: that is not in the set.

1142:

1143: As we have described Doubletree in prior sections, each monitor

1144: has full knowledge of what was discovered by the other monitors.

1145: Each application of the stop set rule is thus taken with the

1146: highest level of certainty. Now with the risk of false positives

1147: from Bloom filters, some forwards probing along the tree-like

1148: structure rooted at a destination will stop sooner than would

1149: otherwise be the case.  The rate of false positives can be fine

1150: tuned by adjusting such parameters as the size of vector employed,

1151: and the number of hash functions. For a given number of elements

1152: and a given vector size, for instance, an optimal number of hash

1153: functions can be chosen to minimize the probability of false

1154: positives.

1155:

1156: We perform the same experiments as in Sec.~\ref{algo.tension} in

1157: order to obtain a preliminary sense of the effect of Bloom filters

1158: on Doubletree.  The methodology followed is the same as in

1159: Sec.~\ref{algo.tension}, except for the stop set implementation.

1160: We experiment with a low false positive rate. Choosing a vector

1161: that contains ten times the number of bits as there are

1162: $(\mathrm{interface},\mathrm{destination})$ pairs in the global

1163: stop set, and using the optimal number of hash functions, five

1164: (Fan et al.~\cite[Sec. 4.3]{bloomMath} make the same choices),

1165: gives a false positive rate of 0.9\%.

1166:

1167: Since a pair of IPv4 addresses consists of 64 bits, the Bloom

1168: filter provides a 6.4:1 compression ratio.  This, of course, is a

1169: first approximation, because the pair information could be

1170: compressed using standard lossless compression techniques.

1171: Likewise, Mitzenmacher~\cite{compressedBloom} has described

1172: effective techniques for compression of Bloom filters; techniques

1173: that have the effect of lowering the false positive rate. We have

1174: yet to evaluate what the compression ratio would be if we were to

1175: compare compressed pair lists with compressed Bloom filters.

1176:

1177:   \begin{figure}[!tbp]

1178:     \begin{center}

1179:       \includegraphics[width=8cm]{Pictures/Algorithm/PercentageSeenWithBloom.eps}

1180:     \end{center}

1181:     \caption{Links and nodes coverage while using Bloom filters}

1182:     \label{bf.coverage}

1183:   \end{figure}

1184:

1185: Fig.~\ref{bf.coverage} presents the coverage in terms of links and

1186: nodes when using a Bloom filter with the parameters just

1187: described.  The level of coverage is runs from 99\% ($p=0$) to

1188: 99.5\% for nodes and from 97\% ($p=0$) to 98,6\% for links. The

1189: coverage values are more uniform across values of $p$ than they

1190: were for the standard global stop set.  This coverage is better

1191: than the results we describe in

1192: Sec.~\ref{algo.tension.methodology}, that do not use Bloom

1193: filters.

1194:

1195: This result is counterintuitive because we would expect that false

1196: positives would have the effect of constraining exploration rather

1197: than promoting it. It seems all the more surprising that coverage

1198: should increase at $p=1$, when the global stop set, or its Bloom

1199: filter replacement, is not even applied for backwards probing. To

1200: address this second question first, recall that when an interface

1201: at hop distance $h$ does not respond to probes, $h$ is halved and

1202: halved again until a responding interface is found. Both forwards

1203: and backwards probing then takes place from this new $h$.  In the

1204: data set that we consider, approximately 40\% of destinations do

1205: not respond. Consequently, when $p=1$, in fully 40\% of the cases

1206: Doubletree performs forwards probing, and can use the global stop

1207: set implemented as a Bloom filter.

1208:

1209: %coverage increasing is a surprise.

1210: Regarding the increase in coverage overall, we can only speculate

1211: as to the reasons.  It might be that simply the fact of

1212: introducing a degree of randomness into the exploration process

1213: enhances discovery.  Perhaps explorations that are blocked by a

1214: false positive leave the way open for further explorations because

1215: fewer pairs enter the stop set.  An assessment of the effect of

1216: introducing randomness into the stopping rule (both false

1217: positives and false negatives, independently of the use of Bloom

1218: filters for the stop set) is a subject for future work.

1219:

1220: These initial results are encouraging from the point of view of

1221: node and link coverage.  However, the price comes in the form of

1222: additional probe traffic. The reduction in the number of probes

1223: compared to the classical approach is only 8\% when $p=0$.

1224: Nevertheless, the number of probes sent decreases as $p$

1225: increases. It oscillates between 8\% and 59\%.  The gross

1226: redundancy on internal interfaces is also higher when $p=0$ (398)

1227: but it also decreases when $p$ increases. One concern is that the

1228: gains are reversed regarding inter-monitor redundancy on

1229: destinations.  The \nth{95} percentile redundancy is at the

1230: maximum (i.e., 24) for each value of $p$.  A full exploration of

1231: the trade-offs involved in the use of Bloom filters is a subject

1232: for future work.

1233:

1234: \section{Conclusion}\label{conclusion}

1235:

1236: In this paper, we quantify the amount of redundancy in classical

1237: internet topology discovery approaches by taking into account the

1238: perspective from the single monitor (intra-monitor) and that of

1239: the entire system (inter-monitor).  In the intra-monitor case, we

1240: find that interfaces close to the monitor suffer from a high

1241: number of repeat visits. Concerning inter-monitor redundancy, we

1242: see that a large portion of interfaces are visited by all

1243: monitors.

1244:

1245: In order to scale up classical approaches, we have proposed

1246: Doubletree, an algorithm that significantly reduces the

1247: duplication of effort while discovering nearly the same set of

1248: nodes and links. Doubletree simultaneously meets the conflicting

1249: demands of reducing intra- and inter-monitor redundancy. We

1250: describe how to tune a single parameter for Doubletree in order to

1251: obtain an acceptable trade-off between redundancy and coverage.

1252:

1253: Doubletree introduces communication between monitors.  To address

1254: the problem of bandwidth consumption, we propose to encode this

1255: communication through the use of Bloom filters. Surprisingly, we

1256: find that this encoding technique, though it generates false

1257: positives that might seem to constrain exploration, can actually

1258: increase the coverage of nodes and links.

1259:

1260: For future work, we plan to study in detail the trade-offs

1261: involved in the use Bloom filters. How does the choice of vector

1262: size and number of hash functions affect levels of redundancy and

1263: coverage?  We will also evaluate the relevance of introducing a

1264: certain level of false negatives to the stop set.

1265:

1266: A probing technique that starts probing at a hop $h$ far from the

1267: monitor has a non zero probability $p$ of hitting a destination

1268: with its first probe. This has serious consequences when scaling

1269: up the number of monitors.  Indeed, the average impact on

1270: destinations will grow linearly as a function of $m$, the number

1271: of monitors.  As $m$ increases, the risk that probing will appear

1272: to be a DDoS attack will grow.

1273:

1274: In order to permit greater scaling, we have started to investigate

1275: techniques for dividing up the monitor set and the destination set

1276: into subsets that we call \dfn{clusters}. By placing an upper

1277: bound on the number of monitors in a cluster, we hope to place a

1278: definitive upper bound on inter-monitor redundancy for destination

1279: interfaces.  Clustering will have effects on redundancy and

1280: coverage, and we are investigating these trade-offs.

1281:

1282: \section*{Acknowledgments}

1283:

1284: Without the data provided by k claffy and her team at

1285: \textsc{Caida}, this research would not have been possible.  They

1286: have also provided many helpful comments; for this, we thank Andre

1287: Broido in particular.  Pierre Lafon and his team at the Centre de

1288: Calcul Formel M�dicis, Laboratoire Stix, Ecole Polytechnique,

1289: kindly gave us access to their computing cluster, allowing faster

1290: and easier simulations.  We thank our partners in the

1291: traceroute@home project, notably Matthieu Latapy, Alessandro

1292: Vespignani, and Alain Barrat, for their support and feedback.

1293:

1294: \bibliographystyle{IEEE}

1295: \bibliography{Bibliography}

1296:

1297: \clearpage

1298:

1299: \appendix

1300:

1301: \onecolumn

1302:

1303: \section{Intra-monitor redundancy plots}\label{appendix.redundancy}

1304:

1305: This appendix presents the intra-monitor redundancy plots for all

1306: 24 skitter monitors that form the basis for the study in this

1307: paper.  For each monitor, we show the redundancy of a classic

1308: topology discovery system.  For 18 of the monitors, we also show

1309: the result of a system applying the Doubletree algorithm with

1310: parameter $p=0.05$.  In each of these cases, the monitor is the

1311: last of the 24 to conduct its probing, using the global stop set

1312: that has been passed to it by the other monitors.

1313:

1314: \begin{figure*}[htbp]

1315:   \begin{center}

1316:     \subfigure[\texttt{apan-jp} classic]{\label{appendix.redundancy.intra.apan.skitter}

1317:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/apan-jp.intra.Quantile.eps}}

1318:     \subfigure[\texttt{cam} classic]{\label{appendix.redundancy.intra.cam.classic}

1319:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/cam.intra.Quantile.eps}}

1320:     \subfigure[\texttt{h-root} classic]{\label{appendix.redundancy.intra.hroot.classic}

1321:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/h-root.intra.Quantile.eps}}

1322:     \subfigure[\texttt{i-root} classic]{\label{appendix.redundancy.intra.iroot.classic}

1323:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/i-root.intra.Quantile.eps}}

1324:     \subfigure[\texttt{k-root} classic]{\label{appendix.redundancy.intra.kroot.classic}

1325:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/k-root.intra.Quantile.eps}}

1326:     \subfigure[\texttt{uoregon} classic]{\label{appendix.redundancy.intra.uoregon.classic}

1327:         \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/uoregon.intra.Quantile.eps}}

1328:   \end{center}

1329:   \caption{\texttt{apan-jp}, \texttt{cam}, \texttt{h-root}, \texttt{i-root}, \texttt{k-root}, and \texttt{uoregon}}

1330: \end{figure*}

1331:

1332: \clearpage

1333:

1334: \begin{figure*}[htbp]

1335:   \begin{center}

1336:     \mbox{

1337:         \subfigure[\texttt{arin} classic]{\label{appendix.redundancy.intra.arin.classic}

1338:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/arin.intra.Quantile.eps}}

1339:             \qquad

1340:         \subfigure[\texttt{arin} Doubletree ($p=0.05$)]{\label{appendix.redundancy.intra.arin.doubletree.p005}

1341:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.arin9.eps}}

1342:     }

1343:     \mbox{

1344:         \subfigure[\texttt{b-root} classic]{\label{appendix.redundancy.intra.broot.classic}

1345:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/b-root.intra.Quantile.eps}}

1346:             \qquad

1347:         \subfigure[\texttt{b-root} Doubletree ($p=0.05$)]{\label{appendix.redundancy.intra.broot.doubletree.p005}

1348:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.b-root7.eps}}

1349:     }

1350:     \mbox{

1351:         \subfigure[\texttt{cdg-rssac} classic]{\label{appendix.redundancy.intra.cdg.classic}

1352:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/cdg-rssac.intra.Quantile.eps}}

1353:             \qquad

1354:         \subfigure[\texttt{cdg-rssac} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.cdg.doubletree.p005}

1355:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.cdg-rssac11.eps}}

1356:     }

1357:   \end{center}

1358:   \caption{\texttt{arin}, \texttt{b-root}, and \texttt{cdg-rssac}}

1359: \end{figure*}

1360:

1361: \clearpage

1362:

1363: \begin{figure*}[htbp]

1364:   \begin{center}

1365:     \mbox{

1366:         \subfigure[\texttt{champagne} classic]{\label{appendix.redundancy.intra.champ.classic}

1367:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/champagne.intra.Quantile.eps}}

1368:             \qquad

1369:         \subfigure[\texttt{champagne} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.champ.doubletree.p005}

1370:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.champagne14.eps}}

1371:     }

1372:     \mbox{

1373:         \subfigure[\texttt{d-root} classic]{\label{appendix.redundancy.intra.droot.classic}

1374:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/d-root.intra.Quantile.eps}}

1375:             \qquad

1376:         \subfigure[\texttt{d-root} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.droot.doubletree.p005}

1377:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.d-root31.eps}}

1378:     }

1379:     \mbox{

1380:         \subfigure[\texttt{e-root} classic]{\label{appendix.redundancy.intra.eroot.classic}

1381:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/e-root.intra.Quantile.eps}}

1382:             \qquad

1383:         \subfigure[\texttt{e-root} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.eroot.doubletree.p005}

1384:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.e-root10.eps}}

1385:     }

1386:   \end{center}

1387:   \caption{\texttt{champagne}, \texttt{d-root}, and \texttt{e-root}}

1388: \end{figure*}

1389:

1390: \clearpage

1391:

1392: \begin{figure*}[htbp]

1393:   \begin{center}

1394:     \mbox{

1395:         \subfigure[\texttt{f-root} classic]{\label{appendix.redundancy.intra.froot.classic}

1396:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/f-root.intra.Quantile.eps}}

1397:             \qquad

1398:         \subfigure[\texttt{f-root} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.froot.classic.doubletree.p005}

1399:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.f-root18.eps}}

1400:     }

1401:     \mbox{

1402:         \subfigure[\texttt{g-root} classic]{\label{appendix.redundancy.intra.groot.classic}

1403:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/g-root.intra.Quantile.eps}}

1404:             \qquad

1405:         \subfigure[\texttt{g-root} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.groot.doubletree.p005}

1406:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.g-root5.eps}}

1407:     }

1408:     \mbox{

1409:         \subfigure[\texttt{iad} classic]{\label{appendix.redundancy.intra.iad.classic}

1410:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/iad.intra.Quantile.eps}}

1411:             \qquad

1412:         \subfigure[\texttt{iad} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.iad.doubletree.p005}

1413:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.iad33.eps}}

1414:     }

1415:   \end{center}

1416:   \caption{\texttt{f-root}, \texttt{g-root}, and \texttt{iad}}

1417: \end{figure*}

1418:

1419: \clearpage

1420:

1421: \begin{figure*}[htbp]

1422:   \begin{center}

1423:     \mbox{

1424:     \subfigure[\texttt{ihug} classic]{\label{appendix.redundancy.intra.ihug.classic}

1425:       \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/ihug.intra.Quantile.eps}}

1426:       \qquad

1427:     \subfigure[\texttt{ihug} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.ihug.doubletree.p005}

1428:       \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.ihug16.eps}}

1429:     }

1430:     \mbox{

1431:     \subfigure[\texttt{k-peer} classic]{\label{appendix.redundancy.intra.kpeer.classic}

1432:       \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/k-peer.intra.Quantile.eps}}

1433:       \qquad

1434:     \subfigure[\texttt{k-peer} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.kpeer.doubletree.p005}

1435:       \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.k-peer1.eps}}

1436:     }

1437:     \mbox{

1438:     \subfigure[\texttt{lhr} classic]{\label{appendix.redundancy.intra.lhr.classic}

1439:       \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/lhr.intra.Quantile.eps}}

1440:       \qquad

1441:     \subfigure[\texttt{lhr} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.lhr.doubletree.p005}

1442:       \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.lhr6.eps}}

1443:     }

1444:   \end{center}

1445:   \caption{\texttt{ihug}, \texttt{k-peer}, and \texttt{lhr}}

1446: \end{figure*}

1447:

1448: \clearpage

1449:

1450: \begin{figure*}[htbp]

1451:   \begin{center}

1452:     \mbox{

1453:         \subfigure[\texttt{m-root} classic]{\label{appendix.redundancy.intra.mroot.classic}

1454:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/m-root.intra.Quantile.eps}}

1455:         \qquad

1456:         \subfigure[\texttt{m-root} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.mroot.doubletree.p005}

1457:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.m-root21.eps}}

1458:     }

1459:     \mbox{

1460:         \subfigure[\texttt{mwest} classic]{\label{appendix.redundancy.intra.mwest.classic}

1461:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/mwest.intra.Quantile.eps}}

1462:         \qquad

1463:         \subfigure[\texttt{mwest} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.mwest.doubletree.p005}

1464:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.mwest3.eps}}

1465:     }

1466:     \mbox {

1467:         \subfigure[\texttt{nrt} classic]{\label{appendix.redundancy.intra.nrt.classic}

1468:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/nrt.intra.Quantile.eps}}

1469:         \qquad

1470:         \subfigure[\texttt{nrt} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.nrt.doubletree.p005}

1471:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.nrt4.eps}}

1472:     }

1473:   \end{center}

1474:   \caption{\texttt{m-root}, \texttt{mwest}, and \texttt{nrt}}

1475: \end{figure*}

1476:

1477: \clearpage

1478:

1479: \begin{figure*}[htbp]

1480:   \begin{center}

1481:     \mbox{

1482:         \subfigure[\texttt{riesling} classic]{\label{appendix.redundancy.intra.riesling.classic}

1483:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/riesling.intra.Quantile.eps}}

1484:             \qquad

1485:         \subfigure[\texttt{riesling} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.riesling.doubletree.p005}

1486:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.riesling17.eps}}

1487:     }

1488:     \mbox{

1489:         \subfigure[\texttt{sjc} classic]{\label{appendix.redundancy.intra.sjc.classic}

1490:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/sjc.intra.Quantile.eps}}

1491:             \qquad

1492:         \subfigure[\texttt{sjc} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.sjc.doubletree.p005}

1493:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.sjc2.eps}}

1494:     }

1495:     \mbox{

1496:         \subfigure[\texttt{yto} classic]{\label{appendix.redundancy.intra.yto.classic}

1497:             \includegraphics[width=5cm]{Pictures/Appendix/Redundancy/yto.intra.Quantile.eps}}

1498:             \qquad

1499:         \subfigure[\texttt{yto} Doubletree ($p=0.05$)]{\label{appendix.algo.intra.yto.doubletree.p005}

1500:             \includegraphics[width=5cm]{Pictures/Appendix/Algorithm/EfficientIntraSequence.yto26.eps}}

1501:     }

1502:     \caption{\texttt{riesling}, \texttt{sjc}, and \texttt{yto}}

1503:   \end{center}

1504: \end{figure*}

1505:

1506: \end{document}

1507: