0511:cs0511053/body.tex

1:

2: \section{Introduction}

3: % The very first letter is a 2 line initial drop letter followed

4: % by the rest of the first word in caps.

5: %

6: % form to use if the first word consists of a single letter:

7: % \PARstart{A}{demo} file is ....

8: %

9: % form to use if you need the single drop letter followed by

10: % normal text (unknown if ever used by IEEE):

11: % \PARstart{A}{}demo file is ....

12: %

13: % Some journals put the first two words in caps:

14: % \PARstart{T}{his demo} file is ....

15: %

16: % Here we have the typical use of a "T" for an initial drop letter

17: % and "HIS" in caps to complete the first word.

18: % You must have at least 2 lines in the paragraph with the drop letter

19: % (should never be an issue)

20:

21: % needed in second column of first page if using \pubid

22: %\pubidadjcol

23:

24: % Reminder: the "draftcls" or "draftclsnofoot", not "draft", class option

25: % should be used if it is desired that the figures are to be displayed while

26: % in draft mode.

27:

28: % An example of a floating figure using the graphicx package.

29: % Note that \label must occur AFTER (or within) \caption.

30: % For figures, \caption should occur after the \includegraphics.

31: %

32: %\begin{figure}

33: %\centering

34: %\includegraphics[width=2.5in]{myfigure}

35: % where an .eps filename suffix will be assumed under latex,

36: % and a .pdf suffix will be assumed for pdflatex

37: %\caption{Simulation Results}

38: %\label{fig_sim}

39: %\end{figure}

40:

41:

42: % An example of a double column floating figure using two subfigures.

43: % (The subfigure.sty package must be loaded for this to work.)

44: % The subfigure \label commands are set within each subfigure command, the

45: % \label for the overall fgure must come after \caption.

46: % \hfil must be used as a separator to get equal spacing

47: %

48: %\begin{figure*}

49: %\centerline{\subfigure[Case I]{\includegraphics[width=2.5in]{subfigcase1}

50: % where an .eps filename suffix will be assumed under latex,

51: % and a .pdf suffix will be assumed for pdflatex

52: %\label{fig_first_case}}

53: %\hfil

54: %\subfigure[Case II]{\includegraphics[width=2.5in]{subfigcase2}

55: % where an .eps filename suffix will be assumed under latex,

56: % and a .pdf suffix will be assumed for pdflatex

57: %\label{fig_second_case}}}

58: %\caption{Simulation results}

59: %\label{fig_sim}

60: %\end{figure*}

61:

62:

63:

64: % An example of a floating table. Note that, for IEEE style tables, the

65: % \caption command should come BEFORE the table. Table text will default to

66: % \footnotesize as IEEE normally uses this smaller font for tables.

67: % The \label must come after \caption as always.

68: %

69: %\begin{table}

70: %% increase table row spacing, adjust to taste

71: %\renewcommand{\arraystretch}{1.3}

72: %\caption{An Example of a Table}

73: %\label{table_example}

74: %\centering

75: %% Some packages, such as MDW tools, offer better commands for making tables

76: %% than the plain LaTeX2e tabular which is used here.

77: %\begin{tabular}{|c||c|}

78: %\hline

79: %One & Two\\

80: %\hline

81: %Three & Four\\

82: %\hline

83: %\end{tabular}

84: %\end{table}

85:

86: %\PARstart{T}{he} effectiveness of a routing protocol directly impacts both the

87: %end-to-end throughput and end-to-end delay experienced by a network.  Current

88: %network routing protocols are primarily concerned with deriving shortest-cost

89: %routes between a source and destination, i.e. they are tailored towards

90: %single-path routing.  Recently, there has been an increased emphasis on

91: %multi-path routing~\cite{chen98}~\cite{chen99}, in which routers maintain

92: %multiple distinct paths of arbitrary costs between a source and destination.

93: %Considering the increasing use of ad-hoc and sensor networks, the need for

94: %the ability to distribute data traffic across multiple paths and quickly

95: %adapt to dynamic network conditions shows that there are many potential

96: %applications of multi-path routing.

97:

98: %Multi-path routing presents several advantages over single-path routing. First,

99: %a multi-path routing protocol is capable of meeting multiple performance

100: %objectives: maximizing throughput, minimizing delay, bounding delay variation,

101: %and minimizing packet loss. Second, from a scalability perspective, it makes

102: %effective use of the graph structure of a network (as opposed to single-path

103: %routing which superimposes a logical routing tree upon the network topology).

104: %Third, multi-path routing protocols are more tolerant of network failures, as

105: %they are able to quickly sense failures and immediately correct for the failure

106: %by routing traffic only along the several other functioning paths which the protocol

107: %maintains.  Finally, multi-path routing algorithms are less susceptible to route

108: %oscillations, which enables the use of high-variance cost metrics that are

109: %better congestion indicators than current single-path routing algorithms, which

110: %face route oscillations due to switching routes as a step function.

111:

112: %While multi-path routing is a desirable goal, the current Internet routing

113: %framework cannot be easily extended to support it. One solution is to develop a

114: %new multi-path routing framework, which necessitates changes to the Internet's

115: %networking protocol (IP). While it would allow us far more freedom to design a

116: %multi-path routing protocol with a clean-slate, the Internet Protocol is a globally deployed

117: %fixture on the internet, and it would be entirely impractical to require a

118: %global deployment for a new routing protocol to be functional. Therefore, the

119: %approach presented in this paper is to study multipath routing within the

120: %confines imposed by the current Internet Protocol.  This restriction leads to unique

121: %design decisions, while still providing optimal performance compared to the

122: %current single-path routing protocols in use on the internet.

123:

124: %OLD--

125: % Routing protocols construct tables at each node that specify the next hop to use

126: % for data packet forwarding for each destination.  A minimal

127: % requirement is that the computed routing tables be free of loops when the

128: % network is stable. In dynamic environments, a more stringent requirement is that

129: % the routing tables be loop-free not only when the network is stable but at every

130: % instant, since loops, even temporary ones, can rapidly degrade performance.

131: %--OLD

132:

133: %Multi-path routing can be qualified by the state maintained at each router.

134: %For instance, a routing algorithm can maintain

135: %multiple, distinct, shortest-cost routing tables, where each routing table is

136: %based on a different cost metric.  This is referred to as a multi-metric

137: %multi-path routing approach.  Alternatively, another approach is to allow

138: %multiple network paths between a source and destination based on a single cost

139: %metric.  This means that routers may use sub-optimal paths, but the routing

140: %sends data on multiple paths to maximize network throughput.  This approach is

141: %referred to as single-metric multi-path routing.

142:

143: %Multi-path routing algorithms can also be distinguished by their routing

144: %granularity.  Coarse-grained, connection-oriented, approaches adopt a

145: %path-per-connection view wherein all packets belonging to the connection follow

146: %the same path.  However, different connections between the source and

147: %destination hosts may follow different paths.  In contrast, fine-grained,

148: %connectionless approaches have no mechanism to associate packets to any

149: %higher-level notion of connection.  For true multi-path forwarding, the routing

150: %algorithm should forward packets between a source-destination pair along

151: %multiple paths, some of which may not necessarily be the shortest-cost paths.

152: %However, in the sense of maximizing overall network performance, fine-grained

153: %multi-path routing algorithms within a single-metric domain offer the most

154: %promise and will be the focus of the remainder of this paper.

155:

156: %Our method of achieving multi-path routing is to extend single-path routing

157: %protocols. This extension is non-trivial for two reasons. First, we need

158: %mechanisms to incorporate state corresponding to multiple paths into the routing

159: %table. More importantly, we need new loop avoidance algorithms: current

160: %shortest-path routing algorithms use their optimality metric to implicitly

161: %eliminate loops. This assumption is untenable for multi-path routing in a

162: %single-metric domain. Resolving these issues typically requires routers to

163: %maintain routing state proportional to the number of paths in the network, which

164: %is impractical.

165:

166: %We approach multi-path routing from the terminal perspective of {\em reachability

167: %routing}~\cite{srinidhi03}, with the goal being to determine all paths between a

168: %sender and a receiver, without the above mentioned state or consistency

169: %maintenance overhead.  While basic reachability routing is primarily concerned

170: %with determining multiple paths through the network, practical implementations

171: %are also interested in determining the relative quality of these paths, a form

172: %we call cost-dependent reachability routing.

173:

174: %In this paper, we propose that reachability routing can be achieved by

175: %exploiting the underlying semantics of probabilistic routing algorithms and

176: %present the case for reinforcement learning (RL) as the framework of choice to

177: %realize reachability routing. In particular, by employing the probabilistic

178: %nature of RL algorithms, we can guarantee that the likelihood of a packet

179: %getting trapped in a loop is zero, although there is a non-zero probability of

180: %entering a loop.  To completely model routing as a RL problem, we need

181: %trategies for gathering information about the environment, deriving routing

182: %tables by credit assignment, and building models of relevant aspects of the

183: %environment to enable routers to progressively improve their routing decisions.

184:

185: %pasupath@vt.edu, Laurie told about you

186:

187: % \PARstart{T}{he}

188: The next generation of network technologies such as sensor

189: networks, peer-to-peer networks, ad-hoc wireless networks, and overlay networks

190: present challenges that have previously not been witnessed in the Internet

191: infrastructure.  These networks operate on large topologies which are highly

192: dynamic in terms of changes in cost and connectivity. In these contexts,

193: single-path routing protocols, the mainstay on current network topologies,

194: suffer from either route flap or temporary loss of connectivity when the primary

195: path fails. In addition, these protocols do not make effective use of the graph

196: connectivity between a sender and receiver in order to improve performance.

197: Effectively, addressing these unique requirements demands routing protocols

198: that can address a number of novel performance metrics.

199: %<need segway line> How did we get here?

200:

201: % Some historical perspective is in order.

202: %(as an aside, pure source routing was tried in the early Internet and, while

203: %still prevalent in some data center networks, is not scalable for graphs of

204: %arbitrary diameter).

205: Historically, routing algorithms evolved from networks where the only

206: parameters available for making routing decisions were source and destination

207: addresses.

208: \footnote{While source routing is still prevalent in some data center networks, and was

209: present in the early Internet, it is not scalable for graphs of arbitrary

210: diameter.} These parameters by themselves do not

211: have sufficient discriminative capability to avoid loops. Hence optimality

212: criteria were added to the routing formulation to eliminate loops leading to

213: single-path routing, which no longer meets the needs of

214: the next generation of network technologies.

215:

216: %Instead of loop elimination, if

217: %loop avoidance can be achieved (i.e., we will get out loops eventually) it gives us

218: %

219: %greater flexibility in laying out optimization constraints. For this reason, we

220: %have chosen to sacrifice loop elimination in favor of loop avoidance.

221: %

222: %New optimization criteria include all-paths and cost-sensitive routing.

223: %\PARstart{N}{ew} directions in network routing research have not kept pace

224: %with the latest developments in network architecture, such as sensor networks, ad-hoc

225: %wireless networks, and overlay networks.  These new paradigms present us

226: %with unique performance requirements and demand routing protocols that can

227: %address a number of novel performance metrics.

228: %

229: %The purpose of a routing protocol is to optimize a set of performance

230: %criteria through the solving of dynamic programming formulations in a distributed

231: %environment.  In short, when routing traffic from a source node to a destination

232: %node, the routing protocol at an intermediate node routes the traffic along the

233: %'best' path from the source to the destination  However, how the 'best' path is

234: %determined is left up to the routing protocol implementation.

235: %

236: %A common characteristic among all of these new network technologies is the presence

237: %of highly dynamic network topologies, much more so than traditional networks.

238: %For example, in ad-hoc wireless networks where typical users serve as routers, a

239: %user turning his computer off has the same effect as shutting down a router on a

240: %network. If the network were to use single-path routing, the disconnection of

241: %the router would partition the network.  This drives the need for routing protocols

242: %that can cope with this dynamism, or even take advantage of it.

243: %

244: %A trivial solution to this problem would be 'hot potato' routing where traffic

245: %is shifted to another randomly chosen router, with no guarantees on performance,

246: %which is obviously unacceptable. Another solution would be

247:

248: To address the needs of emerging network domains, in this paper we attempt to

249: build a routing protocol with the following characteristics.  First, the routing

250: protocol should be capable of converging to a solution even in highly dynamic

251: environments merely with local information i.e., the protocol does not require

252: any global knowledge of the topology.  Second, to maximize the bandwidth (and

253: connectivity) between any pair of nodes, the routing protocol should route along

254: multiple paths between them.  Third, the routing protocol should route as

255: efficiently as possible by selecting routes in inverse proportion to their

256: expected path cost.  Fourth, the protocol should avoid loops as much as

257: possible and guarantee not to get stuck in loops -- the emphasis is on loop

258: avoidance rather than loop elimination.  Finally, to be

259: of maximum practical value, the protocol should work within the confines imposed

260: by the Internet Protocol (IP) specification, including its header fields which only permit a

261: source and destination.  As mentioned before, source routing has been tried in IP

262: networks, but was discarded due to security issues, the lack of space in the IP

263: header to support full source routing for all nodes, as well as its lack of

264: scalability in large networks.

265:

266: Note that these requirements place conflicting demands on routing protocol

267: design.  Different algorithms make differing trade-offs in this multi-constraint

268: space.  For instance, distance vector and link state algorithms achieve loop

269: elimination but are restricted to optimality-based single path routing.  MOSPF

270: achieves loop-free multi-path routing, only in the restricted case of paths with

271: identical costs. Hot potato routing achieves true multi-path routing but pays no

272: attention to either loops or the `quality' of its paths. The

273: MPATH~\cite{mpath} algorithm and several of its variants achieve cost-sensitive

274: loop-free multi-path routing, at the expense of routing table storage overhead

275: proportional to the number of paths (which can be combinatorial). The

276: theoretically best, although practically naive, solution would be

277: all-sources, all-paths routing.  This achieves the goal of correctness, however

278: building and maintaining a complete and correct table of the entire network

279: would be impractical for networks of any non-trivial size.

280:

281: % (because the Internet infrastructure is too set in its ways).

282: While it is still true that source and destination are the only parameters

283: available for routing on the Internet infrastructure, there is a degree of

284: freedom thus far unexplored by routing algorithms.  Single-path {\em

285: deterministic} routing algorithms are driven by a need to achieve loop

286: elimination at any cost due to the disastrous effects of routing loops in such

287: algorithms.  However, for a {\em probabilistic} routing algorithm, this does not

288: necessarily have to be the case.  Therefore, if we relax the requirement for

289: loop elimination and instead seek to achieve loop avoidance by guaranteeing to

290: exit loops once they are entered, we are given greater flexibility in laying out

291: optimization constraints. For this reason, we have chosen to take a

292: probabilistic approach and to sacrifice loop elimination in favor of loop

293: avoidance.

294:

295: The undue emphasis on optimality thus far has created algorithms that aggressively

296: eliminate loops. This has led to implementations that are intolerant of loops.

297: On the other hand, the ability to tolerate loops opens up new exploration strategies

298: for true cost-sensitive multi-path routing that work under the constraints

299: presented above. We therefore begin with the terminal perspective of {\em reachability

300: routing}, where the goal is merely to reach a destination. Hot potato routing can be

301: viewed as a limiting example of reachability routing but we clearly want to do

302: better. From this perspective, we are in the unique position of being able to explore

303: the trade-off between eliminating loops and improving efficiency of selecting paths.

304:

305: Our specific formulation of reachability routing is probabilistic, multi-path, and

306: cost-sensitive by efficiently distributing traffic among all paths leading to a

307: destination.  This type of routing can be viewed as solving an optimization

308: problem which maximizes the number of paths between two nodes by discovering all

309: the paths, and then derives the probability to route on a given path by

310: assessing the path costs leading to the destination.

311:

312: In particular, we study reachability routing through the lens of reinforcement

313: learning, which provides a mathematical framework for describing and solving

314: sequential Markov decision problems (MDPs). The states are the nodes, the

315: actions are the choice of outgoing links, and rewards correspond to

316: path costs associated with the state transitions. A value function imposed on

317: the MDP (e.g., discounted sum of rewards along a path) essentially leads to

318: an optimization problem, whose solution is a policy for routing. Intrinsically,

319: this is what all routing algorithms based on dynamic programming do. However,

320: single-path routing algorithms learn the best deterministic policy that

321: solves the MDP. In this paper, the routing algorithm learns stochastic policies

322: that achieve cost-sensitive multi-path routing.

323:

324: %%Additionally, the process of routing is not only a sequential decision making

325: %process, but can be considered to be Markovian, as the decisions a router

326: %makes on a packet are only a factor of the data contained within the packet,

327: %such as destination and arrival port, rather than the packet's previous history.

328: %

329: %Also, due

330: %to the highly dynamic nature of routing, routing can be considered Markovian,

331: %because previous state presents an outdated view of the network, and therefore

332: %is not much use to the decision being made.

333: %

334: %In the case of routing protocols, what is 'learned' is a policy that optimizes

335: %a value function indicating the type of routing protocol being described.

336: %As such, traditional shortest path routing can be viewed as a solution to one

337: %type of reinforcement learning problem with a particular value function.

338: %Alternatively, reachability routing is the result of solving for another value

339: %function.

340:

341: Our previous work~\cite{srinidhi03} has indicated that such an approach achieves

342: true multi-path routing, with traffic distributed among the multiple paths in

343: inverse proportion to their costs.  In addition, in order for our reachability

344: routing protocol to be of practical use, we are guiding our design decisions by

345: the requirement that the protocol work within the confines imposed by the

346: currently deployed Internet Protocol (IP) architecture.

347:

348: While multi-path routing is not new, we believe that our notion of reachability

349: routing represents a promising new direction in the field.  Applying

350: reinforcement learning in this way is a powerful tool enabling reachability

351: routing to optimize overall network throughput, while at the same time

352: providing built-in fault tolerance and path redundancy.  Additional

353: applications of reinforcement learning within this domain hold the potential to

354: further optimize routing behavior by adaptively refining the performance

355: parameters of the algorithm in response to changes in the network topology.

356:

357: The remainder of this paper is organized as follows: Section II provides an overview

358: of reinforcement learning, its applicability to network routing, and significant

359: previous work done on the topic. In Section III we introduce a new model-based

360: routing algorithm based on RL and describe its implementation in Section IV.

361: Section V presents evaluation results and Section VI concludes with a summary of

362: our contributions and directions for future research in the area.

363:

364: \section{Ants and Reinforcement Learning}

365: Reinforcement learning~\cite{littman96}~\cite{sutton98} is the process of an agent learning to

366: behave optimally, over time, as a result of trial-and-error interacting within a dynamic

367: environment. Reinforcement learning problems are organized in terms of

368: discrete episodes, which, for the purposes of packet routing, consist

369: of a packet finding its way from an originating source to its intended destination.

370: Routing table probabilities are initialized to small random values, thus enabling

371: them to begin routing immediately except that most of the routing decisions will

372: not be optimal or even desirable. To improve the quality of the routing decision,

373: a router can `try out' different links to see if they produce good routes, a mode of

374: operation called {\em exploration}. Information learned during exploration can

375: be used to drive future routing decisions. Such a mode is called {\em

376: exploitation}. Both exploration and exploitation are necessary for effective

377: routing.

378:

379: Our RL routing algorithm is a form of ant-colony optimization~\cite{dorigo99}, in which messages

380: called {\em ants} are used to explore the network and provide reinforcements for

381: future packet routing. The ants transiting the network provide intermediate

382: routers with a sense of the reachability and relative cost of reaching the node

383: which the ant originated from.  In order to overcome the problems of selective path

384: reinforcement, which deterministically converge to shortest paths, our model

385: separates the data collection aspects of the algorithm from the packet routing

386: functionality, as was proposed by Subramanian~et~al.~\cite{subramanian97}.  In

387: our model the ants only perform the role of gathering information about the network,

388: which is then used to guide packet routing decisions.

389:

390: Three parameters must be considered when applying ants in a routing framework:

391: the rate of generation of ants, the choice of their destinations, and the

392: routing policy used for ants.  RL algorithms perform iterative stochastic

393: approximations of an optimal solution, so the rate of ant generation directly

394: affects their convergence properties, shown by Di~Caro~et~al. in AntNet~\cite{dicaro98}.  From a practical

395: perspective in multi-path routing, we would like to choose destinations for the

396: ants that will provide the most useful reinforcement updates; hence a uniform

397: distribution policy assures good exploration. Finally, the policy used to route

398: ants affects the paths that are selectively reinforced by the RL algorithm. As

399: our goal is to discover all possible paths, the policy used to route ants should

400: be independent of that of the data traffic. If we do not separate the policies,

401: then we would end up with the same problem of selective reinforcement as found

402: in the Q-routing~\cite{subramanian97} algorithm.

403:

404: In the context of reinforcement learning using ants, effective credit assignment

405: strategies rely on the expressiveness of the information carried by the ants.

406: The central idea behind credit assignment is to determine the relative quality

407: of a route and apportioning blame. In the case of routing, credit assignment

408: creates a push-pull effect. Since the link probabilities have to sum to one,

409: positively reinforcing a link (push) results in negative reinforcements (pull)

410: for other links.

411:

412: In the simplest form of credit assignment, called backward learning, ants carry

413: information about the ingress router and path cost as determined by the

414: network's cost metrics. At the destination, this information can be used to

415: derive reinforcement for the link along which the ant

416: arrived~\cite{subramanian97}. Another strategy, known as forward learning, is to

417: reinforce the link in the forward direction by sending an ant to a destination

418: and bouncing it back to the source~\cite{dicaro98}. Subramanian et

419: al.~\cite{subramanian97} adapt the former approach. Ants proceed from randomly

420: chosen sources to destinations independent of the data traffic.  Each ant

421: contains the source where it was released, its intended destination, and the

422: cost $c$ experienced thus far. Upon receiving an ant, a router updates its

423: probability to the ant source (not the destination), along the interface by

424: which the ant arrived.  This is a form of backward learning and is a trick to

425: minimize ant traffic.

426:

427: Specifically, when an ant from source $s$ to destination $d$ arrives along

428: interface $i_k$ to router $r$, $r$ first updates $c$ (the cost accumulated by the

429: ant thus far) to include the cost of traveling interface $i_k$ in reverse. $r$

430: then updates its entry for $s$ by slightly nudging the probability up for

431: interface $i_k$ (and correspondingly decreasing the probabilities for other

432: interfaces). The amount of the nudge is a function of the cost $c$ accumulated

433: by the ant. It then routes the ant to its desired destination $d$. In

434: particular, the probability $p_k$ for interface $i_k$ is updated as:

435: \[

436: p_k = \frac{p_k + \Delta p}{1 + \Delta p},

437: p_j = \frac{p_j}{1 + \Delta p},

438: \]

439: \[

440: 1 \le j \le n, j \ne k

441: \]

442: where $\Delta p = \frac{\lambda}{f(c)}, \lambda > 0$ and $f(c)$ is a

443: non-decreasing function of $c$.

444:

445: Two types of ants, {\em regular ants} and {\em uniform ants}, are supported to

446: handle the routing aspect of the algorithm. Regular ants are forwarded

447: probabilistically according to the routing tables, which ensure that the routing

448: tables converge deterministically to the shortest paths in the network. Regular

449: ants treat the probabilities in the routing tables as merely an intermediate

450: stage towards learning a deterministic routing table. They are good exploiters

451: and are beneficial for convergence in static environments. With uniform ants,

452: the ant forwarding probability follows a uniform distribution, wherein all links

453: have equal probability of being chosen. This ensures a continued mode of

454: exploration and helps keep track of dynamic environments. In such a case, the

455: routing tables do not converge to a deterministic answer; rather, the

456: probabilities are partitioned according to the costs. The constant state of

457: exploration maintained by the uniform ants ensures a true multi-path forwarding

458: capability.

459:

460: \section{Motivation}

461: Our primary design objective is to achieve cost-sensitive multi-path forwarding,

462: while at the same time eliminating the entry of loops as much as possible. We

463: have made a series of improvements to the uniform ants algorithm proposed by

464: Subramanian~et~al.~\cite{subramanian97}, culminating in a novel model-based

465: routing algorithm.

466:

467: \begin{figure*}

468: \centering

469: \includegraphics[scale=0.6]{velcro1}

470: \vline

471: \includegraphics[scale=0.6]{velcro2}

472: \vline

473: \includegraphics[scale=0.6]{velcro4}

474: \caption{Velcro topologies with different cost ratios.}

475: \label{velcro_topo}

476: \end{figure*}

477:

478: Let us begin by observing that uniform ants are natural multi-path routers;

479: according to Proposition 2 in Subramanian~et~al.~\cite{subramanian97}, the

480: probability of choosing an interface is aligned in inverse proportion to cost

481: ratios. The reader might be tempted to conclude that uniform ants inherently support

482: reachability routing; however consider the three velcro topologies of

483: Figure~\ref{velcro_topo}.  These topologies have the same underlying graph

484: structure but differ in the costs associated with the main branch paths (the

485: direct path from 0 to 19, and the path through nodes 1, 7, and 13).

486:

487: Uniform ants explore all available interfaces with equal probability; while this

488: makes them naturally suitable for multi-path routing, it also creates a tendency

489: to reinforce paths that have the least amount of decision making. To see why,

490: recall that the goodness of an interface is inversely proportional to a

491: non-decreasing function of the cost of the path along that interface. The cost

492: is not simply the cost of the shortest path along the interface, but is itself

493: assessed by the ants during their exploration; hence the routing probability for

494: choosing a particular interface is implicitly dependent on the number of ways in

495: which a costly path can be encountered along the considered interface.  The

496: presence of loops along an interface means that there are greater opportunities

497: for costly paths to be encountered (causing the interface to be reinforced

498: negatively) or for the ants to loop back to their source (causing their

499: absorption, and again, no positive reinforcement along the interface).

500:

501: The basic problem can be summarized by saying that ``interfaces that provide an

502: inordinate number of options involving loops will not be reinforced, even if

503: there exists high-quality loop-free sub-paths along those interfaces.''

504: Mathematically, this is a race between the negative reinforcements due to many

505: loops (and hence absorptions), and positive reinforcements due to one (or few)

506: short or cheap paths. As a result, the interface with the fewer possibilities

507: for decision making wins, irrespective of the path cost. Hence in the topologies

508: shown in Figure~\ref{velcro_topo}, uniform ants will reinforce along: the

509: costliest path (left), among one of many cheapest paths (center) and the

510: cheapest path (right).  Notice that using regular ants to prevent this incessant

511: multiplication of probabilities is not acceptable, as we will be giving up the

512: multi-path forwarding capability of uniform ants.

513:

514: Ideally, we want our ants to have selective amnesia, behaving as uniform ants

515: when it is important to have multipath forwarding and morphing into regular ants

516: when we do not want loops overshadowing the existence of a cheap, loop-free

517: path. We present a model-based approach that achieves this effect by maintaining

518: a statistics table independent of the routing table. The basic idea is to make

519: routers recognize that they constitute the fulcrum of a loop with respect to a

520: larger path context.

521:

522: For instance, in Figure~\ref{velcro_topo}, nodes 1, 7, and

523: 13 form fulcrums of loops, which should not play a role in multi-path forwarding

524: from, say, node 0 to node 19. The statistics table maintains, for each router

525: (node) and destination, the number of ants generated by it and the number that returned

526: without reaching its intended destination. Using these statistics, for instance,

527: node 1 can reason that all ants meant for destination 19 returned to it, when

528: sent along the interface leading to node 2. This information can be used to

529: reduce the scope of multi-path forwarding, on a per-destination basis.  The

530: statistics table serves as a discriminant function for the choices indicated by

531: the routing table, while the routing table reflects the reinforcement provided

532: by the uniform ants.

533:

534: \section{Protocol Model}

535: \subsection{Ant Structure}

536: Ants are small packets used to explore and gather information about

537: the network. Periodically each source node $s$ generates, to every other destination

538: $d$, ants of the form $[s, d, c, o_i]$, where $c$ is the cost associated with

539: the ant and $o_i$ is the outgoing interface from the source router.  When the

540: ants are created the cost $c$ is initialized to $0$. All the intermediate

541: routers along the path from the source to destination increment the cost $c$ to

542: reflect the cost in reverse (when a message traverses a link from node $a$ to

543: node $b$, $c$ is incremented by the cost of the link from $b$ to $a$). When the

544: ant reaches the destination $d$, the cost $c$ is the end-to-end cost of sending

545: a message from source $s$ to destination $d$. Note the intermediate nodes along

546: the path do not update $o_i$.

547:

548: \subsection{Routing Table Structure}

549: The routing table at each node is a two-dimensional array of the probabilities

550: of using various interfaces to reach destinations.  $RoutingTable_i[j][k]$, maintained at

551: node $i$, is the probability with which the interface $k$ of node $i$ is chosen

552: to reach destination $j$. Initially the probabilities for all destinations are

553: distributed equally across all the interfaces. This is in-line with the

554: destructive property of RL routing algorithms in which all interfaces are

555: ``innocent until proven guilty.''

556:

557: \subsection{Statistics Table Structure}

558: The statistics table is also a two dimensional structure like the routing table,

559: except each node has two statistics tables.  $SentStatTable_i[j][k]$

560: corresponds to the number of ants sent along interface $k$ to destination $j$

561: originating from node $i$, and $ReturnedStatTable_i[j][k]$ is the number of ants

562: sent along the interface $k$ to destination $j$ which returned to their source

563: $i$.

564:

565: The ant statistics are maintained only at the source node, and not at the

566: intermediate nodes, to allow for scalability of the algorithm.  If every intermediate

567: node $n$ along the path of an

568: ant from source $i$ to destination $j$ increments its statistics table

569: $SentStatTable_n[j][m]$ when it forwards the ant along the interface $m$, it

570: would necessitate the ant to have a provision to save the outgoing interface for

571: each node along its path, so that the node will be able to identify if the ant

572: loops back to itself. Accommodating such a structure in large topologies would

573: result in unbounded growth of the ant's size.  Additionally, the ants are not

574: forwarded when they reach the destination or the source. By updating the

575: statistics table only at the source nodes, if the ant doesn't loop back to

576: itself, the source node can safely assume that it has reached the destination

577: (Under 100\% reliability conditions that no packets are dropped); whereas the

578: intermediate nodes would have no way of determining whether the ant reached the

579: destination successfully, or whether it looped back to the source node itself.

580:

581: \begin{table}

582: \caption{Model-based Ant Routing Algorithm}

583: \label{code}

584: \centering

585: \begin{verbatim}

586: procedure Main

587:   begin:

588:     Uncontrolled Exploration

589:     Controlled Exploration

590:   end.

591:

592: procedure Exploration (Uncontrolled | Controlled)

593:   begin:

594:     for every node in the topology

595:     begin:

596:       GenerateAnt; /* Periodically Generate Ant */

597:       SelectInterface (Uncontrolled | Controlled);

598:       UpdateModel;

599:       ForwardAnt;

600:     end.

601:   end. /* End of exploration procedure */

602:

603: procedure ReceiveAnt

604:   begin:

605:     if the receiving node is the source of the ant

606:     begin:

607:       UpdateModel;

608:       DestroyAnt;

609:     end.

610:     if the receiving node is

611:       neither the source nor the destination

612:     begin:

613:       UpdateRouteTable;

614:       SelectInterface(Uncontrolled | Controlled)

615:       ForwardAnt;

616:     end.

617:     if the receiving node is the

618:       intended destination of the ant

619:     begin:

620:       UpdateRouteTable;

621:       DestroyAnt;

622:     end.

623:   end. /* End of receive ant procedure */

624: \end{verbatim}

625: \end{table}

626:

627: \subsection{Description of the Algorithm}

628: An overview of the algorithm is given in Table \ref{code}.  The algorithm

629: consists of two stages: Uncontrolled Exploration and Controlled Exploration. In

630: both forms of exploration, each node periodically generates ants destined for

631: every other node in the topology. The algorithm uses uncontrolled exploration to

632: collect information about the topology and uses that information to build a

633: model to control future exploration at the nodes. The information collected

634: during the controlled exploration is used to update the model as well.  The two

635: forms of exploration work almost identically except for the SelectInterface

636: method. The following is a brief description of the various methods used in the

637: algorithm above.

638:

639: \subsubsection{GenerateAnt}

640: This method generates an ant of

641: the form $[s, d, 0, undefined]$, where $s$ is the source node generating the ant

642: and $d$ is the intended destination. The initial cost $c$ associated with the

643: ant is set to $0$. The SelectInterface method determines the output interface,

644: so at this point, the output interface is undefined immediately after the ant is created.

645:

646: \subsubsection{SelectInterface}

647: Due to the probabilistic nature of the routing algorithm, it is essential to

648: ensure that the choice of the destination node for each ant at each node is

649: uniformly distributed, so that the number of ants generated to the various

650: destinations is nearly equal. This method differentiates between the two forms

651: of exploration mentioned above, however both forms choose the output interface

652: uniformly, although the valid interfaces for Controlled Exploration are slightly

653: constrained for optimization.

654:

655: \begin{itemize}

656: \item{\bf Uncontrolled Exploration: } Here the choice of the outgoing interface

657: at each node along the path from the source to destination is unbiased, i.e.

658: every interface at that node has equal probability of being chosen as the

659: outgoing interface.  The node generating the ant chooses one interface from its

660: interfaces and forwards the ant along that interface. If an intermediate node

661: (not the intended destination node) receives an ant along interface $A$ and

662: has interfaces other than $A$, it forwards the ant on some interface other than

663: $A$.  If it does not have any other interface then it sends-back along the

664: interface $A$ itself.

665:

666: \item{\bf Controlled Exploration: }Here the choice of outgoing interface is

667: controlled by a variable called the threshold factor ($\tau$) ranging from $0$ to

668: $1$. The threshold factor not only affects the multipath capabilities of the routing

669: algorithm, but also its loop-free capabilities and its correctness with respect to

670: the routing of packets (measured by the percentage of packets successfully reaching

671: their intended destinations).

672: \end{itemize}

673:

674: Formally, the threshold factor works in the following manner: When a node $i$

675: (source or intermediate) needs to forward an ant intended for destination $j$,

676: finds the ratio of $ReturnedStatTable_i[j][k]$ to

677: $SentStatTable_i[j][k]$ for each of its interfaces $k_1\cdots k_n$. All those

678: interfaces whose ratios are less than the threshold $\tau$ are eligible for

679: selection as a forwarding interface.  Then the selection policy is to choose among

680: the eligible interfaces with equal probability.  Three special cases must be

681: handled in the case of controlled exploration:

682: \begin{itemize}

683: \item{\bf Case 1}

684: When an ant arrives at a leaf node, i.e. there are no other interfaces other

685: than the incoming interface, and if it is not the intended destination then the

686: node sends-back the ant along the same interface.

687: \item{\bf Case 2}

688: When all the interfaces at the intermediate node are ineligible, i.e. their

689: statistic table ratios are above the threshold , then the node sends-back the

690: ant along the interface it originally received the ant from.

691: \item{\bf Case 3}

692: When all the interfaces at the source node are ineligible then the source node

693: uses the uncontrolled exploration selection policy to break the deadlock. This

694: case is a very rare occurrence and occurs only when  is set to a very low value.

695: \end{itemize}

696:

697: Once the outgoing interface is selected the next step is to forward the ant

698: along the chosen interface (ForwardAnt).  In the case of source node, before

699: calling the ForwardAnt, UpdateModel is called to update the statistics table.

700:

701: \subsubsection{UpdateModel}

702: This method updates the statistics tables when an ant is generated or loops back

703: to its source.  The correctness and currency of the statistics tables are vital

704: to the performance of the router.  When the node generates the ant $[i, j, c, k]$,

705: it increments its statistic table entry $SentStatTable_i[j][k]$ by $1$ to indicate

706: that interface $k$ was chosen by $i$ to forward the ant intended for destination $j$.

707: Also, when an ant $[i, j, c, k]$ loops back to the source node, the statistic table

708: entry $ReturnedStatTable_i[j][k]$ is incremented by $1$ to indicate that the

709: choice of interface $k$ to route the ant intended to destination $j$ resulted in

710: a loop.  This can be considered a negative reinforcement in the behavior of the

711: router.

712:

713: \subsubsection{ForwardAnt}

714: This method is used to forward the ants from the current node to the next node

715: along the interface chosen by the SelectInterface method.

716:

717: \subsubsection{DestroyAnt}

718: When the ant reaches the intended destination or loops back to its source

719: itself, the ant is not forwarded further and the node absorbs the ant.

720:

721: \subsubsection{UpdateRouteTable}

722: When any node $t$ (intermediate or the intended destination) other than the

723: source node, receives an ant $[i, j, c, k]$ on interface $l$ from node $y$, it

724: updates the cost $c$ by adding the cost of traversing the interface $l$ in

725: reverse, and then updates its routing table entries for node $i$ as follows:

726: \[

727: rt[i][l] = \frac{rt[i][l] + \Delta p}{1 + \Delta p},

728: rt[i][m] = \frac{rt[i][m]}{1 + \Delta p}

729: \]

730: \[

731: 1 \le m \le n, l \ne m

732: \]

733: where $\Delta p = \frac{\lambda}{f(c)}$, $\lambda > 0$ and $f(c)$ is a

734: non-decreasing function of $c$.

735:

736: \subsection{Qualitative Characteristics}

737: The model-based routing algorithm presented above discards all {\em useless

738: loops}, in which all traffic exiting the loop must exit at the same point which

739: it entered, such as the fulcrum points in the velcro topologies shown in

740: Figure~\ref{velcro_topo}.  For instance, in these velcro topologies,

741: when node 1 sends out a packet intended for a destination other than those nodes

742: in the loop pivoted at 1, either on the interface leading to node 2 or node 6,

743: the result will be the packet returning to node 1. From the statistics table,

744: node 1 will learn that those interfaces are useless for forwarding packets to

745: certain destinations and hence avoid them in the future. By discarding all the

746: useless loops, this algorithm overcomes the problem of the uniform ants

747: algorithm wherein only the path with the least decision-making is reinforced.

748:

749: The threshold factor $\tau$ influences the reinforcement of the various paths of

750: a topology. At very high values of $\tau$, the algorithm tends towards behaving

751: like uniform ants while continuing to avoid all the useless loops. For instance

752: a $\tau$ value of 1 means that an interface where all but one packet sent on it

753: looped back may still be selected as an outgoing interface. At the same time

754: this setting still avoids all the interfaces that lead to useless loops, as all

755: packets sent along them must have come back to the sender.

756:

757: At high $\tau$ values, certain packets may encounter one or more loops along

758: their path that are unavoidable. At very low values of $\tau$, the nodes have a

759: limited selection of interfaces to choose from due to the stringent

760: loop-avoidance criteria, which will affect our goal of multi-path routing, but

761: will greatly decrease the probability of encountering a loop.  The choice of

762: $\tau$ factor determines the multipath, correctness, and loop-avoidance

763: capabilities of our algorithm. The threshold factor can either be set to a fixed

764: value (for the network, or on a per-router or per-router/destination-pair basis)

765: or can be adaptively refined to optimize model-based routing for various

766: criteria.

767:

768: \section{Evaluation}

769: \subsection{Experimental Setup}

770: To measure the performance of our cost-sensitive reachability routing algorithm,

771: we wrote a discrete event simulator in C to simulate a standard

772: point-to-point topology based network. The simulated network is modeled as a set

773: of nodes interconnected over point-to-point links, each with an associated cost. The

774: discrete event simulator was derived from work done in~\cite{srinidhi00}, and

775: has been used in several networking courses to model routing algorithms.

776:

777: The simulator runs at a resolution of 1 $\mu s$ and an integer value defined at

778: the initialization of the simulation determines the duration of the simulation.

779: In our case, the simulation runs were set to INTMAX (2147483647 as defined in

780: $<$ limits.h $>$).  As it is a discrete event simulator, every action takes

781: place after the expiration of a timer and the simulator is programmed to run in

782: uncontrolled exploration mode for the first one eighth of the time and in

783: controlled exploration mode for the remaining time. Each node generated an ant

784: every 10000 $\mu s$. For the purpose of this paper, we programmed the link layer

785: of the simulator to be reliable, i.e. it does not introduce any errors or drop

786: packets.

787:

788: \subsection{Topologies}

789: A utility provided along with the simulator~\cite{srinidhi00}, when given the

790: number of nodes in the network and number of interfaces per node, is able to

791: generate four different interconnected topologies for the network, namely: tree,

792: clique (fully connected mesh), arbitrary graph, and loop topologies. The automated

793: topology generating utility was used to generate the tree and arbitrary graph

794: topologies used in the simulations.

795:

796: Using the manual topology generator provided along with the simulator, complex

797: topologies such as the velcro and dumbbell topologies were created. These

798: topologies have some intrinsic characteristics helpful in demonstrating the

799: range and effectiveness of our algorithm.

800:

801: A clique topology generator was written in C, which, when given the number of rows and

802: columns in the clique, will generate a perfect clique topology wherein all the

803: interior nodes will be of degree 4 and all the boundary nodes will be of degree

804: 2 or 3.

805:

806: Finally, BRITE, the Boston university Representative Internet Topology

807: gEnerator~\cite{brite}, was used to generate large Internet scale topologies. It

808: provides a wide variety of generation models, as well as the ability to extend

809: such a set by combining existing models or adding new ones. We used the

810: Router Waxman Flat Router-level model, which is governed by a power law, to

811: generate the topologies. A program in C was written to convert the topology

812: format generated by BRITE to the format used by our simulator.

813: Topologies with sizes ranging from 20 to 200 nodes were generated using BRITE.

814:

815: Our model-based routing algorithm was first validated in~\cite{srinidhi03}, by

816: examining its performance when applied to routing on synthetic worst-case

817: scenario topologies, such as velcro topologies.  This previous work also

818: presented a subtle modification to the algorithm, avoiding sub path reinforcement,

819: which results in better performance on certain types of topologies.

820:

821: % Second, we quantify the convergence

822: % of our routing algorithm by measuring the correlation of path costs and hop

823: % counts between all packets sent to and originating from the nodes under

824: % consideration. In our case, the nodes under consideration were those with the

825: % maximum and minimum degree.

826:

827: The primary contribution of this work is to study data traffic across the network

828: based on converged routing tables and introduce a new factor called the

829: reachability factor ($\phi$) that controls the choice of the outgoing

830: interfaces. We investigate the effect of the threshold factor ($\tau$) and the

831: reachability factor on various topologies with the help of an operating curve

832: aimed at helping network administrators in choosing the ideal threshold and

833: reachability factors for their networks. We also show that by making the nodes

834: always choose the interface with the highest probability for the intended

835: destination, our model-based routing algorithm behaves in the same way as any

836: other single-path deterministic routing algorithm i.e., it provides loop-free

837: shortest-paths with guaranteed delivery for all packets.

838:

839: Additionally, we show that even though the goal of every multi-path routing

840: algorithm is to avoid loops, our model-based routing algorithm does not

841: guarantee a complete elimination of loops.  Nevertheless our algorithm

842: guarantees that a packet will eventually exit the loop and reach its intended

843: destination. We study the distribution of loops encountered by packets and show

844: that a vast majority of packets encounter only a small number of loops, or none

845: at all.

846:

847: \subsection{Packet Routing Using Model Based Routing}

848: In this set of experiments, a new application was written on top of the

849: simulator to route packets based on the routing table learned by ants exploring

850: the network. Initially, we ran the model-based routing algorithm on the given

851: topology to obtain a stabilized routing table. Next, we ran the application with

852: the routing table and the reachability factor as parameters and collected

853: various statistics. Below we will discuss in detail the application, the

854: reachability factor, the statistics collected, and analysis of the statistics

855: obtained from both model-based and uniform ants routing.

856:

857: The functioning of the application is similar to the one described earlier,

858: except that there is no update of the routing table.  The routing table is

859: pre-initialized to that obtained from the model-based routing simulation and

860: remains constant throughout. By not updating the routing table based on the

861: packets arriving at every node we are just exploiting the model and not

862: exploring the network further.  It should be noted that a real world router

863: would constantly explore the network with ants, and use the resulting routing

864: table to route packets simultaneously.  However, to determine the effectiveness

865: of the underlying algorithm, it is simpler to analyze its performance in a

866: static network environment.

867:

868: The reachability factor $\phi$  controls the degree of freedom each node has in

869: choosing the outgoing interface. At each node the outgoing interfaces are

870: ordered in descending order of their probabilities for every destination. When a

871: node $n$ needs to route a packet intended for destination $d$, it picks the top

872: $\phi$ interfaces for that destination and uses their scaled up probabilities

873: for selecting the outgoing interface. For a better understanding of the

874: reachability factor, consider the following example. Say a node $M$ has 4

875: interfaces $A$, $B$, $C$, and $D$ with associated probabilities $0.4$, $0.2$,

876: $0.15$, $0.15$ for destination $N$; then a $\phi$ value of $2$ will allow the

877: node $M$ to choose from interfaces $A$ and $B$ with probabilities $(0.4)/(0.4 +

878: 0.2)$ and $(0.2)/(0.4+0.2)$ respectively i.e. node $M$ will choose interface $A$

879: 66.67\% of the time and interface $B$ 33.33\% of the time to route the packet

880: intended for destination $N$.

881:

882: The statistics collected include the number of loops encountered by the packets

883: along their paths, the number of packets encountering loops, the multipath

884: capability of the packets, and the percentage of packets successfully reaching

885: their intended destination. To determine the number of loops encountered by the

886: packets, each packet has a stack associated with it. Every node, before

887: forwarding a packet, checks to see if its {\em id} already exists in the stack. If its

888: {\em id} is present in the stack, it increments the loop counter of the packet by $1$

889: and pops the contents of the stack up to its {\em id} else pushes its {\em id} onto the

890: stack and then forwards the packet. At the end of the simulation we have

891: statistics on the number of packets encountering loops (loop percentage) and the

892: total number of loops encountered by all the packets. Every packet also has a

893: multipath flag associated with it that is set if any node along the path taken

894: by the packet has more than one outgoing interface to choose from. This is used

895: to determine the percentage of packets that could have potentially taken more

896: than one path to reach their intended destination (multipath percentage).

897: Finally, we determine the success percentage as the percentage of packets

898: successfully reaching their intended destination.

899:

900: \subsubsection{Reachability factor $\phi = 1$}

901: In our first set of experiments $\phi$ was set to $1$ so that the nodes always

902: choose the best outgoing interface (interface with the highest probability) for

903: each packet. As each packet deterministically chooses the best interface at

904: every node, the multipath percentage is zero. A $\phi$ value of $1$ also results

905: in the avoidance of loops and a one hundred percent success percentage as all

906: the packets reach their intended destination. According to proposition 2 of

907: Subramanian~et~al.~\cite{subramanian97} the probability of choosing an interface

908: is inversely proportional to the cost ratios (under the assumption of loop free

909: paths).  Keep in mind that this proposition applies even for our modified

910: model-based algorithm as all the avoidable loops are avoided and also we have

911: shown in~\cite{srinidhi03} that the probabilities are inversely proportional to

912: the path costs. By choosing the interface with the highest probability, i.e. the

913: interface that advertised a lower cost path to that destination, at every node

914: we have achieved deterministic shortest path routing while still using the

915: underlying probabilistic routing table.

916:

917: The following set of simulations were done on 20 to 100 node BRITE topologies

918: with uniform cost distribution so that with $\phi = 1$ the path taken by all the

919: packets will not only correspond to the shortest path in terms of cost but also

920: in terms of the number of hops. By sending packets across the network and

921: keeping track of their hop count, we ascertained the shortest path length

922: between every source-destination pair. At the end of the simulations, the

923: average shortest path length for the topologies were calculated and compared

924: with the theoretical shortest path lengths. We then attempt to fit this

925: empirical data onto parametrized formulas.

926:

927: Below we discuss the derivation of average shortest-path lengths for

928: exponentially distributed graphs based on~\cite{newman01}.  The Router Waxman

929: model of BRITE uses an exponentially distributed generation function to create

930: the topologies.  According to~\cite{newman01}, the generating function $G_0(x)$

931: should be normalized such that $G_0(1) = 1$.

932:

933: We use the following generating function for our derivation:

934: \[

935: G_0(x) = \frac{1 - e^{-1/\kappa}}{1 - xe^{-1/\kappa}}

936: \]

937: According to~\cite{newman01}, the average shortest path length is given

938: by:

939: \[

940: l = \frac{\ln{N / z_1}}{\ln{z_2 / z_1}} + 1

941: \]

942: for $N \gg z_1$ and $z_2 \gg z_1$, where $N$ corresponds to the

943: number of nodes in the topology, and $z_m$ corresponds to the

944: average number of $m$th-nearest neighbors with $z_1 = G'_0(1)$

945: and $z_2 = G''_0(1)$. We derived $l$ to be:

946: \[

947: l = 1 + \frac{\ln{N} + \ln{e^{1/\kappa}-1}}{\ln{2} - \ln{e^{1/\kappa} - 1}}

948: \]

949:

950: From this equation we derived the value of $\kappa$ to be

951: \[

952: \kappa = \frac{1}{\ln{\frac{2^{\frac{l-1}{l}}}{N^{\frac{1}{l}}}}}

953: \]

954:

955: Based on the above derivations, a least square fit was conducted on the

956: simulation results, which returns both $\kappa$ and the square of the

957: correlation coefficient with values ranging of $0$ and $1$, indicating bad or

958: good fit respectively. In our case, the fit returned a value of $0.986551$,

959: which indicates that the best fit line summarizes the data very well as shown in

960: Figure~\ref{shortest_path}.

961:

962: \begin{figure}

963: \vspace{0.14in}

964: \centering

965: \includegraphics[scale=0.35]{shortest_path}

966: \vspace{0.14in}

967: \caption{Least square fit between the theoretical and actual shortest path

968: lengths.}

969: \label{shortest_path}

970: \end{figure}

971:

972: \subsubsection{Reachability factor $\phi =$ maximum degree}

973: By setting the reachability factor to the maximum degree of the topology, each

974: node will be allowed to choose among all its interfaces to be the outgoing

975: interfaces (based on the probability associated with it for the intended

976: destination). The simulations were run on the following topologies: 20 to 200

977: node BRITE topologies, 10x4 \& 8x5 clique topologies and the velcro topologies

978: described in Figure~\ref{velcro_topo}. {\em Operating curves} of the percentage

979: of packets encountering loops were plotted against the percentage of those with

980: multipath capabilities for various topologies at different values of the threshold

981: factor.  These operating curves are shown in

982: Figures~\ref{curve_brite},~\ref{curve_velcro},~and~\ref{curve_clique}.

983: Visualizing the performance of the routing algorithm in this way enables us to

984: compare the effect of the inherent topology and performance parameter settings,

985: and the interactions between the two.

986:

987: As opposed to $\phi = 1$, $\phi =$ maximum degree results in multi-path

988: forwarding of the packets and also some portion of packets entering into

989: loops. All the packets reached their intended destinations except for those that

990: looped back to their source resulting in a high success percentage. To overcome

991: the drop in success percentage, the packets were forwarded even when they looped

992: back to the source and counting this episode as just another loop encountered

993: along the path.

994:

995: With this modification all the packets successfully reached

996: their intended destinations but with a linear increase in the percentage of

997: loops (to account for all those packets that were earlier absorbed by their

998: source). All packets had a TTL of 255 but none of them were dropped due to

999: reaching the TTL limit. Below we present the operating curves for various

1000: topologies under both the cases: 1) absorption of packets at their source and 2)

1001: no absorption of packets.

1002:

1003: \subsection{Operating Curve Observations}

1004:

1005: \begin{figure*}

1006: \vspace{0.14in}

1007: \centerline{\subfigure[With source absorption, each point is labeled with its

1008: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_40_abs}

1009: \label{curve_brite_abs}}

1010: \hfill

1011: \subfigure[Without source absorption, each point is labeled with its

1012: threshold value]{\includegraphics[scale=0.35]{curve_40_noabs}

1013: \label{curve_brite_noabs}}}

1014: \caption{Operating curve for a 40 node BRITE topology}

1015: \vspace{0.14in}

1016: \label{curve_brite}

1017: \end{figure*}

1018:

1019: \begin{figure*}

1020: \centerline{\subfigure[With source absorption, each point is labeled with its

1021: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_velcro_abs}

1022: \label{curve_velcro_abs}}

1023: \hfill

1024: \subfigure[Without source absorption, each point is labeled with its

1025: threshold value]{\includegraphics[scale=0.35]{curve_velcro_noabs}

1026: \label{curve_velcro_noabs}}}

1027: \caption{Operating curve for the velcro topology shown in

1028: Figure~\ref{velcro_topo} right}

1029: \vspace{0.14in}

1030: \label{curve_velcro}

1031: \end{figure*}

1032:

1033: \begin{figure*}

1034: \vspace{0.14in}

1035: \centerline{\subfigure[With source absorption, each point is labeled with its

1036: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_8x5_abs}

1037: \label{curve_clique_abs}}

1038: \hfill

1039: \subfigure[Without source absorption, each point is labeled with its

1040: threshold value]{\includegraphics[scale=0.35]{curve_8x5_noabs}

1041: \label{curve_clique_noabs}}}

1042: \caption{Operating curve for a 8x5 clique topology}

1043: \label{curve_clique}

1044: \end{figure*}

1045:

1046: Let us take the operating curve for a random 40 node BRITE topology shown in

1047: Figure~\ref{curve_brite_abs} and study it closely. As the threshold factor

1048: increases, we see that the performance goes from a region with no loops and 45\% multipath to

1049: one with 7\% loops and 100\% multipath. It is heartening to note that the curve first

1050: increases in the direction of accommodating multipath before introducing loops,

1051: rather than the other way around.

1052:

1053: Second, notice that different portions of the graph are shaded differently. Each

1054: operating curve is represented by a solid line and a dotted line. These denote

1055: the region where the model is completely in force, and the region where it is

1056: not, respectively. As discussed earlier, at very low threshold factor values,

1057: when all the interfaces at an intermediate node are ineligible, i.e. their

1058: statistic table ratios are above the threshold, then the node sends-back the ant

1059: along the interface it originally received the ant from resulting in an

1060: increased percentage of packets entering into loops. Similarly at very low

1061: values of $\tau$, when all the interfaces at the source node are ineligible,

1062: then the source node uses the uncontrolled exploration selection policy to break

1063: the deadlock. As Figure~\ref{curve_brite_abs} shows, around a threshold value of

1064: $0.4$, the model comes into force in that all routing decisions are based on

1065: learning rather than defaults.

1066:

1067: By comparing Figure~\ref{curve_brite_abs}

1068: with~\ref{curve_brite_noabs} (the latter of which does not have source

1069: absorption), we notice that the difference in the percentage of success of

1070: packets reaching their destination with and without source absorption is

1071: reflected in the difference in percentage of packets encountering loops with and

1072: without source absorption.  Removing source absorption from the simulation

1073: results in a 100\% success rate, but an increase in the percentage of packets

1074: encountering loops, which is an understandable consequence.  However, for a

1075: router using the ant-derived statistic tables to make routing decisions, it is

1076: vital for data to transit the network with the highest success rate, even at the

1077: expense of an increased likelihood of entering a routing loop.

1078:

1079: The operating curves for the 40 node BRITE topologies shown in

1080: Figures~\ref{curve_brite_abs} and \ref{curve_brite_noabs}, compared to the

1081: operating curves of BRITE topologies with different numbers of nodes (not shown

1082: here, refer to~\cite{kumar04}) also exhibit another

1083: interesting behavior. As the number of nodes in the topology increases, the

1084: minimum multipath percentage also increases. This is due to the fact that at

1085: very low threshold values, the model-based routing algorithm routes a large

1086: number of packets deterministically in smaller topologies.  The shape of the

1087: operating curve greatly depends on the intrinsic graph theoretic property of the

1088: topologies. The reader can observe from the figures above that each topology

1089: class (BRITE, clique, and velcro) generates its own unique shape of operating

1090: curve.

1091:

1092: % Figure 4.28 and Figure 4.29 have their

1093: % operating curve very similar to the operating curve generated

1094: % by the mesh topologies as the topology in Figure 4.6 can be

1095: % viewed as a triangulated mesh topology.

1096:

1097: The reader should also observe that all the operating curves at $\tau = 1$

1098: exhibit the behavior of the uniform ants algorithm~\cite{subramanian97}.  This

1099: is due to the fact that all the interfaces at each node are eligible to be

1100: selected as the outgoing interface for the intended destination which conforms

1101: to the selection policy of uniform ants algorithm.

1102:

1103: The number of unique operating curves is limitless when the various topology

1104: classes are combined in the same network.  The fact that each operating curve

1105: has a unique threshold value that gives the network optimal performance, in

1106: terms of loop avoidance and multipath routing, presents us with the need to

1107: adaptively learn and refine that threshold value for an arbitrary dynamic

1108: network.  This is an area of future research that is necessary before our

1109: multipath routing algorithm can be deployed on actual networks.

1110:

1111: \subsection{Distribution of loop frequency}

1112: Finally, we show that even though the presence of loops is unavoidable, the

1113: number of packets that encountered $k$ loops along their paths to their

1114: respective destinations exponentially decays with increase in $k$, i.e. the

1115: majority of the packets encounter between 0 to 2 loops, suggesting a power law.

1116: Figure~\ref{loop_freq} shows the plot between loop distribution and packet

1117: frequency for a 40-node BRITE topology. It should be noted that due to the

1118: cyclic nature of clique topologies, certain packets in those topologies encounter

1119: as many as 20 loops before they reach their intended destination.

1120:

1121: \begin{figure}

1122: \vspace{0.14in}

1123: \centering

1124: \includegraphics[scale=0.35]{loop_freq}

1125: \vspace{0.14in}

1126: \caption{Distribution of loops encountered by packets in a random 40 node BRITE

1127: topology, note the logarithmic scale of the y-axis.}

1128: \label{loop_freq}

1129: \end{figure}

1130:

1131: \subsection{Verification of cost-sensitive routing in BRITE topologies}

1132: The goal of this experiment was to perform a large-scale validation of the

1133: cost-sensitivity properties of our reachability routing algorithm. First, all

1134: the paths taken by various packets at $\phi =$ maximum degree were enumerated.

1135: To achieve this, every packet had a stack associated with it that kept track of

1136: the nodes visited by it en route to its destination. At the destination, the

1137: paths taken by the packets from each source were ranked in increasing order of

1138: their costs. The destination nodes only kept track of unique paths from each

1139: source and also maintained the frequency associated with each path.

1140:

1141: The purpose of this instrumentation was to ensure that the frequency of costs, as measured

1142: through pursued paths, mirrored the distribution of traffic along these paths.

1143: At the end of the simulation, the summation of frequency over the top $[x - 9\%,

1144: x\%]$ of the paths for every source-destination pair was determined, for $x \in

1145: [10, 20 \cdots 100]$.  Figures~\ref{traffic_dist_subr}

1146: and~\ref{traffic_dist_nosubr} show the cost-sensitive routing of our

1147: model-based algorithm and also that sub-path reinforcement has no effect on

1148: BRITE topologies. (The experiments were performed on 60-node BRITE topologies).

1149:

1150: \begin{figure*}

1151: \vspace{0.14in}

1152: \centerline{\subfigure[With subpath reinforcement]{\includegraphics[scale=0.35]{dist_subr_noabs}

1153: \label{traffic_dist_subr}}

1154: \hfill

1155: \subfigure[Without subpath reinforcement]{\includegraphics[scale=0.35]{dist_nosubr_noabs}

1156: \label{traffic_dist_nosubr}}}

1157: \caption{Traffic distribution in a random 60 node BRITE topology without source

1158: absorption, note the logarithmic scale of the y-axis.}

1159: \vspace{0.14in}

1160: \label{traffic_dist}

1161: \end{figure*}

1162:

1163: \section{Conclusion and Future Work}

1164: In this paper we have presented a new model-based reinforcement learning

1165: algorithm, which achieves true cost-sensitive reachability routing, even in

1166: network topologies that pose problems to both deterministic routing as well as

1167: classical RL formulations.  This algorithm efficiently distributes traffic among

1168: all paths leading to a destination. The evaluation results indicate that our

1169: approach achieves true multi-path routing, with traffic distributed among the

1170: multiple paths in inverse proportion to their costs. By helping maintain the

1171: incremental spirit of current backbone routing algorithms, this approach has the

1172: potential to form the basis of the next generation of routing protocols,

1173: enabling a fluid and robust backbone routing framework. The reader is referred

1174: to~\cite{kumar04} and~\cite{srinidhi03} for background and further experimental

1175: results.

1176:

1177: We now present four possible directions for future work.

1178:

1179: \begin{itemize}

1180: \item{\bf Adaptive configuration of the threshold factor ($\tau$)}

1181:

1182: The threshold factor is currently set to a fixed value for all the nodes in the

1183: topology. From the operating curve, the network administrator determines the

1184: optimal value of $\tau$ at which the routing yields high success and multipath

1185: percentage while keeping the percentage of packets entering into loops low. As

1186: part of the future work, we can determine the $\tau$ value dynamically based on

1187: available information and periodically adjust its value to obtain the optimal

1188: routing requirements. The  value could be dynamically adapted on a per-node

1189: basis or on per-source/destination-pair basis at every node.

1190:

1191: \item{\bf Instructive feedback}

1192:

1193: Our RL algorithm works primarily using evaluative feedback from neighboring

1194: routers. It would be interesting to extend the framework to accommodate

1195: instructive feedback. But to provide instructive feedback, a router must have

1196: sufficient discriminating capability to perform credit assignment. It is

1197: typically of the case that any resulting instruction will be of the negative

1198: kind i.e., ``for destination $X$, do not use interface $i_y$.'' How such

1199: negative instructions can co-exist with positive reinforcements is an important

1200: research issue, not only for our application domain, but also the larger field

1201: of reinforcement learning.

1202:

1203: \item{\bf Modeling topologies with hierarchical addressing}

1204:

1205: Currently the algorithm assumes all topologies to be flat such that all nodes in

1206: the topology are numbered from $1$ to $n$. By supporting hierarchical addressing

1207: of the nodes, the model built at every node could be at a sub network basis

1208: instead of being at a per node basis, i.e. a node could collect statistics for a

1209: group of nodes as a single entity and build its model accordingly. Such an

1210: approach encourages problem decomposition and enables scaling up to large

1211: network sizes.

1212:

1213: \item{\bf Reverse engineering routing protocols}

1214:

1215: The model-based reinforcement learning algorithm presented here promises to

1216: serve as an abstraction of reachability routing algorithms in general. One idea

1217: for further research is to automatically mine the model by analyzing implemented

1218: routing algorithms' behavior, rather than incrementally learning it from

1219: scratch, as we have done here. In other words, we can seek to imitate the

1220: functioning of another algorithm by suitably configuring our model.  This

1221: problem has its roots in inverse reinforcement learning, where we are aiming to

1222: recover an algorithm from observed (optimal) behavior.  The first steps toward

1223: such reverse engineering have been recently taken~\cite{shiraev03}.

1224:

1225: \end{itemize}

1226:

1227: % if have a single appendix:

1228: %\appendix[Proof of the Zonklar Equations]

1229: % or

1230: %\appendix  % for no appendix heading

1231: % do not use \section anymore after \appendix, only \section*

1232: % is possibly needed

1233:

1234: % use appendices with more than one appendix

1235: % then use \section to start each appendix

1236: % you must declare a \section before using any

1237: % \subsection or using \label (\appendices by itself

1238: % starts a section numbered zero.)

1239: %

1240: % Use this command to get the appendices' numbers in "A", "B" instead of the

1241: % default capitalized Roman numerals ("I", "II", etc.).

1242: % However, the capital letter form may result in awkward subsection numbers

1243: % (such as "A-A"). Capitalized Roman numerals are the default.

1244: %\useRomanappendicesfalse

1245: %

1246: %\appendices

1247: %\section{Proof of the First Zonklar Equation}

1248: %Appendix one text goes here.

1249:

1250: % you can choose not to have a title for an appendix

1251: % if you want by leaving the argument blank

1252: %\section{}

1253: %Appendix two text goes here.

1254:

1255: % use section* for acknowledgement

1256: %\section*{Acknowledgment}

1257: % optional entry into table of contents (if used)

1258: %\addcontentsline{toc}{section}{Acknowledgment}

1259: %The authors would like to thank...

1260:

1261: % trigger a \newpage just before the given reference

1262: % number - used to balance the columns on the last page

1263: % adjust value as needed - may need to be readjusted if

1264: % the document is modified later

1265: %\IEEEtriggeratref{8}

1266: % The "triggered" command can be changed if desired:

1267: %\IEEEtriggercmd{\enlargethispage{-5in}}

1268:

1269: % references section

1270: % NOTE: BibTeX documentation can be easily obtained at:

1271: % http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/

1272:

1273: % can use a bibliography generated by BibTeX as a .bbl file

1274: % standard IEEE bibliography style from:

1275: % http://www.ctan.org/tex-archive/macros/latex/contrib/supported/IEEEtran/bibtex

1276: %\bibliographystyle{IEEEtran.bst}

1277: % argument is your BibTeX string definitions and bibliography database(s)

1278: %\bibliography{IEEEabrv,../bib/paper}

1279: %

1280: % <OR> manually copy in the resultant .bbl file

1281: % set second argument of \begin to the number of references

1282: % (used to reserve space for the reference number labels box)

1283: \begin{thebibliography}{1}

1284:

1285: % \bibitem{boyan94}

1286: % J.~Boyan and M.~Littman. Packet Routing in Dynamically Changing

1287: % Networks: A Reinforcement Learning Approach. In \emph{Advances in Neural

1288: % Information Processing Systems 6 (NIPS6)}, pages 671-678. Morgan

1289: % Kaufmann, San Francisco, CA, 1994.

1290:

1291: \bibitem{chen98}

1292: J.~Chen, P.~Druschel, D.~Subramanian. A Simple, Practical Distributed

1293: Multi-Path Routing Algorithm. TR98-320. Department of Computer

1294: Science, Rice University. July 1998.

1295:

1296: \bibitem{chen99}

1297: J.~Chen, P.~Druschel, and D.~Subramanian. A New Approach to Routing

1298: with Dynamic Metrics. In \emph{Proceedings of the IEEE INFOCOM Conference on

1299: Computer Communications}, pages 661-670. IEEE Press, New York, March 1999.

1300:

1301: \bibitem{dicaro98}

1302: G.~Di~Caro and M.~Dorigo. AntNet: Distributed Stigmergetic Control for

1303: Communications Networks. Journal of Artificial Intelligence Research,

1304: Vol. 9, pages 317-365, 1998.

1305:

1306: \bibitem{dorigo99}

1307: M.~Dorigo, G.~Di~Caro, and L.~M.~Gambardella. Ant Algorithms for Discrete

1308: Optimization. Artificial Life, Vol. 5, No. 2, pages 137-172, 1999.

1309:

1310: \bibitem{guestrin02}

1311: C.~Guestrin, M.~Lagoudakis, and R.~Parr. Coordinated Reinforcement

1312: Learning. In \emph{Machine Learning: Proceedings of the Nineteenth Interna-

1313: tional Conference (ICML 2002)}, pages 227-234. University of New South

1314: Wales, Sydney, Australia, July 2002.

1315:

1316: \bibitem{littman96}

1317: L.~P.~Kaelbling, M.~L.~Littman, and A.~W.~Moore. Reinforcement Learning:

1318: A Survey. Journal of Artificial Intelligence Research, Vol. 4,

1319: pages 237-285, 1996.

1320:

1321: \bibitem{brite}

1322: A.~Medina, A.~Lakhina, I.~Matta, J.~Byers. BRITE: Universal Topology

1323: Generation from a User's Perspective. Technical Report, BUCS-TR2001-

1324: 003, Boston University, 2001.

1325:

1326: \bibitem{newman01}

1327: M.~E.~J. Newman, S.~H.~Strogatz, and D.~J.~Watts. Random graphs with

1328: arbitrary degree distributions and their applications. Physics Review E 64,

1329: 026118 pages 1-16, 2001.

1330:

1331: \bibitem{shiraev03}

1332: D.~Shiraev. Inverse Reinforcement Learning and Routing Metric Discov-

1333: ery. M.S. Thesis, Department of Computer Science, Virginia Tech, August

1334: 2003.

1335:

1336: \bibitem{subramanian97}

1337: D. Subramanian, P. Druschel, and J. Chen. Ants and Reinforcement

1338: Learning: A Case Study in Routing in Dynamic Networks. In \emph{Proceedings

1339: of the Fifteenth International Joint Conference on Artificial Intelligence

1340: (IJCAI'97)}, pages 832-839. Morgan Kaufmann, San Francisco, CA, 1997.

1341:

1342: \bibitem{sutton98}

1343: R.~S.~Sutton and A.~G.~Barto. Reinforcement Learning. MIT Press,

1344: Cambridge, MA, 1998.

1345:

1346: \bibitem{kumar04}

1347: M.~Thirunavukkarasu. Reinforcing Reachable Routes. M.S. Thesis,

1348: Department of Computer Science, Virginia Tech, May 2004.

1349:

1350: \bibitem{srinidhi03}

1351: S.~Varadarajan, N.~Ramakrishnan, M.~Thirunavukkarasu. Reinforcing

1352: Reachable Routes. Computer Networks, Vol. 43, No. 3, pages 389-416,

1353: Oct 2003.

1354:

1355: \bibitem{srinidhi00}

1356: S.~Varadarajan. Ethereal: A Fault Tolerant Host-Transparent Mechanism

1357: for Bandwidth Guarantees over Switched Ethernet Networks. PhD thesis,

1358: Department of Computer Science, State University of New York, Stony

1359: Brook, 2000.

1360:

1361: \bibitem{mpath}

1362: S.~Vutukury and J.~J.~Garcia-Luna-Aceves, MPATH: A Loop-free Multipath Routing

1363: Algorithm.  Microprocessors and Microsystems Journal (Elsevier), Vol. 24, pages

1364: 319-327, 2001.

1365:

1366: \end{thebibliography}

1367:

1368: