cs0511053/body.tex
1: 
2: \section{Introduction}
3: % The very first letter is a 2 line initial drop letter followed
4: % by the rest of the first word in caps.
5: % 
6: % form to use if the first word consists of a single letter:
7: % \PARstart{A}{demo} file is ....
8: % 
9: % form to use if you need the single drop letter followed by
10: % normal text (unknown if ever used by IEEE):
11: % \PARstart{A}{}demo file is ....
12: % 
13: % Some journals put the first two words in caps:
14: % \PARstart{T}{his demo} file is ....
15: % 
16: % Here we have the typical use of a "T" for an initial drop letter
17: % and "HIS" in caps to complete the first word.
18: % You must have at least 2 lines in the paragraph with the drop letter
19: % (should never be an issue)
20: 
21: % needed in second column of first page if using \pubid
22: %\pubidadjcol
23: 
24: % Reminder: the "draftcls" or "draftclsnofoot", not "draft", class option
25: % should be used if it is desired that the figures are to be displayed while
26: % in draft mode.
27: 
28: % An example of a floating figure using the graphicx package.
29: % Note that \label must occur AFTER (or within) \caption.
30: % For figures, \caption should occur after the \includegraphics.
31: %
32: %\begin{figure}
33: %\centering
34: %\includegraphics[width=2.5in]{myfigure}
35: % where an .eps filename suffix will be assumed under latex, 
36: % and a .pdf suffix will be assumed for pdflatex
37: %\caption{Simulation Results}
38: %\label{fig_sim}
39: %\end{figure}
40: 
41: 
42: % An example of a double column floating figure using two subfigures.
43: % (The subfigure.sty package must be loaded for this to work.)
44: % The subfigure \label commands are set within each subfigure command, the
45: % \label for the overall fgure must come after \caption.
46: % \hfil must be used as a separator to get equal spacing
47: %
48: %\begin{figure*}
49: %\centerline{\subfigure[Case I]{\includegraphics[width=2.5in]{subfigcase1}
50: % where an .eps filename suffix will be assumed under latex, 
51: % and a .pdf suffix will be assumed for pdflatex
52: %\label{fig_first_case}}
53: %\hfil
54: %\subfigure[Case II]{\includegraphics[width=2.5in]{subfigcase2}
55: % where an .eps filename suffix will be assumed under latex, 
56: % and a .pdf suffix will be assumed for pdflatex
57: %\label{fig_second_case}}}
58: %\caption{Simulation results}
59: %\label{fig_sim}
60: %\end{figure*}
61: 
62: 
63: 
64: % An example of a floating table. Note that, for IEEE style tables, the 
65: % \caption command should come BEFORE the table. Table text will default to
66: % \footnotesize as IEEE normally uses this smaller font for tables.
67: % The \label must come after \caption as always.
68: %
69: %\begin{table}
70: %% increase table row spacing, adjust to taste
71: %\renewcommand{\arraystretch}{1.3}
72: %\caption{An Example of a Table}
73: %\label{table_example}
74: %\centering
75: %% Some packages, such as MDW tools, offer better commands for making tables
76: %% than the plain LaTeX2e tabular which is used here.
77: %\begin{tabular}{|c||c|}
78: %\hline
79: %One & Two\\
80: %\hline
81: %Three & Four\\
82: %\hline
83: %\end{tabular}
84: %\end{table}
85: 
86: %\PARstart{T}{he} effectiveness of a routing protocol directly impacts both the
87: %end-to-end throughput and end-to-end delay experienced by a network.  Current
88: %network routing protocols are primarily concerned with deriving shortest-cost
89: %routes between a source and destination, i.e. they are tailored towards
90: %single-path routing.  Recently, there has been an increased emphasis on
91: %multi-path routing~\cite{chen98}~\cite{chen99}, in which routers maintain 
92: %multiple distinct paths of arbitrary costs between a source and destination. 
93: %Considering the increasing use of ad-hoc and sensor networks, the need for 
94: %the ability to distribute data traffic across multiple paths and quickly 
95: %adapt to dynamic network conditions shows that there are many potential 
96: %applications of multi-path routing.
97: 
98: %Multi-path routing presents several advantages over single-path routing. First,
99: %a multi-path routing protocol is capable of meeting multiple performance
100: %objectives: maximizing throughput, minimizing delay, bounding delay variation,
101: %and minimizing packet loss. Second, from a scalability perspective, it makes
102: %effective use of the graph structure of a network (as opposed to single-path
103: %routing which superimposes a logical routing tree upon the network topology).
104: %Third, multi-path routing protocols are more tolerant of network failures, as
105: %they are able to quickly sense failures and immediately correct for the failure
106: %by routing traffic only along the several other functioning paths which the protocol
107: %maintains.  Finally, multi-path routing algorithms are less susceptible to route
108: %oscillations, which enables the use of high-variance cost metrics that are
109: %better congestion indicators than current single-path routing algorithms, which
110: %face route oscillations due to switching routes as a step function.
111: 
112: %While multi-path routing is a desirable goal, the current Internet routing
113: %framework cannot be easily extended to support it. One solution is to develop a
114: %new multi-path routing framework, which necessitates changes to the Internet's
115: %networking protocol (IP). While it would allow us far more freedom to design a
116: %multi-path routing protocol with a clean-slate, the Internet Protocol is a globally deployed
117: %fixture on the internet, and it would be entirely impractical to require a
118: %global deployment for a new routing protocol to be functional. Therefore, the 
119: %approach presented in this paper is to study multipath routing within the 
120: %confines imposed by the current Internet Protocol.  This restriction leads to unique
121: %design decisions, while still providing optimal performance compared to the
122: %current single-path routing protocols in use on the internet.
123: 
124: %OLD--
125: % Routing protocols construct tables at each node that specify the next hop to use
126: % for data packet forwarding for each destination.  A minimal
127: % requirement is that the computed routing tables be free of loops when the
128: % network is stable. In dynamic environments, a more stringent requirement is that
129: % the routing tables be loop-free not only when the network is stable but at every
130: % instant, since loops, even temporary ones, can rapidly degrade performance.
131: %--OLD
132: 
133: %Multi-path routing can be qualified by the state maintained at each router.  
134: %For instance, a routing algorithm can maintain
135: %multiple, distinct, shortest-cost routing tables, where each routing table is
136: %based on a different cost metric.  This is referred to as a multi-metric
137: %multi-path routing approach.  Alternatively, another approach is to allow
138: %multiple network paths between a source and destination based on a single cost
139: %metric.  This means that routers may use sub-optimal paths, but the routing
140: %sends data on multiple paths to maximize network throughput.  This approach is
141: %referred to as single-metric multi-path routing.
142: 
143: %Multi-path routing algorithms can also be distinguished by their routing
144: %granularity.  Coarse-grained, connection-oriented, approaches adopt a
145: %path-per-connection view wherein all packets belonging to the connection follow
146: %the same path.  However, different connections between the source and
147: %destination hosts may follow different paths.  In contrast, fine-grained,
148: %connectionless approaches have no mechanism to associate packets to any
149: %higher-level notion of connection.  For true multi-path forwarding, the routing
150: %algorithm should forward packets between a source-destination pair along
151: %multiple paths, some of which may not necessarily be the shortest-cost paths.
152: %However, in the sense of maximizing overall network performance, fine-grained
153: %multi-path routing algorithms within a single-metric domain offer the most
154: %promise and will be the focus of the remainder of this paper.
155: 
156: %Our method of achieving multi-path routing is to extend single-path routing
157: %protocols. This extension is non-trivial for two reasons. First, we need
158: %mechanisms to incorporate state corresponding to multiple paths into the routing
159: %table. More importantly, we need new loop avoidance algorithms: current
160: %shortest-path routing algorithms use their optimality metric to implicitly
161: %eliminate loops. This assumption is untenable for multi-path routing in a
162: %single-metric domain. Resolving these issues typically requires routers to
163: %maintain routing state proportional to the number of paths in the network, which
164: %is impractical.
165: 
166: %We approach multi-path routing from the terminal perspective of {\em reachability
167: %routing}~\cite{srinidhi03}, with the goal being to determine all paths between a
168: %sender and a receiver, without the above mentioned state or consistency
169: %maintenance overhead.  While basic reachability routing is primarily concerned
170: %with determining multiple paths through the network, practical implementations
171: %are also interested in determining the relative quality of these paths, a form
172: %we call cost-dependent reachability routing. 
173: 
174: %In this paper, we propose that reachability routing can be achieved by
175: %exploiting the underlying semantics of probabilistic routing algorithms and
176: %present the case for reinforcement learning (RL) as the framework of choice to
177: %realize reachability routing. In particular, by employing the probabilistic
178: %nature of RL algorithms, we can guarantee that the likelihood of a packet
179: %getting trapped in a loop is zero, although there is a non-zero probability of
180: %entering a loop.  To completely model routing as a RL problem, we need
181: %trategies for gathering information about the environment, deriving routing
182: %tables by credit assignment, and building models of relevant aspects of the
183: %environment to enable routers to progressively improve their routing decisions.
184: 
185: %pasupath@vt.edu, Laurie told about you
186: 
187: % \PARstart{T}{he} 
188: The next generation of network technologies such as sensor
189: networks, peer-to-peer networks, ad-hoc wireless networks, and overlay networks
190: present challenges that have previously not been witnessed in the Internet
191: infrastructure.  These networks operate on large topologies which are highly
192: dynamic in terms of changes in cost and connectivity. In these contexts,
193: single-path routing protocols, the mainstay on current network topologies,
194: suffer from either route flap or temporary loss of connectivity when the primary
195: path fails. In addition, these protocols do not make effective use of the graph
196: connectivity between a sender and receiver in order to improve performance. 
197: Effectively, addressing these unique requirements demands routing protocols 
198: that can address a number of novel performance metrics.
199: %<need segway line> How did we get here?
200: 
201: % Some historical perspective is in order. 
202: %(as an aside, pure source routing was tried in the early Internet and, while
203: %still prevalent in some data center networks, is not scalable for graphs of
204: %arbitrary diameter). 
205: Historically, routing algorithms evolved from networks where the only
206: parameters available for making routing decisions were source and destination
207: addresses.
208: \footnote{While source routing is still prevalent in some data center networks, and was
209: present in the early Internet, it is not scalable for graphs of arbitrary
210: diameter.} These parameters by themselves do not
211: have sufficient discriminative capability to avoid loops. Hence optimality
212: criteria were added to the routing formulation to eliminate loops leading to
213: single-path routing, which no longer meets the needs of
214: the next generation of network technologies.
215: 
216: %Instead of loop elimination, if 
217: %loop avoidance can be achieved (i.e., we will get out loops eventually) it gives us 
218: %
219: %greater flexibility in laying out optimization constraints. For this reason, we 
220: %have chosen to sacrifice loop elimination in favor of loop avoidance.
221: %
222: %New optimization criteria include all-paths and cost-sensitive routing.
223: %\PARstart{N}{ew} directions in network routing research have not kept pace
224: %with the latest developments in network architecture, such as sensor networks, ad-hoc
225: %wireless networks, and overlay networks.  These new paradigms present us
226: %with unique performance requirements and demand routing protocols that can
227: %address a number of novel performance metrics.  
228: %
229: %The purpose of a routing protocol is to optimize a set of performance 
230: %criteria through the solving of dynamic programming formulations in a distributed
231: %environment.  In short, when routing traffic from a source node to a destination
232: %node, the routing protocol at an intermediate node routes the traffic along the 
233: %'best' path from the source to the destination  However, how the 'best' path is
234: %determined is left up to the routing protocol implementation.  
235: %
236: %A common characteristic among all of these new network technologies is the presence
237: %of highly dynamic network topologies, much more so than traditional networks.
238: %For example, in ad-hoc wireless networks where typical users serve as routers, a
239: %user turning his computer off has the same effect as shutting down a router on a 
240: %network. If the network were to use single-path routing, the disconnection of
241: %the router would partition the network.  This drives the need for routing protocols 
242: %that can cope with this dynamism, or even take advantage of it.  
243: %
244: %A trivial solution to this problem would be 'hot potato' routing where traffic
245: %is shifted to another randomly chosen router, with no guarantees on performance,
246: %which is obviously unacceptable. Another solution would be 
247: 
248: To address the needs of emerging network domains, in this paper we attempt to
249: build a routing protocol with the following characteristics.  First, the routing
250: protocol should be capable of converging to a solution even in highly dynamic
251: environments merely with local information i.e., the protocol does not require
252: any global knowledge of the topology.  Second, to maximize the bandwidth (and
253: connectivity) between any pair of nodes, the routing protocol should route along
254: multiple paths between them.  Third, the routing protocol should route as
255: efficiently as possible by selecting routes in inverse proportion to their
256: expected path cost.  Fourth, the protocol should avoid loops as much as
257: possible and guarantee not to get stuck in loops -- the emphasis is on loop
258: avoidance rather than loop elimination.  Finally, to be
259: of maximum practical value, the protocol should work within the confines imposed
260: by the Internet Protocol (IP) specification, including its header fields which only permit a
261: source and destination.  As mentioned before, source routing has been tried in IP
262: networks, but was discarded due to security issues, the lack of space in the IP
263: header to support full source routing for all nodes, as well as its lack of
264: scalability in large networks.
265: 
266: Note that these requirements place conflicting demands on routing protocol
267: design.  Different algorithms make differing trade-offs in this multi-constraint
268: space.  For instance, distance vector and link state algorithms achieve loop
269: elimination but are restricted to optimality-based single path routing.  MOSPF
270: achieves loop-free multi-path routing, only in the restricted case of paths with
271: identical costs. Hot potato routing achieves true multi-path routing but pays no 
272: attention to either loops or the `quality' of its paths. The
273: MPATH~\cite{mpath} algorithm and several of its variants achieve cost-sensitive
274: loop-free multi-path routing, at the expense of routing table storage overhead
275: proportional to the number of paths (which can be combinatorial). The
276: theoretically best, although practically naive, solution would be 
277: all-sources, all-paths routing.  This achieves the goal of correctness, however 
278: building and maintaining a complete and correct table of the entire network 
279: would be impractical for networks of any non-trivial size.  
280: 
281: % (because the Internet infrastructure is too set in its ways). 
282: While it is still true that source and destination are the only parameters
283: available for routing on the Internet infrastructure, there is a degree of
284: freedom thus far unexplored by routing algorithms.  Single-path {\em
285: deterministic} routing algorithms are driven by a need to achieve loop
286: elimination at any cost due to the disastrous effects of routing loops in such
287: algorithms.  However, for a {\em probabilistic} routing algorithm, this does not
288: necessarily have to be the case.  Therefore, if we relax the requirement for
289: loop elimination and instead seek to achieve loop avoidance by guaranteeing to
290: exit loops once they are entered, we are given greater flexibility in laying out
291: optimization constraints. For this reason, we have chosen to take a
292: probabilistic approach and to sacrifice loop elimination in favor of loop
293: avoidance.
294: 
295: The undue emphasis on optimality thus far has created algorithms that aggressively
296: eliminate loops. This has led to implementations that are intolerant of loops.
297: On the other hand, the ability to tolerate loops opens up new exploration strategies
298: for true cost-sensitive multi-path routing that work under the constraints
299: presented above. We therefore begin with the terminal perspective of {\em reachability
300: routing}, where the goal is merely to reach a destination. Hot potato routing can be
301: viewed as a limiting example of reachability routing but we clearly want to do
302: better. From this perspective, we are in the unique position of being able to explore 
303: the trade-off between eliminating loops and improving efficiency of selecting paths.
304: 
305: Our specific formulation of reachability routing is probabilistic, multi-path, and
306: cost-sensitive by efficiently distributing traffic among all paths leading to a
307: destination.  This type of routing can be viewed as solving an optimization
308: problem which maximizes the number of paths between two nodes by discovering all
309: the paths, and then derives the probability to route on a given path by
310: assessing the path costs leading to the destination.  
311: 
312: In particular, we study reachability routing through the lens of reinforcement
313: learning, which provides a mathematical framework for describing and solving
314: sequential Markov decision problems (MDPs). The states are the nodes, the
315: actions are the choice of outgoing links, and rewards correspond to
316: path costs associated with the state transitions. A value function imposed on
317: the MDP (e.g., discounted sum of rewards along a path) essentially leads to
318: an optimization problem, whose solution is a policy for routing. Intrinsically,
319: this is what all routing algorithms based on dynamic programming do. However,
320: single-path routing algorithms learn the best deterministic policy that 
321: solves the MDP. In this paper, the routing algorithm learns stochastic policies 
322: that achieve cost-sensitive multi-path routing.
323: 
324: %%Additionally, the process of routing is not only a sequential decision making
325: %process, but can be considered to be Markovian, as the decisions a router 
326: %makes on a packet are only a factor of the data contained within the packet, 
327: %such as destination and arrival port, rather than the packet's previous history.  
328: %
329: %Also, due
330: %to the highly dynamic nature of routing, routing can be considered Markovian,
331: %because previous state presents an outdated view of the network, and therefore
332: %is not much use to the decision being made.
333: %
334: %In the case of routing protocols, what is 'learned' is a policy that optimizes 
335: %a value function indicating the type of routing protocol being described.  
336: %As such, traditional shortest path routing can be viewed as a solution to one 
337: %type of reinforcement learning problem with a particular value function.  
338: %Alternatively, reachability routing is the result of solving for another value 
339: %function.  
340: 
341: Our previous work~\cite{srinidhi03} has indicated that such an approach achieves
342: true multi-path routing, with traffic distributed among the multiple paths in
343: inverse proportion to their costs.  In addition, in order for our reachability
344: routing protocol to be of practical use, we are guiding our design decisions by
345: the requirement that the protocol work within the confines imposed by the
346: currently deployed Internet Protocol (IP) architecture.
347: 
348: While multi-path routing is not new, we believe that our notion of reachability
349: routing represents a promising new direction in the field.  Applying
350: reinforcement learning in this way is a powerful tool enabling reachability
351: routing to optimize overall network throughput, while at the same time
352: providing built-in fault tolerance and path redundancy.  Additional
353: applications of reinforcement learning within this domain hold the potential to
354: further optimize routing behavior by adaptively refining the performance
355: parameters of the algorithm in response to changes in the network topology.
356: 
357: The remainder of this paper is organized as follows: Section II provides an overview
358: of reinforcement learning, its applicability to network routing, and significant
359: previous work done on the topic. In Section III we introduce a new model-based
360: routing algorithm based on RL and describe its implementation in Section IV.
361: Section V presents evaluation results and Section VI concludes with a summary of
362: our contributions and directions for future research in the area.
363: 
364: \section{Ants and Reinforcement Learning}
365: Reinforcement learning~\cite{littman96}~\cite{sutton98} is the process of an agent learning to
366: behave optimally, over time, as a result of trial-and-error interacting within a dynamic
367: environment. Reinforcement learning problems are organized in terms of
368: discrete episodes, which, for the purposes of packet routing, consist
369: of a packet finding its way from an originating source to its intended destination. 
370: Routing table probabilities are initialized to small random values, thus enabling 
371: them to begin routing immediately except that most of the routing decisions will 
372: not be optimal or even desirable. To improve the quality of the routing decision, 
373: a router can `try out' different links to see if they produce good routes, a mode of
374: operation called {\em exploration}. Information learned during exploration can
375: be used to drive future routing decisions. Such a mode is called {\em
376: exploitation}. Both exploration and exploitation are necessary for effective
377: routing.
378: 
379: Our RL routing algorithm is a form of ant-colony optimization~\cite{dorigo99}, in which messages
380: called {\em ants} are used to explore the network and provide reinforcements for
381: future packet routing. The ants transiting the network provide intermediate
382: routers with a sense of the reachability and relative cost of reaching the node
383: which the ant originated from.  In order to overcome the problems of selective path
384: reinforcement, which deterministically converge to shortest paths, our model
385: separates the data collection aspects of the algorithm from the packet routing
386: functionality, as was proposed by Subramanian~et~al.~\cite{subramanian97}.  In
387: our model the ants only perform the role of gathering information about the network,
388: which is then used to guide packet routing decisions. 
389: 
390: Three parameters must be considered when applying ants in a routing framework:
391: the rate of generation of ants, the choice of their destinations, and the
392: routing policy used for ants.  RL algorithms perform iterative stochastic
393: approximations of an optimal solution, so the rate of ant generation directly
394: affects their convergence properties, shown by Di~Caro~et~al. in AntNet~\cite{dicaro98}.  From a practical
395: perspective in multi-path routing, we would like to choose destinations for the
396: ants that will provide the most useful reinforcement updates; hence a uniform
397: distribution policy assures good exploration. Finally, the policy used to route
398: ants affects the paths that are selectively reinforced by the RL algorithm. As
399: our goal is to discover all possible paths, the policy used to route ants should
400: be independent of that of the data traffic. If we do not separate the policies,
401: then we would end up with the same problem of selective reinforcement as found
402: in the Q-routing~\cite{subramanian97} algorithm.
403: 
404: In the context of reinforcement learning using ants, effective credit assignment
405: strategies rely on the expressiveness of the information carried by the ants.
406: The central idea behind credit assignment is to determine the relative quality
407: of a route and apportioning blame. In the case of routing, credit assignment
408: creates a push-pull effect. Since the link probabilities have to sum to one,
409: positively reinforcing a link (push) results in negative reinforcements (pull)
410: for other links.
411: 
412: In the simplest form of credit assignment, called backward learning, ants carry
413: information about the ingress router and path cost as determined by the
414: network's cost metrics. At the destination, this information can be used to
415: derive reinforcement for the link along which the ant
416: arrived~\cite{subramanian97}. Another strategy, known as forward learning, is to
417: reinforce the link in the forward direction by sending an ant to a destination
418: and bouncing it back to the source~\cite{dicaro98}. Subramanian et
419: al.~\cite{subramanian97} adapt the former approach. Ants proceed from randomly
420: chosen sources to destinations independent of the data traffic.  Each ant
421: contains the source where it was released, its intended destination, and the
422: cost $c$ experienced thus far. Upon receiving an ant, a router updates its
423: probability to the ant source (not the destination), along the interface by
424: which the ant arrived.  This is a form of backward learning and is a trick to
425: minimize ant traffic.
426: 
427: Specifically, when an ant from source $s$ to destination $d$ arrives along
428: interface $i_k$ to router $r$, $r$ first updates $c$ (the cost accumulated by the
429: ant thus far) to include the cost of traveling interface $i_k$ in reverse. $r$
430: then updates its entry for $s$ by slightly nudging the probability up for
431: interface $i_k$ (and correspondingly decreasing the probabilities for other
432: interfaces). The amount of the nudge is a function of the cost $c$ accumulated
433: by the ant. It then routes the ant to its desired destination $d$. In
434: particular, the probability $p_k$ for interface $i_k$ is updated as:
435: \[
436: p_k = \frac{p_k + \Delta p}{1 + \Delta p}, 
437: p_j = \frac{p_j}{1 + \Delta p}, 
438: \]
439: \[
440: 1 \le j \le n, j \ne k
441: \]
442: where $\Delta p = \frac{\lambda}{f(c)}, \lambda > 0$ and $f(c)$ is a
443: non-decreasing function of $c$.
444: 
445: Two types of ants, {\em regular ants} and {\em uniform ants}, are supported to
446: handle the routing aspect of the algorithm. Regular ants are forwarded
447: probabilistically according to the routing tables, which ensure that the routing
448: tables converge deterministically to the shortest paths in the network. Regular
449: ants treat the probabilities in the routing tables as merely an intermediate
450: stage towards learning a deterministic routing table. They are good exploiters
451: and are beneficial for convergence in static environments. With uniform ants,
452: the ant forwarding probability follows a uniform distribution, wherein all links
453: have equal probability of being chosen. This ensures a continued mode of
454: exploration and helps keep track of dynamic environments. In such a case, the
455: routing tables do not converge to a deterministic answer; rather, the
456: probabilities are partitioned according to the costs. The constant state of
457: exploration maintained by the uniform ants ensures a true multi-path forwarding
458: capability.
459: 
460: \section{Motivation}
461: Our primary design objective is to achieve cost-sensitive multi-path forwarding, 
462: while at the same time eliminating the entry of loops as much as possible. We
463: have made a series of improvements to the uniform ants algorithm proposed by
464: Subramanian~et~al.~\cite{subramanian97}, culminating in a novel model-based
465: routing algorithm.
466: 
467: \begin{figure*}
468: \centering
469: \includegraphics[scale=0.6]{velcro1}
470: \vline
471: \includegraphics[scale=0.6]{velcro2}
472: \vline
473: \includegraphics[scale=0.6]{velcro4}
474: \caption{Velcro topologies with different cost ratios.}
475: \label{velcro_topo}
476: \end{figure*}
477: 
478: Let us begin by observing that uniform ants are natural multi-path routers;
479: according to Proposition 2 in Subramanian~et~al.~\cite{subramanian97}, the
480: probability of choosing an interface is aligned in inverse proportion to cost
481: ratios. The reader might be tempted to conclude that uniform ants inherently support
482: reachability routing; however consider the three velcro topologies of
483: Figure~\ref{velcro_topo}.  These topologies have the same underlying graph
484: structure but differ in the costs associated with the main branch paths (the
485: direct path from 0 to 19, and the path through nodes 1, 7, and 13).  
486: 
487: Uniform ants explore all available interfaces with equal probability; while this
488: makes them naturally suitable for multi-path routing, it also creates a tendency
489: to reinforce paths that have the least amount of decision making. To see why,
490: recall that the goodness of an interface is inversely proportional to a
491: non-decreasing function of the cost of the path along that interface. The cost
492: is not simply the cost of the shortest path along the interface, but is itself
493: assessed by the ants during their exploration; hence the routing probability for
494: choosing a particular interface is implicitly dependent on the number of ways in
495: which a costly path can be encountered along the considered interface.  The
496: presence of loops along an interface means that there are greater opportunities
497: for costly paths to be encountered (causing the interface to be reinforced
498: negatively) or for the ants to loop back to their source (causing their
499: absorption, and again, no positive reinforcement along the interface). 
500: 
501: The basic problem can be summarized by saying that ``interfaces that provide an
502: inordinate number of options involving loops will not be reinforced, even if
503: there exists high-quality loop-free sub-paths along those interfaces.''
504: Mathematically, this is a race between the negative reinforcements due to many
505: loops (and hence absorptions), and positive reinforcements due to one (or few)
506: short or cheap paths. As a result, the interface with the fewer possibilities
507: for decision making wins, irrespective of the path cost. Hence in the topologies
508: shown in Figure~\ref{velcro_topo}, uniform ants will reinforce along: the
509: costliest path (left), among one of many cheapest paths (center) and the
510: cheapest path (right).  Notice that using regular ants to prevent this incessant
511: multiplication of probabilities is not acceptable, as we will be giving up the
512: multi-path forwarding capability of uniform ants.
513: 
514: Ideally, we want our ants to have selective amnesia, behaving as uniform ants
515: when it is important to have multipath forwarding and morphing into regular ants
516: when we do not want loops overshadowing the existence of a cheap, loop-free
517: path. We present a model-based approach that achieves this effect by maintaining
518: a statistics table independent of the routing table. The basic idea is to make
519: routers recognize that they constitute the fulcrum of a loop with respect to a
520: larger path context. 
521: 
522: For instance, in Figure~\ref{velcro_topo}, nodes 1, 7, and
523: 13 form fulcrums of loops, which should not play a role in multi-path forwarding
524: from, say, node 0 to node 19. The statistics table maintains, for each router
525: (node) and destination, the number of ants generated by it and the number that returned
526: without reaching its intended destination. Using these statistics, for instance,
527: node 1 can reason that all ants meant for destination 19 returned to it, when
528: sent along the interface leading to node 2. This information can be used to
529: reduce the scope of multi-path forwarding, on a per-destination basis.  The
530: statistics table serves as a discriminant function for the choices indicated by
531: the routing table, while the routing table reflects the reinforcement provided
532: by the uniform ants.
533: 
534: \section{Protocol Model}
535: \subsection{Ant Structure}
536: Ants are small packets used to explore and gather information about
537: the network. Periodically each source node $s$ generates, to every other destination
538: $d$, ants of the form $[s, d, c, o_i]$, where $c$ is the cost associated with
539: the ant and $o_i$ is the outgoing interface from the source router.  When the
540: ants are created the cost $c$ is initialized to $0$. All the intermediate
541: routers along the path from the source to destination increment the cost $c$ to
542: reflect the cost in reverse (when a message traverses a link from node $a$ to
543: node $b$, $c$ is incremented by the cost of the link from $b$ to $a$). When the
544: ant reaches the destination $d$, the cost $c$ is the end-to-end cost of sending
545: a message from source $s$ to destination $d$. Note the intermediate nodes along
546: the path do not update $o_i$.
547: 
548: \subsection{Routing Table Structure}
549: The routing table at each node is a two-dimensional array of the probabilities
550: of using various interfaces to reach destinations.  $RoutingTable_i[j][k]$, maintained at
551: node $i$, is the probability with which the interface $k$ of node $i$ is chosen
552: to reach destination $j$. Initially the probabilities for all destinations are
553: distributed equally across all the interfaces. This is in-line with the
554: destructive property of RL routing algorithms in which all interfaces are
555: ``innocent until proven guilty.''
556: 
557: \subsection{Statistics Table Structure}
558: The statistics table is also a two dimensional structure like the routing table,
559: except each node has two statistics tables.  $SentStatTable_i[j][k]$
560: corresponds to the number of ants sent along interface $k$ to destination $j$
561: originating from node $i$, and $ReturnedStatTable_i[j][k]$ is the number of ants
562: sent along the interface $k$ to destination $j$ which returned to their source
563: $i$.
564: 
565: The ant statistics are maintained only at the source node, and not at the
566: intermediate nodes, to allow for scalability of the algorithm.  If every intermediate 
567: node $n$ along the path of an
568: ant from source $i$ to destination $j$ increments its statistics table
569: $SentStatTable_n[j][m]$ when it forwards the ant along the interface $m$, it
570: would necessitate the ant to have a provision to save the outgoing interface for
571: each node along its path, so that the node will be able to identify if the ant
572: loops back to itself. Accommodating such a structure in large topologies would
573: result in unbounded growth of the ant's size.  Additionally, the ants are not
574: forwarded when they reach the destination or the source. By updating the
575: statistics table only at the source nodes, if the ant doesn't loop back to
576: itself, the source node can safely assume that it has reached the destination
577: (Under 100\% reliability conditions that no packets are dropped); whereas the
578: intermediate nodes would have no way of determining whether the ant reached the
579: destination successfully, or whether it looped back to the source node itself.
580: 
581: \begin{table}
582: \caption{Model-based Ant Routing Algorithm}
583: \label{code}
584: \centering
585: \begin{verbatim}
586: procedure Main
587:   begin:
588:     Uncontrolled Exploration
589:     Controlled Exploration
590:   end.
591: 
592: procedure Exploration (Uncontrolled | Controlled)
593:   begin:
594:     for every node in the topology
595:     begin:
596:       GenerateAnt; /* Periodically Generate Ant */
597:       SelectInterface (Uncontrolled | Controlled);
598:       UpdateModel;
599:       ForwardAnt;
600:     end.
601:   end. /* End of exploration procedure */
602: 
603: procedure ReceiveAnt
604:   begin:
605:     if the receiving node is the source of the ant
606:     begin:
607:       UpdateModel;
608:       DestroyAnt;
609:     end.
610:     if the receiving node is
611:       neither the source nor the destination
612:     begin:
613:       UpdateRouteTable;
614:       SelectInterface(Uncontrolled | Controlled)
615:       ForwardAnt;
616:     end.
617:     if the receiving node is the
618:       intended destination of the ant
619:     begin:
620:       UpdateRouteTable;
621:       DestroyAnt;
622:     end.
623:   end. /* End of receive ant procedure */
624: \end{verbatim}
625: \end{table}
626: 
627: \subsection{Description of the Algorithm}
628: An overview of the algorithm is given in Table \ref{code}.  The algorithm
629: consists of two stages: Uncontrolled Exploration and Controlled Exploration. In
630: both forms of exploration, each node periodically generates ants destined for
631: every other node in the topology. The algorithm uses uncontrolled exploration to
632: collect information about the topology and uses that information to build a
633: model to control future exploration at the nodes. The information collected
634: during the controlled exploration is used to update the model as well.  The two
635: forms of exploration work almost identically except for the SelectInterface
636: method. The following is a brief description of the various methods used in the
637: algorithm above.
638: 
639: \subsubsection{GenerateAnt}
640: This method generates an ant of
641: the form $[s, d, 0, undefined]$, where $s$ is the source node generating the ant
642: and $d$ is the intended destination. The initial cost $c$ associated with the
643: ant is set to $0$. The SelectInterface method determines the output interface,
644: so at this point, the output interface is undefined immediately after the ant is created.
645: 
646: \subsubsection{SelectInterface}
647: Due to the probabilistic nature of the routing algorithm, it is essential to 
648: ensure that the choice of the destination node for each ant at each node is 
649: uniformly distributed, so that the number of ants generated to the various 
650: destinations is nearly equal. This method differentiates between the two forms 
651: of exploration mentioned above, however both forms choose the output interface
652: uniformly, although the valid interfaces for Controlled Exploration are slightly 
653: constrained for optimization.
654: 
655: \begin{itemize} 
656: \item{\bf Uncontrolled Exploration: } Here the choice of the outgoing interface
657: at each node along the path from the source to destination is unbiased, i.e.
658: every interface at that node has equal probability of being chosen as the
659: outgoing interface.  The node generating the ant chooses one interface from its
660: interfaces and forwards the ant along that interface. If an intermediate node
661: (not the intended destination node) receives an ant along interface $A$ and
662: has interfaces other than $A$, it forwards the ant on some interface other than
663: $A$.  If it does not have any other interface then it sends-back along the
664: interface $A$ itself.
665: 
666: \item{\bf Controlled Exploration: }Here the choice of outgoing interface is
667: controlled by a variable called the threshold factor ($\tau$) ranging from $0$ to
668: $1$. The threshold factor not only affects the multipath capabilities of the routing
669: algorithm, but also its loop-free capabilities and its correctness with respect to 
670: the routing of packets (measured by the percentage of packets successfully reaching 
671: their intended destinations). 
672: \end{itemize}
673: 
674: Formally, the threshold factor works in the following manner: When a node $i$ 
675: (source or intermediate) needs to forward an ant intended for destination $j$, 
676: finds the ratio of $ReturnedStatTable_i[j][k]$ to
677: $SentStatTable_i[j][k]$ for each of its interfaces $k_1\cdots k_n$. All those
678: interfaces whose ratios are less than the threshold $\tau$ are eligible for
679: selection as a forwarding interface.  Then the selection policy is to choose among
680: the eligible interfaces with equal probability.  Three special cases must be
681: handled in the case of controlled exploration:
682: \begin{itemize}
683: \item{\bf Case 1}
684: When an ant arrives at a leaf node, i.e. there are no other interfaces other
685: than the incoming interface, and if it is not the intended destination then the
686: node sends-back the ant along the same interface.
687: \item{\bf Case 2}
688: When all the interfaces at the intermediate node are ineligible, i.e. their
689: statistic table ratios are above the threshold , then the node sends-back the
690: ant along the interface it originally received the ant from.
691: \item{\bf Case 3}
692: When all the interfaces at the source node are ineligible then the source node
693: uses the uncontrolled exploration selection policy to break the deadlock. This
694: case is a very rare occurrence and occurs only when  is set to a very low value.
695: \end{itemize}
696: 
697: Once the outgoing interface is selected the next step is to forward the ant
698: along the chosen interface (ForwardAnt).  In the case of source node, before
699: calling the ForwardAnt, UpdateModel is called to update the statistics table.
700: 
701: \subsubsection{UpdateModel}
702: This method updates the statistics tables when an ant is generated or loops back
703: to its source.  The correctness and currency of the statistics tables are vital
704: to the performance of the router.  When the node generates the ant $[i, j, c, k]$, 
705: it increments its statistic table entry $SentStatTable_i[j][k]$ by $1$ to indicate 
706: that interface $k$ was chosen by $i$ to forward the ant intended for destination $j$.
707: Also, when an ant $[i, j, c, k]$ loops back to the source node, the statistic table
708: entry $ReturnedStatTable_i[j][k]$ is incremented by $1$ to indicate that the
709: choice of interface $k$ to route the ant intended to destination $j$ resulted in
710: a loop.  This can be considered a negative reinforcement in the behavior of the
711: router.
712: 
713: \subsubsection{ForwardAnt}
714: This method is used to forward the ants from the current node to the next node 
715: along the interface chosen by the SelectInterface method.
716: 
717: \subsubsection{DestroyAnt}
718: When the ant reaches the intended destination or loops back to its source
719: itself, the ant is not forwarded further and the node absorbs the ant.
720: 
721: \subsubsection{UpdateRouteTable}
722: When any node $t$ (intermediate or the intended destination) other than the
723: source node, receives an ant $[i, j, c, k]$ on interface $l$ from node $y$, it
724: updates the cost $c$ by adding the cost of traversing the interface $l$ in
725: reverse, and then updates its routing table entries for node $i$ as follows:
726: \[
727: rt[i][l] = \frac{rt[i][l] + \Delta p}{1 + \Delta p}, 
728: rt[i][m] = \frac{rt[i][m]}{1 + \Delta p}
729: \]
730: \[
731: 1 \le m \le n, l \ne m
732: \]
733: where $\Delta p = \frac{\lambda}{f(c)}$, $\lambda > 0$ and $f(c)$ is a
734: non-decreasing function of $c$.
735: 
736: \subsection{Qualitative Characteristics}
737: The model-based routing algorithm presented above discards all {\em useless
738: loops}, in which all traffic exiting the loop must exit at the same point which
739: it entered, such as the fulcrum points in the velcro topologies shown in
740: Figure~\ref{velcro_topo}.  For instance, in these velcro topologies,
741: when node 1 sends out a packet intended for a destination other than those nodes
742: in the loop pivoted at 1, either on the interface leading to node 2 or node 6,
743: the result will be the packet returning to node 1. From the statistics table,
744: node 1 will learn that those interfaces are useless for forwarding packets to
745: certain destinations and hence avoid them in the future. By discarding all the
746: useless loops, this algorithm overcomes the problem of the uniform ants
747: algorithm wherein only the path with the least decision-making is reinforced.
748: 
749: The threshold factor $\tau$ influences the reinforcement of the various paths of
750: a topology. At very high values of $\tau$, the algorithm tends towards behaving
751: like uniform ants while continuing to avoid all the useless loops. For instance
752: a $\tau$ value of 1 means that an interface where all but one packet sent on it
753: looped back may still be selected as an outgoing interface. At the same time
754: this setting still avoids all the interfaces that lead to useless loops, as all
755: packets sent along them must have come back to the sender. 
756: 
757: At high $\tau$ values, certain packets may encounter one or more loops along
758: their path that are unavoidable. At very low values of $\tau$, the nodes have a
759: limited selection of interfaces to choose from due to the stringent
760: loop-avoidance criteria, which will affect our goal of multi-path routing, but
761: will greatly decrease the probability of encountering a loop.  The choice of
762: $\tau$ factor determines the multipath, correctness, and loop-avoidance
763: capabilities of our algorithm. The threshold factor can either be set to a fixed
764: value (for the network, or on a per-router or per-router/destination-pair basis)
765: or can be adaptively refined to optimize model-based routing for various
766: criteria.
767: 
768: \section{Evaluation}
769: \subsection{Experimental Setup}
770: To measure the performance of our cost-sensitive reachability routing algorithm,
771: we wrote a discrete event simulator in C to simulate a standard
772: point-to-point topology based network. The simulated network is modeled as a set
773: of nodes interconnected over point-to-point links, each with an associated cost. The
774: discrete event simulator was derived from work done in~\cite{srinidhi00}, and
775: has been used in several networking courses to model routing algorithms.
776: 
777: The simulator runs at a resolution of 1 $\mu s$ and an integer value defined at
778: the initialization of the simulation determines the duration of the simulation.
779: In our case, the simulation runs were set to INTMAX (2147483647 as defined in
780: $<$ limits.h $>$).  As it is a discrete event simulator, every action takes
781: place after the expiration of a timer and the simulator is programmed to run in
782: uncontrolled exploration mode for the first one eighth of the time and in
783: controlled exploration mode for the remaining time. Each node generated an ant
784: every 10000 $\mu s$. For the purpose of this paper, we programmed the link layer
785: of the simulator to be reliable, i.e. it does not introduce any errors or drop
786: packets.
787: 
788: \subsection{Topologies}
789: A utility provided along with the simulator~\cite{srinidhi00}, when given the
790: number of nodes in the network and number of interfaces per node, is able to
791: generate four different interconnected topologies for the network, namely: tree,
792: clique (fully connected mesh), arbitrary graph, and loop topologies. The automated 
793: topology generating utility was used to generate the tree and arbitrary graph 
794: topologies used in the simulations.
795: 
796: Using the manual topology generator provided along with the simulator, complex
797: topologies such as the velcro and dumbbell topologies were created. These
798: topologies have some intrinsic characteristics helpful in demonstrating the
799: range and effectiveness of our algorithm.
800: 
801: A clique topology generator was written in C, which, when given the number of rows and
802: columns in the clique, will generate a perfect clique topology wherein all the
803: interior nodes will be of degree 4 and all the boundary nodes will be of degree
804: 2 or 3.
805: 
806: Finally, BRITE, the Boston university Representative Internet Topology
807: gEnerator~\cite{brite}, was used to generate large Internet scale topologies. It
808: provides a wide variety of generation models, as well as the ability to extend 
809: such a set by combining existing models or adding new ones. We used the
810: Router Waxman Flat Router-level model, which is governed by a power law, to
811: generate the topologies. A program in C was written to convert the topology
812: format generated by BRITE to the format used by our simulator.
813: Topologies with sizes ranging from 20 to 200 nodes were generated using BRITE.
814: 
815: Our model-based routing algorithm was first validated in~\cite{srinidhi03}, by
816: examining its performance when applied to routing on synthetic worst-case
817: scenario topologies, such as velcro topologies.  This previous work also
818: presented a subtle modification to the algorithm, avoiding sub path reinforcement, 
819: which results in better performance on certain types of topologies. 
820: 
821: % Second, we quantify the convergence
822: % of our routing algorithm by measuring the correlation of path costs and hop
823: % counts between all packets sent to and originating from the nodes under
824: % consideration. In our case, the nodes under consideration were those with the
825: % maximum and minimum degree. 
826: 
827: The primary contribution of this work is to study data traffic across the network
828: based on converged routing tables and introduce a new factor called the
829: reachability factor ($\phi$) that controls the choice of the outgoing
830: interfaces. We investigate the effect of the threshold factor ($\tau$) and the
831: reachability factor on various topologies with the help of an operating curve
832: aimed at helping network administrators in choosing the ideal threshold and
833: reachability factors for their networks. We also show that by making the nodes
834: always choose the interface with the highest probability for the intended
835: destination, our model-based routing algorithm behaves in the same way as any
836: other single-path deterministic routing algorithm i.e., it provides loop-free
837: shortest-paths with guaranteed delivery for all packets. 
838: 
839: Additionally, we show that even though the goal of every multi-path routing
840: algorithm is to avoid loops, our model-based routing algorithm does not
841: guarantee a complete elimination of loops.  Nevertheless our algorithm
842: guarantees that a packet will eventually exit the loop and reach its intended
843: destination. We study the distribution of loops encountered by packets and show
844: that a vast majority of packets encounter only a small number of loops, or none
845: at all.
846: 
847: \subsection{Packet Routing Using Model Based Routing}
848: In this set of experiments, a new application was written on top of the
849: simulator to route packets based on the routing table learned by ants exploring
850: the network. Initially, we ran the model-based routing algorithm on the given
851: topology to obtain a stabilized routing table. Next, we ran the application with
852: the routing table and the reachability factor as parameters and collected
853: various statistics. Below we will discuss in detail the application, the
854: reachability factor, the statistics collected, and analysis of the statistics
855: obtained from both model-based and uniform ants routing.
856: 
857: The functioning of the application is similar to the one described earlier,
858: except that there is no update of the routing table.  The routing table is
859: pre-initialized to that obtained from the model-based routing simulation and
860: remains constant throughout. By not updating the routing table based on the
861: packets arriving at every node we are just exploiting the model and not
862: exploring the network further.  It should be noted that a real world router
863: would constantly explore the network with ants, and use the resulting routing
864: table to route packets simultaneously.  However, to determine the effectiveness
865: of the underlying algorithm, it is simpler to analyze its performance in a
866: static network environment.
867: 
868: The reachability factor $\phi$  controls the degree of freedom each node has in
869: choosing the outgoing interface. At each node the outgoing interfaces are
870: ordered in descending order of their probabilities for every destination. When a
871: node $n$ needs to route a packet intended for destination $d$, it picks the top
872: $\phi$ interfaces for that destination and uses their scaled up probabilities
873: for selecting the outgoing interface. For a better understanding of the
874: reachability factor, consider the following example. Say a node $M$ has 4
875: interfaces $A$, $B$, $C$, and $D$ with associated probabilities $0.4$, $0.2$,
876: $0.15$, $0.15$ for destination $N$; then a $\phi$ value of $2$ will allow the
877: node $M$ to choose from interfaces $A$ and $B$ with probabilities $(0.4)/(0.4 +
878: 0.2)$ and $(0.2)/(0.4+0.2)$ respectively i.e. node $M$ will choose interface $A$
879: 66.67\% of the time and interface $B$ 33.33\% of the time to route the packet
880: intended for destination $N$.
881: 
882: The statistics collected include the number of loops encountered by the packets
883: along their paths, the number of packets encountering loops, the multipath
884: capability of the packets, and the percentage of packets successfully reaching
885: their intended destination. To determine the number of loops encountered by the
886: packets, each packet has a stack associated with it. Every node, before
887: forwarding a packet, checks to see if its {\em id} already exists in the stack. If its
888: {\em id} is present in the stack, it increments the loop counter of the packet by $1$
889: and pops the contents of the stack up to its {\em id} else pushes its {\em id} onto the
890: stack and then forwards the packet. At the end of the simulation we have
891: statistics on the number of packets encountering loops (loop percentage) and the
892: total number of loops encountered by all the packets. Every packet also has a
893: multipath flag associated with it that is set if any node along the path taken
894: by the packet has more than one outgoing interface to choose from. This is used
895: to determine the percentage of packets that could have potentially taken more
896: than one path to reach their intended destination (multipath percentage).
897: Finally, we determine the success percentage as the percentage of packets
898: successfully reaching their intended destination.
899: 
900: \subsubsection{Reachability factor $\phi = 1$}
901: In our first set of experiments $\phi$ was set to $1$ so that the nodes always
902: choose the best outgoing interface (interface with the highest probability) for
903: each packet. As each packet deterministically chooses the best interface at
904: every node, the multipath percentage is zero. A $\phi$ value of $1$ also results
905: in the avoidance of loops and a one hundred percent success percentage as all
906: the packets reach their intended destination. According to proposition 2 of
907: Subramanian~et~al.~\cite{subramanian97} the probability of choosing an interface
908: is inversely proportional to the cost ratios (under the assumption of loop free
909: paths).  Keep in mind that this proposition applies even for our modified
910: model-based algorithm as all the avoidable loops are avoided and also we have
911: shown in~\cite{srinidhi03} that the probabilities are inversely proportional to
912: the path costs. By choosing the interface with the highest probability, i.e. the
913: interface that advertised a lower cost path to that destination, at every node
914: we have achieved deterministic shortest path routing while still using the
915: underlying probabilistic routing table. 
916: 
917: The following set of simulations were done on 20 to 100 node BRITE topologies
918: with uniform cost distribution so that with $\phi = 1$ the path taken by all the
919: packets will not only correspond to the shortest path in terms of cost but also
920: in terms of the number of hops. By sending packets across the network and
921: keeping track of their hop count, we ascertained the shortest path length
922: between every source-destination pair. At the end of the simulations, the
923: average shortest path length for the topologies were calculated and compared
924: with the theoretical shortest path lengths. We then attempt to fit this
925: empirical data onto parametrized formulas.
926: 
927: Below we discuss the derivation of average shortest-path lengths for
928: exponentially distributed graphs based on~\cite{newman01}.  The Router Waxman
929: model of BRITE uses an exponentially distributed generation function to create
930: the topologies.  According to~\cite{newman01}, the generating function $G_0(x)$
931: should be normalized such that $G_0(1) = 1$.
932: 
933: We use the following generating function for our derivation:
934: \[
935: G_0(x) = \frac{1 - e^{-1/\kappa}}{1 - xe^{-1/\kappa}}
936: \]
937: According to~\cite{newman01}, the average shortest path length is given
938: by:
939: \[
940: l = \frac{\ln{N / z_1}}{\ln{z_2 / z_1}} + 1
941: \]
942: for $N \gg z_1$ and $z_2 \gg z_1$, where $N$ corresponds to the
943: number of nodes in the topology, and $z_m$ corresponds to the
944: average number of $m$th-nearest neighbors with $z_1 = G'_0(1)$
945: and $z_2 = G''_0(1)$. We derived $l$ to be:
946: \[
947: l = 1 + \frac{\ln{N} + \ln{e^{1/\kappa}-1}}{\ln{2} - \ln{e^{1/\kappa} - 1}}
948: \]
949: 
950: From this equation we derived the value of $\kappa$ to be
951: \[
952: \kappa = \frac{1}{\ln{\frac{2^{\frac{l-1}{l}}}{N^{\frac{1}{l}}}}}
953: \]
954: 
955: Based on the above derivations, a least square fit was conducted on the
956: simulation results, which returns both $\kappa$ and the square of the
957: correlation coefficient with values ranging of $0$ and $1$, indicating bad or
958: good fit respectively. In our case, the fit returned a value of $0.986551$,
959: which indicates that the best fit line summarizes the data very well as shown in
960: Figure~\ref{shortest_path}.
961: 
962: \begin{figure}
963: \vspace{0.14in}
964: \centering
965: \includegraphics[scale=0.35]{shortest_path}
966: \vspace{0.14in}
967: \caption{Least square fit between the theoretical and actual shortest path
968: lengths.}
969: \label{shortest_path}
970: \end{figure}
971: 
972: \subsubsection{Reachability factor $\phi =$ maximum degree}
973: By setting the reachability factor to the maximum degree of the topology, each 
974: node will be allowed to choose among all its interfaces to be the outgoing
975: interfaces (based on the probability associated with it for the intended
976: destination). The simulations were run on the following topologies: 20 to 200
977: node BRITE topologies, 10x4 \& 8x5 clique topologies and the velcro topologies
978: described in Figure~\ref{velcro_topo}. {\em Operating curves} of the percentage 
979: of packets encountering loops were plotted against the percentage of those with 
980: multipath capabilities for various topologies at different values of the threshold 
981: factor.  These operating curves are shown in 
982: Figures~\ref{curve_brite},~\ref{curve_velcro},~and~\ref{curve_clique}.  
983: Visualizing the performance of the routing algorithm in this way enables us to
984: compare the effect of the inherent topology and performance parameter settings,
985: and the interactions between the two.
986: 
987: As opposed to $\phi = 1$, $\phi =$ maximum degree results in multi-path
988: forwarding of the packets and also some portion of packets entering into
989: loops. All the packets reached their intended destinations except for those that
990: looped back to their source resulting in a high success percentage. To overcome
991: the drop in success percentage, the packets were forwarded even when they looped
992: back to the source and counting this episode as just another loop encountered
993: along the path.  
994: 
995: With this modification all the packets successfully reached
996: their intended destinations but with a linear increase in the percentage of
997: loops (to account for all those packets that were earlier absorbed by their
998: source). All packets had a TTL of 255 but none of them were dropped due to
999: reaching the TTL limit. Below we present the operating curves for various
1000: topologies under both the cases: 1) absorption of packets at their source and 2)
1001: no absorption of packets. 
1002: 
1003: \subsection{Operating Curve Observations}
1004: 
1005: \begin{figure*}
1006: \vspace{0.14in}
1007: \centerline{\subfigure[With source absorption, each point is labeled with its
1008: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_40_abs}
1009: \label{curve_brite_abs}}
1010: \hfill
1011: \subfigure[Without source absorption, each point is labeled with its
1012: threshold value]{\includegraphics[scale=0.35]{curve_40_noabs}
1013: \label{curve_brite_noabs}}}
1014: \caption{Operating curve for a 40 node BRITE topology}
1015: \vspace{0.14in}
1016: \label{curve_brite}
1017: \end{figure*}
1018: 
1019: \begin{figure*}
1020: \centerline{\subfigure[With source absorption, each point is labeled with its
1021: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_velcro_abs}
1022: \label{curve_velcro_abs}}
1023: \hfill
1024: \subfigure[Without source absorption, each point is labeled with its
1025: threshold value]{\includegraphics[scale=0.35]{curve_velcro_noabs}
1026: \label{curve_velcro_noabs}}}
1027: \caption{Operating curve for the velcro topology shown in
1028: Figure~\ref{velcro_topo} right}
1029: \vspace{0.14in}
1030: \label{curve_velcro}
1031: \end{figure*}
1032: 
1033: \begin{figure*}
1034: \vspace{0.14in}
1035: \centerline{\subfigure[With source absorption, each point is labeled with its
1036: threshold value and success percentage]{\includegraphics[scale=0.35]{curve_8x5_abs}
1037: \label{curve_clique_abs}}
1038: \hfill
1039: \subfigure[Without source absorption, each point is labeled with its
1040: threshold value]{\includegraphics[scale=0.35]{curve_8x5_noabs}
1041: \label{curve_clique_noabs}}}
1042: \caption{Operating curve for a 8x5 clique topology}
1043: \label{curve_clique}
1044: \end{figure*}
1045: 
1046: Let us take the operating curve for a random 40 node BRITE topology shown in
1047: Figure~\ref{curve_brite_abs} and study it closely. As the threshold factor
1048: increases, we see that the performance goes from a region with no loops and 45\% multipath to
1049: one with 7\% loops and 100\% multipath. It is heartening to note that the curve first
1050: increases in the direction of accommodating multipath before introducing loops,
1051: rather than the other way around.
1052: 
1053: Second, notice that different portions of the graph are shaded differently. Each
1054: operating curve is represented by a solid line and a dotted line. These denote
1055: the region where the model is completely in force, and the region where it is
1056: not, respectively. As discussed earlier, at very low threshold factor values,
1057: when all the interfaces at an intermediate node are ineligible, i.e. their
1058: statistic table ratios are above the threshold, then the node sends-back the ant
1059: along the interface it originally received the ant from resulting in an
1060: increased percentage of packets entering into loops. Similarly at very low
1061: values of $\tau$, when all the interfaces at the source node are ineligible,
1062: then the source node uses the uncontrolled exploration selection policy to break
1063: the deadlock. As Figure~\ref{curve_brite_abs} shows, around a threshold value of
1064: $0.4$, the model comes into force in that all routing decisions are based on
1065: learning rather than defaults.  
1066: 
1067: By comparing Figure~\ref{curve_brite_abs}
1068: with~\ref{curve_brite_noabs} (the latter of which does not have source
1069: absorption), we notice that the difference in the percentage of success of
1070: packets reaching their destination with and without source absorption is
1071: reflected in the difference in percentage of packets encountering loops with and
1072: without source absorption.  Removing source absorption from the simulation
1073: results in a 100\% success rate, but an increase in the percentage of packets
1074: encountering loops, which is an understandable consequence.  However, for a
1075: router using the ant-derived statistic tables to make routing decisions, it is
1076: vital for data to transit the network with the highest success rate, even at the
1077: expense of an increased likelihood of entering a routing loop.
1078: 
1079: The operating curves for the 40 node BRITE topologies shown in
1080: Figures~\ref{curve_brite_abs} and \ref{curve_brite_noabs}, compared to the
1081: operating curves of BRITE topologies with different numbers of nodes (not shown
1082: here, refer to~\cite{kumar04}) also exhibit another
1083: interesting behavior. As the number of nodes in the topology increases, the
1084: minimum multipath percentage also increases. This is due to the fact that at
1085: very low threshold values, the model-based routing algorithm routes a large
1086: number of packets deterministically in smaller topologies.  The shape of the
1087: operating curve greatly depends on the intrinsic graph theoretic property of the
1088: topologies. The reader can observe from the figures above that each topology
1089: class (BRITE, clique, and velcro) generates its own unique shape of operating
1090: curve.  
1091: 
1092: % Figure 4.28 and Figure 4.29 have their
1093: % operating curve very similar to the operating curve generated
1094: % by the mesh topologies as the topology in Figure 4.6 can be
1095: % viewed as a triangulated mesh topology.
1096: 
1097: The reader should also observe that all the operating curves at $\tau = 1$
1098: exhibit the behavior of the uniform ants algorithm~\cite{subramanian97}.  This
1099: is due to the fact that all the interfaces at each node are eligible to be
1100: selected as the outgoing interface for the intended destination which conforms 
1101: to the selection policy of uniform ants algorithm.
1102: 
1103: The number of unique operating curves is limitless when the various topology
1104: classes are combined in the same network.  The fact that each operating curve
1105: has a unique threshold value that gives the network optimal performance, in
1106: terms of loop avoidance and multipath routing, presents us with the need to
1107: adaptively learn and refine that threshold value for an arbitrary dynamic
1108: network.  This is an area of future research that is necessary before our
1109: multipath routing algorithm can be deployed on actual networks. 
1110: 
1111: \subsection{Distribution of loop frequency}
1112: Finally, we show that even though the presence of loops is unavoidable, the
1113: number of packets that encountered $k$ loops along their paths to their
1114: respective destinations exponentially decays with increase in $k$, i.e. the
1115: majority of the packets encounter between 0 to 2 loops, suggesting a power law.
1116: Figure~\ref{loop_freq} shows the plot between loop distribution and packet
1117: frequency for a 40-node BRITE topology. It should be noted that due to the 
1118: cyclic nature of clique topologies, certain packets in those topologies encounter 
1119: as many as 20 loops before they reach their intended destination.
1120: 
1121: \begin{figure}
1122: \vspace{0.14in}
1123: \centering
1124: \includegraphics[scale=0.35]{loop_freq}
1125: \vspace{0.14in}
1126: \caption{Distribution of loops encountered by packets in a random 40 node BRITE
1127: topology, note the logarithmic scale of the y-axis.}
1128: \label{loop_freq}
1129: \end{figure}
1130: 
1131: \subsection{Verification of cost-sensitive routing in BRITE topologies}
1132: The goal of this experiment was to perform a large-scale validation of the
1133: cost-sensitivity properties of our reachability routing algorithm. First, all
1134: the paths taken by various packets at $\phi =$ maximum degree were enumerated.
1135: To achieve this, every packet had a stack associated with it that kept track of
1136: the nodes visited by it en route to its destination. At the destination, the
1137: paths taken by the packets from each source were ranked in increasing order of
1138: their costs. The destination nodes only kept track of unique paths from each
1139: source and also maintained the frequency associated with each path. 
1140: 
1141: The purpose of this instrumentation was to ensure that the frequency of costs, as measured
1142: through pursued paths, mirrored the distribution of traffic along these paths.
1143: At the end of the simulation, the summation of frequency over the top $[x - 9\%,
1144: x\%]$ of the paths for every source-destination pair was determined, for $x \in
1145: [10, 20 \cdots 100]$.  Figures~\ref{traffic_dist_subr}
1146: and~\ref{traffic_dist_nosubr} show the cost-sensitive routing of our
1147: model-based algorithm and also that sub-path reinforcement has no effect on
1148: BRITE topologies. (The experiments were performed on 60-node BRITE topologies).
1149: 
1150: \begin{figure*}
1151: \vspace{0.14in}
1152: \centerline{\subfigure[With subpath reinforcement]{\includegraphics[scale=0.35]{dist_subr_noabs}
1153: \label{traffic_dist_subr}}
1154: \hfill
1155: \subfigure[Without subpath reinforcement]{\includegraphics[scale=0.35]{dist_nosubr_noabs}
1156: \label{traffic_dist_nosubr}}}
1157: \caption{Traffic distribution in a random 60 node BRITE topology without source
1158: absorption, note the logarithmic scale of the y-axis.}
1159: \vspace{0.14in}
1160: \label{traffic_dist}
1161: \end{figure*}
1162: 
1163: \section{Conclusion and Future Work}
1164: In this paper we have presented a new model-based reinforcement learning
1165: algorithm, which achieves true cost-sensitive reachability routing, even in
1166: network topologies that pose problems to both deterministic routing as well as
1167: classical RL formulations.  This algorithm efficiently distributes traffic among
1168: all paths leading to a destination. The evaluation results indicate that our
1169: approach achieves true multi-path routing, with traffic distributed among the
1170: multiple paths in inverse proportion to their costs. By helping maintain the
1171: incremental spirit of current backbone routing algorithms, this approach has the
1172: potential to form the basis of the next generation of routing protocols,
1173: enabling a fluid and robust backbone routing framework. The reader is referred
1174: to~\cite{kumar04} and~\cite{srinidhi03} for background and further experimental
1175: results.
1176: 
1177: We now present four possible directions for future work.
1178: 
1179: \begin{itemize}
1180: \item{\bf Adaptive configuration of the threshold factor ($\tau$)}
1181: 
1182: The threshold factor is currently set to a fixed value for all the nodes in the
1183: topology. From the operating curve, the network administrator determines the
1184: optimal value of $\tau$ at which the routing yields high success and multipath
1185: percentage while keeping the percentage of packets entering into loops low. As
1186: part of the future work, we can determine the $\tau$ value dynamically based on
1187: available information and periodically adjust its value to obtain the optimal
1188: routing requirements. The  value could be dynamically adapted on a per-node
1189: basis or on per-source/destination-pair basis at every node.
1190: 
1191: \item{\bf Instructive feedback}
1192: 
1193: Our RL algorithm works primarily using evaluative feedback from neighboring
1194: routers. It would be interesting to extend the framework to accommodate
1195: instructive feedback. But to provide instructive feedback, a router must have
1196: sufficient discriminating capability to perform credit assignment. It is
1197: typically of the case that any resulting instruction will be of the negative
1198: kind i.e., ``for destination $X$, do not use interface $i_y$.'' How such
1199: negative instructions can co-exist with positive reinforcements is an important
1200: research issue, not only for our application domain, but also the larger field
1201: of reinforcement learning.
1202: 
1203: \item{\bf Modeling topologies with hierarchical addressing}
1204: 
1205: Currently the algorithm assumes all topologies to be flat such that all nodes in
1206: the topology are numbered from $1$ to $n$. By supporting hierarchical addressing
1207: of the nodes, the model built at every node could be at a sub network basis
1208: instead of being at a per node basis, i.e. a node could collect statistics for a
1209: group of nodes as a single entity and build its model accordingly. Such an
1210: approach encourages problem decomposition and enables scaling up to large
1211: network sizes.
1212: 
1213: \item{\bf Reverse engineering routing protocols}
1214: 
1215: The model-based reinforcement learning algorithm presented here promises to
1216: serve as an abstraction of reachability routing algorithms in general. One idea
1217: for further research is to automatically mine the model by analyzing implemented
1218: routing algorithms' behavior, rather than incrementally learning it from
1219: scratch, as we have done here. In other words, we can seek to imitate the
1220: functioning of another algorithm by suitably configuring our model.  This
1221: problem has its roots in inverse reinforcement learning, where we are aiming to
1222: recover an algorithm from observed (optimal) behavior.  The first steps toward
1223: such reverse engineering have been recently taken~\cite{shiraev03}.
1224: 
1225: \end{itemize}
1226: 
1227: % if have a single appendix:
1228: %\appendix[Proof of the Zonklar Equations]
1229: % or
1230: %\appendix  % for no appendix heading
1231: % do not use \section anymore after \appendix, only \section*
1232: % is possibly needed
1233: 
1234: % use appendices with more than one appendix
1235: % then use \section to start each appendix
1236: % you must declare a \section before using any
1237: % \subsection or using \label (\appendices by itself
1238: % starts a section numbered zero.)
1239: %
1240: % Use this command to get the appendices' numbers in "A", "B" instead of the
1241: % default capitalized Roman numerals ("I", "II", etc.).
1242: % However, the capital letter form may result in awkward subsection numbers
1243: % (such as "A-A"). Capitalized Roman numerals are the default.
1244: %\useRomanappendicesfalse
1245: %
1246: %\appendices
1247: %\section{Proof of the First Zonklar Equation}
1248: %Appendix one text goes here.
1249: 
1250: % you can choose not to have a title for an appendix
1251: % if you want by leaving the argument blank
1252: %\section{}
1253: %Appendix two text goes here.
1254: 
1255: % use section* for acknowledgement
1256: %\section*{Acknowledgment}
1257: % optional entry into table of contents (if used)
1258: %\addcontentsline{toc}{section}{Acknowledgment}
1259: %The authors would like to thank...
1260: 
1261: % trigger a \newpage just before the given reference
1262: % number - used to balance the columns on the last page
1263: % adjust value as needed - may need to be readjusted if
1264: % the document is modified later
1265: %\IEEEtriggeratref{8}
1266: % The "triggered" command can be changed if desired:
1267: %\IEEEtriggercmd{\enlargethispage{-5in}}
1268: 
1269: % references section
1270: % NOTE: BibTeX documentation can be easily obtained at:
1271: % http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/
1272: 
1273: % can use a bibliography generated by BibTeX as a .bbl file
1274: % standard IEEE bibliography style from:
1275: % http://www.ctan.org/tex-archive/macros/latex/contrib/supported/IEEEtran/bibtex
1276: %\bibliographystyle{IEEEtran.bst}
1277: % argument is your BibTeX string definitions and bibliography database(s)
1278: %\bibliography{IEEEabrv,../bib/paper}
1279: %
1280: % <OR> manually copy in the resultant .bbl file
1281: % set second argument of \begin to the number of references
1282: % (used to reserve space for the reference number labels box)
1283: \begin{thebibliography}{1}
1284: 
1285: % \bibitem{boyan94}
1286: % J.~Boyan and M.~Littman. Packet Routing in Dynamically Changing
1287: % Networks: A Reinforcement Learning Approach. In \emph{Advances in Neural
1288: % Information Processing Systems 6 (NIPS6)}, pages 671-678. Morgan
1289: % Kaufmann, San Francisco, CA, 1994.
1290: 
1291: \bibitem{chen98} 
1292: J.~Chen, P.~Druschel, D.~Subramanian. A Simple, Practical Distributed
1293: Multi-Path Routing Algorithm. TR98-320. Department of Computer
1294: Science, Rice University. July 1998.
1295: 
1296: \bibitem{chen99} 
1297: J.~Chen, P.~Druschel, and D.~Subramanian. A New Approach to Routing
1298: with Dynamic Metrics. In \emph{Proceedings of the IEEE INFOCOM Conference on
1299: Computer Communications}, pages 661-670. IEEE Press, New York, March 1999.
1300: 
1301: \bibitem{dicaro98}
1302: G.~Di~Caro and M.~Dorigo. AntNet: Distributed Stigmergetic Control for
1303: Communications Networks. Journal of Artificial Intelligence Research,
1304: Vol. 9, pages 317-365, 1998.
1305: 
1306: \bibitem{dorigo99}
1307: M.~Dorigo, G.~Di~Caro, and L.~M.~Gambardella. Ant Algorithms for Discrete
1308: Optimization. Artificial Life, Vol. 5, No. 2, pages 137-172, 1999.
1309: 
1310: \bibitem{guestrin02}
1311: C.~Guestrin, M.~Lagoudakis, and R.~Parr. Coordinated Reinforcement
1312: Learning. In \emph{Machine Learning: Proceedings of the Nineteenth Interna-
1313: tional Conference (ICML 2002)}, pages 227-234. University of New South
1314: Wales, Sydney, Australia, July 2002.
1315: 
1316: \bibitem{littman96}
1317: L.~P.~Kaelbling, M.~L.~Littman, and A.~W.~Moore. Reinforcement Learning:
1318: A Survey. Journal of Artificial Intelligence Research, Vol. 4, 
1319: pages 237-285, 1996.
1320: 
1321: \bibitem{brite}
1322: A.~Medina, A.~Lakhina, I.~Matta, J.~Byers. BRITE: Universal Topology
1323: Generation from a User's Perspective. Technical Report, BUCS-TR2001-
1324: 003, Boston University, 2001.
1325: 
1326: \bibitem{newman01}
1327: M.~E.~J. Newman, S.~H.~Strogatz, and D.~J.~Watts. Random graphs with
1328: arbitrary degree distributions and their applications. Physics Review E 64,
1329: 026118 pages 1-16, 2001.
1330: 
1331: \bibitem{shiraev03}
1332: D.~Shiraev. Inverse Reinforcement Learning and Routing Metric Discov-
1333: ery. M.S. Thesis, Department of Computer Science, Virginia Tech, August
1334: 2003.
1335: 
1336: \bibitem{subramanian97}
1337: D. Subramanian, P. Druschel, and J. Chen. Ants and Reinforcement
1338: Learning: A Case Study in Routing in Dynamic Networks. In \emph{Proceedings
1339: of the Fifteenth International Joint Conference on Artificial Intelligence
1340: (IJCAI'97)}, pages 832-839. Morgan Kaufmann, San Francisco, CA, 1997.
1341: 
1342: \bibitem{sutton98}
1343: R.~S.~Sutton and A.~G.~Barto. Reinforcement Learning. MIT Press,
1344: Cambridge, MA, 1998.
1345: 
1346: \bibitem{kumar04}
1347: M.~Thirunavukkarasu. Reinforcing Reachable Routes. M.S. Thesis,
1348: Department of Computer Science, Virginia Tech, May 2004.
1349: 
1350: \bibitem{srinidhi03}
1351: S.~Varadarajan, N.~Ramakrishnan, M.~Thirunavukkarasu. Reinforcing
1352: Reachable Routes. Computer Networks, Vol. 43, No. 3, pages 389-416,
1353: Oct 2003.
1354: 
1355: \bibitem{srinidhi00}
1356: S.~Varadarajan. Ethereal: A Fault Tolerant Host-Transparent Mechanism
1357: for Bandwidth Guarantees over Switched Ethernet Networks. PhD thesis,
1358: Department of Computer Science, State University of New York, Stony
1359: Brook, 2000.
1360: 
1361: \bibitem{mpath}
1362: S.~Vutukury and J.~J.~Garcia-Luna-Aceves, MPATH: A Loop-free Multipath Routing
1363: Algorithm.  Microprocessors and Microsystems Journal (Elsevier), Vol. 24, pages 
1364: 319-327, 2001.
1365: 
1366: \end{thebibliography}
1367: 
1368: