1: %% LyX 1.3 created this file. For more info, see http://www.lyx.org/.
2: %% Do not edit unless you really know what you are doing.
3: \documentclass{IEEEtran}
4: \usepackage[T1]{fontenc}
5: \usepackage{float}
6: \usepackage{graphicx}
7:
8: \makeatletter
9:
10: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
11: \newcommand{\noun}[1]{\textsc{#1}}
12: %% Bold symbol macro for standard LaTeX users
13: \providecommand{\boldsymbol}[1]{\mbox{\boldmath $#1$}}
14:
15:
16: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
17: \newcommand{\lyxaddress}[1]{
18: \par {\raggedright #1
19: \vspace{1.4em}
20: \noindent\par}
21: }
22:
23: \makeatother
24: \begin{document}
25:
26: \title{Analysis of Inter-Domain Traffic Correlations: Random Matrix Theory
27: Approach}
28:
29:
30: \author{Viktoria Rojkova, Mehmed Kantardzic}
31:
32: \maketitle
33:
34: \lyxaddress{Department of Computer Engineering and Computer Science, University
35: of Louisville, Louisville, KY 40292 email: \{vbrozh01, mmkant01\}@gwise.louisville.edu }
36:
37: \begin{abstract}
38: The traffic behavior of University of Louisville network with the
39: interconnected backbone routers and the number of Virtual Local Area
40: Network (VLAN) subnets is investigated using the Random Matrix Theory
41: (RMT) approach. We employ the system of equal interval time series
42: of traffic counts at all router to router and router to subnet connections
43: as a representation of the inter-VLAN traffic. The cross-correlation
44: matrix $C$ of the traffic rate changes between different traffic
45: time series is calculated and tested against null-hypothesis of random
46: interactions.
47:
48: The majority of the eigenvalues $\lambda_{i}$ of matrix $C$ fall
49: within the bounds predicted by the RMT for the eigenvalues of random
50: correlation matrices. The distribution of eigenvalues and eigenvectors
51: outside of the RMT bounds displays prominent and systematic deviations
52: from the RMT predictions. Moreover, these deviations are stable in
53: time.
54:
55: The method we use provides a unique possibility to accomplish three
56: concurrent tasks of traffic analysis. The method verifies the uncongested
57: state of the network, by establishing the profile of random interactions.
58: It recognizes the system-specific large-scale interactions, by establishing
59: the profile of stable in time non-random interactions. Finally, by
60: looking into the eigenstatistics we are able to detect and allocate
61: anomalies of network traffic interactions.
62: \end{abstract}
63:
64: \section*{Categories and Subject Descriptors}
65:
66: C.2.3 {[}\textbf{Computer-Communication Networks}{]}: Network Operations
67:
68:
69: \section*{General Terms}
70:
71: Measurement, Experimentation
72:
73: \begin{keywords}
74: Network-Wide Traffic Analysis, Random Matrix Theory, Large-Scale Correlations
75: \end{keywords}
76:
77: \section{introduction}
78:
79: The infrastructure, applications and protocols of the system of communicating
80: computers and networks are constantly evolving. The traffic, which
81: is an essence of the communication, presently is a voluminous data
82: generated on minute-by-minute basis within multi-layered structure
83: by different applications and according to different protocols. As
84: a consequence, there are two general approaches in analysis of the
85: traffic and in modeling of its healthy behavior. In the first approach,
86: the traffic analysis considers the protocols, applications, traffic
87: matrix and routing matrix estimates, independence of ingress and egress
88: points and much more. The second approach treats the infrastructure
89: between the points from which the traffic is obtained as a {}``black
90: box'' \cite{Lau,Allen}.
91:
92: Measuring interactions between logically and architecturally equivalent
93: substructures of the system is a natural extension of the {}``black
94: box'' approach. Certain amount of work in this direction has already
95: been done. Studies on statistical traffic flow properties revealed
96: the {}``congested'', {}``fluid'' and {}``transitional'' regimes
97: of the flow at a large scale \cite{Fukuda,Ohira}. The observed collective
98: behavior suggests the existence of the large-scale network-wide correlations
99: between the network subparts. Indeed, the \cite{Barthelemy} work
100: showed the large-scale cross-correlations between different connections
101: of the Renater scientific network. Moreover, the analysis of correlations
102: across all simultaneous network-wide traffic has been used in network
103: distributed attacks detection \cite{LCD}.
104:
105: The distributions and stability of established interactions statistics
106: represent the characteristic features of the system and may be exploited
107: in healthy network traffic profile creation, which is an essential
108: part of network anomaly detection. As it is successfully demonstrated
109: in \cite{Crovella}, all tested traffic anomalies change the distribution
110: of the traffic features.
111:
112: Among numerous types of traffic monitoring variables, time series
113: of traffic counts are free of applications {}``semantics'' and thus
114: more preferable for {}``black box'' analysis. To extract the meaningful
115: information about underlying interactions contained in time series,
116: the empirical correlation matrix is a usual tool at hand. In addition,
117: there are various classes of statistical tools, such as principal
118: component analysis, singular value decomposition, and factor analysis,
119: which in turn strongly rely on the validity of the correlation matrix
120: and obtain the meaningful part of the time series. Thus, it is important
121: to understand quantitatively the effect of noise, i.e. to separate
122: the noisy, random interactions from meaningful ones. In addition,
123: it is crucial to consider the finiteness of the time series in the
124: determination of the empirical correlation, since the finite length
125: of time series available to estimate cross correlations introduces
126: {}``measurement noise'' \cite{Guhr1}. Statistically, it is also
127: advisable to develop null-hypothesis tests in order to check the degree
128: of statistical validity of the results obtained against cases of purely
129: random interactions.
130:
131: The methodology of random matrix theory (RMT) developed for studying
132: the complex energy levels of heavy nuclei and is given a detailed
133: account in \cite{Wigner1,Dyson1,Dyson2,Mehta,Brody,Guhr3}. For our
134: purposes this methodology comes in as a series of statistical tests
135: run on the eigenvalues and eigenvectors of {}``system matrix'',
136: which in our case is traffic time series cross-correlation matrix
137: $C$ (and is Hamiltonian matrix in case of nuclei and other RMT systems
138: \cite{Wigner1,Dyson1,Dyson2,Mehta,Brody,Guhr3}).
139:
140: In our study, we propose to investigate the network traffic as a complex
141: system with a certain degree of mutual interactions of its constituents,
142: i.e. single-link traffic time series, using the RMT approach. We concentrate
143: on the large scale correlations between the time series generated
144: by Simple Network Manage Protocol (SNMP) traffic counters at every
145: router-router and router-VLAN subnet connection of University of Louisville
146: backbone routers system.
147:
148: The contributions of this study are as follows:
149:
150: \begin{itemize}
151: \item We propose the application constraints free methodology of network-wide
152: traffic time series interactions analysis. Even though in this particular
153: study, we know in advance that VLANs represent separate broadcast
154: domains, VLAN-router incoming traffic is a traffic intended for other
155: VLANs and VLAN-router outgoing traffic is a routed traffic from other
156: VLANs. Nevertheless, this information is irrelevant for our analysis
157: and acquired only at the interpretation of the analysis results.
158: \item Using the RMT, we are able to separate the random interactions from
159: system specific interactions. The vast majority of traffic time series
160: interact in random fashion. The time stable random interactions signify
161: the healthy, and free of congestion traffic. The proposed analysis
162: of eigenvector distribution allows to verify the time series content
163: of uncongested traffic.
164: \item The time stable non-random interactions provide us with information
165: about large-scale system-specific interactions.
166: \item Finally, the temporal changes in random and non-random interactions
167: can be detected and allocated with eigenvalues and eigenvectors statistics
168: of interactions.
169: \end{itemize}
170: The organization of this paper is as follows. Section II presents
171: the survey of related work. We describe the RMT methodology in Section
172: III. Section IV contains the explanation of the data analyzed. In
173: Section V we test the eigenvalue distribution of inter-VLAN traffic
174: time series cross-correlation matrix C against the RMT predictions.
175: In Section VI we analyze the content of inter-VLAN traffic interactions
176: by mean of eigenvalues and eigenvectors deviated from RMT. Section
177: VII discusses the characteristic traffic interactions parameters of
178: the system such as time stability of the deviating eigenvalues and
179: eigenvectors, inverse participation ratio (IPR) of eigenvalues spectra,
180: localization points in IPR plot, overlap matrices of the deviating
181: eigenvectors. With series of different experiments, we demonstrate
182: how traffic interactions anomalies can be detected and allocated in
183: time and space using various visualization techniques on eigenvalues
184: and eigenvectors statistics in Section VIII. We present our conclusions
185: and prospective research steps in Section IX.
186:
187:
188: \section{related work}
189:
190: Few works investigate the interactions of traffic time series regardless
191: of underlying architecture of the traffic system. As it was stated
192: in Introduction, the study of \cite{Barthelemy} showed the large-scale
193: cross-correlations between different connections of the French scientific
194: network Renater with 26 interconnected routers and 650 connections
195: links. The random interactions between traffic time series of complex
196: traffic system without the routing protocol information were established
197: by Krbalek and Seba in \cite{Seba} for transportation system in Cuernavaca
198: (Mexico).
199:
200: The urgent need for a network-wide, scalable approach to the problem
201: of healthy network traffic profile creation is expressed in works
202: of \cite{Crovella,Crovella2,Min,McNutt,Roughan,Huang}. There are
203: several studies with the promising results, which demonstrate that
204: the traffic anomalous events cause the temporal changes in statistical
205: properties of traffic features. Lakhina, Crovella and Diot presented
206: the characterization of the network-wide anomalies of the traffic
207: flows. The authors studied three different types of traffic flows
208: and fused the information from flow measurements taken throughout
209: the entire network. They obtained and classified a different set of
210: anomalies for different traffic types using the subspace method \cite{Crovella2}.
211:
212: The same group of researchers extended their work in \cite{Crovella}.
213: Under the new assumption that any network anomaly induces the changes
214: in distributional aspects of packet header fields, they detected and
215: identified large set of anomalies using the entropy measurement tool.
216:
217: Hidden Markov model has been proposed to model the distribution of
218: network-wide traffic in \cite{Min}. The observation window is used
219: to distinguish denial of service (DoS) flooding attack mixed with
220: the normal background traffic.
221:
222: Roughan et al. combined the entire network routing and traffic data
223: to detect the IP forwarding anomalies \cite{Roughan}.
224:
225: Huang et al., \cite{Huang} used the distributed version of the Principal
226: Component Analysis (PCA) method for centralized network-wide volume
227: anomaly detection. A key ingredient of their framework is an analytical
228: method based on stochastic matrix perturbation theory that balances
229: between the accuracy of the approximate network anomaly detection
230: and the amount of data communication over the network.
231:
232: The authors of \cite{McNutt} found the high temporal correlation
233: (frequently > 0.99) between flow counts on quiescent ports (TCP/IP
234: ports which are not in regular use) at the one of the known pre-attack,
235: so called \emph{reconnaissance}, anomalous behavior, vertical scan.
236:
237:
238: \section{rmt methodology}
239:
240: The RMT was employed in the financial studies of stock correlations
241: \cite{Sharifi,Guhr1}, communication theory of wireless systems \cite{Tulino},
242: array signal processing \cite{Tse}, bioinformatics studies of protein
243: folding \cite{Zee}. We are not aware of any work, except for \cite{Barthelemy},
244: where RMT techniques were applied to the Internet traffic system.
245:
246: We adopt the methodology used in works on financial time series correlations
247: (see \cite{Sharifi,Guhr1} and references therein) and later in \cite{Barthelemy},
248: which discusses cross-correlations in Internet traffic. In particular,
249: we quantify correlations between $N$ traffic counts time series of
250: $L$ time points, by calculating the traffic rate change of every
251: time series $T$ $i=1,\dots,N$ , over a time scale $\Delta t$,\begin{equation}
252: G_{i}\left(t\right)\equiv\textrm{ln}\, T_{i}\left(t+\Delta t\right)-\textrm{ln}\, T_{i}\left(t\right)\label{eq1}\end{equation}
253: where $T{}_{i}\left(t\right)$ denotes the traffic rate of time series
254: $i$. This measure is independent from the volume of the traffic exchange
255: and allows capturing the subtle changes in the traffic rate \cite{Barthelemy}.
256: The normalized traffic rate change is
257:
258: \begin{equation}
259: g_{i}\left(t\right)\equiv\frac{G_{i}\left(t\right)-\left\langle G_{i}\left(t\right)\right\rangle }{\sigma_{i}}\label{eq2}\end{equation}
260: where $\sigma_{i}\equiv\sqrt{\left\langle G_{i}^{2}\right\rangle -\left\langle G_{i}\right\rangle ^{2}}$
261: is the standard deviation of $G_{i}$. The equal-time cross-correlation
262: matrix $C$ can be computed as follows\begin{equation}
263: C_{ij}\equiv\left\langle g_{i}\left(t\right)g_{j}\left(t\right)\right\rangle \label{eq3}\end{equation}
264: The properties of the traffic interactions matrix $C$ have to be
265: compared with those of a random cross-correlation matrix \cite{Laloux}.
266: In matrix notation, the interaction matrix $C$ can be expressed as\begin{equation}
267: C=\frac{1}{L}GG^{T},\label{eq4}\end{equation}
268: where $G$ is $N\times L$ matrix with elements $\left\{ g_{i\, m}\equiv g_{i}\left(m\bigtriangleup t\right);\right.$
269: $i=1,\dots,N;$ $\left.m=0,\dots,L-1\right\} ,$ and $G^{T}$ denotes
270: the transpose of $G$. Just as was done in \cite{Guhr1}, we consider
271: a random correlation matrix \begin{equation}
272: R=\frac{1}{L}AA^{T},\label{eq5}\end{equation}
273: where $A$ is $N\times L$ matrix containing $N$ time series of $L$
274: random elements $a_{i\, m}$ with zero mean and unit variance, which
275: are mutually uncorrelated as a null hypothesis.
276:
277: Statistical properties of the random matrices $R$ have been known
278: for years in physics literature \cite{Wigner1,Brody,Dyson1,Dyson2,Mehta,Guhr3}.
279: In particular, it was shown analytically \cite{Sengupta} that, under
280: the restriction of $N\rightarrow\infty,$ $L\rightarrow\infty$ and
281: providing that $Q\equiv L/N$$\left(>1\right)$ is fixed, the probability
282: density function $P_{rm}\left(\lambda\right)$ of eigenvalues $\lambda$
283: of the random matrix $R$ is given by
284:
285: \begin{equation}
286: P_{rm}\left(\lambda\right)=\frac{Q}{2\pi}\frac{\sqrt{\left(\lambda_{+}-\lambda\right)\left(\lambda-\lambda_{-}\right)}}{\lambda}\label{eq6}\end{equation}
287: where $\lambda_{+}$ and $\lambda_{-}$ are maximum and minimum eigenvalues
288: of $R,$ respectively and $\lambda_{-}\leq\lambda_{i}\leq\lambda_{+}$.
289: $\lambda_{+}$ and $\lambda_{-}$are given analytically by
290:
291: \begin{equation}
292: \lambda_{\pm}=1+\frac{1}{Q}\pm2\sqrt{\frac{1}{Q}}.\label{eq7}\end{equation}
293: Random matrices display \emph{universal} functional forms for eigenvalues
294: correlations which depend on the general symmetries of the matrix
295: only. First step to test the data for such a universal properties
296: is to find a transformation called {}``unfolding'', which maps the
297: eigenvalues $\lambda_{i}$ to new variables, {}``unfolded eigenvalues''
298: $\xi_{i},$ whose distribution is uniform \cite{Mehta,Brody,Guhr3}.
299: Unfolding ensures that the distances between eigenvalues are expressed
300: in units of \emph{local} mean eigenvalues spacing \cite{Mehta}, and
301: thus facilitates the comparison with analytical results.
302:
303: We define the cumulative distribution function of eigenvalues, which
304: counts the number of eigenvalues in the interval $\lambda_{i}\leq\lambda,$
305:
306: \begin{equation}
307: F\left(\lambda\right)=N\int_{-\infty}^{\lambda}P\left(x\right)dx,\label{eq8}\end{equation}
308: where $P\left(x\right)$ denotes the probability density of eigenvalues
309: and $N$ is the total number of eigenvalues. The function $F\left(\lambda\right)$
310: can be decomposed into an average and a fluctuating part, \begin{equation}
311: F\left(\lambda\right)=F_{av}\left(\lambda\right)+F_{fluc}\left(\lambda\right),\label{eq9}\end{equation}
312: Since $P_{fluc}\equiv dF_{fluc}\left(\lambda\right)/d\lambda=0$ on
313: average, \begin{equation}
314: P_{rm}\left(\lambda\right)\equiv\frac{dF_{av}\left(\lambda\right)}{d\lambda},\label{eq10}\end{equation}
315: is the averaged eigenvalues density. The dimensionless, unfolded eigenvalues
316: are then given by \begin{equation}
317: \xi_{i}\equiv F_{av}\left(\lambda_{i}\right).\label{eq11}\end{equation}
318:
319:
320: Three known universal properties of GOE matrices (matrices whose elements
321: are distributed according to a Gaussian probability measure) are:
322: (i) the distribution of nearest-neighbor eigenvalues spacing $P_{GOE}\left(s\right)$
323: \begin{equation}
324: P_{GOE}\left(s\right)=\frac{\pi s}{2}exp\left(-\frac{\pi}{4}s^{2}\right),\label{eq12}\end{equation}
325: (ii) the distribution of next-nearest-neighbor eigenvalues spacing,
326: which is according to the theorem due to \cite{Dyson2} is identical
327: to the distribution of nearest-neighbor spacing of Gaussian symplectic
328: ensemble (GSE),
329:
330: \begin{equation}
331: P_{GSE}\left(s\right)=\frac{2^{18}}{3^{6}\pi^{3}}s^{4}exp\left(-\frac{64}{9\pi}s^{2}\right)\label{eq13}\end{equation}
332: and finally (iii) the {}``number variance'' statistics $\Sigma^{2}$,
333: defined as the variance of the number of unfolded eigenvalues in the
334: intervals of length $l$, around each $\xi_{i}$ \cite{Mehta,Guhr3,Brody}.\begin{equation}
335: \Sigma^{2}\left(l\right)=\left\langle \left[n\left(\xi,l\right)-l\right]^{2}\right\rangle _{\xi},\label{eq14}\end{equation}
336: where $n\left(\xi,l\right)$ is the number of the unfolded eigenvalues
337: in the interval $\left[\xi-\frac{l}{2},\xi+\frac{l}{2}\right]$. The
338: number variance is expressed as follows
339:
340: \begin{equation}
341: \Sigma^{2}\left(l\right)=l-2\int_{0}^{l}\left(l-x\right)Y\left(x\right)dx,\label{eq15}\end{equation}
342: where $Y\left(x\right)$ for the GOE case is given by \cite{Mehta}
343:
344: \begin{equation}
345: Y\left(x\right)=s^{2}\left(x\right)+\frac{ds}{dx}\int_{x}^{\infty}s\left(x'\right)dx',\label{eq16}\end{equation}
346: and \begin{equation}
347: s\left(x\right)=\frac{sin\left(\pi x\right)}{\pi x}.\label{eq17}\end{equation}
348: Just as was stressed in \cite{Guhr1,Sharifi,Stockman} the overall
349: time of observation is crucial for explaining the empirical cross-correlation
350: coefficients. On one hand, the longer we observe the traffic the more
351: information about the correlations we obtain and less {}``noise''
352: we introduce. On the other hand, the correlations are not stationary,
353: i.e. they can change with time. To differentiate the {}``random''
354: contribution to empirical correlation coefficients from {}``genuine''
355: contribution, the eigenvalues statistics of $C$ is contrasted with
356: the eigenvalues statistics of a correlation matrix taken from the
357: so called {}``chiral'' Gaussian Orthogonal Ensemble \cite{Guhr1}.
358: Such an ensemble is one of the ensembles of RMT \cite{Stockman,Bouchaud},
359: briefly discussed in Appendix A. A \emph{random} cross-correlation
360: matrix, which is a matrix filled with uncorrelated Gaussian random
361: numbers, is supposed to represent transient uncorrelated in time network
362: activity, that is, a completely noisy environment. In case the cross-correlation
363: matrix $C$ obeys the same eigenstatistical properties as the RMT-matrix,
364: the network traffic is equilibrated and deemed universal in a sense
365: that every single connection interacts with the rest in a completely
366: chaotic manner. It also means a complete absence of congestions and
367: anomalies. Meantime, any stable in time deviations from the \emph{universal}
368: predictions of RMT signify system-specific, nonrandom properties of
369: the system, providing the clues about the nature of the underlying
370: interactions. That allows us to establish the profile of system-specific
371: correlations.
372:
373:
374: \section{data}
375:
376: In this paper, we study the averaged traffic count data collected
377: from all router-router and router-VLAN subnet connections of the University
378: of Louisville backbone routers system. The system consists of nine
379: interconnected multi-gigabit backbone routers, over $200$ Ethernet
380: segments and over $300$ VLAN subnets. We collected the traffic count
381: data for $3$ months, for the period from September \emph{$21$, $2006$}
382: to December \emph{$20$, $2006$} from $7$ routers, since two routers
383: are reserved for server farms. The overall data amounted to approximately
384: $18$ GB.
385:
386: The traffic count data is provided by Multi Router Traffic Grapher
387: (MRTG) tool that reads the SNMP traffic counters. MRTG log file never
388: grows in size due to the data consolidation algorithm: it contains
389: records of average incoming, outgoing, max and min transfer rate in
390: bytes per second with time intervals $300$ seconds, \emph{$30$}
391: minutes, $1$ day and $1$ month. We extracted $300$ seconds interval
392: data for seven days. Then, we separated the incoming and outgoing
393: traffic counts time series and considered them as independent. For
394: $352$ connections we formed $L=2015$ records of $N=704$ time series
395: with $300$ seconds interval.
396:
397: We pursued the changes in the traffic rate, thus, we excluded from
398: consideration the connections, where channel is open but the traffic
399: is not established or there is just constant rate and equal low amount
400: test traffic. Another reason for excluding the {}``empty'' traffic
401: time series is that they make the time series cross-correlation matrix
402: unnecessary sparse. The exclusion does not influence the analysis
403: and results. After the exclusions the number of the traffic time series
404: became $N=497$.
405:
406: To calculate the traffic rate change $G_{i}\left(t\right)$ we used
407: the logarithm of the ratio of two successive counts. As it is stated
408: earlier, $log$-transformation makes the ratio independent from the
409: traffic volume and allows capturing the subtle changes in the traffic
410: rate. We added 1 byte to all data points, to avoid manipulations with
411: $log\left(0\right)$, in cases where traffic count is equal to zero
412: bytes. This measure did not affect the changes in the traffic rate.
413:
414:
415: \section{eigenvalue distribution of cross-correlation matrix, comparison with
416: rmt}
417:
418: We constructed inter-VLAN traffic cross-correlation matrix $C$ with
419: number of time series $N=497$ and number of observations per series
420: $L=2015$, ($Q=4.0625$) so that, $\lambda_{+}=2.23843$ and $\lambda_{-}=0.253876$.
421: Our first goal is to compare the eigenvalue distribution $P\left(\lambda\right)$
422: of $C$ with $P_{rm}\left(\lambda\right)$ \cite{Laloux}. To compute
423: eigenvalues of $C$ we used standard \emph{MATLAB} function. The empirical
424: probability distribution $P\left(\lambda\right)$ is then given by
425: the corresponding histogram. We display the resulting distribution
426: $P\left(\lambda\right)$ in Figure 1 and compare it to the probability
427: distribution $P_{rm}\left(\lambda\right)$ taken from Eq. (\ref{eq6})
428: calculated for the same value of traffic time series parameters ($Q=4.0625$).
429: The solid curve demonstrates $P_{rm}\left(\lambda\right)$ of Eq.(\ref{eq6}).
430: The largest eigenvalue shown in inset has the value $\lambda_{497}=8.99$.
431: We zoom in the deviations from the RMT predictions on the inset to
432: Figure 1. %
433: \begin{figure}[H]
434: \begin{center}\includegraphics{EigenvaluePDF.eps}\end{center}
435:
436:
437: \caption{\label{1} Empirical probability distribution function $P\left(\lambda\right)$
438: for the inter-VLAN traffic cross-correlations matrix $C$ (histogram). }
439: \end{figure}
440: We note the presence of {}``bulk'' (RMT-like) eigenvalues which
441: fall within the bounds {[}$\lambda_{-},$$\lambda_{+}${]} for $P_{rm}\left(\lambda\right)$,
442: and presence of the eigenvalues which lie outside of the {}``bulk'',
443: representing deviations from the RMT predictions. In particular, largest
444: eigenvalue $\lambda_{497}=8.99$ for seven days period is approximately
445: four times larger than the RMT upper bound $\lambda_{+}$.
446:
447: The histogram for well-defined bulk agrees with $P_{rm}\left(\lambda\right)$
448: suggesting that the cross-correlations of matrix $C$ are mostly random.
449: We observe that inter-VLAN traffic time series interact mostly in
450: a random fashion.
451:
452: Nevertheless, the agreement of empirical probability distribution
453: $P\left(\lambda\right)$ of the bulk with $P_{rm}\left(\lambda\right)$
454: is not sufficient to claim that the bulk of eigenvalue spectrum is
455: random. Therefore, further RMT tests are needed \cite{Guhr1}.
456:
457: To do that, we obtained the unfolded eigenvalues $\xi_{i}$ by following
458: the phenomenological procedure referred to as Gaussian broadening
459: \cite{Bruus}, (see \cite{Bruus,Bruus2,Guhr1,Sharifi}). The empirical
460: cumulative distribution function of eigenvalues $F\left(\lambda\right)$
461: agrees well with the $F_{av}\left(\lambda\right)$ (see Figure 2),
462: where $\xi_{i}$ obtained with Gaussian broadening procedure with
463: the broadening parameter $a=8$.%
464: \begin{figure}[h]
465: \begin{center}\includegraphics{EmpiricalCDFvsTheoretical.eps}\end{center}
466:
467:
468:
469:
470: \caption{\label{2}The empirical cumulative distribution of $\lambda_{i}$
471: and unfolded eigenvalues $\xi_{i}\equiv F_{av}\left(\lambda\right)$. }
472: \end{figure}
473: The first independent RMT test is the comparison of the distribution
474: of the nearest-neighbor unfolded eigenvalue spacing $P_{nn}\left(s\right)$,
475: where $s\equiv\xi_{k+1}-\xi_{k}$ with $P_{GOE}\left(s\right)$ \cite{Mehta,Brody,Guhr3}.
476: The empirical probability distribution of nearest-neighbor unfolded
477: eigenvalues spacing $P_{nn}\left(s\right)$ and $P_{GOE}\left(s\right)$
478: are presented in Figure 3. The Gaussian decay of $P_{GOE}\left(s\right)$
479: for large $s$ suggests that $P_{GOE}\left(s\right)$ {}``probes''
480: scales only of the order of one eigenvalue spacing. The solid line
481: represents.%
482: \begin{figure}[h]
483: \begin{center}\includegraphics{EmpiricalNNSvsTheoretical.eps}\end{center}
484:
485:
486: \caption{\label{3} Nearest-neighbor spacing distribution $P_{nn}\left(s\right)$
487: of unfolded eigenvalues $\xi_{i}$ of cross-correlation matrix $C$. }
488: \end{figure}
489: The agreement between empirical probability distribution $P_{nn}\left(s\right)$
490: and the distribution of nearest-neighbor eigenvalues spacing of the
491: GOE matrices $P_{GOE}\left(s\right)$ testifies that the positions
492: of two adjacent empirical unfolded eigenvalues at the distance $s$
493: are correlated just as the eigenvalues of the GOE matrices.
494:
495: Next, we took on the distribution $P_{nnn}\left(s'\right)$ of next-nearest-neighbor
496: spacings $s'\equiv\xi_{k+2}-\xi_{k}$ between the unfolded eigenvalues.
497: According to \cite{Dyson2} this distribution should fit to the distribution
498: of nearest-neighbor spacing of the GSE. We demonstrate this correspondence
499: in Figure 4. The solid line shows $P_{GSE}\left(s\right)$.%
500: \begin{figure}[h]
501: \begin{center}\includegraphics{EmpiricalNNNSvsTheoretical.eps}\end{center}
502:
503:
504: \caption{\label{4} Next-nearest-neighbor eigenvalue spacing distribution
505: $P_{nnn}\left(s'\right).$ }
506: \end{figure}
507: Finally, the long-range two-point eigenvalue correlations were tested.
508: It is known \cite{Mehta,Brody,Guhr3}, that if eigenvalues are uncorrelated
509: we expect the number variance to scale with $l$, $\Sigma^{2}\sim l$.
510: Meanwhile, when the unfolded eigenvalues of $C$ are correlated, $\Sigma^{2}$
511: approaches constant value, revealing {}``spectral rigidity'' \cite{Mehta,Brody,Guhr3}.
512: In Figure 5, we contrasted Poissonian number variance with the one
513: we observed, and came to the conclusion that eigenvalues belonging
514: to the {}``bulk'' clearly exhibit universal RMT properties. The
515: broadening parameter $a=8$ was used in Gaussian broadening procedure
516: to unfold the eigenvalues $\lambda_{i}$ \cite{Bruus,Bruus2,Guhr1,Sharifi}.
517: The dashed line corresponds to the case of uncorrelated eigenvalues.%
518: \begin{figure}[h]
519: \begin{center}\includegraphics{NumberVariance.eps}\end{center}
520:
521:
522: \caption{\label{5} Number variance $\Sigma^{2}\left(l\right)$ calculated
523: from the unfolded eigenvalues $\xi_{i}$ of $C$. }
524: \end{figure}
525: These findings show that the system of inter-VLAN traffic has a \emph{universal}
526: part of eigenvalues spectral correlations, shared by broad class of
527: systems, including chaotic and disordered systems, nuclei, atoms and
528: molecules. Thus it can be concluded, that the bulk eigenvalue statistics
529: of the inter-VLAN traffic cross-correlation matrix $C$ are consistent
530: with those of real symmetric random matrix $R$, given by Eq. (\ref{eq5})
531: \cite{Sengupta}. Meantime, the deviations from the RMT contain the
532: information about the system-specific correlations. The next section
533: is entirely devoted to the analysis of the eigenvalues and eigenvectors
534: deviating from the RMT, which signifies the meaningful inter-VLAN
535: traffic interactions.
536:
537:
538: \section{inter-vlan traffic interactions analysis}
539:
540: We overview the points of interest in eigenvectors of inter-VLAN traffic
541: cross-correlation matrix $C$, which are determined according to $Cu^{k}=\lambda_{k}u^{k}$,
542: where $\lambda_{k}$ is $k$-th eigenvalue. Particularly important
543: characteristics of eigenvectors, proven to be useful in physics of
544: disordered conductors is the inverse participation ratio (IPR) (see,
545: for example, Ref. \cite{Guhr3}). In such systems, the IPR being a
546: function of an eigenstate (eigenvector) allows to judge and clarify
547: whether the corresponding eigenstate, and therefore electron is extended
548: or localized.
549:
550:
551: \subsection{Inverse participation ratio of eigenvectors components}
552:
553: For our purposes, it is sufficient to know that IPR quantifies the
554: reciprocal of the number of significant components of the eigenvector.
555: For the eigenvector $u^{k}$ it is defined as\begin{equation}
556: I^{k}\equiv\sum_{l=1}^{N}\left[u_{l}^{k}\right]^{4},\label{eq18}\end{equation}
557: where $u_{l}^{k}$, $l=1,\dots,497$ are components of the eigenvector
558: $u^{k}$. In particular, the vector with one significant component
559: has $I^{k}=1$, while vector with identical components $u_{l}^{k}=1/\sqrt{N}$
560: has $I^{k}=1/N$.%
561: \begin{figure}[h]
562: \begin{center}\includegraphics{IPR1.eps}\end{center}
563:
564:
565: \caption{\label{6} Inverse participation ratio as a function of eigenvalue
566: $\lambda$.}
567: \end{figure}
568: Consequently, the inverse of IPR gives us a number of significant
569: participants of the eigenvector. In Figure 6 we plot the IPR of cross-correlation
570: matrix $C$ as a function of eigenvalue $\lambda$. The control plot
571: is IPR of eigenvectors of random cross-correlation matrix $R$ of
572: Eq. \ref{eq5}. As we can see, eigenvectors corresponding to eigenvalues
573: from $0.25$ to $3.5$, what is within the RMT boundaries, have IPR
574: close to $0$. This means that almost all components of eigenvectors
575: in the bulk interact in a random fashion. The number of significant
576: components of eigenvectors deviating from the RMT is typically twenty
577: times smaller than the one of the eigenvectors within the RMT boundaries,
578: around twenty. For instance, IPR of eigenvector $u^{492}$, which
579: corresponds to the eigenvalue $5.9$ in Figure 6, is $0.05$, i.e.
580: twenty time series are significantly contribute to $u^{492}$. Another
581: observation which we derive from Figure 6 is that the number of eigenvectors
582: significant participants is considerably smaller at both edges of
583: the eigenvalue spectrum. These findings resemble the results of \cite{Guhr1},
584: where the eigenvectors with a few participating components were referred
585: to as \emph{localized} vectors. The theory of \emph{localization}
586: is explained in the context of random band matrices, where elements
587: independently drawn from different probability distributions \cite{Guhr1}.
588: These matrices despite their randomness, still contain probabilistic
589: information. The \emph{localization} in inter-VLAN traffic is explained
590: as follows. The separated broadcast domains, i.e. VLANs forward traffic
591: from one to another only through the router, reducing the routing
592: for broadcast containment. Although the optimal VLAN deployment is
593: to keep as much traffic as possible from traversing through the router,
594: the bottleneck at the large number of VLANs is unavoidable.
595:
596:
597: \subsection{Distribution of eigenvectors components}
598:
599: Another target of interest is the distribution of the components $\left\{ u_{l}^{k};\, l=1,\dots,N\right\} $
600: of eigenvector $u^{k}$ of the interactions matrix $C$. To calculate
601: vectors $u$ we used the \emph{MATLAB} routine again and obtained
602: components distribution $p\left(u\right)$ of the eigenvectors components.
603: Then, we contrasted it with the RMT predictions for the eigenvector
604: distribution $p_{rm}\left(u\right)$ of the random correlation matrix
605: $R$. According to \cite{Guhr3} $p_{rm}\left(u\right)$ has a Gaussian
606: distribution with mean zero and unit variance, i.e.\begin{equation}
607: p_{rm}\left(u\right)=\frac{1}{\sqrt{2\pi}}exp\left(\frac{-u^{2}}{2}\right).\label{eq19}\end{equation}
608: The weights of randomly interacting traffic counts time series, which
609: are represented by the eigenvectors components has to be distributed
610: normally. The results are presented in Figure 7. One can see (from
611: Figures 7a and 7b) that $p\left(u\right)$ for two $u^{k}$ taken
612: from the bulk is in accord with $p_{rm}\left(u\right)$. The distribution
613: $p\left(u\right)$ corresponding to the eigenvalue $\lambda_{i}$,
614: which exceeds the RMT upper bound ($\lambda_{i}>\lambda_{+}$), is
615: shown in Figure 7c. The solid line shows $p_{rm}\left(u\right)$ from
616: Eq. \ref{eq19}. (c) $p\left(u\right)$ for $u^{496}$, corresponding
617: to the eigenvalue outside of the RMT bulk. (d) $p\left(u\right)$
618: for $u^{497}$, corresponding to largest eigenvalue.%
619: \begin{figure}[H]
620: \begin{center}\includegraphics{EigenvectorsDistribution.eps}\end{center}
621:
622:
623: \caption{\label{7} Distribution of components $p\left(u\right)$ of eigenvectors
624: corresponding to eigenvalues (a) from the middle of the bulk, i.e.$\lambda_{-}<\lambda<\lambda_{+}$,
625: (b) from the bulk close to $\lambda_{+}$, (c) $\lambda_{496}$ (d)
626: $\lambda_{497}$.}
627: \end{figure}
628:
629:
630:
631: \subsection{Deviating eigenvalues and significant inter-VLAN traffic series contributing
632: to the deviating eigenvectors.}
633:
634: The distribution of $u^{497}$, the eigenvector corresponding to the
635: largest eigenvalue $\lambda_{497}$, deviates significantly from the
636: Gaussian (as follows from Figure 7d). While Gaussian kurtosis has
637: the value 3, the kurtosis of $p\left(u^{497}\right)$ comes out to
638: $23.22$. The smaller number of significant components of the eigenvector
639: also influences the difference between Gaussian distribution and empirical
640: distribution of eigenvector components. More than half of $u^{497}$components
641: have the same sign, thus slightly shifting the $p\left(u\right)$
642: to one side. This result suggests the existence of the common VLAN
643: traffic intended for inter-VLAN communication that affects all of
644: the significant participants of the eigenvector $u^{497}$with the
645: same bias. We know that the number of significant components of $u^{497}$
646: is twenty two, since IPR of $u^{497}$is $0.045$. Hence, the largest
647: eigenvector content reveals 22 traffic time series, which are affected
648: by the same event. We obtain the time series, which affects 22 traffic
649: time series by the following procedure. First of all, we calculate
650: projection $G^{497}\left(t\right)$ of the time series $G_{i}\left(t\right)$
651: on the eigenvector $u^{497}$,\begin{equation}
652: G^{497}\left(t\right)\equiv\sum_{i=1}^{497}u_{i}^{497}G_{i}\left(t\right)\label{eq20}\end{equation}
653: Next, we compare $G^{497}\left(t\right)$ with $G_{i}\left(t\right)$,
654: by finding the correlation coefficient $\left\langle \frac{G^{497}\left(t\right)}{\sigma^{497}}\frac{G_{i}\left(t\right)}{\sigma_{i}}\right\rangle $.
655: The Fiber Distributed Data Interface (FDDI)-VLAN internet switch at
656: one of the routers demonstrates the largest correlation coefficient
657: of $0.89$ (see Figure 8).%
658: \begin{figure}[h]
659: \begin{center}\includegraphics{Gprojection_1.eps}\end{center}
660:
661:
662: \caption{\label{8} (a) FDDI-VLAN internet switch time series regressed against
663: the projection $G^{497}\left(t\right)$ from Eq. \ref{eq20}. (b)
664: Time series defined by the eigenvector corresponding to eigenvalue
665: within RMT bounds shows no linear dependence on $G^{497}\left(t\right).$}
666: \end{figure}
667: The eigenvector $u^{497}$ has the following content: seven most significant
668: participants are seven FDDI-VLAN switches at the seven routers. The
669: presence of FDDI-VLAN switch provide us with information about VLAN
670: membership definition. FDDI is layer 2 protocol, which means that
671: at least one of two layer 2 membership is used, port group or/and
672: MAC address membership. The next group of significant participants
673: comprises of VLAN traffic intended for routing and already routed
674: traffic from different VLANs. The final group of significant participants
675: constitutes open switches, which pick up any {}``leaking'' traffic
676: on the router. Usually, the {}``leaking'' traffic is the network
677: management traffic, a very low level traffic which spikes when queried
678: by the management systems.
679:
680: If every deviating eigenvalue notifies a particular sub-model of non-random
681: interactions of the network, then every corresponding eigenvector
682: presents the number of significant dimensions of sub-model. Thus,
683: we can think of every deviating eigenvector as a representative network-wide
684: {}``snapshot'' of interactions within the certain dimensions.
685:
686: The analysis of the significant participants of the deviating eigenvectors
687: revealed three types of inter-VLAN traffic time series groupings.
688: One group contains time series, which are interlinked on the router.
689: We recognize them as, router1-VLAN\_1000 traffic, router1-firewall
690: traffic and VLAN\_1000-router1 traffic. The time series, which are
691: listed as router1-vlan\_2000, router2-VLAN\_2000, router3-VLAN\_2000,
692: etc., are reserved for the same service VLAN on every router and comprise
693: another group. The content of these groups suggests the VLANs implementation,
694: it is a mixture of infrastructural approach, where functional groups
695: (departments, schools, etc.) are considered, and service approach,
696: where VLAN provides a particular service (network management, firewall,
697: etc.).
698:
699:
700: \section{stability of inter-vlan traffic interactions in time}
701:
702: We expect to observe the stability of inter-VLAN traffic interactions
703: in the period of time used to compute traffic cross-correlation matrix
704: $C$. The eigenvalues distribution at different time periods provides
705: the information about the system stabilization, i.e. about the time
706: after which the fluctuations of eigenvalues are not significant. Time
707: periods of $1$ hour, $3$ hours and $6$ hours are not sufficient
708: to gain the knowledge about the system, which is demonstrated in Figure
709: 9a. In Figure 9b the system stabilizes after $1$ day period. To observe
710: the time stability of inter-VLAN meaningful interactions we computed
711: the {}``overlap matrix'' of the deviating eigenvectors for the time
712: period $t$ and deviating eigenvectors for the time period $t+\tau$,
713: where $t=60h,\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} $.%
714: \begin{figure}[h]
715: \begin{center}\includegraphics{EigenvaluesvsTime.eps}\end{center}
716:
717:
718: \caption{\label{9} (a) Eigenvalues distributions of traffic streams correlation
719: matrix $C$ for $1$ hour, $3$ hours and $6$ hours time intervals.
720: (b) Eigenvalues distributions for $24$ hours, $48$ hours and $72$
721: hours}
722: \end{figure}
723:
724:
725: First, we obtained matrix D from $p=57$ eigenvectors, which correspond
726: to $p$ eigenvalues outside of the RMT upper bound $\lambda_{+}$.
727: Then we computed the {}``overlap matrix'' $O\left(t,\tau\right)$
728: from $D_{A}D_{B}^{T}$, where $O_{ij}$ is a scalar product of the
729: eigenvector $u^{i}$ of period $A$ (starting at time $t=t$) with
730: $\textrm{u}^{j}$ of period $B$ at the time $t=t+\tau$,
731:
732: \begin{equation}
733: O_{ij}\left(t,\tau\right)\equiv\sum_{k=1}^{N}D_{ik}\left(t\right)D_{ik}\left(t+\tau\right)\label{eq21}\end{equation}
734: The values of $O_{ij}\left(t,\tau\right)$ elements at $i=j$, i.e.
735: of diagonal elements of matrix $O$ will be $1$, if the matrix $D\left(t+\tau\right)$
736: is identical to the matrix $D\left(t\right)$. Clearly, the diagonal
737: of the {}``overlap matrix'' $O$ can serve as an indicator of time
738: stability of $p$ eigenvectors outside of the RMT upper bound $\lambda_{+}$.
739: The gray scale colormap of the {}``overlap matrices'' $O\left(t=60h,\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} \right)$
740: is presented in Figure 10. Black color of grayscale represents $O_{ij}=1$,
741: white color represents $O_{ij}=0.$ The most stable eigenvalue is
742: $\lambda_{492}.$%
743: \begin{figure}[h]
744: \begin{center}\includegraphics{EigenvectorsStability.eps}\end{center}
745:
746:
747: \caption{\label{10} The grayscale of overlap matrix $O\left(t,\tau\right)$
748: at $t=60h$ and $\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} $. }
749: \end{figure}
750: At lag $\tau=3$ hours the inter-VLAN interactions show the highest
751: degree of stability. For further lags the overall stability decays.
752: As the analysis of deviating eigenvectors content showed, the highly
753: interacting traffic time series are time series of service based VLANs,
754: intended for routing. Particular network services are evoked at the
755: same time and active for the same period of time, which explains the
756: stability and consequent decay of deviating eigenvectors of traffic
757: interactions.
758:
759:
760: \section{detecting anomalies of traffic interactions}
761:
762: We assume that the health of inter-VLAN traffic is expressed by stability
763: of its interactions in time. Meanwhile, the temporal critical events
764: or anomalies will cause the temporal instabilities. The {}``deviating''
765: eigenvalues and eigenvectors provide us with stable in time snapshots
766: of interactions representative of the entire network. Therefore, these
767: eigenvectors judged on the basis of their IPR can serve as monitoring
768: parameters of the system stability.
769:
770: Among the essential anomalous events of VLAN infrastructure we can
771: list violations in VLAN membership assignment, in address resolution
772: protocol, in VLAN trunking protocol, router misconfiguration. The
773: violation of membership assignment and router misconfiguration will
774: cause the changes in the picture of random and non-random interactions
775: of inter-VLAN traffic. To shed more light on the possibilities of
776: anomaly detection we conducted the experiments to establish spatial-temporal
777: traces of instabilities caused by artificial and temporal increase
778: of the correlation in normal non-congested inter-VLAN traffic. We
779: explored the possibility to distinguish different types of increased
780: temporal correlations. Finally, we observed the consequences of breaking
781: the interactions between time series, by injecting traffic counts
782: obtained from sample of random distribution.
783:
784: \textbf{\emph{Experiment 1}}
785:
786: We selected the traffic counts time series representing the components
787: of the eigenvector which lies within the RMT bounds and temporarily
788: increased the correlation between these series for three hour period.
789: The proposed monitoring parameters show the dependence of system stability
790: on the number of temporarily correlated time series (see Figure 11).
791: Presented in Figure 11, left to right are (a) eigenvalue distribution
792: of interactions with two temporarily correlated time series, (b) IPR
793: of eigenvectors of interactions with two temporarily correlated time
794: series, (c) the overlap matrix of deviating eigenvectors with two
795: temporarily correlated time series. Top to bottom the layout shows
796: these monitoring parameters when correlation is temporarily increased
797: between 10 connections (d,e and f) and between 20 connections (g,h
798: and i). %
799: \begin{figure}[h]
800: \begin{center}\includegraphics{Experiment1.eps}\end{center}
801:
802:
803: \caption{\label{11} Eigenvalues distribution, IPR and overlap matrix of deviating
804: eigenvectors. }
805: \end{figure}
806: One can conclude that increased temporal correlation between two time
807: series and between ten time series does not affect system stability.
808: Meanwhile, when the number of temporarily correlated time series reaches
809: the number of significant participants of $u^{497},$ which is calculated
810: as inverse of $I^{497}$and is equal to twenty two, the system becomes
811: visibly unstable. The largest eigenvalue changes from $10$ in stable
812: condition to $12$, the tail of inverse participation ratio plot is
813: extended and the diagonal of {}``overlap matrix'' disappears at
814: twenty temporarily correlated time series.
815:
816: In Figures 12 (a, b, c and d ), the temporal correlation between ten
817: time series is traced with the matrix of sorted in decreasing order
818: of their components deviating from RMT eigenvectors.%
819: \begin{figure}[h]
820: \begin{center}\includegraphics{Figure12.eps}\end{center}
821:
822:
823: \caption{\label{12} Sorted deviating eigenvectors with injected correlation
824: among ten traffic time series.}
825: \end{figure}
826: The sorted in decreasing order deviating eigenvectors of $60h$ of
827: uninterrupted traffic are presented in Figure 12a. Then, after three
828: hours of uninterrupted traffic the weights of eigenvectors components,
829: which had zero value start changing, This is captured in Figure 12b.
830: Same process for traffic with induced three hours correlation is captured
831: in Figure 12c. The difference between results in Figures 12b and 12c
832: is presented in Figure 12d. The procedure used to visualize this produces
833: the high rate of false positive alarms.
834:
835: In addition, we visualize in Figure 13 the system instability during
836: temporal increase of correlation between twenty time series with spatial-temporal
837: representation of eigenvector $u^{497}$. %
838: \begin{figure}[h]
839: \begin{center}\includegraphics{Figure5_1.eps}\end{center}
840:
841:
842: \caption{\label{13} (a) The weights of components of $u^{497}$ plotted for
843: time period from $36$ to $84$ hours of uninterrupted traffic with
844: 6 hours interval. (b) The weights of components of $u^{497}$ plotted
845: with respect to the same time period, with induced three hours correlation.
846: (c) The weights of components of $u^{496}$ plotted with respect to
847: the same time period, with induced three hours correlation.}
848: \end{figure}
849: We used the weights of components of eigenvector $u^{497}$, defined
850: for IPR computation and plotted them with respect to time $t+\tau$,
851: where $t=36$ hours and $\tau=6n,$ where $n\in\left\{ 0,1,\dots,7\right\} $.
852: In Figure 13a the spatial-temporal pattern of $u^{497}$ captures
853: precise locations of system-specific interactions of uninterrupted
854: traffic for $84$ hours of observation. The abrupt change of this
855: pattern in Figure 13b indicates the starting point of induced correlation
856: between twenty traffic time series usually interacting in a random
857: fashion. It turns out, that the {}``normal'' stable pattern of eigenvector
858: $u^{497}$ moves to eigenvector $u^{496}$, when the interruption
859: ends. Thus, we are able to observe the end point of the induced correlations
860: in Figure 13c, which represents weights of components of eigenvector
861: $u^{496}$ plotted with respect to the same time intervals. With this
862: setup we are able to locate the anomaly in time and space. Translated
863: to network topological representation, the behavior of eigenvectors
864: $u^{497}$and $u^{496}$ during our manipulations with inter-VLAN
865: traffic may be monitored with the following graphs (see Figure 14).%
866: \begin{figure}[h]
867: \begin{center}\includegraphics{Figure14.eps}\end{center}
868:
869:
870: \caption{\label{14} Left column - behavior of $u^{497}$ during time period
871: from $48$h to $60$h with $6$h time window, induced correlation
872: starts at $54$h and lasts for $3$h. Right column - behavior of $u^{496}$
873: in same conditions.}
874: \end{figure}
875:
876:
877: \textbf{\emph{Experiment 2}}
878:
879: In the previous experiment we injected just one type of increased
880: correlation among time series. Now we make two and three different
881: types of induced correlations produce different spatial-temporal patterns
882: on eigenvector $u^{497}$ components (see Figure 15). Time series
883: for temporal increase of correlation are obtained in the same way
884: as in Experiment 1. We temporarily increased the correlation between
885: series by inducing elements from distributions of sine function and
886: quadratic function, respectively for three hours.%
887: \begin{figure}[h]
888: \begin{center}\includegraphics{Figure6_1.eps}\end{center}
889:
890:
891: \caption{\label{15} (a) The weights of components of $u^{497}$ plotted for
892: time period from $36$ to $84$ hours with $6$ hours interval, two
893: different types of induced correlations. (b) The weights of components
894: of $u^{497}$ plotted with respect to the same time period, three
895: different types of induced correlations. }
896: \end{figure}
897: In Figure 15a, one type of three hours correlation is induced among
898: ten traffic time series and another type of correlation among other
899: ten time series. Three different types of three hours correlations
900: are induced among twenty traffic time series in Figure 15b. The sorted
901: in decreasing order content of significant components shows that time
902: series tend to group according to the type of correlation they are
903: involved in.
904:
905: \textbf{\emph{Experiment 3}}
906:
907: Next we turn our attention to disruption of normal picture of inter-VLAN
908: traffic interactions. This can be done by injecting the traffic from
909: random distribution to non-randomly interacting time series for three
910: hours. We demonstrate it by examining the eigenvalue distribution,
911: the IPR and the deviating eigenvectors overlap matrix plotted in Figure
912: 16.%
913: \begin{figure}[h]
914: \begin{center}\includegraphics{Experiment3.eps}\end{center}
915:
916:
917: \caption{\label{16} Eigenvalues distribution, IPR and overlap matrix of deviating
918: eigenvectors of inter-VLAN traffic cross-correlation matrix $C$. }
919: \end{figure}
920: After $60$ hours of uninterrupted traffic, we injected elements from
921: random distribution to significant participants of $u^{497}$ for
922: three hours. The largest eigenvalue increases, from $10$ to $12.$
923: Extended IPR tail shows the larger number of \emph{localized} eigenvectors
924: and we observe the dramatic break in deviating eigenvectors stability.
925:
926:
927: \section{conclusion and future work}
928:
929: The RMT methodology we used in this paper enables us to analyze the
930: complex system behavior without the consideration of system constraints,
931: type and structure. Our goal was to investigate the characteristics
932: of day-to-day temporal dynamics of the system of interconnected routers
933: with VLAN subnets of the University of Louisville. The type and structure
934: of the system at hand suggests the natural interpretation of the RMT-like
935: behavior and the RMT deviating results. The time stable random interactions
936: signify the healthy, and free of congestion traffic. The time stable
937: non-random interactions provide us with information about large-scale
938: network-wide traffic interactions. The changes in the stable picture
939: of random and non-random interactions signify the temporal traffic
940: anomalies.
941:
942: In general, the fact of sharing the universal properties of the bulk
943: of eigenvalues spectrum of inter-VLAN traffic interactions with random
944: matrices opens a new venue in network-wide traffic modeling. As stated
945: in \cite{Guhr1}, in physical systems it is common to start with the
946: model of dynamics of the system. This way, one would model the traffic
947: time series interactions with the family of stochastic differential
948: equations \cite{Farmer,Cont}, which describe the {}``instantaneous''
949: traffic counts \begin{equation}
950: g_{i}\left(t\right)=\left(d/dt\right)lnT_{i}\left(t\right),\label{eq22}\end{equation}
951: as a random walk with couplings. Then one would relate the revealed
952: interactions to the correlated {}``modes'' of the system.
953:
954: Additional question that RMT findings raise in network-wide traffic
955: analysis is whether the found eigenvalues spectrum correlations and
956: \emph{localized} eigenvectors outside of RMT bulk can add to the explanation
957: of the fundamental property of the network traffic, such as self-similarity
958: \cite{Leland}.
959:
960: To summarize, we have tested the eigenvalues statistics of inter-VLAN
961: traffic cross-correlation matrix $C$ against the null hypothesis
962: of random correlation matrix. By separating the eigenvalues spectrum
963: correlations of random matrices that are present in this system, the
964: uncongested state of the network traffic is verified. We analyzed
965: the stable in time system-specific correlations. The analyzed eigenvalues
966: and eigenvectors deviating from the RMT showed the principal groups
967: of VLAN-router switches, groups of traffic time series interlinked
968: through the firewalls and groups of same service VLANs at every router.
969: With straightforward experiments on the traffic time series, we demonstrated
970: that eigenvalue distribution, IPR of eigenvectors, overlap matrix
971: and spatial-temporal patterns of deviating eigenvectors can monitor
972: the stability of inter-VLAN traffic interactions, detect and spot
973: in time and space of any network-wide changes in normal traffic time
974: series interactions.
975:
976: As the reservation for the future work, we would like to investigate
977: the behavior of delayed traffic time series cross-correlation matrix
978: $C_{d}$ in the RMT terms. The importance of delay in measurement-based
979: analysis of Internet is emphasized in \cite{Zhang}. To understand
980: and quantify the effect of one time series on another at a later time,
981: one can calculate the delay correlation matrix, where the entries
982: are cross-correlation of one time series and another at a time delay
983: $\tau$ \cite{Mayya}. In addition, we are interested in testing the
984: fruitfulness of the RMT approach on the larger system of inter-domain
985: interactions, for instance, on 5-minute averaged traffic count time
986: series of underlying backbone circuits of Abilene backbone network.
987:
988:
989: \section*{acknowledgment}
990:
991: This research was partially supported by a grant from the US Department
992: of Treasury through a subcontract from the University of Kentucky.
993: The authors thank Igor Rozhkov for consulting on the RMT methodology.
994: We thank Hans Fiedler, University of Louisville network manager, for
995: MRTG data of UofL routers system used in this study and helpful suggestions
996: in network interpretations of our results. We are grateful to Nathan
997: Johnson, University of Louisville super computing administrator, for
998: providing the computing environment and space.
999:
1000: \begin{thebibliography}{10}
1001: \bibitem{Fukuda}K. Fukuda, PhD Thesis: A study on phase transition phenomena in internet
1002: traffic, Keio University, 1999.
1003: \bibitem{Ohira}T. Ohira, R. Sawatari, Phase transition in a computer network traffic
1004: model, Phys. Rev. E \textbf{58}, July 1998, 193-195.
1005: \bibitem{Barthelemy}M. Barthelemy, B. Gondran and E. Guichard, Large scale cross-correlations
1006: in internet traffic, arXiv:cond0mat/0206185 vol \textbf{2} 3 Dec 2002.
1007: \bibitem{LCD}A. Lakhina, M. Crovella, and C. Diot, Detecting distributed attacks
1008: using network-wide flow traffic, Proceedings of FloCon 2005 Analysis
1009: Workshop, 2005.
1010: \bibitem{Crovella}A. Lakhina, M. Crovella, and C. Diot. Mining Anomalies Using Traffic
1011: Feature Distributions. Technical Report BUCS-TR-2005-002, Boston University,
1012: 2005.
1013: \bibitem{Wigner1}E.P. Wigner, On a class of analytic functions from the quantum theory
1014: of collisions, Ann. Math. \textbf{\noun{53}}, 36 (1951), Proc. Cambridge
1015: Philos. Soc. \textbf{47}, 790 (1951).
1016: \bibitem{Dyson1}F. Dyson, Statistical theory of the energy levels of complex systems,
1017: J. Math. Phys. \textbf{3}, 140 (1962).
1018: \bibitem{Dyson2}F. Dyson and M.L. Mehta, Statistical theory of the energy levels of
1019: complex systems, J. Math. Phys. \textbf{4}, 701, 713 (1963).
1020: \bibitem{Mehta}M.L Mehta, Random matrices (Academic Press, Boston, 1991).
1021: \bibitem{Brody}T.A. Brody, J.Flores, J.B. French, P.A. Mello, A. Pandey, and S.S.M.
1022: Wong, Random-matrix physics: spectrum and strength fluctuations, Rev.
1023: Mod. Phys. \textbf{53}, 385 - 479, issue \textbf{3}, July 1981.
1024: \bibitem{Guhr3}T. Guhr, A. Muller-Groeling, and H.A. Weidenmuller, Random matrix
1025: theories in quantum physics: common concepts, Phys. Rep. \textbf{299},
1026: 190 (1998).
1027: \bibitem{Seba}M. Krbalek and P.Seba, Statistical properties of the city transport
1028: in Cuernavaca (Mexico) and random matrix theory. J. Phys. \textbf{214}
1029: (2000), 1, 91-100.
1030: \bibitem{McNutt}J. McNutt and M. De Shon, Correlation between quiescent ports in network
1031: flows, CERT network situational awareness group report, Carnegie Mellon
1032: University, September 2005.
1033: \bibitem{Crovella2}A. Lakhina, M. Crovella, and C. Diot, Characterization of network-wide
1034: anomalies in traffic flows, Proceedings of the ACM/SIGCOMM Internet
1035: Measurement conference, 2004, 201-206.
1036: \bibitem{Min}L. Min, Y. Shun-Zheng, A network-wide traffic anomaly detection method
1037: based on HSMM, Int. conf. on communications, circuits and system proceedings,
1038: vol \textbf{6}, June 2006, 1636 - 1640.
1039: \bibitem{Roughan}M. Roughan, T. Griffin, M. Mao, A. Greenberg, and B. Freeman, Combining
1040: routing and traffic data for detection of IP forwarding anomalies,
1041: Proceedings of the joint int. conf. on Measurement and modeling of
1042: computer systems, 2004, 416 - 417.
1043: \bibitem{Huang}L. Huang, X. Nguyen, M. Garofalakis, M. Jordan, A. Joseph and N. Taft,
1044: Distributed PCA and network anomaly detection, Technical report No.
1045: UCB/EECS-2006-99.
1046: \bibitem{Sharifi}S. Sharifi, M. Crane, A. Shamaie and H. Ruskin, Random matrix portfolio
1047: optimization: a stability approach, Physica A \textbf{335} (2004)
1048: 629-643.
1049: \bibitem{Guhr1}V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, T. Guhr,
1050: and H.E. Stanley, Random matrix theory approach to cross correlations
1051: in financial data, Phys. Rev. E, vol \textbf{65}, 066126, 27 June
1052: 2002.
1053: \bibitem{Tulino}A. Tulino and S. Verdu, Random matrix theory and wireless communications,
1054: Communications and Information theory, vol \textbf{1}, issue \textbf{1},
1055: June 2004, 1 - 182.
1056: \bibitem{Tse}D. Tse, Multiuser receivers, random matrices and free probability,
1057: Proceedings of 37th Ann. Allerton Conf., Monticello, IL, September
1058: 1999.
1059: \bibitem{Zee}A. Zee, Random matrix theory and RNA folding, Acta Physica Polonica
1060: B, vol \textbf{36}, No \textbf{9}, June 2005.
1061: \bibitem{Laloux}L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Noise dressing
1062: of financial correlation matrices, Phys. Rev. Lett. \textbf{83}, August
1063: 1999, 1467-1470.
1064: \bibitem{Sengupta}A.M. Sengupta and P.P. Mitra, Distributions of singular values for
1065: some random matrices, arXiv:cond-mat/9709283 vol \textbf{1} 25 September
1066: 1997.
1067: \bibitem{Stockman}H.-J. Stockman, Quantum Chaos: an introduction, 1999.
1068: \bibitem{Bouchaud}J.-P. Bouchaud, Theory of financial risk and derivative pricing: from
1069: statistical physics to risk management, 1962.
1070: \bibitem{Bruus}H. Bruus and J.-C. Angles d'Auriac, Energy level statistics of two-dimensional
1071: Hubbard model at low filling, arXiv:cond-mat/9610142 vol \textbf{1}
1072: 18 October 1996.
1073: \bibitem{Farmer}J.D. Farmer, Market Force, ecology and evolution, e-print adap-org/9812005,
1074: Int. J. Theo. Appl. fin. \textbf{3}, 425, 2000.
1075: \bibitem{Cont}J.-P. Bouchaud, R. Cont, A Langevin approach to stock market fluctuations
1076: and crashes, European Journal of Physics, B \textbf{6}, 543, 1998.
1077: \bibitem{Leland}W.E. Leland, M.S. Taqq, W. Willinger, and D.V. Willson, On the self-similar
1078: nature of Ethernet traffic, ACM SIGCOMM, 1993, 183 - 193.
1079: \bibitem{Zhang}B. Zhang, T.S. Eugene Ng, and A. Nandi, Measurement-based analysis,
1080: modeling, and synthesis of the Internet delay space, Proceedings of
1081: the 6-th ACM SIGCOMM on Internet Measurement, 2006, 85-98.
1082: \bibitem{Mayya}K.B.K. Mayya and R.E. Amritkar, Analysis of delay correlation matrices,
1083: oai:arXiv.org:cond-mat/0601279 (2006-12-20).
1084: \bibitem{Lau}W.-C. Lau, S.-Q. Li, Traffic analysis in large-scale high-speed integrated
1085: networks:validation of nodal decomposition approach, INFOCOM, 1993,
1086: Proceedings of twelfth annual joint conference of the IEEE Computer
1087: and Communications Societies, vol \textbf{3}, 1320-1329.
1088: \bibitem{Allen}W.H. Allen, G.A. Marin, L.A. Rivera, Automated detection of malicious
1089: reconnaissance to enhance network security, SoutheastCon, 2005, Proceedings
1090: of IEEE, issue 8-10, April 2005, 450-454.
1091: \bibitem{Bruus2}H. Bruus and J.-C. Angles d'Auriac, The spectrum of two-dimensional
1092: Hubbard model at low filling, Europhysics letters, \textbf{35} (5),
1093: 321-326, 1999.
1094: \end{thebibliography}
1095: \appendix
1096:
1097: \section{RMT}
1098:
1099: In this Appendix, we provide a short (and non-rigorous) explanation
1100: of main concepts and glossary of terms used in the RMT studies. The
1101: RMT approaches, which originated in nuclear and condensed matter physics
1102: and later became common in many branches of mathematical physics \cite{Stockman},
1103: have recently penetrated into econophysics, finance \cite{Bouchaud}
1104: and network traffic analysis \cite{Barthelemy}.
1105:
1106: For the statistical description of complex physical systems, such
1107: as, for example, atomic nucleus or acoustical reverberant structure,
1108: the RMT serves as guiding light when one is interested in the degree
1109: of mutual interaction of the constituents. As it turns out, the uncorrelated
1110: energy levels or acoustic eigenfrequencies would produce qualitatively
1111: different result from those obeying RMT-like correlations \cite{Stockman}.
1112: Therefore, real (experimentally measured) spectra can help to decide
1113: on the nature of interactions in the underlying system. To be specific,
1114: ideally, symmetric system is expected to exhibit spectral properties
1115: drastically different from the properties of generic one, and if the
1116: spectral properties are those of RMT systems, other ideas of RMT can
1117: be brought to the researcher aid.
1118:
1119: To describe {}``awareness'' of the structural constituents about
1120: each other, scientists in different fields use similar constructs.
1121: Physicists use Hamiltonian matrix, engineers stiffness matrix, finance
1122: and network analysts the equal-time cross-correlation matrix. Although
1123: the physical meaning of mentioned operators can be different, the
1124: eigenvalues/eigenvectors analysis seems to be a universally accepted
1125: tool. The eigenvalues have direct connection to spectrum of physical
1126: systems, while eigenvectors can be used for the description of excitation/signal/information
1127: propagation inside the system. In physics, the RMT approaches come
1128: about whenever the system of interest demonstrates certain qualitative
1129: features in their spectral behavior. For example, if one looks at
1130: nearest neighbor spacing distribution of eigenvalues and instead of
1131: Poisson law\[
1132: P\left(s\right)=\exp\left(-s\right),\]
1133: discovers {}``Wigner surmise''\[
1134: P\left(s\right)=\frac{\pi}{2}s\exp\left(-\frac{\pi}{2}s^{2}\right),\]
1135: one concludes (upon running several additional statistical tests)
1136: that apparatus of RMT can be used for the system at hand, and system
1137: matrix can be replaced by a matrix with random entries. For mathematical
1138: convenience, these entries are given Gaussian weight. The only other
1139: ingredient of this rather succinct phenomenological model is recognizing
1140: the physical situation. For example, systems with and without magnetic
1141: field and/or central symmetry are described by different matrix ensembles
1142: (that is the set of matrices) with elements distributed within distribution
1143: corresponding to the same $\beta$\[
1144: P^{\left(\beta\right)}\left(H\right)\propto\textrm{exp}\left(-\frac{\beta}{4v^{2}}trH^{2}\right),\]
1145: where the constant $v$ sets the length of the resulting eigenvalues
1146: spectrum.
1147:
1148: The very fact that RMT can be helpful in statistical description of
1149: the broad range of systems suggests that these systems are analyzed
1150: in a certain special \emph{universal} regime, in which physical or
1151: other laws are undermined by equilibrated and ergodic evolution. In
1152: most physical applications, a Hamiltonian matrix is rather sparse,
1153: indicating lack of interaction between different subparts of the corresponding
1154: object. However, if the universal regime is inferred from the above
1155: mentioned statistical tests, it is very beneficial to replace this
1156: single matrix with the ensemble of random matrices. Then, one can
1157: proceed with statistical analysis using matrix ensemble for calculation
1158: of statistical averages more relevant for the physical problem at
1159: hand than the statistics of eigenvalues. The latter can be mean or
1160: variance of the response to external or internal excitation.
1161: \end{document}
1162: