0706:0706.2520/AnalysisOfTrafficInteractions.tex

1: %% LyX 1.3 created this file.  For more info, see http://www.lyx.org/.

2: %% Do not edit unless you really know what you are doing.

3: \documentclass{IEEEtran}

4: \usepackage[T1]{fontenc}

5: \usepackage{float}

6: \usepackage{graphicx}

7:

8: \makeatletter

9:

10: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.

11: \newcommand{\noun}[1]{\textsc{#1}}

12: %% Bold symbol macro for standard LaTeX users

13: \providecommand{\boldsymbol}[1]{\mbox{\boldmath $#1$}}

14:

15:

16: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.

17:  \newcommand{\lyxaddress}[1]{

18:    \par {\raggedright #1

19:    \vspace{1.4em}

20:    \noindent\par}

21:  }

22:

23: \makeatother

24: \begin{document}

25:

26: \title{Analysis of Inter-Domain Traffic Correlations: Random Matrix Theory

27: Approach}

28:

29:

30: \author{Viktoria Rojkova, Mehmed Kantardzic}

31:

32: \maketitle

33:

34: \lyxaddress{Department of Computer Engineering and Computer Science, University

35: of Louisville, Louisville, KY 40292 email: \{vbrozh01, mmkant01\}@gwise.louisville.edu }

36:

37: \begin{abstract}

38: The traffic behavior of University of Louisville network with the

39: interconnected backbone routers and the number of Virtual Local Area

40: Network (VLAN) subnets is investigated using the Random Matrix Theory

41: (RMT) approach. We employ the system of equal interval time series

42: of traffic counts at all router to router and router to subnet connections

43: as a representation of the inter-VLAN traffic. The cross-correlation

44: matrix $C$ of the traffic rate changes between different traffic

45: time series is calculated and tested against null-hypothesis of random

46: interactions.

47:

48: The majority of the eigenvalues $\lambda_{i}$ of matrix $C$ fall

49: within the bounds predicted by the RMT for the eigenvalues of random

50: correlation matrices. The distribution of eigenvalues and eigenvectors

51: outside of the RMT bounds displays prominent and systematic deviations

52: from the RMT predictions. Moreover, these deviations are stable in

53: time.

54:

55: The method we use provides a unique possibility to accomplish three

56: concurrent tasks of traffic analysis. The method verifies the uncongested

57: state of the network, by establishing the profile of random interactions.

58: It recognizes the system-specific large-scale interactions, by establishing

59: the profile of stable in time non-random interactions. Finally, by

60: looking into the eigenstatistics we are able to detect and allocate

61: anomalies of network traffic interactions.

62: \end{abstract}

63:

64: \section*{Categories and Subject Descriptors}

65:

66: C.2.3 {[}\textbf{Computer-Communication Networks}{]}: Network Operations

67:

68:

69: \section*{General Terms}

70:

71: Measurement, Experimentation

72:

73: \begin{keywords}

74: Network-Wide Traffic Analysis, Random Matrix Theory, Large-Scale Correlations

75: \end{keywords}

76:

77: \section{introduction}

78:

79: The infrastructure, applications and protocols of the system of communicating

80: computers and networks are constantly evolving. The traffic, which

81: is an essence of the communication, presently is a voluminous data

82: generated on minute-by-minute basis within multi-layered structure

83: by different applications and according to different protocols. As

84: a consequence, there are two general approaches in analysis of the

85: traffic and in modeling of its healthy behavior. In the first approach,

86: the traffic analysis considers the protocols, applications, traffic

87: matrix and routing matrix estimates, independence of ingress and egress

88: points and much more. The second approach treats the infrastructure

89: between the points from which the traffic is obtained as a {}``black

90: box'' \cite{Lau,Allen}.

91:

92: Measuring interactions between logically and architecturally equivalent

93: substructures of the system is a natural extension of the {}``black

94: box'' approach. Certain amount of work in this direction has already

95: been done. Studies on statistical traffic flow properties revealed

96: the {}``congested'', {}``fluid'' and {}``transitional'' regimes

97: of the flow at a large scale \cite{Fukuda,Ohira}. The observed collective

98: behavior suggests the existence of the large-scale network-wide correlations

99: between the network subparts. Indeed, the \cite{Barthelemy} work

100: showed the large-scale cross-correlations between different connections

101: of the Renater scientific network. Moreover, the analysis of correlations

102: across all simultaneous network-wide traffic has been used in network

103: distributed attacks detection \cite{LCD}.

104:

105: The distributions and stability of established interactions statistics

106: represent the characteristic features of the system and may be exploited

107: in healthy network traffic profile creation, which is an essential

108: part of network anomaly detection. As it is successfully demonstrated

109: in  \cite{Crovella}, all tested traffic anomalies change the distribution

110: of the traffic features.

111:

112: Among numerous types of traffic monitoring variables, time series

113: of traffic counts are free of applications {}``semantics'' and thus

114: more preferable for {}``black box'' analysis. To extract the meaningful

115: information about underlying interactions contained in time series,

116: the empirical correlation matrix is a usual tool at hand. In addition,

117: there are various classes of statistical tools, such as principal

118: component analysis, singular value decomposition, and factor analysis,

119: which in turn strongly rely on the validity of the correlation matrix

120: and obtain the meaningful part of the time series. Thus, it is important

121: to understand quantitatively the effect of noise, i.e. to separate

122: the noisy, random interactions from meaningful ones. In addition,

123: it is crucial to consider the finiteness of the time series in the

124: determination of the empirical correlation, since the finite length

125: of time series available to estimate cross correlations introduces

126: {}``measurement noise'' \cite{Guhr1}. Statistically, it is also

127: advisable to develop null-hypothesis tests in order to check the degree

128: of statistical validity of the results obtained against cases of purely

129: random interactions.

130:

131: The methodology of random matrix theory (RMT) developed for studying

132: the complex energy levels of heavy nuclei and is given a detailed

133: account in \cite{Wigner1,Dyson1,Dyson2,Mehta,Brody,Guhr3}. For our

134: purposes this methodology comes in as a series of statistical tests

135: run on the eigenvalues and eigenvectors of {}``system matrix'',

136: which in our case is traffic time series cross-correlation matrix

137: $C$ (and is Hamiltonian matrix in case of nuclei and other RMT systems

138: \cite{Wigner1,Dyson1,Dyson2,Mehta,Brody,Guhr3}).

139:

140: In our study, we propose to investigate the network traffic as a complex

141: system with a certain degree of mutual interactions of its constituents,

142: i.e. single-link traffic time series, using the RMT approach. We concentrate

143: on the large scale correlations between the time series generated

144: by Simple Network Manage Protocol (SNMP) traffic counters at every

145: router-router and router-VLAN subnet connection of University of Louisville

146: backbone routers system.

147:

148: The contributions of this study are as follows:

149:

150: \begin{itemize}

151: \item We propose the application constraints free methodology of network-wide

152: traffic time series interactions analysis. Even though in this particular

153: study, we know in advance that VLANs represent separate broadcast

154: domains, VLAN-router incoming traffic is a traffic intended for other

155: VLANs and VLAN-router outgoing traffic is a routed traffic from other

156: VLANs. Nevertheless, this information is irrelevant for our analysis

157: and acquired only at the interpretation of the analysis results.

158: \item Using the RMT, we are able to separate the random interactions from

159: system specific interactions. The vast majority of traffic time series

160: interact in random fashion. The time stable random interactions signify

161: the healthy, and free of congestion traffic. The proposed analysis

162: of eigenvector distribution allows to verify the time series content

163: of uncongested traffic.

164: \item The time stable non-random interactions provide us with information

165: about large-scale system-specific interactions.

166: \item Finally, the temporal changes in random and non-random interactions

167: can be detected and allocated with eigenvalues and eigenvectors statistics

168: of interactions.

169: \end{itemize}

170: The organization of this paper is as follows. Section II presents

171: the survey of related work. We describe the RMT methodology in Section

172: III. Section IV contains the explanation of the data analyzed. In

173: Section V we test the eigenvalue distribution of inter-VLAN traffic

174: time series cross-correlation matrix C against the RMT predictions.

175: In Section VI we analyze the content of inter-VLAN traffic interactions

176: by mean of eigenvalues and eigenvectors deviated from RMT. Section

177: VII discusses the characteristic traffic interactions parameters of

178: the system such as time stability of the deviating eigenvalues and

179: eigenvectors, inverse participation ratio (IPR) of eigenvalues spectra,

180: localization points in IPR plot, overlap matrices of the deviating

181: eigenvectors. With series of different experiments, we demonstrate

182: how traffic interactions anomalies can be detected and allocated in

183: time and space using various visualization techniques on eigenvalues

184: and eigenvectors statistics in Section VIII. We present our conclusions

185: and prospective research steps in Section IX.

186:

187:

188: \section{related work}

189:

190: Few works investigate the interactions of traffic time series regardless

191: of underlying architecture of the traffic system. As it was stated

192: in Introduction, the study of \cite{Barthelemy} showed the large-scale

193: cross-correlations between different connections of the French scientific

194: network Renater with 26 interconnected routers and 650 connections

195: links. The random interactions between traffic time series of complex

196: traffic system without the routing protocol information were established

197: by Krbalek and Seba in \cite{Seba} for transportation system in Cuernavaca

198: (Mexico).

199:

200: The urgent need for a network-wide, scalable approach to the problem

201: of healthy network traffic profile creation is expressed in works

202: of \cite{Crovella,Crovella2,Min,McNutt,Roughan,Huang}. There are

203: several studies with the promising results, which demonstrate that

204: the traffic anomalous events cause the temporal changes in statistical

205: properties of traffic features. Lakhina, Crovella and Diot presented

206: the characterization of the network-wide anomalies of the traffic

207: flows. The authors studied three different types of traffic flows

208: and fused the information from flow measurements taken throughout

209: the entire network. They obtained and classified a different set of

210: anomalies for different traffic types using the subspace method \cite{Crovella2}.

211:

212: The same group of researchers extended their work in \cite{Crovella}.

213: Under the new assumption that any network anomaly induces the changes

214: in distributional aspects of packet header fields, they detected and

215: identified large set of anomalies using the entropy measurement tool.

216:

217: Hidden Markov model has been proposed to model the distribution of

218: network-wide traffic in \cite{Min}. The observation window is used

219: to distinguish denial of service (DoS) flooding attack mixed with

220: the normal background traffic.

221:

222: Roughan et al. combined the entire network routing and traffic data

223: to detect the IP forwarding anomalies \cite{Roughan}.

224:

225: Huang et al., \cite{Huang} used the distributed version of the Principal

226: Component Analysis (PCA) method for centralized network-wide volume

227: anomaly detection. A key ingredient of their framework is an analytical

228: method based on stochastic matrix perturbation theory that balances

229: between the accuracy of the approximate network anomaly detection

230: and the amount of data communication over the network.

231:

232: The authors of \cite{McNutt} found the high temporal correlation

233: (frequently > 0.99) between flow counts on quiescent ports (TCP/IP

234: ports which are not in regular use) at the one of the known pre-attack,

235: so called \emph{reconnaissance}, anomalous behavior, vertical scan.

236:

237:

238: \section{rmt methodology}

239:

240: The RMT was employed in the financial studies of stock correlations

241: \cite{Sharifi,Guhr1}, communication theory of wireless systems \cite{Tulino},

242: array signal processing \cite{Tse}, bioinformatics studies of protein

243: folding \cite{Zee}. We are not aware of any work, except for \cite{Barthelemy},

244: where RMT techniques were applied to the Internet traffic system.

245:

246: We adopt the methodology used in works on financial time series correlations

247: (see \cite{Sharifi,Guhr1} and references therein) and later in \cite{Barthelemy},

248: which discusses cross-correlations in Internet traffic. In particular,

249: we quantify correlations between $N$ traffic counts time series of

250: $L$ time points, by calculating the traffic rate change of every

251: time series $T$ $i=1,\dots,N$ , over a time scale $\Delta t$,\begin{equation}

252: G_{i}\left(t\right)\equiv\textrm{ln}\, T_{i}\left(t+\Delta t\right)-\textrm{ln}\, T_{i}\left(t\right)\label{eq1}\end{equation}

253: where $T{}_{i}\left(t\right)$ denotes the traffic rate of time series

254: $i$. This measure is independent from the volume of the traffic exchange

255: and allows capturing the subtle changes in the traffic rate \cite{Barthelemy}.

256: The normalized traffic rate change is

257:

258: \begin{equation}

259: g_{i}\left(t\right)\equiv\frac{G_{i}\left(t\right)-\left\langle G_{i}\left(t\right)\right\rangle }{\sigma_{i}}\label{eq2}\end{equation}

260: where $\sigma_{i}\equiv\sqrt{\left\langle G_{i}^{2}\right\rangle -\left\langle G_{i}\right\rangle ^{2}}$

261: is the standard deviation of $G_{i}$. The equal-time cross-correlation

262: matrix $C$ can be computed as follows\begin{equation}

263: C_{ij}\equiv\left\langle g_{i}\left(t\right)g_{j}\left(t\right)\right\rangle \label{eq3}\end{equation}

264: The properties of the traffic interactions matrix $C$ have to be

265: compared with those of a random cross-correlation matrix \cite{Laloux}.

266: In matrix notation, the interaction matrix $C$ can be expressed as\begin{equation}

267: C=\frac{1}{L}GG^{T},\label{eq4}\end{equation}

268: where $G$ is $N\times L$ matrix with elements $\left\{ g_{i\, m}\equiv g_{i}\left(m\bigtriangleup t\right);\right.$

269: $i=1,\dots,N;$ $\left.m=0,\dots,L-1\right\} ,$ and $G^{T}$ denotes

270: the transpose of $G$. Just as was done in \cite{Guhr1}, we consider

271: a random correlation matrix \begin{equation}

272: R=\frac{1}{L}AA^{T},\label{eq5}\end{equation}

273: where $A$ is $N\times L$ matrix containing $N$ time series of $L$

274: random elements $a_{i\, m}$ with zero mean and unit variance, which

275: are mutually uncorrelated as a null hypothesis.

276:

277: Statistical properties of the random matrices $R$ have been known

278: for years in physics literature \cite{Wigner1,Brody,Dyson1,Dyson2,Mehta,Guhr3}.

279: In particular, it was shown analytically \cite{Sengupta} that, under

280: the restriction of $N\rightarrow\infty,$ $L\rightarrow\infty$ and

281: providing that $Q\equiv L/N$$\left(>1\right)$ is fixed, the probability

282: density function $P_{rm}\left(\lambda\right)$ of eigenvalues $\lambda$

283: of the random matrix $R$ is given by

284:

285: \begin{equation}

286: P_{rm}\left(\lambda\right)=\frac{Q}{2\pi}\frac{\sqrt{\left(\lambda_{+}-\lambda\right)\left(\lambda-\lambda_{-}\right)}}{\lambda}\label{eq6}\end{equation}

287: where $\lambda_{+}$ and $\lambda_{-}$ are maximum and minimum eigenvalues

288: of $R,$ respectively and $\lambda_{-}\leq\lambda_{i}\leq\lambda_{+}$.

289: $\lambda_{+}$ and $\lambda_{-}$are given analytically by

290:

291: \begin{equation}

292: \lambda_{\pm}=1+\frac{1}{Q}\pm2\sqrt{\frac{1}{Q}}.\label{eq7}\end{equation}

293: Random matrices display \emph{universal} functional forms for eigenvalues

294: correlations which depend on the general symmetries of the matrix

295: only. First step to test the data for such a universal properties

296: is to find a transformation called {}``unfolding'', which maps the

297: eigenvalues $\lambda_{i}$ to new variables, {}``unfolded eigenvalues''

298: $\xi_{i},$ whose distribution is uniform \cite{Mehta,Brody,Guhr3}.

299: Unfolding ensures that the distances between eigenvalues are expressed

300: in units of \emph{local} mean eigenvalues spacing \cite{Mehta}, and

301: thus facilitates the comparison with analytical results.

302:

303: We define the cumulative distribution function of eigenvalues, which

304: counts the number of eigenvalues in the interval $\lambda_{i}\leq\lambda,$

305:

306: \begin{equation}

307: F\left(\lambda\right)=N\int_{-\infty}^{\lambda}P\left(x\right)dx,\label{eq8}\end{equation}

308: where $P\left(x\right)$ denotes the probability density of eigenvalues

309: and $N$ is the total number of eigenvalues. The function $F\left(\lambda\right)$

310: can be decomposed into an average and a fluctuating part, \begin{equation}

311: F\left(\lambda\right)=F_{av}\left(\lambda\right)+F_{fluc}\left(\lambda\right),\label{eq9}\end{equation}

312: Since $P_{fluc}\equiv dF_{fluc}\left(\lambda\right)/d\lambda=0$ on

313: average, \begin{equation}

314: P_{rm}\left(\lambda\right)\equiv\frac{dF_{av}\left(\lambda\right)}{d\lambda},\label{eq10}\end{equation}

315: is the averaged eigenvalues density. The dimensionless, unfolded eigenvalues

316: are then given by \begin{equation}

317: \xi_{i}\equiv F_{av}\left(\lambda_{i}\right).\label{eq11}\end{equation}

318:

319:

320: Three known universal properties of GOE matrices (matrices whose elements

321: are distributed according to a Gaussian probability measure) are:

322: (i) the distribution of nearest-neighbor eigenvalues spacing $P_{GOE}\left(s\right)$

323: \begin{equation}

324: P_{GOE}\left(s\right)=\frac{\pi s}{2}exp\left(-\frac{\pi}{4}s^{2}\right),\label{eq12}\end{equation}

325: (ii) the distribution of next-nearest-neighbor eigenvalues spacing,

326: which is according to the theorem due to \cite{Dyson2} is identical

327: to the distribution of nearest-neighbor spacing of Gaussian symplectic

328: ensemble (GSE),

329:

330: \begin{equation}

331: P_{GSE}\left(s\right)=\frac{2^{18}}{3^{6}\pi^{3}}s^{4}exp\left(-\frac{64}{9\pi}s^{2}\right)\label{eq13}\end{equation}

332: and finally (iii) the {}``number variance'' statistics $\Sigma^{2}$,

333: defined as the variance of the number of unfolded eigenvalues in the

334: intervals of length $l$, around each $\xi_{i}$ \cite{Mehta,Guhr3,Brody}.\begin{equation}

335: \Sigma^{2}\left(l\right)=\left\langle \left[n\left(\xi,l\right)-l\right]^{2}\right\rangle _{\xi},\label{eq14}\end{equation}

336: where $n\left(\xi,l\right)$ is the number of the unfolded eigenvalues

337: in the interval $\left[\xi-\frac{l}{2},\xi+\frac{l}{2}\right]$. The

338: number variance is expressed as follows

339:

340: \begin{equation}

341: \Sigma^{2}\left(l\right)=l-2\int_{0}^{l}\left(l-x\right)Y\left(x\right)dx,\label{eq15}\end{equation}

342: where $Y\left(x\right)$ for the GOE case is given by \cite{Mehta}

343:

344: \begin{equation}

345: Y\left(x\right)=s^{2}\left(x\right)+\frac{ds}{dx}\int_{x}^{\infty}s\left(x'\right)dx',\label{eq16}\end{equation}

346: and \begin{equation}

347: s\left(x\right)=\frac{sin\left(\pi x\right)}{\pi x}.\label{eq17}\end{equation}

348: Just as was stressed in \cite{Guhr1,Sharifi,Stockman} the overall

349: time of observation is crucial for explaining the empirical cross-correlation

350: coefficients. On one hand, the longer we observe the traffic the more

351: information about the correlations we obtain and less {}``noise''

352: we introduce. On the other hand, the correlations are not stationary,

353: i.e. they can change with time. To differentiate the {}``random''

354: contribution to empirical correlation coefficients from {}``genuine''

355: contribution, the eigenvalues statistics of $C$ is contrasted with

356: the eigenvalues statistics of a correlation matrix taken from the

357: so called {}``chiral'' Gaussian Orthogonal Ensemble \cite{Guhr1}.

358: Such an ensemble is one of the ensembles of RMT \cite{Stockman,Bouchaud},

359: briefly discussed in Appendix A. A \emph{random} cross-correlation

360: matrix, which is a matrix filled with uncorrelated Gaussian random

361: numbers, is supposed to represent transient uncorrelated in time network

362: activity, that is, a completely noisy environment. In case the cross-correlation

363: matrix $C$ obeys the same eigenstatistical properties as the RMT-matrix,

364: the network traffic is equilibrated and deemed universal in a sense

365: that every single connection interacts with the rest in a completely

366: chaotic manner. It also means a complete absence of congestions and

367: anomalies. Meantime, any stable in time deviations from the \emph{universal}

368: predictions of RMT signify system-specific, nonrandom properties of

369: the system, providing the clues about the nature of the underlying

370: interactions. That allows us to establish the profile of system-specific

371: correlations.

372:

373:

374: \section{data}

375:

376: In this paper, we study the averaged traffic count data collected

377: from all router-router and router-VLAN subnet connections of the University

378: of Louisville backbone routers system. The system consists of nine

379: interconnected multi-gigabit backbone routers, over $200$ Ethernet

380: segments and over $300$ VLAN subnets. We collected the traffic count

381: data for $3$ months, for the period from September \emph{$21$, $2006$}

382: to December \emph{$20$, $2006$} from $7$ routers, since two routers

383: are reserved for server farms. The overall data amounted to approximately

384: $18$ GB.

385:

386: The traffic count data is provided by Multi Router Traffic Grapher

387: (MRTG) tool that reads the SNMP traffic counters. MRTG log file never

388: grows in size due to the data consolidation algorithm: it contains

389: records of average incoming, outgoing, max and min transfer rate in

390: bytes per second with time intervals $300$ seconds, \emph{$30$}

391: minutes, $1$ day and $1$ month. We extracted $300$ seconds interval

392: data for seven days. Then, we separated the incoming and outgoing

393: traffic counts time series and considered them as independent. For

394: $352$ connections we formed $L=2015$ records of $N=704$ time series

395: with $300$ seconds interval.

396:

397: We pursued the changes in the traffic rate, thus, we excluded from

398: consideration the connections, where channel is open but the traffic

399: is not established or there is just constant rate and equal low amount

400: test traffic. Another reason for excluding the {}``empty'' traffic

401: time series is that they make the time series cross-correlation matrix

402: unnecessary sparse. The exclusion does not influence the analysis

403: and results. After the exclusions the number of the traffic time series

404: became $N=497$.

405:

406: To calculate the traffic rate change $G_{i}\left(t\right)$ we used

407: the logarithm of the ratio of two successive counts. As it is stated

408: earlier, $log$-transformation makes the ratio independent from the

409: traffic volume and allows capturing the subtle changes in the traffic

410: rate. We added 1 byte to all data points, to avoid manipulations with

411: $log\left(0\right)$, in cases where traffic count is equal to zero

412: bytes. This measure did not affect the changes in the traffic rate.

413:

414:

415: \section{eigenvalue distribution of cross-correlation matrix, comparison with

416: rmt}

417:

418: We constructed inter-VLAN traffic cross-correlation matrix $C$ with

419: number of time series $N=497$ and number of observations per series

420: $L=2015$, ($Q=4.0625$) so that, $\lambda_{+}=2.23843$ and $\lambda_{-}=0.253876$.

421: Our first goal is to compare the eigenvalue distribution $P\left(\lambda\right)$

422: of $C$ with $P_{rm}\left(\lambda\right)$ \cite{Laloux}. To compute

423: eigenvalues of $C$ we used standard \emph{MATLAB} function. The empirical

424: probability distribution $P\left(\lambda\right)$ is then given by

425: the corresponding histogram. We display the resulting distribution

426: $P\left(\lambda\right)$ in Figure 1 and compare it to the probability

427: distribution $P_{rm}\left(\lambda\right)$ taken from Eq. (\ref{eq6})

428: calculated for the same value of traffic time series parameters ($Q=4.0625$).

429: The solid curve demonstrates $P_{rm}\left(\lambda\right)$ of Eq.(\ref{eq6}).

430: The largest eigenvalue shown in inset has the value $\lambda_{497}=8.99$.

431: We zoom in the deviations from the RMT predictions on the inset to

432: Figure 1. %

433: \begin{figure}[H]

434: \begin{center}\includegraphics{EigenvaluePDF.eps}\end{center}

435:

436:

437: \caption{\label{1} Empirical probability distribution function $P\left(\lambda\right)$

438: for the inter-VLAN traffic cross-correlations matrix $C$ (histogram). }

439: \end{figure}

440: We note the presence of {}``bulk'' (RMT-like) eigenvalues which

441: fall within the bounds {[}$\lambda_{-},$$\lambda_{+}${]} for $P_{rm}\left(\lambda\right)$,

442: and presence of the eigenvalues which lie outside of the {}``bulk'',

443: representing deviations from the RMT predictions. In particular, largest

444: eigenvalue $\lambda_{497}=8.99$ for seven days period is approximately

445: four times larger than the RMT upper bound $\lambda_{+}$.

446:

447: The histogram for well-defined bulk agrees with $P_{rm}\left(\lambda\right)$

448: suggesting that the cross-correlations of matrix $C$ are mostly random.

449: We observe that inter-VLAN traffic time series interact mostly in

450: a random fashion.

451:

452: Nevertheless, the agreement of empirical probability distribution

453: $P\left(\lambda\right)$ of the bulk with $P_{rm}\left(\lambda\right)$

454: is not sufficient to claim that the bulk of eigenvalue spectrum is

455: random. Therefore, further RMT tests are needed \cite{Guhr1}.

456:

457: To do that, we obtained the unfolded eigenvalues $\xi_{i}$ by following

458: the phenomenological procedure referred to as Gaussian broadening

459: \cite{Bruus}, (see \cite{Bruus,Bruus2,Guhr1,Sharifi}). The empirical

460: cumulative distribution function of eigenvalues $F\left(\lambda\right)$

461: agrees well with the $F_{av}\left(\lambda\right)$ (see Figure 2),

462: where $\xi_{i}$ obtained with Gaussian broadening procedure with

463: the broadening parameter $a=8$.%

464: \begin{figure}[h]

465: \begin{center}\includegraphics{EmpiricalCDFvsTheoretical.eps}\end{center}

466:

467:

468:

469:

470: \caption{\label{2}The empirical cumulative distribution of $\lambda_{i}$

471: and unfolded eigenvalues $\xi_{i}\equiv F_{av}\left(\lambda\right)$. }

472: \end{figure}

473: The first independent RMT test is the comparison of the distribution

474: of the nearest-neighbor unfolded eigenvalue spacing $P_{nn}\left(s\right)$,

475: where $s\equiv\xi_{k+1}-\xi_{k}$ with $P_{GOE}\left(s\right)$ \cite{Mehta,Brody,Guhr3}.

476: The empirical probability distribution of nearest-neighbor unfolded

477: eigenvalues spacing $P_{nn}\left(s\right)$ and $P_{GOE}\left(s\right)$

478: are presented in Figure 3. The Gaussian decay of $P_{GOE}\left(s\right)$

479: for large $s$ suggests that $P_{GOE}\left(s\right)$ {}``probes''

480: scales only of the order of one eigenvalue spacing. The solid line

481: represents.%

482: \begin{figure}[h]

483: \begin{center}\includegraphics{EmpiricalNNSvsTheoretical.eps}\end{center}

484:

485:

486: \caption{\label{3} Nearest-neighbor spacing distribution $P_{nn}\left(s\right)$

487: of unfolded eigenvalues $\xi_{i}$ of cross-correlation matrix $C$. }

488: \end{figure}

489: The agreement between empirical probability distribution $P_{nn}\left(s\right)$

490: and the distribution of nearest-neighbor eigenvalues spacing of the

491: GOE matrices $P_{GOE}\left(s\right)$ testifies that the positions

492: of two adjacent empirical unfolded eigenvalues at the distance $s$

493: are correlated just as the eigenvalues of the GOE matrices.

494:

495: Next, we took on the distribution $P_{nnn}\left(s'\right)$ of next-nearest-neighbor

496: spacings $s'\equiv\xi_{k+2}-\xi_{k}$ between the unfolded eigenvalues.

497: According to \cite{Dyson2} this distribution should fit to the distribution

498: of nearest-neighbor spacing of the GSE. We demonstrate this correspondence

499: in Figure 4. The solid line shows $P_{GSE}\left(s\right)$.%

500: \begin{figure}[h]

501: \begin{center}\includegraphics{EmpiricalNNNSvsTheoretical.eps}\end{center}

502:

503:

504: \caption{\label{4} Next-nearest-neighbor eigenvalue spacing distribution

505: $P_{nnn}\left(s'\right).$ }

506: \end{figure}

507: Finally, the long-range two-point eigenvalue correlations were tested.

508: It is known \cite{Mehta,Brody,Guhr3}, that if eigenvalues are uncorrelated

509: we expect the number variance to scale with $l$, $\Sigma^{2}\sim l$.

510: Meanwhile, when the unfolded eigenvalues of $C$ are correlated, $\Sigma^{2}$

511: approaches constant value, revealing {}``spectral rigidity'' \cite{Mehta,Brody,Guhr3}.

512: In Figure 5, we contrasted Poissonian number variance with the one

513: we observed, and came to the conclusion that eigenvalues belonging

514: to the {}``bulk'' clearly exhibit universal RMT properties. The

515: broadening parameter $a=8$ was used in Gaussian broadening procedure

516: to unfold the eigenvalues $\lambda_{i}$ \cite{Bruus,Bruus2,Guhr1,Sharifi}.

517: The dashed line corresponds to the case of uncorrelated eigenvalues.%

518: \begin{figure}[h]

519: \begin{center}\includegraphics{NumberVariance.eps}\end{center}

520:

521:

522: \caption{\label{5} Number variance $\Sigma^{2}\left(l\right)$ calculated

523: from the unfolded eigenvalues $\xi_{i}$ of $C$. }

524: \end{figure}

525: These findings show that the system of inter-VLAN traffic has a \emph{universal}

526: part of eigenvalues spectral correlations, shared by broad class of

527: systems, including chaotic and disordered systems, nuclei, atoms and

528: molecules. Thus it can be concluded, that the bulk eigenvalue statistics

529: of the inter-VLAN traffic cross-correlation matrix $C$ are consistent

530: with those of real symmetric random matrix $R$, given by Eq. (\ref{eq5})

531: \cite{Sengupta}. Meantime, the deviations from the RMT contain the

532: information about the system-specific correlations. The next section

533: is entirely devoted to the analysis of the eigenvalues and eigenvectors

534: deviating from the RMT, which signifies the meaningful inter-VLAN

535: traffic interactions.

536:

537:

538: \section{inter-vlan traffic interactions analysis}

539:

540: We overview the points of interest in eigenvectors of inter-VLAN traffic

541: cross-correlation matrix $C$, which are determined according to $Cu^{k}=\lambda_{k}u^{k}$,

542: where $\lambda_{k}$ is $k$-th eigenvalue. Particularly important

543: characteristics of eigenvectors, proven to be useful in physics of

544: disordered conductors is the inverse participation ratio (IPR) (see,

545: for example, Ref. \cite{Guhr3}). In such systems, the IPR being a

546: function of an eigenstate (eigenvector) allows to judge and clarify

547: whether the corresponding eigenstate, and therefore electron is extended

548: or localized.

549:

550:

551: \subsection{Inverse participation ratio of eigenvectors components}

552:

553: For our purposes, it is sufficient to know that IPR quantifies the

554: reciprocal of the number of significant components of the eigenvector.

555: For the eigenvector $u^{k}$ it is defined as\begin{equation}

556: I^{k}\equiv\sum_{l=1}^{N}\left[u_{l}^{k}\right]^{4},\label{eq18}\end{equation}

557: where $u_{l}^{k}$, $l=1,\dots,497$ are components of the eigenvector

558: $u^{k}$. In particular, the vector with one significant component

559: has $I^{k}=1$, while vector with identical components $u_{l}^{k}=1/\sqrt{N}$

560: has $I^{k}=1/N$.%

561: \begin{figure}[h]

562: \begin{center}\includegraphics{IPR1.eps}\end{center}

563:

564:

565: \caption{\label{6} Inverse participation ratio as a function of eigenvalue

566: $\lambda$.}

567: \end{figure}

568: Consequently, the inverse of IPR gives us a number of significant

569: participants of the eigenvector. In Figure 6 we plot the IPR of cross-correlation

570: matrix $C$ as a function of eigenvalue $\lambda$. The control plot

571: is IPR of eigenvectors of random cross-correlation matrix $R$ of

572: Eq. \ref{eq5}. As we can see, eigenvectors corresponding to eigenvalues

573: from $0.25$ to $3.5$, what is within the RMT boundaries, have IPR

574: close to $0$. This means that almost all components of eigenvectors

575: in the bulk interact in a random fashion. The number of significant

576: components of eigenvectors deviating from the RMT is typically twenty

577: times smaller than the one of the eigenvectors within the RMT boundaries,

578: around twenty. For instance, IPR of eigenvector $u^{492}$, which

579: corresponds to the eigenvalue $5.9$ in Figure 6, is $0.05$, i.e.

580: twenty time series are significantly contribute to $u^{492}$. Another

581: observation which we derive from Figure 6 is that the number of eigenvectors

582: significant participants is considerably smaller at both edges of

583: the eigenvalue spectrum. These findings resemble the results of \cite{Guhr1},

584: where the eigenvectors with a few participating components were referred

585: to as \emph{localized} vectors. The theory of \emph{localization}

586: is explained in the context of random band matrices, where elements

587: independently drawn from different probability distributions \cite{Guhr1}.

588: These matrices despite their randomness, still contain probabilistic

589: information. The \emph{localization} in inter-VLAN traffic is explained

590: as follows. The separated broadcast domains, i.e. VLANs forward traffic

591: from one to another only through the router, reducing the routing

592: for broadcast containment. Although the optimal VLAN deployment is

593: to keep as much traffic as possible from traversing through the router,

594: the bottleneck at the large number of VLANs is unavoidable.

595:

596:

597: \subsection{Distribution of eigenvectors components}

598:

599: Another target of interest is the distribution of the components $\left\{ u_{l}^{k};\, l=1,\dots,N\right\} $

600: of eigenvector $u^{k}$ of the interactions matrix $C$. To calculate

601: vectors $u$ we used the \emph{MATLAB} routine again and obtained

602: components distribution $p\left(u\right)$ of the eigenvectors components.

603: Then, we contrasted it with the RMT predictions for the eigenvector

604: distribution $p_{rm}\left(u\right)$ of the random correlation matrix

605: $R$. According to \cite{Guhr3} $p_{rm}\left(u\right)$ has a Gaussian

606: distribution with mean zero and unit variance, i.e.\begin{equation}

607: p_{rm}\left(u\right)=\frac{1}{\sqrt{2\pi}}exp\left(\frac{-u^{2}}{2}\right).\label{eq19}\end{equation}

608: The weights of randomly interacting traffic counts time series, which

609: are represented by the eigenvectors components has to be distributed

610: normally. The results are presented in Figure 7. One can see (from

611: Figures 7a and 7b) that $p\left(u\right)$ for two $u^{k}$ taken

612: from the bulk is in accord with $p_{rm}\left(u\right)$. The distribution

613: $p\left(u\right)$ corresponding to the eigenvalue $\lambda_{i}$,

614: which exceeds the RMT upper bound ($\lambda_{i}>\lambda_{+}$), is

615: shown in Figure 7c. The solid line shows $p_{rm}\left(u\right)$ from

616: Eq. \ref{eq19}. (c) $p\left(u\right)$ for $u^{496}$, corresponding

617: to the eigenvalue outside of the RMT bulk. (d) $p\left(u\right)$

618: for $u^{497}$, corresponding to largest eigenvalue.%

619: \begin{figure}[H]

620: \begin{center}\includegraphics{EigenvectorsDistribution.eps}\end{center}

621:

622:

623: \caption{\label{7} Distribution of components $p\left(u\right)$ of eigenvectors

624: corresponding to eigenvalues (a) from the middle of the bulk, i.e.$\lambda_{-}<\lambda<\lambda_{+}$,

625: (b) from the bulk close to $\lambda_{+}$, (c) $\lambda_{496}$ (d)

626: $\lambda_{497}$.}

627: \end{figure}

628:

629:

630:

631: \subsection{Deviating eigenvalues and significant inter-VLAN traffic series contributing

632: to the deviating eigenvectors.}

633:

634: The distribution of $u^{497}$, the eigenvector corresponding to the

635: largest eigenvalue $\lambda_{497}$, deviates significantly from the

636: Gaussian (as follows from Figure 7d). While Gaussian kurtosis has

637: the value 3, the kurtosis of $p\left(u^{497}\right)$ comes out to

638: $23.22$. The smaller number of significant components of the eigenvector

639: also influences the difference between Gaussian distribution and empirical

640: distribution of eigenvector components. More than half of $u^{497}$components

641: have the same sign, thus slightly shifting the $p\left(u\right)$

642: to one side. This result suggests the existence of the common VLAN

643: traffic intended for inter-VLAN communication that affects all of

644: the significant participants of the eigenvector $u^{497}$with the

645: same bias. We know that the number of significant components of $u^{497}$

646: is twenty two, since IPR of $u^{497}$is $0.045$. Hence, the largest

647: eigenvector content reveals 22 traffic time series, which are affected

648: by the same event. We obtain the time series, which affects 22 traffic

649: time series by the following procedure. First of all, we calculate

650: projection $G^{497}\left(t\right)$ of the time series $G_{i}\left(t\right)$

651: on the eigenvector $u^{497}$,\begin{equation}

652: G^{497}\left(t\right)\equiv\sum_{i=1}^{497}u_{i}^{497}G_{i}\left(t\right)\label{eq20}\end{equation}

653: Next, we compare $G^{497}\left(t\right)$ with $G_{i}\left(t\right)$,

654: by finding the correlation coefficient $\left\langle \frac{G^{497}\left(t\right)}{\sigma^{497}}\frac{G_{i}\left(t\right)}{\sigma_{i}}\right\rangle $.

655: The Fiber Distributed Data Interface (FDDI)-VLAN internet switch at

656: one of the routers demonstrates the largest correlation coefficient

657: of $0.89$ (see Figure 8).%

658: \begin{figure}[h]

659: \begin{center}\includegraphics{Gprojection_1.eps}\end{center}

660:

661:

662: \caption{\label{8} (a) FDDI-VLAN internet switch time series regressed against

663: the projection $G^{497}\left(t\right)$ from Eq. \ref{eq20}. (b)

664: Time series defined by the eigenvector corresponding to eigenvalue

665: within RMT bounds shows no linear dependence on $G^{497}\left(t\right).$}

666: \end{figure}

667: The eigenvector $u^{497}$ has the following content: seven most significant

668: participants are seven FDDI-VLAN switches at the seven routers. The

669: presence of FDDI-VLAN switch provide us with information about VLAN

670: membership definition. FDDI is layer 2 protocol, which means that

671: at least one of two layer 2 membership is used, port group or/and

672: MAC address membership. The next group of significant participants

673: comprises of VLAN traffic intended for routing and already routed

674: traffic from different VLANs. The final group of significant participants

675: constitutes open switches, which pick up any {}``leaking'' traffic

676: on the router. Usually, the {}``leaking'' traffic is the network

677: management traffic, a very low level traffic which spikes when queried

678: by the management systems.

679:

680: If every deviating eigenvalue notifies a particular sub-model of non-random

681: interactions of the network, then every corresponding eigenvector

682: presents the number of significant dimensions of sub-model. Thus,

683: we can think of every deviating eigenvector as a representative network-wide

684: {}``snapshot'' of interactions within the certain dimensions.

685:

686: The analysis of the significant participants of the deviating eigenvectors

687: revealed three types of inter-VLAN traffic time series groupings.

688: One group contains time series, which are interlinked on the router.

689: We recognize them as, router1-VLAN\_1000 traffic, router1-firewall

690: traffic and VLAN\_1000-router1 traffic. The time series, which are

691: listed as router1-vlan\_2000, router2-VLAN\_2000, router3-VLAN\_2000,

692: etc., are reserved for the same service VLAN on every router and comprise

693: another group. The content of these groups suggests the VLANs implementation,

694: it is a mixture of infrastructural approach, where functional groups

695: (departments, schools, etc.) are considered, and service approach,

696: where VLAN provides a particular service (network management, firewall,

697: etc.).

698:

699:

700: \section{stability of inter-vlan traffic interactions in time}

701:

702: We expect to observe the stability of inter-VLAN traffic interactions

703: in the period of time used to compute traffic cross-correlation matrix

704: $C$. The eigenvalues distribution at different time periods provides

705: the information about the system stabilization, i.e. about the time

706: after which the fluctuations of eigenvalues are not significant. Time

707: periods of $1$ hour, $3$ hours and $6$ hours are not sufficient

708: to gain the knowledge about the system, which is demonstrated in Figure

709: 9a. In Figure 9b the system stabilizes after $1$ day period. To observe

710: the time stability of inter-VLAN meaningful interactions we computed

711: the {}``overlap matrix'' of the deviating eigenvectors for the time

712: period $t$ and deviating eigenvectors for the time period $t+\tau$,

713: where $t=60h,\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} $.%

714: \begin{figure}[h]

715: \begin{center}\includegraphics{EigenvaluesvsTime.eps}\end{center}

716:

717:

718: \caption{\label{9} (a) Eigenvalues distributions of traffic streams correlation

719: matrix $C$ for $1$ hour, $3$ hours and $6$ hours time intervals.

720: (b) Eigenvalues distributions for $24$ hours, $48$ hours and $72$

721: hours}

722: \end{figure}

723:

724:

725: First, we obtained matrix D from $p=57$ eigenvectors, which correspond

726: to $p$ eigenvalues outside of the RMT upper bound $\lambda_{+}$.

727: Then we computed the {}``overlap matrix'' $O\left(t,\tau\right)$

728: from $D_{A}D_{B}^{T}$, where $O_{ij}$ is a scalar product of the

729: eigenvector $u^{i}$ of period $A$ (starting at time $t=t$) with

730: $\textrm{u}^{j}$ of period $B$ at the time $t=t+\tau$,

731:

732: \begin{equation}

733: O_{ij}\left(t,\tau\right)\equiv\sum_{k=1}^{N}D_{ik}\left(t\right)D_{ik}\left(t+\tau\right)\label{eq21}\end{equation}

734: The values of $O_{ij}\left(t,\tau\right)$ elements at $i=j$, i.e.

735: of diagonal elements of matrix $O$ will be $1$, if the matrix $D\left(t+\tau\right)$

736: is identical to the matrix $D\left(t\right)$. Clearly, the diagonal

737: of the {}``overlap matrix'' $O$ can serve as an indicator of time

738: stability of $p$ eigenvectors outside of the RMT upper bound $\lambda_{+}$.

739: The gray scale colormap of the {}``overlap matrices'' $O\left(t=60h,\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} \right)$

740: is presented in Figure 10. Black color of grayscale represents $O_{ij}=1$,

741: white color represents $O_{ij}=0.$ The most stable eigenvalue is

742: $\lambda_{492}.$%

743: \begin{figure}[h]

744: \begin{center}\includegraphics{EigenvectorsStability.eps}\end{center}

745:

746:

747: \caption{\label{10} The grayscale of overlap matrix $O\left(t,\tau\right)$

748: at $t=60h$ and $\tau=\left\{ 0h,3h,12h,24h,36h,48h\right\} $. }

749: \end{figure}

750: At lag $\tau=3$ hours the inter-VLAN interactions show the highest

751: degree of stability. For further lags the overall stability decays.

752: As the analysis of deviating eigenvectors content showed, the highly

753: interacting traffic time series are time series of service based VLANs,

754: intended for routing. Particular network services are evoked at the

755: same time and active for the same period of time, which explains the

756: stability and consequent decay of deviating eigenvectors of traffic

757: interactions.

758:

759:

760: \section{detecting anomalies of traffic interactions}

761:

762: We assume that the health of inter-VLAN traffic is expressed by stability

763: of its interactions in time. Meanwhile, the temporal critical events

764: or anomalies will cause the temporal instabilities. The {}``deviating''

765: eigenvalues and eigenvectors provide us with stable in time snapshots

766: of interactions representative of the entire network. Therefore, these

767: eigenvectors judged on the basis of their IPR can serve as monitoring

768: parameters of the system stability.

769:

770: Among the essential anomalous events of VLAN infrastructure we can

771: list violations in VLAN membership assignment, in address resolution

772: protocol, in VLAN trunking protocol, router misconfiguration. The

773: violation of membership assignment and router misconfiguration will

774: cause the changes in the picture of random and non-random interactions

775: of inter-VLAN traffic. To shed more light on the possibilities of

776: anomaly detection we conducted the experiments to establish spatial-temporal

777: traces of instabilities caused by artificial and temporal increase

778: of the correlation in normal non-congested inter-VLAN traffic. We

779: explored the possibility to distinguish different types of increased

780: temporal correlations. Finally, we observed the consequences of breaking

781: the interactions between time series, by injecting traffic counts

782: obtained from sample of random distribution.

783:

784: \textbf{\emph{Experiment 1}}

785:

786: We selected the traffic counts time series representing the components

787: of the eigenvector which lies within the RMT bounds and temporarily

788: increased the correlation between these series for three hour period.

789: The proposed monitoring parameters show the dependence of system stability

790: on the number of temporarily correlated time series (see Figure 11).

791: Presented in Figure 11, left to right are (a) eigenvalue distribution

792: of interactions with two temporarily correlated time series, (b) IPR

793: of eigenvectors of interactions with two temporarily correlated time

794: series, (c) the overlap matrix of deviating eigenvectors with two

795: temporarily correlated time series. Top to bottom the layout shows

796: these monitoring parameters when correlation is temporarily increased

797: between 10 connections (d,e and f) and between 20 connections (g,h

798: and i). %

799: \begin{figure}[h]

800: \begin{center}\includegraphics{Experiment1.eps}\end{center}

801:

802:

803: \caption{\label{11} Eigenvalues distribution, IPR and overlap matrix of deviating

804: eigenvectors. }

805: \end{figure}

806: One can conclude that increased temporal correlation between two time

807: series and between ten time series does not affect system stability.

808: Meanwhile, when the number of temporarily correlated time series reaches

809: the number of significant participants of $u^{497},$ which is calculated

810: as inverse of $I^{497}$and is equal to twenty two, the system becomes

811: visibly unstable. The largest eigenvalue changes from $10$ in stable

812: condition to $12$, the tail of inverse participation ratio plot is

813: extended and the diagonal of {}``overlap matrix'' disappears at

814: twenty temporarily correlated time series.

815:

816: In Figures 12 (a, b, c and d ), the temporal correlation between ten

817: time series is traced with the matrix of sorted in decreasing order

818: of their components deviating from RMT eigenvectors.%

819: \begin{figure}[h]

820: \begin{center}\includegraphics{Figure12.eps}\end{center}

821:

822:

823: \caption{\label{12} Sorted deviating eigenvectors with injected correlation

824: among ten traffic time series.}

825: \end{figure}

826: The sorted in decreasing order deviating eigenvectors of $60h$ of

827: uninterrupted traffic are presented in Figure 12a. Then, after three

828: hours of uninterrupted traffic the weights of eigenvectors components,

829: which had zero value start changing, This is captured in Figure 12b.

830: Same process for traffic with induced three hours correlation is captured

831: in Figure 12c. The difference between results in Figures 12b and 12c

832: is presented in Figure 12d. The procedure used to visualize this produces

833: the high rate of false positive alarms.

834:

835: In addition, we visualize in Figure 13 the system instability during

836: temporal increase of correlation between twenty time series with spatial-temporal

837: representation of eigenvector $u^{497}$. %

838: \begin{figure}[h]

839: \begin{center}\includegraphics{Figure5_1.eps}\end{center}

840:

841:

842: \caption{\label{13} (a) The weights of components of $u^{497}$ plotted for

843: time period from $36$ to $84$ hours of uninterrupted traffic with

844: 6 hours interval. (b) The weights of components of $u^{497}$ plotted

845: with respect to the same time period, with induced three hours correlation.

846: (c) The weights of components of $u^{496}$ plotted with respect to

847: the same time period, with induced three hours correlation.}

848: \end{figure}

849: We used the weights of components of eigenvector $u^{497}$, defined

850: for IPR computation and plotted them with respect to time $t+\tau$,

851: where $t=36$ hours and $\tau=6n,$ where $n\in\left\{ 0,1,\dots,7\right\} $.

852: In Figure 13a the spatial-temporal pattern of $u^{497}$ captures

853: precise locations of system-specific interactions of uninterrupted

854: traffic for $84$ hours of observation. The abrupt change of this

855: pattern in Figure 13b indicates the starting point of induced correlation

856: between twenty traffic time series usually interacting in a random

857: fashion. It turns out, that the {}``normal'' stable pattern of eigenvector

858: $u^{497}$ moves to eigenvector $u^{496}$, when the interruption

859: ends. Thus, we are able to observe the end point of the induced correlations

860: in Figure 13c, which represents weights of components of eigenvector

861: $u^{496}$ plotted with respect to the same time intervals. With this

862: setup we are able to locate the anomaly in time and space. Translated

863: to network topological representation, the behavior of eigenvectors

864: $u^{497}$and $u^{496}$ during our manipulations with inter-VLAN

865: traffic may be monitored with the following graphs (see Figure 14).%

866: \begin{figure}[h]

867: \begin{center}\includegraphics{Figure14.eps}\end{center}

868:

869:

870: \caption{\label{14} Left column - behavior of $u^{497}$ during time period

871: from $48$h to $60$h with $6$h time window, induced correlation

872: starts at $54$h and lasts for $3$h. Right column - behavior of $u^{496}$

873: in same conditions.}

874: \end{figure}

875:

876:

877: \textbf{\emph{Experiment 2}}

878:

879: In the previous experiment we injected just one type of increased

880: correlation among time series. Now we make two and three different

881: types of induced correlations produce different spatial-temporal patterns

882: on eigenvector $u^{497}$ components (see Figure 15). Time series

883: for temporal increase of correlation are obtained in the same way

884: as in Experiment 1. We temporarily increased the correlation between

885: series by inducing elements from distributions of sine function and

886: quadratic function, respectively for three hours.%

887: \begin{figure}[h]

888: \begin{center}\includegraphics{Figure6_1.eps}\end{center}

889:

890:

891: \caption{\label{15} (a) The weights of components of $u^{497}$ plotted for

892: time period from $36$ to $84$ hours with $6$ hours interval, two

893: different types of induced correlations. (b) The weights of components

894: of $u^{497}$ plotted with respect to the same time period, three

895: different types of induced correlations. }

896: \end{figure}

897: In Figure 15a, one type of three hours correlation is induced among

898: ten traffic time series and another type of correlation among other

899: ten time series. Three different types of three hours correlations

900: are induced among twenty traffic time series in Figure 15b. The sorted

901: in decreasing order content of significant components shows that time

902: series tend to group according to the type of correlation they are

903: involved in.

904:

905: \textbf{\emph{Experiment 3}}

906:

907: Next we turn our attention to disruption of normal picture of inter-VLAN

908: traffic interactions. This can be done by injecting the traffic from

909: random distribution to non-randomly interacting time series for three

910: hours. We demonstrate it by examining the eigenvalue distribution,

911: the IPR and the deviating eigenvectors overlap matrix plotted in Figure

912: 16.%

913: \begin{figure}[h]

914: \begin{center}\includegraphics{Experiment3.eps}\end{center}

915:

916:

917: \caption{\label{16} Eigenvalues distribution, IPR and overlap matrix of deviating

918: eigenvectors of inter-VLAN traffic cross-correlation matrix $C$. }

919: \end{figure}

920: After $60$ hours of uninterrupted traffic, we injected elements from

921: random distribution to significant participants of $u^{497}$ for

922: three hours. The largest eigenvalue increases, from $10$ to $12.$

923: Extended IPR tail shows the larger number of \emph{localized} eigenvectors

924: and we observe the dramatic break in deviating eigenvectors stability.

925:

926:

927: \section{conclusion and future work}

928:

929: The RMT methodology we used in this paper enables us to analyze the

930: complex system behavior without the consideration of system constraints,

931: type and structure. Our goal was to investigate the characteristics

932: of day-to-day temporal dynamics of the system of interconnected routers

933: with VLAN subnets of the University of Louisville. The type and structure

934: of the system at hand suggests the natural interpretation of the RMT-like

935: behavior and the RMT deviating results. The time stable random interactions

936: signify the healthy, and free of congestion traffic. The time stable

937: non-random interactions provide us with information about large-scale

938: network-wide traffic interactions. The changes in the stable picture

939: of random and non-random interactions signify the temporal traffic

940: anomalies.

941:

942: In general, the fact of sharing the universal properties of the bulk

943: of eigenvalues spectrum of inter-VLAN traffic interactions with random

944: matrices opens a new venue in network-wide traffic modeling. As stated

945: in \cite{Guhr1}, in physical systems it is common to start with the

946: model of dynamics of the system. This way, one would model the traffic

947: time series interactions with the family of stochastic differential

948: equations \cite{Farmer,Cont}, which describe the {}``instantaneous''

949: traffic counts \begin{equation}

950: g_{i}\left(t\right)=\left(d/dt\right)lnT_{i}\left(t\right),\label{eq22}\end{equation}

951: as a random walk with couplings. Then one would relate the revealed

952: interactions to the correlated {}``modes'' of the system.

953:

954: Additional question that RMT findings raise in network-wide traffic

955: analysis is whether the found eigenvalues spectrum correlations and

956: \emph{localized} eigenvectors outside of RMT bulk can add to the explanation

957: of the fundamental property of the network traffic, such as self-similarity

958: \cite{Leland}.

959:

960: To summarize, we have tested the eigenvalues statistics of inter-VLAN

961: traffic cross-correlation matrix $C$ against the null hypothesis

962: of random correlation matrix. By separating the eigenvalues spectrum

963: correlations of random matrices that are present in this system, the

964: uncongested state of the network traffic is verified. We analyzed

965: the stable in time system-specific correlations. The analyzed eigenvalues

966: and eigenvectors deviating from the RMT showed the principal groups

967: of VLAN-router switches, groups of traffic time series interlinked

968: through the firewalls and groups of same service VLANs at every router.

969: With straightforward experiments on the traffic time series, we demonstrated

970: that eigenvalue distribution, IPR of eigenvectors, overlap matrix

971: and spatial-temporal patterns of deviating eigenvectors can monitor

972: the stability of inter-VLAN traffic interactions, detect and spot

973: in time and space of any network-wide changes in normal traffic time

974: series interactions.

975:

976: As the reservation for the future work, we would like to investigate

977: the behavior of delayed traffic time series cross-correlation matrix

978: $C_{d}$ in the RMT terms. The importance of delay in measurement-based

979: analysis of Internet is emphasized in \cite{Zhang}. To understand

980: and quantify the effect of one time series on another at a later time,

981: one can calculate the delay correlation matrix, where the entries

982: are cross-correlation of one time series and another at a time delay

983: $\tau$ \cite{Mayya}. In addition, we are interested in testing the

984: fruitfulness of the RMT approach on the larger system of inter-domain

985: interactions, for instance, on 5-minute averaged traffic count time

986: series of underlying backbone circuits of Abilene backbone network.

987:

988:

989: \section*{acknowledgment}

990:

991: This research was partially supported by a grant from the US Department

992: of Treasury through a subcontract from the University of Kentucky.

993: The authors thank Igor Rozhkov for consulting on the RMT methodology.

994: We thank Hans Fiedler, University of Louisville network manager, for

995: MRTG data of UofL routers system used in this study and helpful suggestions

996: in network interpretations of our results. We are grateful to Nathan

997: Johnson, University of Louisville super computing administrator, for

998: providing the computing environment and space.

999:

1000: \begin{thebibliography}{10}

1001: \bibitem{Fukuda}K. Fukuda, PhD Thesis: A study on phase transition phenomena in internet

1002: traffic, Keio University, 1999.

1003: \bibitem{Ohira}T. Ohira, R. Sawatari, Phase transition in a computer network traffic

1004: model, Phys. Rev. E \textbf{58}, July 1998, 193-195.

1005: \bibitem{Barthelemy}M. Barthelemy, B. Gondran and E. Guichard, Large scale cross-correlations

1006: in internet traffic, arXiv:cond0mat/0206185 vol \textbf{2} 3 Dec 2002.

1007: \bibitem{LCD}A. Lakhina, M. Crovella, and C. Diot, Detecting distributed attacks

1008: using network-wide flow traffic, Proceedings of FloCon 2005 Analysis

1009: Workshop, 2005.

1010: \bibitem{Crovella}A. Lakhina, M. Crovella, and C. Diot. Mining Anomalies Using Traffic

1011: Feature Distributions. Technical Report BUCS-TR-2005-002, Boston University,

1012: 2005.

1013: \bibitem{Wigner1}E.P. Wigner, On a class of analytic functions from the quantum theory

1014: of collisions, Ann. Math. \textbf{\noun{53}}, 36 (1951), Proc. Cambridge

1015: Philos. Soc. \textbf{47}, 790 (1951).

1016: \bibitem{Dyson1}F. Dyson, Statistical theory of the energy levels of complex systems,

1017: J. Math. Phys. \textbf{3}, 140 (1962).

1018: \bibitem{Dyson2}F. Dyson and M.L. Mehta, Statistical theory of the energy levels of

1019: complex systems, J. Math. Phys. \textbf{4}, 701, 713 (1963).

1020: \bibitem{Mehta}M.L Mehta, Random matrices (Academic Press, Boston, 1991).

1021: \bibitem{Brody}T.A. Brody, J.Flores, J.B. French, P.A. Mello, A. Pandey, and S.S.M.

1022: Wong, Random-matrix physics: spectrum and strength fluctuations, Rev.

1023: Mod. Phys. \textbf{53}, 385 - 479, issue \textbf{3}, July 1981.

1024: \bibitem{Guhr3}T. Guhr, A. Muller-Groeling, and H.A. Weidenmuller, Random matrix

1025: theories in quantum physics: common concepts, Phys. Rep. \textbf{299},

1026: 190 (1998).

1027: \bibitem{Seba}M. Krbalek and P.Seba, Statistical properties of the city transport

1028: in Cuernavaca (Mexico) and random matrix theory. J. Phys. \textbf{214}

1029: (2000), 1, 91-100.

1030: \bibitem{McNutt}J. McNutt and M. De Shon, Correlation between quiescent ports in network

1031: flows, CERT network situational awareness group report, Carnegie Mellon

1032: University, September 2005.

1033: \bibitem{Crovella2}A. Lakhina, M. Crovella, and C. Diot, Characterization of network-wide

1034: anomalies in traffic flows, Proceedings of the ACM/SIGCOMM Internet

1035: Measurement conference, 2004, 201-206.

1036: \bibitem{Min}L. Min, Y. Shun-Zheng, A network-wide traffic anomaly detection method

1037: based on HSMM, Int. conf. on communications, circuits and system proceedings,

1038: vol \textbf{6}, June 2006, 1636 - 1640.

1039: \bibitem{Roughan}M. Roughan, T. Griffin, M. Mao, A. Greenberg, and B. Freeman, Combining

1040: routing and traffic data for detection of IP forwarding anomalies,

1041: Proceedings of the joint int. conf. on Measurement and modeling of

1042: computer systems, 2004, 416 - 417.

1043: \bibitem{Huang}L. Huang, X. Nguyen, M. Garofalakis, M. Jordan, A. Joseph and N. Taft,

1044: Distributed PCA and network anomaly detection, Technical report No.

1045: UCB/EECS-2006-99.

1046: \bibitem{Sharifi}S. Sharifi, M. Crane, A. Shamaie and H. Ruskin, Random matrix portfolio

1047: optimization: a stability approach, Physica A \textbf{335} (2004)

1048: 629-643.

1049: \bibitem{Guhr1}V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, T. Guhr,

1050: and H.E. Stanley, Random matrix theory approach to cross correlations

1051: in financial data, Phys. Rev. E, vol \textbf{65}, 066126, 27 June

1052: 2002.

1053: \bibitem{Tulino}A. Tulino and S. Verdu, Random matrix theory and wireless communications,

1054: Communications and Information theory, vol \textbf{1}, issue \textbf{1},

1055: June 2004, 1 - 182.

1056: \bibitem{Tse}D. Tse, Multiuser receivers, random matrices and free probability,

1057: Proceedings of 37th Ann. Allerton Conf., Monticello, IL, September

1058: 1999.

1059: \bibitem{Zee}A. Zee, Random matrix theory and RNA folding, Acta Physica Polonica

1060: B, vol \textbf{36}, No \textbf{9}, June 2005.

1061: \bibitem{Laloux}L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Noise dressing

1062: of financial correlation matrices, Phys. Rev. Lett. \textbf{83}, August

1063: 1999, 1467-1470.

1064: \bibitem{Sengupta}A.M. Sengupta and P.P. Mitra, Distributions of singular values for

1065: some random matrices, arXiv:cond-mat/9709283 vol \textbf{1} 25 September

1066: 1997.

1067: \bibitem{Stockman}H.-J. Stockman, Quantum Chaos: an introduction, 1999.

1068: \bibitem{Bouchaud}J.-P. Bouchaud, Theory of financial risk and derivative pricing: from

1069: statistical physics to risk management, 1962.

1070: \bibitem{Bruus}H. Bruus and J.-C. Angles d'Auriac, Energy level statistics of two-dimensional

1071: Hubbard model at low filling, arXiv:cond-mat/9610142 vol \textbf{1}

1072: 18 October 1996.

1073: \bibitem{Farmer}J.D. Farmer, Market Force, ecology and evolution, e-print adap-org/9812005,

1074: Int. J. Theo. Appl. fin. \textbf{3}, 425, 2000.

1075: \bibitem{Cont}J.-P. Bouchaud, R. Cont, A Langevin approach to stock market fluctuations

1076: and crashes, European Journal of Physics, B \textbf{6}, 543, 1998.

1077: \bibitem{Leland}W.E. Leland, M.S. Taqq, W. Willinger, and D.V. Willson, On the self-similar

1078: nature of Ethernet traffic, ACM SIGCOMM, 1993, 183 - 193.

1079: \bibitem{Zhang}B. Zhang, T.S. Eugene Ng, and A. Nandi, Measurement-based analysis,

1080: modeling, and synthesis of the Internet delay space, Proceedings of

1081: the 6-th ACM SIGCOMM on Internet Measurement, 2006, 85-98.

1082: \bibitem{Mayya}K.B.K. Mayya and R.E. Amritkar, Analysis of delay correlation matrices,

1083: oai:arXiv.org:cond-mat/0601279 (2006-12-20).

1084: \bibitem{Lau}W.-C. Lau, S.-Q. Li, Traffic analysis in large-scale high-speed integrated

1085: networks:validation of nodal decomposition approach, INFOCOM, 1993,

1086: Proceedings of twelfth annual joint conference of the IEEE Computer

1087: and Communications Societies, vol \textbf{3}, 1320-1329.

1088: \bibitem{Allen}W.H. Allen, G.A. Marin, L.A. Rivera, Automated detection of malicious

1089: reconnaissance to enhance network security, SoutheastCon, 2005, Proceedings

1090: of IEEE, issue 8-10, April 2005, 450-454.

1091: \bibitem{Bruus2}H. Bruus and J.-C. Angles d'Auriac, The spectrum of two-dimensional

1092: Hubbard model at low filling, Europhysics letters, \textbf{35} (5),

1093: 321-326, 1999.

1094: \end{thebibliography}

1095: \appendix

1096:

1097: \section{RMT}

1098:

1099: In this Appendix, we provide a short (and non-rigorous) explanation

1100: of main concepts and glossary of terms used in the RMT studies. The

1101: RMT approaches, which originated in nuclear and condensed matter physics

1102: and later became common in many branches of mathematical physics \cite{Stockman},

1103: have recently penetrated into econophysics, finance \cite{Bouchaud}

1104: and network traffic analysis \cite{Barthelemy}.

1105:

1106: For the statistical description of complex physical systems, such

1107: as, for example, atomic nucleus or acoustical reverberant structure,

1108: the RMT serves as guiding light when one is interested in the degree

1109: of mutual interaction of the constituents. As it turns out, the uncorrelated

1110: energy levels or acoustic eigenfrequencies would produce qualitatively

1111: different result from those obeying RMT-like correlations \cite{Stockman}.

1112: Therefore, real (experimentally measured) spectra can help to decide

1113: on the nature of interactions in the underlying system. To be specific,

1114: ideally, symmetric system is expected to exhibit spectral properties

1115: drastically different from the properties of generic one, and if the

1116: spectral properties are those of RMT systems, other ideas of RMT can

1117: be brought to the researcher aid.

1118:

1119: To describe {}``awareness'' of the structural constituents about

1120: each other, scientists in different fields use similar constructs.

1121: Physicists use Hamiltonian matrix, engineers stiffness matrix, finance

1122: and network analysts the equal-time cross-correlation matrix. Although

1123: the physical meaning of mentioned operators can be different, the

1124: eigenvalues/eigenvectors analysis seems to be a universally accepted

1125: tool. The eigenvalues have direct connection to spectrum of physical

1126: systems, while eigenvectors can be used for the description of excitation/signal/information

1127: propagation inside the system. In physics, the RMT approaches come

1128: about whenever the system of interest demonstrates certain qualitative

1129: features in their spectral behavior. For example, if one looks at

1130: nearest neighbor spacing distribution of eigenvalues and instead of

1131: Poisson law\[

1132: P\left(s\right)=\exp\left(-s\right),\]

1133:  discovers {}``Wigner surmise''\[

1134: P\left(s\right)=\frac{\pi}{2}s\exp\left(-\frac{\pi}{2}s^{2}\right),\]

1135: one concludes (upon running several additional statistical tests)

1136: that apparatus of RMT can be used for the system at hand, and system

1137: matrix can be replaced by a matrix with random entries. For mathematical

1138: convenience, these entries are given Gaussian weight. The only other

1139: ingredient of this rather succinct phenomenological model is recognizing

1140: the physical situation. For example, systems with and without magnetic

1141: field and/or central symmetry are described by different matrix ensembles

1142: (that is the set of matrices) with elements distributed within distribution

1143: corresponding to the same $\beta$\[

1144: P^{\left(\beta\right)}\left(H\right)\propto\textrm{exp}\left(-\frac{\beta}{4v^{2}}trH^{2}\right),\]

1145: where the constant $v$ sets the length of the resulting eigenvalues

1146: spectrum.

1147:

1148: The very fact that RMT can be helpful in statistical description of

1149: the broad range of systems suggests that these systems are analyzed

1150: in a certain special \emph{universal} regime, in which physical or

1151: other laws are undermined by equilibrated and ergodic evolution. In

1152: most physical applications, a Hamiltonian matrix is rather sparse,

1153: indicating lack of interaction between different subparts of the corresponding

1154: object. However, if the universal regime is inferred from the above

1155: mentioned statistical tests, it is very beneficial to replace this

1156: single matrix with the ensemble of random matrices. Then, one can

1157: proceed with statistical analysis using matrix ensemble for calculation

1158: of statistical averages more relevant for the physical problem at

1159: hand than the statistics of eigenvalues. The latter can be mean or

1160: variance of the response to external or internal excitation.

1161: \end{document}

1162: