1: \documentclass[12pt]{article}
2: \usepackage{times}
3: \usepackage{a4}
4: \usepackage{latexsym}
5: \usepackage{amsmath}
6: \usepackage{amsfonts}
7: \usepackage{graphicx}
8: \usepackage{fancyhdr,lastpage}
9: \newcommand{\rank}{\mathrm{r}}
10: \newcommand{\sgn}{\mathrm{sgn}}
11: \newcommand{\tr}{\mathrm{tr}}
12: \newcommand{\VaR}{\mathrm{VaR}}
13: \newcommand{\R}{\mathbb{R}}
14: \newcommand{\C}{\mathbb{C}}
15: \newcommand{\N}{\mathbb{N}}
16: \newtheorem{theorem}{Theorem}
17: \newtheorem{corollary}[theorem]{Corollary}
18: \newtheorem{definition}{Definition}
19: \newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}
20: \setlength{\parskip}{1ex plus0.5ex minus0.2ex} \setlength{\parindent}{0mm} \setlength{\jot}{1ex}
21:
22: \begin{document}
23:
24: \title{Random Matrix Theory and \\
25: Robust Covariance Matrix Estimation \\
26: for Financial Data}
27: \author{Gabriel Frahm\thanks{Email: {\tt frahm@ccrl-nece.de}.}\hspace{.2cm} \& Uwe Jaekel\thanks{Email: {\tt jaekel@ccrl-nece.de}.}\\[.5cm]
28: C\&C Research Laboratories, NEC Europe Ltd.\\
29: Rathausallee 10, 53757 Sankt Augustin, Germany}
30: \maketitle
31:
32: \vspace{-1cm}
33: \begin{abstract}
34: The traditional class of elliptical distributions is extended to allow for asymmetries. A completely robust dispersion matrix estimator (the `spectral estimator') for the new class of `generalized elliptical distributions' is presented. It is shown that the spectral estimator corresponds to an M-estimator proposed by Tyler (1983) in the context of elliptical distributions. Both the generalization of elliptical distributions and the development of a robust dispersion matrix estimator are motivated by the stylized facts of empirical finance. Random matrix theory is used for analyzing the linear dependence structure of high-dimensional data. It is shown that the Mar\v{c}enko-Pastur law fails if the sample covariance matrix is considered as a random matrix in the context of elliptically distributed and heavy tailed data. But substituting the sample covariance matrix by the spectral estimator resolves the problem and the Mar\v{c}enko-Pastur law remains valid.
35: \end{abstract}
36:
37: \section{Motivation}
38:
39: Short-term financial data usually exhibit similar properties called `stylized facts' like, e.g., leptokurtosis,
40: dependence of simultaneous extremes, radial asymmetry, vola\-tility clustering, etc., especially if the
41: log-price changes (called the `log-returns') of stocks, stock indices, and foreign exchange rates are
42: considered. Particularly, high-frequency data usually are non-stationary, have jumps, and are strongly
43: dependent. Cf., e.g., Bouchaud, Cont, and Potters, 1998, Breymann, Dias, and Embrechts, 2003, Eberlein and
44: Keller, 1995, Embrechts, Frey, and McNeil, 2004 (Section 4.1.1), Engle, 1982, Fama, 1965, Junker and May, 2002, Mandelbrot, 1963, and Mikosch, 2003 (Chapter 1).
45:
46: Figure 1 contains QQ-plots of $\text{GARCH}(1,1)$ residuals of daily log-returns of the NASDAQ and the
47: S\&P 500 indices from 1993-01-01 to 2000-06-30. It is clearly indicated that the normal distribution hypothesis is not appropriate for the loss parts of the distributions whereas the Gaussian law seems to be acceptable for the profit parts. Hence the probability of extreme losses is higher than suggested by the normal distribution assumption.
48:
49: \begin{center}
50: %\includegraphics[scale=.35]{NASDAQ_QQ-Plot.eps}\quad
51: %\includegraphics[scale=.35]{SP500_QQ-Plot.eps}\\[.25cm]
52: %% \includegraphics[scale=.34]{NASDAQ_QQ-Plot.png}
53: %% \includegraphics[scale=.34]{SP500_QQ-Plot.png}\\[.25cm]
54: \includegraphics[scale=.34]{NASDAQ_QQ-Plot}
55: \includegraphics[scale=.34]{SP500_QQ-Plot}\\[.25cm]
56: \end{center}
57: {\bf Fig. 1:} QQ-plots of NASDAQ (left hand) and S\&P 500 (right hand) $\text{GARCH}(1,1)$ residuals from
58: 1993-01-01 to 2000-06-30 ($n=1892$).\\[.25cm]
59:
60: The next picture shows the joint distribution of the GARCH residuals considered above.
61:
62: \begin{center}
63: %\includegraphics[scale=.35]{NASDAQ_vs._SP500_-_emp_ohne_Konturen.eps}\\[.25cm]
64: %% \includegraphics[scale=.35]{NASDAQ_vs_SP500_-_emp_ohne_Konturen.png}\\[.25cm]
65: \includegraphics[scale=.35]{NASDAQ_vs_SP500_-_emp_ohne_Konturen}\\[.25cm]
66: \end{center}
67: {\bf Fig. 2:} NASDAQ vs. S\&P 500 $\text{GARCH}(1,1)$ residuals from 1993-01-01 to 2000-06-30 ($n=1892$).\\[.25cm]
68:
69: Except for one element all extremes occur simultaneously. The effect of simultaneous extremes can be observed
70: more precisely in the following picture. It shows the total numbers of S\&P 500 stocks whose absolute values of
71: daily log-returns exceeded $10\%$ for each trading day during 1980-01-02 to 2003-11-26. On the 19th October
72: 1987 (i.e. the `Black Monday') there occurred 239 extremes. This is suppressed for the sake of transparency.
73:
74: \begin{center}
75: %\includegraphics[scale=.35]{outlier_profile.eps}\\[.25cm]
76: %% \includegraphics[scale=.35]{outlier_profile.png}\\[.25cm]
77: \includegraphics[scale=.35]{outlier_profile}\\[.25cm]
78: {\bf Fig. 3:} Number of extremes in the S\&P 500 during 1980-01-02 to 2003-11-26.\\[.25cm]
79: \end{center}
80:
81: The latter figure shows the concomitance of extremes. If extremes would occur independently then the number of
82: extremal events (no matter if losses or profits) should be small and all but constant over time. Obviously,
83: this is not the case. In contrast one can see the October Crash of 1987 and several extremes which occur
84: permanently since the beginning of the bear market in 2000. Hence there is an increasing tendency of
85: simultaneous losses which is probably due to globalization effects and relaxed market regulation. The phenomenon of simultaneous extremes is often denoted by `asymptotic dependence' or `tail dependence'.
86:
87: The traditional class of elliptically symmetric distributions (Cambanis, Huang, and Simons, 1981, Fang, Kotz,
88: and Ng, 1990, and Kelker, 1970) is often proposed for the modeling of financial data (cf., e.g., Bingham and
89: Kiesel, 2002). But elliptical distributions suffer from the pro\-perty of radial symmetry. The pictures above
90: show that financial data are not always symmetrically distributed. For this reason the authors will bear on the
91: assumption of gene\-ralized elliptically distributed (Frahm, 2004) log-returns. This allows for the modeling of tail dependence and radial asymmetry.
92:
93: The quintessence of modern portfolio theory is that the portfolio diversification effect depends essentially on
94: the covariances. But the parameters for portfolio optimization, i.e. the mean vector and the covariance matrix,
95: have to be estimated. Especially for portfolio risk minimization a reliable estimate of the covariance matrix
96: is necessary (Chopra and Ziemba, 1993). For covariance matrix estimation generally one should use as much
97: available data as possible. But since daily log-returns and all the more high-frequency data are not normally
98: distributed, standard estimators like the sample covariance matrix may be highly inefficient leading to
99: erroneous implications (see, e.g., Oja, 2003 and Visuri, 2001). This is because the sample covariance matrix is
100: very sensitive to outliers. The smaller the distribution's tail index (Hult and Lindskog, 2002), i.e. the
101: heavier the tails of the log-return distributions the higher the estimator's variance. So the quality of the
102: parameter estimates depends essentially on the true multivariate distribution of log-returns.
103:
104: In the following it is shown how the linear dependence structure of generalized elliptical random vectors can be
105: estimated robustly. More precisely, it is shown that Tyler's (1987) robust M-estimator for the dispersion matrix
106: $\Sigma$ of elliptically distributed random vectors remains completely robust for generalized elliptically
107: distributed random vectors. This estimator is not disturbed neither by asymmetries nor by outliers and all the
108: available data points can be used for estimation purposes. Further, the impact of high-dimensional (financial)
109: data on statistical inference will be discussed. This is done by referring to a branch of statistical physics
110: called `Random Matrix Theory' (Hiai and Petz, 2000 and Mehta, 1990). Random matrix theory (RMT) is concerned
111: with the distribution of eigenvalues of high-dimensional randomly generated matrices. If each component of a sample is independent and identically distributed then the distribution of the eigenvalues of the sample covariance matrix converges to a specified law which does not depend on the specific distribution of the sample components. The circumstances under which this result of RMT can be properly adopted to generalized elliptically distributed data will be examined.
112:
113: \section{Generalized Elliptical Distributions}
114:
115: It is well known that an elliptically distributed random vector $X$ can be represented stochastically by
116: $X\! =_{\mathrm{d}}\! \mu +\mathcal{R}\Lambda U^{\left( k\right)}$, where $\mu\in\R^{d}$, $\Lambda\in\R^{d\times k}$
117: with $\rank(\Lambda)=k$, $U^{\left( k\right) }$ is a $k$-dimensional random vector uniformly distributed on the
118: unit hypersphere $\mathcal{S}^{k-1}$, and $\mathcal{R}$ is a nonnegative random variable stochastically
119: independent of $U^{\left( k\right) }$. The positive semi-definite matrix $\Sigma := \Lambda\Lambda^{\mathrm{T}}$ characterizes the linear dependence structure of $X$ and is referred to as the `dispersion matrix'.
120:
121: \begin{definition}[Generalized elliptical distribution]
122: The $d$-dimensional random vector $X$ is said to be `generalized elliptically distributed' if and only if
123:
124: \begin{equation*}
125: X\overset{\mathrm{d}}{=}\mu +\mathcal{R}\Lambda U^{\left( k\right) }.
126: \end{equation*}
127:
128: where $U^{\left( k\right) }$ is a $k$-dimensional random vector uniformly distributed on $\mathcal{S}^{k-1}$,
129: $\mathcal{R}$ is a random variable, $\mu \in \R^{d}$, and $\Lambda \in \R^{d\times k}$.
130:
131: \end{definition}
132:
133: Note that the definition of generalized elliptical distributions preserves all the ordinary components of
134: elliptically symmetric distributions (i.e. $\mu$, $\Sigma$, and $\mathcal{R}$). But in contrast the generating
135: variate $\mathcal{R}$ may be negative and even more it may depend on $U^{\left( k\right) }$. It is worth to point out that the class of generalized elliptical distributions contains the class of skew-elliptical distributions (Branco and Dey, 2001, and Frahm, 2004, Section 3.2).
136:
137: The next figure shows once again the joint distribution of the GARCH residuals of the NASDAQ and
138: S\&P 500 log-returns from 1993-01-01 to 2000-06-30 from Figure 2. The right hand of Figure 4 contains
139: simulated GARCH residuals on the basis of a generalized $t$-distribution. More precisely, the generating variate
140: $\mathcal{R}$ corres\-ponds to $\sqrt{\nu \cdot \chi _{2}^{2}/\chi _{\nu }^{2}}\,$ but the number of degrees of
141: freedom $\nu$ depends on $U^{(2)}$, i.e. $\nu = 4 + 996\cdot\left(\delta(\Lambda u / \|\Lambda
142: u\|_{2},v\right))^{3}$ $(\|u\|_{2}=1)$. Here $\delta$ is a function that measures the distance between $\Lambda
143: u / \|\Lambda u\|_{2}$ and the reference vector $v=\left(-\cos \left( \pi /4\right) ,-\sin \left( \pi /4\right)
144: \right)$, $\delta(u,v) := \angle(u,v)/\pi = \arccos(u^{\mathrm{T}} v)/\pi$. Hence, random vectors which are close to the
145: reference vector (i.e. close to the `perfect loss scenario') are supposed to be $t$-distributed with $\nu=4$
146: degrees of freedom whereas random vectors which are opposite are assumed to be nearly Gaussian ($\nu=1000$)
147: distributed. This is consistent with the phenomenon observed in Figure 1. The pseudo-correlation coefficient is
148: set to $0.78$.
149:
150: \begin{center}
151: %\includegraphics[scale=.35]{emp.eps}\qquad
152: %\includegraphics[scale=.35]{sim.eps}\\[.25cm]
153: %% \includegraphics[scale=.34]{emp.png}
154: %% \includegraphics[scale=.34]{sim.png}\\[.25cm]
155: \includegraphics[scale=.34]{emp}
156: \includegraphics[scale=.34]{sim}\\[.25cm]
157: \end{center}
158: {\bf Fig. 4:} Observed $\text{GARCH}(1,1)$ residuals of NASDAQ and S\&P 500 (left hand) and simulated
159: generalized $t$-distributed random noise ($n=1892$) (right hand).\\[.25cm]
160:
161: \section{Robust Covariance Matrix Estimation}
162:
163: It is well-known that the sample covariance matrix corresponds both to the moment estimator and to the
164: ML-estimator for the dispersion matrix $\Sigma$ of normally distributed data. But given any other elliptical
165: distribution family the dispersion matrix usually does not correspond to the covariance matrix. Generally,
166: robust covariance matrix estimation means to estimate the dispersion matrix, that is the covariance matrix up
167: to a scaling constant. There are many applications like, e.g., principal components analysis, canonical
168: correlation analysis, linear discriminant ana\-lysis, and multivariate regression where only the dispersion
169: matrix is demanded (Oja, 2003). Particularly, by Tobin's two-fund separation theorem (Tobin, 1958) the
170: optimal portfolio of risky assets does not depend on the scale of the covariance matrix. Thus in the following
171: we will loosely speak of `covariance matrix estimation' rather than of estimating the dispersion matrix for the
172: sake of simplicity.
173:
174: As mentioned before the true linear dependence structure of elliptically distributed data can not be estimated efficiently by the sample covariance matrix, generally. Especially, if the data stem from a regularly varying random
175: vector the smaller the tail index, i.e. the heavier the tails the larger the estimator's variance. But in the following it is shown that there exists a completely robust alternative to the sample covariance matrix.
176:
177: Let $X$ be a $d$-dimensional generalized elliptically distributed random vector where $\mu$ is supposed to be
178: known, $\Lambda \in \R^{d\times k}$ with $\rank(\Lambda)=d$, and $P(\mathcal{R}=0)=0$. Further, let the unit
179: random vector generated by $\Lambda$ be defined as
180:
181: \begin{equation*}
182: S := \frac{\Lambda U^{\left( k\right) }}{ {\big |\!|}\Lambda U^{\left( k\right) }{\big |\!|}_{2}}.
183: \end{equation*}
184:
185: Due to the stochastic representation of $X$ the following relations hold,
186:
187: \begin{equation*}
188: \frac{X-\mu}{{\big |\!|}X-\mu{\big |\!|}_{2}}\overset{\mathrm{d}}{=}%
189: \frac{\mathcal{R}\Lambda U^{\left( k\right) }}{
190: {\big |\!|}\mathcal{R}\Lambda U^{\left( k\right) }{\big |\!|}_{2}}\overset{\mathrm{a.s.}}{=}%
191: \pm\frac{\Lambda U^{\left( k\right) }}{ {\big |\!|}\Lambda U^{\left( k\right) }{\big |\!|}_{2}}=\pm S,
192: \end{equation*}
193:
194: where $\pm :=\sgn(\mathcal{R})$. The random vector $\pm S$ does not depend on the absolute value of
195: $\mathcal{R}$. So it is completely robust against extreme outcomes of the generating variate. But the sign of
196: $\mathcal{R}$ still remains and this may depend on $U^{\left( k\right) }$, anymore. Suppose for the moment that
197: $\pm$ is known for each realization of $\mathcal{R}$. Then the dispersion matrix of $X$ can be estimated
198: robustly via maximum-likelihood estimation using the density function of $S$ which is only a function of
199: $\Lambda$. This is given by the next theorem.
200:
201: \begin{theorem}
202: The spectral density function of the unit random vector generated by $\Lambda \in \R^{d\times k}$ corresponds
203: to
204:
205: \begin{equation*}\label{spectral_density}
206: s\longmapsto \psi \left( s\right) =\frac{\Gamma \left( \frac{d}{2}\right) }{2\pi ^{d/2}}\cdot \sqrt{\det
207: (\Sigma ^{-1})}\cdot \sqrt{s^{\mathrm{T}}\Sigma ^{-1}s}^{\,-d},\qquad \forall \ s\in \mathcal{S}^{d-1},
208: \end{equation*}
209:
210: where $\Sigma :=\Lambda \Lambda ^{\mathrm{T}}$.
211: \end{theorem}
212:
213: \begin{proof}
214: See, e.g., Frahm, 2004, pp. 59-60.\hfill \medskip
215: \end{proof}
216:
217: Since $\psi$ is a symmetric density function the sign of $\mathcal{R}$ does not matter at all. Hence the
218: ML-estimation approach works even if the data are skew-elliptically distributed, for instance.
219:
220: The desired `spectral estimator' is given by the fixed-point equation (Frahm, 2004, Section 4.2.2)
221:
222: \begin{equation*}
223: \widehat{\Sigma}_{\mathrm{S}}=\frac{d}{n}\cdot \sum_{j=1}^{n}\frac{s_{j}s_{j}^{\mathrm{T}}}{s_{j}^{\mathrm{T}}\widehat{\Sigma}_{\mathrm{S}}^{-1}s_{j}},
224: \end{equation*}
225:
226: where $s_{j}:=\left(x_{j}-\mu\right)/\left({\big |\!|}x_{j}-\mu {\big |\!|}_{2}\right)$
227: for $j=1,...,n$. Since the solution of the fixed-point equation is only unique up to a scaling constant in the
228: following it is implicitly required that the upper left element of $\widehat{\Sigma}_{\mathrm{S}}$ corresponds
229: to $1$.
230:
231: The spectral estimator $\widehat{\Sigma}_{\mathrm{S}}$ cor\-responds to Tyler's robust M-estimator (Tyler,
232: 1983 and Tyler, 1987) for elliptical distributions, i.e.
233:
234: \begin{equation*}
235: \widehat{\Sigma}_{\mathrm{S}}=\frac{d}{n}\cdot \sum_{j=1}^{n}\frac{\left( x_{j}-\mu \right) \left(
236: x_{j}-\mu \right) ^{\mathrm{T}}}{\left( x_{j}-\mu \right) ^{\mathrm{T}}\widehat{\Sigma}_{\mathrm{S}}^{-1}\left( x_{j}-\mu \right) }.
237: \end{equation*}
238:
239: Hence Tyler's M-estimator remains completely robust within the class of generalized elliptical distributions.
240:
241: The following figure shows the sample covariance matrix (left hand) of a sample with $n=1000$ observations and
242: $d=500$ dimensions drawn from a multivariate $t$-distribution with $\nu=4$ degrees of freedom. Note
243: that the tail index of the multivariate $t$-distribution corresponds to $\nu$. Each cell of the plots represents
244: a matrix element where the blue colored cells symbolize small numbers and the red colored cells indicate large
245: numbers. The true dispersion matrix is given in the middle whereas the spectral estimate is given by the right
246: hand.
247:
248: \begin{center}
249: \includegraphics[height=4.5cm,width=4.5cm]{nu4momest.eps}\quad
250: \includegraphics[height=4.5cm,width=4.5cm]{true.eps}\quad
251: \includegraphics[height=4.5cm,width=4.5cm]{nu4specest.eps}\\[.25cm]
252: %% \includegraphics[height=4.7cm,width=5.3cm]{nu4momest.png}\hskip-2em
253: %% \includegraphics[height=4.7cm,width=5.3cm]{true.png}\hskip-2em
254: %% \includegraphics[height=4.7cm,width=5.3cm]{nu4specest.png}\\[.25cm]
255: %\includegraphics[height=4.7cm,width=5.3cm]{nu4momest}\hskip-2em
256: %\includegraphics[height=4.7cm,width=5.3cm]{true}\hskip-2em
257: %\includegraphics[height=4.7cm,width=5.3cm]{nu4specest}\\[.25cm]
258: \end{center}
259: {\bf Fig. 5:} Sample covariance matrix (left hand), true covariance matrix (middle), and spectral estimate
260: (right hand) of multivariate $t$-distributed realizations ($n=1000,\,d=500,\,\nu=4$).\\[.25cm]
261:
262: \section{Random Matrix Theory}\label{RMT}
263:
264: RMT is concerned with the distribution of the eigenvalues of high-dimensional randomly gene\-rated matrices. A
265: random matrix is simply a matrix of random variables. We will consider only symmetric random matrices. Thus the
266: corresponding eigenvalues are always real. The empirical distribution function of eigenvalues is defined as
267: follows.
268:
269: \begin{definition}[Empirical distribution function of eigenvalues]
270: Let $\widehat{\Sigma}$ be a $d\times d$ symmetric random matrix with eigenvalues
271: $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\ldots ,\widehat{\lambda}_{d}\,$. Then the function
272: \begin{equation*}
273: \lambda \longmapsto \widehat{W}_{d}\left( \lambda \right) :=\frac{1}{d}\cdot
274: \sum_{i=1}^{d}1\!\!1_{\widehat{\lambda}_{i}\leq \,\lambda }
275: \end{equation*}
276: is called the `empirical distribution function of the eigenvalues' of $\,\widehat{\Sigma}$.
277: \end{definition}
278:
279: Note that each eigenvalue of a random matrix in fact is random but per se not a random variable since there is
280: no single-valued mapping $\widehat{\Sigma}\mapsto\widehat{\lambda}_{i}$ $\left( i\in \left\{ 1,\ldots ,d\right\}
281: \right)$ but rather $\widehat{\Sigma}\mapsto\lambda (\widehat{\Sigma})$ where $\lambda (\widehat{\Sigma})$
282: denotes the set of all eigenvalues of $\widehat{\Sigma}$. This can be simply fixed by assuming that the
283: eigenvalues $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\ldots ,\widehat{\lambda}_{d}$ are sorted either in an
284: increasing or decreasing order.
285:
286: \begin{theorem}[Mar\v{c}enko and Pastur, 1967]\label{MP_law}
287: Let $U_{1}^{\left( d\right) },U_{2}^{\left( d\right) },\ldots ,U_{n}^{\left( d\right) }$ $\left( n=1,2,\ldots
288: \right)$ be sequences of independent random vectors uniformly distributed on the unit hypersphere
289: $\mathcal{S}^{d-1}$ and consider the random matrix
290: \begin{equation*}
291: \widehat{\Sigma}_{\mathrm{MP}}:=\frac{d}{n}\cdot\sum_{j=1}^{n}U_{j}^{\left( d\right) }U_{j}^{\left( d\right)
292: \mathrm{T}},
293: \end{equation*}%
294: where its empirical distribution function of the eigenvalues is denoted by $%
295: \widehat{W}_{d}\,$. Suppose that $n\rightarrow \infty $,$\ d\rightarrow \infty $, $n/d\rightarrow q<\infty $.
296: Then
297: \begin{equation*}
298: \widehat{W}_{d}\overset{\mathrm{p}}{\longrightarrow }F_{\mathrm{MP}}\left(\cdot\,;q\right),
299: \end{equation*}
300: at all points where $F_{\mathrm{MP}}$ is continuous. More precisely, $\lambda \mapsto F_{\mathrm{MP}}\left(
301: \lambda \,;q\right) =F_{\mathrm{MP}}^{\mathrm{Dir}}\left( \lambda \,;q\right)
302: +F_{\mathrm{MP}}^{\mathrm{Leb}}\left( \lambda \,;q\right) $ where the Dirac part is given by
303: \begin{equation*}
304: \lambda \longmapsto F_{\mathrm{MP}}^{\mathrm{Dir}}\left( \lambda \,;q\right) =\left\{
305: \begin{array}{lll}
306: 1-q, & & \lambda \geq 0,\,0\leq q<1, \\
307: \rule{0cm}{0.5cm}0, & & \text{else},%
308: \end{array}%
309: \right.
310: \end{equation*}%
311: and the Lebesgue part $\lambda \mapsto F_{\mathrm{MP}}^{\mathrm{Leb}}\left(
312: \lambda \,;q\right) =\int_{-\infty }^{\lambda }f_{\mathrm{MP}}^{\mathrm{Leb}%
313: }\left( x\,;q\right) dx$ is determined by the density function%
314: \begin{equation*}
315: \lambda \longmapsto f_{\mathrm{MP}}^{\mathrm{Leb}}\left( \lambda \,;q\right) =\left\{
316: \begin{array}{lll}
317: \frac{q}{2\pi}\cdot \frac{\sqrt{\left( \lambda _{\max }-\lambda \right) \left( \lambda -\lambda _{\min }\right)
318: }}{\lambda }, & & \lambda _{\min }< \lambda < \lambda _{\max }, \\
319: \rule{0cm}{0.5cm}0, & & \text{else},%
320: \end{array}%
321: \right.
322: \end{equation*}%
323: where%
324: \begin{equation*}
325: \lambda _{\min ,\max }:=\left( 1\pm \frac{1}{\sqrt{q}}\right) ^{2}.
326: \end{equation*}
327: \end{theorem}
328:
329: \begin{proof}
330: Mar\v{c}enko and Pastur, 1967.\hfill \medskip
331: \end{proof}
332:
333: In the following $\widehat{\Sigma}_{\mathrm{MP}}$ will be called `Mar\v{c}enko-Pastur operator'. The next
334: corollary states that the Mar\v{c}enko-Pastur law $F_{\mathrm{MP}}$ holds not only for the empirical
335: distribution function of eigenvalues of the Mar\v{c}enko-Pastur operator but also for that obtained by the
336: sample covariance matrix if the data are standard normally distributed and independent.
337:
338: \begin{corollary}
339: Let $X,X_{1},X_{2},\ldots ,X_{n}$ $\left( n=1,2,\ldots \right)$ be sequences of independent and standard normally
340: distributed random vectors with uncorrelated components. Then the empirical distribution function of the eigenvalues of
341: \begin{equation*}
342: \frac{1}{n}\cdot\sum_{j=1}^{n}X_{j}X_{j}^{\mathrm{T}}
343: \end{equation*}
344: converges in probability to the Mar\v{c}enko-Pastur law stated in Theorem \ref{MP_law}.
345: \end{corollary}
346:
347: \begin{proof}
348: Due to the strong law of large numbers $\chi _{d}^{2}/d\overset{\mathrm{a.s.}}{\rightarrow }1$
349: $(d\rightarrow\infty)$ and thus
350: \begin{equation*}
351: \widehat{\Sigma}_{\mathrm{MP}} \sim \frac{d}{n}\cdot \sum_{j=1}^{n}\frac{\chi _{d,j}^{2}}{d}\cdot U_{j}^{\left(
352: d\right) }U_{j}^{\left( d\right) \mathrm{T}} \overset{\mathrm{d}}{=}
353: \frac{1}{n}\cdot\sum_{j=1}^{n}X_{j}X_{j}^{\mathrm{T}}.
354: \end{equation*}
355: \rule{.5cm}{0cm}\hfill\medskip
356: \end{proof}
357:
358: Moreover, the Mar\v{c}enko-Pastur law holds even if $X$ is an arbitrary random vector with standardized i.i.d.
359: components provided the second moment is finite (Yin, 1986). More precisely, consider the random vector $X$ with
360: $E(X)=\mu$ and $Var(X)=\sigma^2 I_{d}$ where the components of $X$ are supposed to be stochastically
361: independent. Then the Mar\v{c}enko-Pastur law can be applied on the empirical distribution function of the
362: eigenvalues of
363: \begin{equation*}
364: \frac{1}{n}\cdot\sum_{j=1}^{n}\left(\frac{X_{j}-\widehat{\mu}}{\widehat{\sigma}}\right)
365: \left(\frac{X_{j}-\widehat{\mu}}{\widehat{\sigma}}\right)^{\mathrm{T}}= \widehat{\Sigma}/\widehat{\sigma}^2,
366: \end{equation*}
367: where $\widehat{\Sigma}$ denotes the sample covariance matrix and
368: \begin{equation*}
369: \widehat{\sigma}^2:=\frac{\tr(\widehat{\Sigma})}{d}=\frac{1}{d}\cdot
370: \sum_{i=1}^{d}\widehat{\lambda}_{i}=:\overline{\lambda}.
371: \end{equation*}
372:
373: Hence, the Mar\v{c}enko-Pastur law can be applied virtually ever on the empirical distribution function of
374: $\widehat{\lambda}_{1}/\overline{\lambda},...,\widehat{\lambda}_{d}/\overline{\lambda}$ where the estimated
375: eigenvalues are given by the sample covariance matrix provided the sample elements, i.e. the realized random
376: vectors consist of stochastically independent components. But within the class of elliptical distributions this
377: holds only for uncorrelated normally distributed data. Hence linear independence and stochastical independence
378: are not equivalent for genera\-lized elliptically distributed data. This is because even if there is no linear dependence between the components of an elliptically distributed random vector another sort of nonlinear dependence caused by the gene\-rating variate $\mathcal{R}$ remains, generally.
379:
380: For instance, consider the unit random vector
381: $U^{(2)}=(U_{1},U_{2})$. Then
382: \begin{equation*}
383: U_{2}\overset{\mathrm{a.s.}}{=}\pm \sqrt{1-U_{1}^{2}},
384: \end{equation*}%
385: i.e. $U_{2}$ depends strongly on $U_{1}$ though indeed the elements of $U^{(2)}$ are uncorrelated.
386:
387: Tail dependent random variables cannot be stochastically independent. Especially, if the random components of an elliptically distributed random vector are heavy tailed, i.e. if the generating variate is regularly varying then they possess the property of tail dependence (Schmidt, 2002). In that case the eigenspectrum generated by the sample covariance matrix may lead to erroneous implications.
388:
389: For instance, consider a sample (with sample size $n=1000$) of $500$-dimensional random vectors where each vector element is standardized $t$-distributed with $\nu=5$ degrees of freedom and stochastically independent of each other. Here the eigenspectrum obtained by the sample covariance matrix indeed is consistent with the Mar\v{c}enko-Pastur law (upper left part of Figure 6). But if the data stem from a multivariate $t$-distribution possessing the same parameters and each vector component is uncorrelated then the eigenspectrum obtained by the sample covariance matrix does not correspond to the Mar\v{c}enko-Pastur law (upper right part of Figure 6). Actually, there are $24$ eigenva\-lues exceeding the Mar\v{c}enko-Pastur upper bound $\lambda _{\max}=(1+1/\sqrt{2}\,)^{2}=2.91$ and the largest eigenvalue corresponds to $10.33$. But fortunately
390: the eigenspectra obtained by the spectral estimator are consistent with the Mar\v{c}enko-Pastur law as
391: indicated by the lower part of Figure 6.
392:
393: \begin{center}
394: %\includegraphics[scale=.35]{MP1mom.eps}\quad
395: %\includegraphics[scale=.35]{MP2mom.eps}\\[.25cm]
396: %\includegraphics[scale=.35]{MP1spec.eps}\quad
397: %\includegraphics[scale=.35]{MP2spec.eps}\\[.25cm]
398: %% \includegraphics[scale=.34]{MP1mom.png}
399: %% \includegraphics[scale=.34]{MP2mom.png}\\[.25cm]
400: %% \includegraphics[scale=.34]{MP1spec.png}
401: %% \includegraphics[scale=.34]{MP2spec.png}\\[.25cm]
402: \includegraphics[scale=.34]{MP1mom}
403: \includegraphics[scale=.34]{MP2mom}\\[.25cm]
404: \includegraphics[scale=.34]{MP1spec}
405: \includegraphics[scale=.34]{MP2spec}\\[.25cm]
406: \end{center}
407: {\bf Fig. 6:} Eigenspectra of univariate (left part) and multivariate (right part) uncorrelated $t$-distributed
408: data ($n=1000,\,d=500,\,\nu=5$) obtained by the sample covariance matrix (upper part) and by the spectral
409: estimator (lower part).\\[.25cm]
410:
411: Tyler (1987) shows that the spectral estimator converges strongly to the true dispersion matrix $\Sigma $. That means %
412: \begin{equation*}
413: \frac{s_{j}s_{j}^{\mathrm{T}}}{s_{j}^{\mathrm{T}}%
414: \widehat{\Sigma }^{-1}s_{j}}\longrightarrow \frac{%
415: s_{j}s_{j}^{\mathrm{T}}}{s_{j}^{\mathrm{T}}\Sigma ^{-1}s_{j}},\qquad n\longrightarrow \infty ,\ d\text{ const.,}
416: \end{equation*}%
417: for $j=1,2,\ldots$ and $P$-almost all realizations. Consequently, if $\Sigma =I_{d}$ (up to a scaling constant) then%
418: \begin{equation*}
419: \frac{s_{j}s_{j}^{\mathrm{T}}}{s_{j}^{\mathrm{T}}%
420: \widehat{\Sigma }^{-1}s_{j}}\longrightarrow s_{j}s_{j}^{\mathrm{T}} \equiv u_{j}^{\left(d\right)}u_{j}^{\left(d\right)\mathrm{T}},
421: \end{equation*}%
422: as $n\rightarrow\infty$ and $d$ constant. Hence the spectral estimator and the Mar\v{c}enko-Pastur operator are
423: asymptotically equivalent provided $\Sigma =\sigma^{2}I_{d}$. The authors believe that the strong
424: convergence holds even for $n\rightarrow \infty $, $d\rightarrow \infty $, $n/d\rightarrow q>1$ for $P$-almost
425: all realizations where the spectral estimate exists. The proof of this conjecture is due to a forthcoming work.
426: Note that for $q\leq 1$ the spectral estimate does not exist at all. Further, Tyler (1987) shows that the
427: spectral estimate exists (a.s.) if $n>d\left(d-1\right)$, i.e. $q>d-1$. Indeed, this is a sufficient condition
428: for the existency of the spectral estimator. But in practice the spectral estimator seems to exist in most cases
429: when $n$ is already slightly larger than $d$.
430:
431: We conclude that testing high-dimensional data for the null hypothesis $\Sigma =\sigma^{2}I_{d}$ by means of the sample covariance matrix may lead to wrong conclusions provided the data are generalized elliptically distributed. In contrast, the spectral estimator seems to be a robust alternative for applying the results of RMT in the context of generalized elliptical distributions.
432:
433:
434: \section{Financial Applications}
435:
436: \subsection{Portfolio Risk Minimization}
437:
438: In this section it is supposed that $n/d\rightarrow \infty$, i.e. from the viewpoint of RMT we
439: study low-dimensional problems. Let $R=(R_{1},R_{2},...,R_{d})$ be an elliptically distributed random vector of
440: short-term (e.g. daily) log-returns. If the fourth order cross moments of the log-returns are
441: finite then the elements of the sample covariance matrix are multivariate normally distributed, asymptotically.
442: The asymptotic covariance of each element is given by (see, e.g., Praag and Wesselman, 1989)
443: \begin{equation*}
444: \mathrm{ACov}\left(\hat{\sigma}_{ij},\hat{\sigma}_{kl}\right) =\left( 1+\kappa \right) \cdot \left( \sigma
445: _{ik}\sigma _{jl}+\sigma _{il}\sigma _{jk}\right) +\kappa\cdot\sigma _{ij}\sigma _{kl},
446: \end{equation*}
447: where $\Sigma=[\sigma_{ij}]$ denotes the true covariance matrix of $R$ and
448: \begin{equation*}
449: \kappa :=\frac{1}{3}\cdot \frac{E\left( R_{i}^{4}\right) }{E^{2}\!\left( R_{i}^{2}\right) }-1
450: \end{equation*}
451: is called the `kurtosis parameter'. Note that the kurtosis parameter does not depend on $i\in\{1,...,d\}$. It
452: is well-known that in the case of normality $\kappa =0$. A distribution with positive (or even infinite)
453: $\kappa $ is called `leptokurtic'. Particularly, regularly varying distributions are leptokurtic.
454:
455: It is well-known that the portfolio which minimizes the portfolio return variance (the so called `global minimum
456: variance portfolio') is given by the vector of portfolio weights
457: \begin{equation*}\label{GMVP}
458: w := \frac{\Sigma ^{-1}\text{$\underline{1}$}}{\text{$\underline{1}$}^{\mathrm{T}}\Sigma
459: ^{-1}\text{$\underline{1}$}}.
460: \end{equation*}
461:
462: Now, suppose for the sake of simplicity that $R$ is spherically distributed, i.e. that $\mu = 0$ and $\Sigma$ is proportional to the identity matrix. Since the weights of the global minimum variance portfolio do not depend on the scale of $\Sigma$ we may assume $\Sigma = I_{d}$ w.l.o.g. Then the asymptotic covariances of the sample covariance matrix elements are simply given by
463: \begin{equation*}
464: \mathrm{ACov}\left(\hat{\sigma}_{ij},\hat{\sigma}_{kl}\right) =\left\{
465: \begin{array}{rcl}
466: 2+3\kappa , & & i=j=k=l, \\ \rule{0cm}{0.5cm}\kappa , & & i=j,\, k=l,\, i\neq k, \\ \rule{0cm}{0.5cm}1+\kappa
467: , & & i=k,\, j=l,\, i\neq j, \\ \rule{0in}{0.5cm}0, & & \text{else}.
468: \end{array}\right.
469: \end{equation*}
470:
471: For instance suppose that the random vector $R$ is multivariate $t$-distributed with $\nu>4$ de\-grees of
472: freedom. Then the kurtosis parameter corresponds to $\kappa =2/(\nu -4)$ (see, e.g., Frahm, 2004, p. 91).
473: Hence, the smaller $\nu$ the larger the asymptotic variances and covariances and these quantities tend to
474: infinity for $\nu \searrow 4$. Further, if $\nu\leq 4$ the sample covariance matrix even is no longer
475: multivariate nor\-mally distributed, asymptotically.
476:
477: In contrast, the asymptotic covariance of each element of the spectral estimator (Frahm, 2004, p.
478: 76) is given by
479: \begin{equation*}
480: \mathrm{ACov}\left(\hat{\sigma}_{\mathrm{S},ij},\hat{\sigma}_{\mathrm{S},kl}\right) =\left\{
481: \begin{array}{rcl}
482: 4\cdot\frac{d+2}{d} , & & i=j=k=l, \\ \rule{0cm}{0.5cm}2\cdot\frac{d+2}{d} , & & i=j,\, k=l,\, i\neq k, \\
483: \rule{0cm}{0.5cm}\frac{d+2}{d} , & & i=k,\, j=l,\, i\neq j, \\ \rule{0in}{0.5cm}0, & & \text{else}.
484: \end{array}\right.
485: \end{equation*}
486:
487: Note that the same holds even if $R$ is not $t$-distributed but only generalized elliptically distributed since
488: $\widehat{\Sigma}_{\mathrm{S}}$ does not depend on the generating variate of $R$. Particularly, the spectral
489: estimator is not disturbed by the tail index of $R$.
490:
491: Now one may ask when the sample covariance matrix is dominated (in a component-wise manner) by the spectral
492: estimator provided the data are multivariate $t$-distributed. Regarding the main diagonal entries of the
493: covariance matrix estimate this is given by
494: \begin{equation*}
495: 4\cdot \frac{d+2}{d}<2\cdot \frac{\nu -1}{\nu -4},
496: \end{equation*}
497: i.e. if $\nu <4 + 3d/(d+4)$ the variance of the spectral estimator's main diagonal elements is smaller than
498: the variance of the corresponding main diagonal elements of the sample covariance matrix, asymptotically. Concerning its off diagonal entries we obtain
499: \begin{equation*}
500: \frac{d+2}{d}<\frac{\nu -2}{\nu -4},
501: \end{equation*}
502: i.e. $\nu < 4+d$. It is worth to note that several empirical studies indicate that the tail indices of daily log-returns generally lie between $4$ and $7$ (see, e.g., Embrechts, Frey, and McNeil, 2004, p. 81 and Junker and May, 2002).
503:
504: In the following the daily log-returns from 1980-01-02 to 2003-10-06 of 285 S\&P 500 stocks are analyzed for studying the robustness of the spectral estimator vs. the sample covariance matrix. The considered stocks belong to the `survivors' of the S\&P 500 composite at the last quarter of 2003. The sample size corresponds to $n=6000$. The total sample period is partitioned into $10$ sub-periods each containing $600$ daily log-returns. Further, each sub-period is divided into `even' and `odd' days, i.e. there is a sub-sample containing the 1st, 3rd, \ldots, 599th log-returns and another sub-sample with the 2nd, 4th, \ldots, 600th log-returns. Hence each sub-sample contains $300$ daily log-returns of $285$ stocks. Both the sample covariance matrix and the spectral estimator are used for estimating the relative eigenspectrum of the true covariance matrix, i.e. $\lambda_{1}/\sum_{i=1}^{d}\lambda_{i},\ldots ,\lambda_{d}/\sum_{i=1}^{d}\lambda_{i}$ for each even and odd sub-sample, separately. If the covariance matrix estimator is robust against outliers then the estimated eigenspectra of each sub-sample should be similar since even if the true eigenspectrum changes dynamically over time this must affect both the even and the odd days, equally. The eigenspectrum obtained in the even sub-sample can be compared with the eigenspectrum given by the odd sub-sample simply by the differences of the ordered (relative) eigenvalues.
505:
506: \begin{center}
507: %% \includegraphics[scale=.34]{even_oddMnew.png}
508: %% \includegraphics[scale=.34]{even_oddSnew.png}\\[.25cm]
509: %\includegraphics[scale=.28]{even_odd_fullcolor.png}\\[.25cm]
510: \includegraphics[scale=.21]{even_oddMnew}%\hspace{-.5cm}
511: \includegraphics[scale=.21]{even_oddSnew}\\[.25cm]
512: \end{center}
513: {\bf Fig. 7:} Eigenvalue differences for each ordered eigenvalue given by the sample covariance matrix (left hand) and by the spectral estimate (right hand).\\[.25cm]
514:
515: On the left hand of Figure 7 we see the eigenvalue differences for each $10$ sub-periods caused by the sample covariance matrix. Similarly, the right hand of Figure 7 shows the eigenvalue differences given by the spectral estimate. Figure 7 indicates that the spectral estimator leads to more robust estimates of the eigenspectra of financial data. But note that - concerning the overall eigenspectrum - the sample covariance matrix performs well up to the 4th sub-period. This is the period which contains the famous October Crash of $1987$. In contrast, the spectral estimator is not affected by extreme values.
516:
517: \begin{center}
518: %% \includegraphics[scale=.34]{5even_oddM.png}
519: %% \includegraphics[scale=.34]{5even_oddS.png}\\[.25cm]
520: \includegraphics[scale=.21]{5even_oddM}
521: \includegraphics[scale=.21]{5even_oddS}\\[.25cm]
522: \end{center}
523: {\bf Fig. 8:} Eigenvalue differences for the largest $5$ eigenvalues given by the sample covariance matrix (left hand) and by the spectral estimate (right hand).\\[.25cm]
524:
525: Figure 8 focuses on the differences of the $5$ largest eigenvalues. It shows that the sample covariance matrix particularly fails for estimating the largest eigenvalue. Once again this phenomenon is caused by the Black Monday which belongs to the even sub-sample of the 4th sub-period. Note that the largest eigenvalue of the even sub-sample exceeds the largest eigenvalue of the odd sub-sample by almost $12$ percentage points. We conclude that although the sample covariance matrix works quite good for the most time it is not appropriate for measuring the linear dependence structure of financial data. This is due to a few but extreme fluctuations on financial markets.
526:
527:
528: \subsection{Principal Components Analysis}
529:
530: Now, consider a $d$-dimensional vector $R=(R_{1},...,R_{d})$ of long-term (e.g. yearly) i.i.d. log-returns. Due to the central limit theorem each vector component of $R$ is approximately normal distributed provided the covariance matrix of the short-term (e.g. daily) log-returns exists and is finite. Since the sum of i.i.d. elliptical random vectors is always elliptically distributed, too (see, e.g., Hult and Lindskog, 2002) one may take for granted that the vector components of $R$ are jointly normally distributed, approximately. But this is not true if the number of dimensions $d$ is large relative to the sample size $n$.
531:
532: For instance, consider a $d$-dimensional random vector $X$ which is multivariate $t$-distributed with $\nu>2$ degrees of freedom, location vector $\mu = 0$, and dispersion matrix $\Sigma = (\nu -2)/\nu\cdot I_{d}$. Due to the multivariate central limit theorem one could believe that
533: \begin{equation*}
534: Y := \frac{1}{\sqrt{n}}\cdot\sum_{j=1}^{n} X_{j}\overset{\cdot}{\sim}N_{d}\left( 0,I_{d}\right),
535: \end{equation*}
536: where $X_{1},\ldots,X_{n}$ are independent copies of $X$. But indeed $Y^{\text{T}}Y \overset{\cdot}{\sim}\chi_{d}^{2}$ holds only if $q:=n/d$ is large rather than $n$ being large (cf. Frahm, 2004, Section 6.2). Thus the quantity $q$ can be interpreted as `effective sample size'.
537:
538: In the following it is assumed that $R$ is elliptically distributed with location vector $\mu$ and dispersion matrix $\Sigma$. Let $\Sigma = \mathcal{O}\mathcal{D}\mathcal{O}^{\text{T}}$ be a spectral decomposition of $\Sigma$. Then
539: \begin{equation*}
540: R\overset{\mathrm{d}}{=}\mu +\mathcal{O}\sqrt{\mathcal{D}}\,Y,
541: \end{equation*}
542: where $Y$ spherically distributed with $\Sigma = I_{d}$.
543:
544: We assume that the elements of $\mathcal{D}$, i.e. the eigenvalues of $\Sigma$ are given in a descending order
545: and that the first $k$ eigenvalues are large whereas the residual ones are small. The elements of $Y$ are called `principal components' of $R$. Since $\mathcal{O}$ is orthonormal the distribution of $\sqrt{\mathcal{D}}\,Y$ remains up to a rotation in $\R^{d}$. The direction of each principal component is given by the corresponding column of $\mathcal{O}$.
546:
547: Hence the first $k$ eigenvalues correspond to the variances (up to a scaling constant) of the `driving risk factors' contained in the first part of $Y$, i.e. $\left( Y_{1},\ldots,Y_{k}\right)$. For the purpose of dimension reduction $k$ shall not be too large. Because the $d-k$ residual risk factors contained in $\left( Y_{k+1},\ldots
548: ,Y_{d}\right) $ are supposed to have (relatively) small variances they can be interpreted as the components of the
549: idiosyncratic risks of each firm, i.e.
550: \begin{equation*}
551: \varepsilon _{i}:=\sum_{j=k+1}^{d}\sqrt{\lambda_{j}}\,\mathcal{O}_{ij}Y_{j},\qquad i=1,\ldots ,d,
552: \end{equation*}
553: where $\lambda_{j}:=\mathcal{D}_{jj}$.
554:
555: Thus we obtain the following principal components model for long-term log-returns,
556: \begin{equation*}
557: R_{i}\overset{\mathrm{d}}{=}\mu_{i}+\beta _{i1}Y_{1}+\ldots +\beta _{ik}Y_{k}+\varepsilon _{i},\qquad
558: i=1,\ldots ,d,
559: \end{equation*}
560: where the driving risk factors $Y_{1},...,Y_{k}$ are uncorrelated. Further, each noise term
561: $\varepsilon_{i}$ $(i=1,...,d)$ is uncorrelated to $Y_{1},...,Y_{k}$, too. But note that $\varepsilon_{1},\ldots ,\varepsilon_{d}$ are correlated, generally. The `Betas' are given by $\beta_{ij} = \sqrt{\lambda_{j}}\,\mathcal{O}_{ij}$ for $i=1,\ldots , d$ and $j=1,\ldots ,k$.
562:
563: The purpose of principal components analysis is to reduce the complexity caused by the number of dimensions.
564: This can be done successfully only if there is indeed a number of principal components accountable for the most
565: part of the distribution. Additionally, the covariance matrix estimator which is used for extracting the
566: principal components should be robust against outliers.
567:
568: For example, let the daily log-returns be multivariate $t$-distributed with $\nu$ degrees of freedom and suppose
569: that $d=500$ and $n=1000$. Note that due to the central limit theorem the normality assumption concerning the long-term log-returns makes sense whenever $\nu >2$. The black lines in Figure 9 show the true proportion of the total variation for a set of $500$ eigenvalues. We see that the largest $20\%$ of the eigenvalues accounts for $%
570: 80\%$ of the overall variance. This is known in economics as `80/20 rule' or `Pareto's principle'. The estimated
571: eigenvalue proportions obtained by the sample covariance matrix are represented by the red lines whereas the
572: corres\-ponding estimates based on the spectral estimator are given by the green lines. Each line is an average
573: over $100$ concentration curves drawn from samples of the corresponding multivariate $t$-distribution.
574:
575: If the data have a small tail index as given by the lower right of Figure 9 then the sample covariance matrix
576: tends to underestimate the number of driving risk factors, essentially. This is similar to the phenomenon
577: observed in Figure 6 where the number of large eigenvalues is overestimated. In contrast, the concentration
578: curves obtained by the spectral estimator are robust against heavy tails. This holds even if the long-term log-returns are not asymptotically normal distributed.
579:
580: \begin{center}
581: %\includegraphics[scale=.33]{PCA2.eps}\quad
582: %\includegraphics[scale=.33]{PCA3.eps}\\[.25cm]
583: %\includegraphics[scale=.33]{PCA1.eps}\quad
584: %\includegraphics[scale=.33]{PCA4.eps}\\[.25cm]
585: %% \includegraphics[scale=.34]{PCA2.png}
586: %% \includegraphics[scale=.34]{PCA3.png}\\[.25cm]
587: %% \includegraphics[scale=.34]{PCA1.png}
588: %% \includegraphics[scale=.34]{PCA4.png}\\[.25cm]
589: \includegraphics[scale=.34]{PCA2}
590: \includegraphics[scale=.34]{PCA3}\\[.25cm]
591: \includegraphics[scale=.34]{PCA1}
592: \includegraphics[scale=.34]{PCA4}\\[.25cm]
593: \end{center}
594: {\bf Fig. 9:} True proportion of the total variation (black line) and proportions obtained by the sample
595: covariance matrix (red lines) and by the spectral estimator (green lines). The samples are drawn from a
596: multivariate $t$-distribution with $\nu =\infty$ (i.e. the multivariate normal distribution, upper left),
597: $\nu=10$ (upper right), $\nu =5$ (lower left), and $\nu =2$ (lower right).\\[.25cm]
598:
599: In the simulated example of Figure 9 it is assumed that the small eigenvalues are equal. This is equivalent to
600: the assumption that the residual risk factors are spherically distributed, i.e. that they contain no
601: more information about the linear dependence structure of $R$. But even if the true eigenvalues are equal
602: the corresponding estimates will not share this property because of estimation errors. Yet it is important to know whether the residual risk factors have structural information or the differences between the eigenvalue estimates are only caused by random noise. This is not an easy task, especially if the data are not normally distributed and the number of dimensions is large which is the issue of the next section.
603:
604:
605: \subsection{Signal-Noise Separation}
606:
607: In the previous section it was mentioned that the central limit theorem fails in the context of high-dimensional data, i.e. if $n/d$ is small. Hence, now we leave the field of classical multivariate analysis and get to the domain of RMT.
608:
609: Let $\Sigma =\mathcal{ODO}^{\mathrm{T}}\in \R^{d\times d}$ be a spectral decomposition where $\mathcal{D}$ shall be a diagonal matrix containing a `bulk' of small and equal eigenvalues and some large (but
610: not necessarily equal) eigenvalues. For the sake of simplicity suppose%
611: \begin{equation*}
612: \mathcal{D}=\left[
613: \begin{array}{cc}
614: cI_{k} & 0 \\
615: \rule{0cm}{.5cm} 0 & bI_{d-k}%
616: \end{array}%
617: \right] \qquad c>b>0,
618: \end{equation*}%
619:
620: where $d-k$ is large. Hence $\Sigma$ has two different characteristic manifolds. The `major' one is determined
621: by the first $k$ column vectors of $\mathcal{O}$ (the `signal part' of $\Sigma$) whereas the `minor' one is
622: given by the $d-k$ residual column vectors of $\mathcal{O}$ (the `noise part' of $\Sigma$). We are interested in
623: separating signal from noise that is to say estimating $k$, properly.
624:
625: For instance, assume that $n=1000$, $d=500$, and that a sample consists of normally distributed random vectors
626: with covariance matrix $\Sigma$, where $b=1$, $c=5$, and $k=100$. By using the sample covariance matrix and
627: normalizing the eigenvalues one obtains exemplarily the histogram of eigenvalues given on the left hand of
628: Figure 10. As might be expected the Mar\v{c}enko-Pastur law is not valid due to the two different regimes of
629: eigenvalues. In contrast, when focusing on the smallest $400$ eigenvalues, i.e. on the noise part of
630: $\widehat{\Sigma}$ the Mar\v{c}enko-Pastur law becomes valid as we see on the right hand of Figure 10.
631:
632: \begin{center}
633: %\includegraphics[scale=.35]{SNS1.eps}\quad
634: %\includegraphics[scale=.35]{SNS2.eps}\\[.25cm]
635: %% \includegraphics[scale=.34]{SNS1.png}
636: %% \includegraphics[scale=.34]{SNS2.png}\\[.25cm]
637: \includegraphics[scale=.34]{SNS1}
638: \includegraphics[scale=.34]{SNS2}\\[.25cm]
639: \end{center}
640: {\bf Fig. 10:} Histogram of all $d=500$ eigenvalues (left hand) and of the noise part (right hand) consisting of
641: the $d-k=400$ smallest eigenvalues. The Mar\v{c}enko-Pastur law is represented by the green lines.\\[.25cm]
642:
643: Thus separating signal from noise means sorting out the largest eigenvalues successively until the residual
644: eigenspectrum is consistent with the Mar\v{c}enko-Pastur law. This is given, e.g., when there are no more
645: eigenvalues exceeding the Mar\v{c}enko-Pastur upper bound $\lambda _{\max}$. In our case-study this is given
646: for $397$ eigenvalues (see the figure below), i.e. $\widehat{k}=103$.
647:
648: \begin{center}
649: %\includegraphics[scale=.35]{SNS3.eps}\\[.25cm]
650: %% \includegraphics[scale=.35]{SNS3.png}\\[.25cm]
651: \includegraphics[scale=.35]{SNS3}\\[.25cm]
652: {\bf Fig. 11:} Histogram of the remaining $397$ eigenvalues after signal-noise separation.\\[.25cm]
653: \end{center}
654:
655: As it was shown in Section \ref{RMT} this approach is promising only if the data are not regularly varying. Hence
656: for financial data not the sample covariance matrix but the spectral estimator is proposed for a proper signal-noise
657: separation.
658:
659:
660: \section{Conclusions}
661:
662: Due to the stylized facts of empirical finance the Gaussian distribution hypothesis is not appropriate for the modeling of financial data. For that reason the authors rely on the broad class of generalized elliptical distributions. This class allows for tail dependence and radial asymmetry. Although the sample covariance matrix works quite good with financial data for the most time it is not appropriate for measuring their linear dependence structure. This is due to a few but extreme fluctuations on financial markets.
663:
664: It is shown that there exists a completely robust ML-estimator (the `spectral estimator') for the dispersion matrix of generalized elliptical distributions. This estimator corresponds to Tyler's M-estimator for elliptical distributions. Further, it is shown that the Mar\v{c}enko-Pastur law fails if the sample covariance matrix is considered as random matrix in the context of elliptically or even generalized elliptically distributed data. This is due to the fact that stochastical independence implies linear independence but conversely uncorrelated random variables are not necessarily independent. In contrast, the Mar\v{c}enko-Pastur law remains valid if the data are uncorrelated and the spectral estimator is considered as random matrix.
665:
666: The robustness property of the spectral estimator can be demonstrated for several financial applications like, e.g., portfolio risk minimization, principal components analy\-sis, and signal-noise separation. If the data are heavy tailed the principal components analy\-sis tends to underestimate the number of driving risk factors if the sample covariance matrix is used for extracting the eigenspectrum. This means that the contribution of the largest eigenvalues to the total variation of the data is overestimated, systemati\-cally. Consequently, in the context of signal-noise separation the largest eigenvalues are overestimated by the sample covariance matrix. This can be fixed simply by using the spectral estimator, instead.
667:
668: \begin{thebibliography}{99}
669:
670: \bibitem{Bin02} Bingham, N.H. and Kiesel, R. (2002). `Semi-parametric modelling in finance: theo\-retical foundation.' \textit{Quantitative Finance} \textbf{2}, pp. 241-250.
671:
672: \bibitem{Bou98} Bouchaud, J.P., Cont, R., and Potters, M. (1998). `Scaling in stock market data: stable laws and beyond.' In: Dubrulle, B., Graner, F., and Sornette, D. (Eds.), \textit{Scale Invariance and Beyond}, Proceedings of the CNRS Workshop on Scale Invariance, Les Houches, March 1997, Springer.
673:
674: \bibitem{Bra01} Branco, M.D. and Dey, D.K. (2001). `A general class of multivariate skew-elliptical distributions.' \textit{Journal of Multivariate Analysis} \textbf{79}: pp. 99-113.
675:
676: \bibitem{Bre03} Breymann, W., Dias, A., and Embrechts, P. (2003). `Dependence structures for multivariate high-frequency data in finance.' \textit{Quantitative Finance} \textbf{3}: pp. 1-14.
677:
678: \bibitem{Cam81} Cambanis, S., Huang, S., and Simons, G. (1981). `On the theory of elliptically contoured distributions.' \textit{Journal of Multivariate Analysis} \textbf{11}: pp. 368-385.
679:
680: \bibitem{Cho93} Chopra, V.K. and Ziemba, W.T. (1993). `The effect of errors in means, variances, and covariances on optimal portfolio choice.' \textit{The Journal of Portfolio Management}, Winter 1993: pp. 6-11.
681:
682: \bibitem{Ebe95} Eberlein, E. and Keller, U. (1995). `Hyperbolic distributions in finance.' \textit{Bernoulli} \textbf{1}: pp. 281-299.
683:
684: \bibitem{Emb04} Embrechts, P., Frey, R., and McNeil, A.J. (2004). `Quantitative methods for financial risk management.' In progress, but various chapters are retrievable from \texttt{http://www.math.ethz.ch/\symbol{126}mcneil/book.html}.
685:
686: \bibitem{Eng82} Engle, R.F. (1982). `Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation.' \textit{Econometrica} \textbf{50}: pp. 987-1007.
687:
688: \bibitem{Fam65} Fama, E.F. (1965). `The behavior of stock market prices.' \textit{Journal of Business} \textbf{38}: pp. 34-105.
689:
690: \bibitem{Fan90} Fang, KT., Kotz, S., and Ng, KW. (1990). `Symmetric multivariate and related distributions.' Chapman \& Hall.
691:
692: \bibitem{Fra04} Frahm, G. (2004). `Generalized elliptical distributions: theory and applications.' Ph.D. thesis, University of Cologne, Faculty of Management, Economics, and Social Sciences, Department of Statistics, Germany. Retrievable from \texttt{http://kups.ub.uni-koeln.de/volltexte/2004/1319/}.
693:
694: \bibitem{Hia00} Hiai, F. and Petz, D. (2000). `The semicircle law, free random variables and entropy.' American Mathematical Society.
695:
696: \bibitem{Hul02} Hult, H. and Lindskog, F. (2002). `Multivariate extremes, aggregation and dependence in elliptical distributions.' \textit{Advances in Applied Probability} \textbf{34}: pp. 587-608.
697:
698: \bibitem{Jun02} Junker, M. and May, A. (2002). `Measurement of aggregate risk with copulas.' Working paper, CAESAR, Bonn, Germany. Retrieved 2004-10-14 from \texttt{http://www.caesar.de/uploads/media/cae\_pp\_0021\allowbreak \_junker\_2002-05-09.pdf}.
699:
700: \bibitem{Kel70} Kelker, D. (1970). `Distribution theory of spherical distributions and a location-scale parameter generalization.' \textit{Sankhya A} \textbf{32}: pp. 419-430.
701:
702: \bibitem{Lin00} Lindskog, F. (2000). `Linear correlation estimation.' Working paper, Risklab, Switzerland. Retrieved 2004-10-14 from \texttt{http://www.risklab.ch/\allowbreak Papers.html\#LCELindskog}.
703:
704: \bibitem{Man63} Mandelbrot, B. (1963). `The variation of certain speculative prices.' \textit{Journal of Business} \textbf{36}: pp. 394-419.
705:
706: \bibitem{Meh90} Mehta, M.L. (1990). `Random matrices.' Academic Press, 2nd edition.
707:
708: \bibitem{Mik03} Mikosch, T. (2003). `Modeling dependence and tails of financial time series.' In: Finkenstaedt, B. and Rootz\'{e}n, H. (Eds.), \textit{Extreme Values in Finance, Telecommunications, and the Environment}, Chapman \& Hall.
709:
710: \bibitem{Oja03} Oja, H. (2003). `Multivariate M-estimates of location and shape.' In: H\"{o}glund, R., J\"{a}ntti, M., and Rosenqvist, G. (Eds.), \textit{Statistics, Econometrics and Society. Essays in Honor of Leif Nordberg}, Statistics Finland.
711:
712: \bibitem{Pra89} Praag, B.M.S. van and Wesselman, B.M. (1989). `Elliptical multivariate analysis.' \textit{Journal of Econometrics} \textbf{41}: pp. 189-203.
713:
714: \bibitem{Smi02} Schmidt, R. (2002). `Tail dependence for elliptically contoured distributions.' \textit{Mathematical Methods of Operations Research} \textbf{55}: pp. 301-327.
715:
716: \bibitem{Tob58} Tobin, J. (1958). `Liquidity preference as behavior towards risk.' \textit{Review of Economic Studies} \textbf{25}: pp. 65-86.
717:
718: \bibitem{Tyl83} Tyler, D.E. (1983). `Robustness and efficiency properties of scatter matrices.' \textit{Biometrika} \textbf{70}: pp. 411-420.
719:
720: \bibitem{Tyl87} Tyler, D.E. (1987). `A distribution-free $M$-estimator of multivariate scatter.' \textit{The Annals of Statistics} \textbf{15}: pp. 234-251.
721:
722: \bibitem{Vis01} Visuri, S. (2001). `Array and multichannel signal processing using nonparametric statistics.' Ph.D. thesis, Helsinki University of Technology, Signal Processing Laboratory, Finland.
723:
724: \bibitem{Yin86} Yin, Y.Q. (1986). `Limiting spectral distribution for a class of random matrices.' \textit{Journal of Multivariate Analysis} \textbf{20}: pp. 50-68.
725:
726: \end{thebibliography}
727: \end{document}
728: