math0506027/cuc.tex
1: \documentclass[11pt]{article}
2: \usepackage{psfig,amsmath,amssymb,euscript}
3: %\usepackage{showkeys}
4: \usepackage{lscape}
5: \renewcommand{\baselinestretch}{1.6}
6: \oddsidemargin 0in
7: \evensidemargin 0in
8: %\topmargin -0.3in
9: \topmargin -0.7in
10: \textwidth 6.4in
11: \textheight 9.25in
12: \makeatletter
13: \newcommand{\fsize}{\footnotesize}
14: \newcommand{\bvxi}{\mbox{\boldmath$\xi$}}
15: \input{QYalias}
16: 
17: 
18: \begin{document}
19: \title{\bf Modelling Multivariate Volatilities via Conditionally
20: Uncorrelated Components\thanks{Partially supported by an EPSRC
21: research grant and by NSF grant DMS-0355179.}}
22: \author{
23: Jianqing Fan$^{1,2}$
24: \quad \quad Mingjin Wang$^{2,3}$
25:  \quad \quad Qiwei Yao$^{2,3}$\\[2ex]
26: $^1$ Benheim Center of Finance and \\
27: Department of Operations Research and Financial Engineering\\
28: Princeton University, Princeton, NJ 08544, USA\\[1ex]
29: $^2$Department of Statistics, London School of Economics, London, WC2A
30: 2AE, UK\\[1ex]
31: $^3$ Guanghua School of Management, Peking University, Beijing 100871, China}
32: 
33: \date{}
34: 
35: 
36: 
37: \maketitle
38: 
39: \begin{abstract}
40: We propose to model multivariate volatility processes based on the
41: newly defined conditionally uncorrelated components (CUCs). This model
42: represents a parsimonious representation for matrix-valued processes.
43: It is flexible in the sense that we may fit each CUC with any
44: appropriate univariate volatility model. Computationally it splits
45: one high-dimensional optimization problem into several lower-dimensional
46: subproblems. Consistency for the estimated CUCs has been established.
47: A bootstrap test is proposed for testing the existence
48: of CUCs. The proposed methodology is illustrated  with both simulated and
49: real data sets.
50: \end{abstract}
51: 
52: \noindent
53: {\sl Key words}:
54: dimension reduction,
55: extended GARCH(1,1),
56: financial returns,
57: multivariate volatility,
58: portfolio volatility,
59: time series.
60: 
61: \newpage
62: 
63: \section{Introduction}
64: 
65: One of the most prolific areas of research in the financial
66: econometrics literature in last two decades is to model
67: time-varying volatility of financial returns. Many statistical
68: models, most designed for univariate data, have been proposed for
69: this purpose. From the practical point of view, there are at least
70: two incentives to model several financial returns jointly. First,
71: time-varying correlations among different securities are important
72: and useful information for portfolio optimization, asset pricing
73: and risk management. Secondly, modelling for single security may
74: be improved by incorporating the relevant information in other
75: securities. The quest for modelling multivariate processes, which
76: are often represented by conditional covariance matrices, has
77: motivated the attempts to extending univariate volatility models to
78: multivariate cases, aiming for practical and/or statistical
79: effectiveness. We list some of the endeavors below.
80: 
81: 
82: Let $\{ \bX_t \}$ be a vector-valued (return) time series with
83: \[
84: E(\bX_t | \calF_{t-1} ) = 0, \quad \quad
85: \var( \bX_t | \calF_{t-1} ) = \bSigma_t \equiv \big( \sigma_{t,ij} \big),
86: \]
87: where $\calF_t$ is the $\sigma$-algebra generated by $\{ \bX_t,
88: \bX_{t-1}, \cdots \}$, and $\bSigma_t$ is an
89: $\calF_{t-1}$-measurable $d\times d$ semi-positive definite
90: matrix. One of the most general multivariate GARCH($p,q$) model is
91: the BEKK representation (Engle and Kroner~1995)
92: \begin{eqnarray}
93: \label{a3}
94: \bSigma_t = \bC + \sum_{i=1}^p \sum_{j=1}^m \bA_{ij} \bX_{t-i} \bX_{t-i}^\tau
95: \bA_{ij}^\tau
96: + \sum_{i=1}^q \sum_{j=1}^m \bB_{ij} \bSigma_{t-i} \bB_{ij}^\tau,
97: \end{eqnarray}
98: where $ \bC, \bA_{ij}, \bB_{ij}$ are  $d\times d$ matrices, and
99: $\bC$ is positive definite (denoted as $\bC>0$).
100: Although the form of the above model is quite general especially when
101: $m$ is reasonably large (Proposition~2.2 of Engle and Kroner 1995), it
102: suffers from the problems of
103: overparametrization. Similar to multivariate ARMA models, not all
104: parameters in model (\ref{a3}) are necessarily
105: identifiable even when $m=1$.
106: Overparametrization will also lead to a flat likelihood function, making
107: statistical inference intrinsically difficult and computationally
108: troublesome. See, for example, Engle and Kroner~(1995), and Jerez, Casals
109: and Sotoca~(2001).
110: 
111: To overcome the difficulties due to overparametrization, a dynamic
112: conditional correlation (DCC) model (Engle 2002, Engle and Sheppard~2001)
113: has been proposed. It is based on the decomposition
114: \begin{equation} \label{a4}
115: \bSigma_t = \bD_t \bR_t \bD_t,
116: \end{equation}
117:  where $\bD_t = \diag( \sigma_{t,11}^{1/2},
118: \cdots, \sigma_{t,dd}^{1/2} )$, $\sigma_{t,ii}$ is the conditional
119: variance of the $i$-th component of $\bX_t$, and $\bR_t \equiv
120: (\rho_{t,ij})$ is the conditional correlation
121: matrix. A simple way to facilitate such a model is to model each
122: $\sigma_{t,ii}$ with a univariate volatility model and to model
123: conditional correlation using
124: a rolling exponential smoothing as follows
125: \[
126: \rho_{t,ij} =  \sum_{k=1}^{t-1} \la^k \ve_{t-k,i} \ve_{t-k,j}
127: \Big/ \Big\{ \sum_{k=1}^{t-1} \la^k \ve_{t-k,i}^2
128: \; \sum_{k=1}^{t-1} \la^k \ve_{t-k,j}^2 \Big\}^{1/2},
129: \]
130: where $\ve_{ti}= X_{ti}/\sigma_{t,ii}^{1/2}$. Even with such a
131: simple specification, estimation typically involves solving a
132: high-dimensional optimization problem as, for example, the
133: Gaussian likelihood function cannot be factorized into several
134: lower-dimensional functions. To overcome the computational
135: difficulty, Engle~(2002) proposes a two-step estimation procedure
136: as follows: first fit each $\sigma_{t,ii}$ in (\ref{a4}) with a
137: univariate GARCH(1,1) model using the observations on the $i$-th
138: component of $\bX_t$ only, and then  model the conditional
139: correlation matrix $\bR_t$ by a simple GARCH(1,1) form
140: \begin{equation}
141: \label{a5}
142: {\bR}_t={\bf S}(1-\theta_1-\theta_2)+\theta_1
143: ({\bve}_{t-1}{\bve}_{t-1}^{\prime})+\theta_2 {\bR}_{t-1},
144: \end{equation}
145: and $\bve_t$
146: is a $d\times 1$ vector of the
147: standardized residuals obtained in the separate GARCH(1,1) fittings for
148: the $d$ components of $\bX_t$,
149: and ${\bf S}$ is
150: the sample correlation matrix of $\bX_t$. Note there are only two unknown parameters
151: $\theta_1, \theta_2$
152: in the dynamical correlation model (\ref{a5}), so it can be easily implemented
153: even for large or very large $d$. However it
154: may not provide adequate fitting when the components of $\bX_t$ exhibit
155: different dynamic correlation structures; see an example of three-dimensional
156: data set in section~4 below. Furthermore in modelling the volatility for
157: each component, no attempts are made to extract additional
158: information from other components.
159: 
160: 
161: 
162: Alexander (2001) proposes an orthogonal GARCH model which fits
163: each principal component (PC) with a univariate GARCH model
164: separately, and treats all PCs as {\sl conditionally} uncorrelated
165: random variables. Since PCs are only unconditionally uncorrelated,
166: such a misspecification may lead to non-negligible errors in the
167: fitting; see, for example, Figure~5 and related discussions in
168: section~4 below.
169: 
170: Other multivariate volatility models include, for
171: example, vectorized multivariate GARCH models of Bollerslev, Engle
172: and Wooldridge~(1988), constant conditional
173:  correlation
174: multivariate GARCH models of Bollerslev~(1990),
175: a multivariate stochastic volatility model of Harvey, Ruiz and
176: Shephard~(1994),
177: a generalized
178: orthogonal GARCH models of van der Weide~(2002),
179: an easy-to-fit ad hoc
180: approach of Wang and Yao~(2005); see also a survey in Bauwens, Laurent and
181: Rombouts~(2003) and the references within.
182: 
183: While all the aforementioned models have their own merits,
184: each of them has one or more of the three drawbacks; (i)
185: overparametrization, (ii) computational complication, and (iii) too
186: simple to catch some
187: important dynamical structures.
188: 
189: In this paper, we propose a new modelling methodology which
190: mitigates the above three drawbacks. The basic idea is to assume
191: that $\bX_t$ is a linear combination of a set of {\sl
192: conditionally uncorrelated components} (CUCs); see section~2.1
193: below. One fundamental  difference from the orthogonal GARCH model
194: is that we use CUCs, instead of PCs, which are genuinely
195: conditionally uncorrelated. The advantages of the new approach
196: include: (i)~the CUC decomposition leads to a parsimonious
197: representation for multivariate volatility (matrix-valued)
198: processes --- there is no model identification problems, (ii)~it has
199: the flexibility to model each CUC with any appropriate univariate
200: volatility models, (iii)~computationally it splits a
201: high-dimensional optimization problem into several
202: lower-dimensional subproblems, and (iv)~it allows the volatility
203: model for one CUC to depend on the lagged value of the other
204: CUCs.
205: 
206: The idea of using CUCs is similar to the so-called the independent
207: component analysis (Hyv\"arinen, Karhunen and Oja 2001). However
208: instead of requiring all the component series are independent with
209: each other, we only impose a weaker condition that the component
210: series are conditionally uncorrelated; see (\ref{b1}) below. Of
211: course the existence of CUCs is also not always guaranteed. We
212: propose a bootstrap test to assess the feasibility of such an
213: approach. Our empirical experience shows that for a large number
214: of practical examples, there is no significant evidence to reject
215: the hypothesis that the CUCs exist.
216: 
217: Literature on applying independent components  analysis to
218: financial and economic time series includes, for example, Back and
219: Weigend (1997), Kiviluoto and Oja (1998), M${\breve {\rm
220: a}}$l${\breve {\rm a}}$roiu, Kiviluoto and Oja (2000), and van der
221: Weide (2002). Although our basic idea is somehow similar to van
222: der Weide~(2002), our approach is completely different.
223: 
224: The rest of the paper is organized as follows. Section~2 contains
225: a detailed description of the proposed new methodology and the
226: associated theoretical results. Simulation results are reported in
227: section~3. Illustrations with real data examples are presented in
228: section~4. Technical proofs are relegated in appendices.
229: 
230: 
231: 
232: 
233: 
234: 
235: 
236: \section{Methodology}
237: 
238: \subsection{Basic setting}
239: 
240: To simplify the matter concerned, we may assume $\var(\bX_t) =
241: \bI_d$ --- the $d\times d$ identity matrix. In practice, this
242: amounts to replacing $\bX_t$ by $\bS^{-1/2}\bX_t$, where
243: $\bS$ is the sample covariance matrix of $\bX_t$.
244: We assume that each component of $\bX_t$ is a linear
245: combination of $d$ conditionally uncorrelated components (CUCs)
246: $Z_{t1}, \cdots, Z_{td}$ which satisfy the conditions $E(Z_{ti}|
247: \calF_{t-1} )=0$, Var$(Z_{ti}) =1$, and
248: \begin{equation} \label{b1}
249: E(Z_{ti}Z_{tj} | \calF_{t-1} ) = 0, \quad \mbox{for all } i\ne j.
250: \end{equation}
251: Put $\bZ_t = (Z_{t1}, \cdots, Z_{td})^\tau$.
252: The above setting implies that
253:  \begin{equation} \label{b2}
254:  \bX_t = \bA \bZ_t, \quad \bZ_t = \bA^\tau \bX_t,
255:  \end{equation}
256: for a constant matrix $\bA$. Furthermore, $ \var(\bZ_t) = \bA^\tau
257: \var(\bX_t) \bA = \bA^\tau \bA =\bI_d$. Hence
258: $\bA$ is a $d\times d$ orthogonal matrix with ${d\over 2}(d-1)$
259: free elements. Put
260: \begin{equation} \label{b3}
261: \var(\bZ_t|\calF_{t-1}) = \diag( \sigma_{t1}^2, \cdots, \sigma_{td}^2),
262: \end{equation}
263: i.e. $\sigma_{tj}^2 = \var(Z_{tj} | \calF_{t-1})$. It is easy to see
264: that once we
265: have specified $\sigma_{tj}^2$ -- the volatility of
266: the $j$-th CUC, for $j=1, \cdots, d$,
267: volatilities for any portfolios can be deduced accordingly. For
268: example, for any portfolios $\xi_t = \bb^\tau_1 \bX_t$ and $\eta_t
269: = \bb^\tau_2 \bX_t$ it holds that
270: \[
271: \var(\xi_t | \calF_{t-1}) = \sum_{j=1}^d b_{j1}^2
272: \, \sigma_{tj}^2, \quad \quad \quad
273: \cov(\xi_t, \eta_t | \calF_{t-1}) = \sum_{j=1}^d b_{j1} b_{j2}
274: \, \sigma_{tj}^2.
275: \]
276: where $(b_{1j}, \cdots, b_{dj}) = \bb_j^\tau \bA$ $(j=1, 2)$.
277: Hence, the CUC decomposition (\ref{b2}) facilitates a parsimonious
278: modelling for $d$-dimensional multivariate volatility process via
279: $d$ univariate volatility models. In this way, we reduce the
280: number of parameters involved substantially.
281: 
282: 
283: \subsection{Estimation of CUCs}
284: 
285: \subsubsection{Estimation procedure}
286: 
287: By (\ref{b2}), $Z_{tj} = \ba_j^\tau \bX_t$, and $ \ba_1, \cdots,
288: \ba_d$ are $d$ orthogonal vectors. The goal is to estimate the
289: orthogonal matrix $ \bA =( \ba_1, \cdots, \ba_d) $. Note the
290: order of $\ba_1, \cdots, \ba_d$ is arbitrary, and cannot be
291: identified. Furthermore, $\ba_j$ can be replaced by $-\ba_j$.
292: 
293: Condition (\ref{b1}) is equivalent to
294: \begin{equation} \label{b5}
295: \max_{B \in \calB_t } \big| E\{ Z_{ti} Z_{tj} I(B) \} \big| = 0
296: \end{equation}
297: for any $\pi$-class $\calB_t \subset \calF_{t-1}$ such that the
298: $\sigma$-algebra generated by $\calB_t$ is equal to $\calF_{t-1}$
299: (Theorem~7.1.1 of Chow and Teicher, 1997).
300: In practice, we use some simple $\calB_t$ for the sake of the
301: tractability. This leads to choosing  an orthogonal matrix $\bA  =
302: ( \ba_1, \cdots, \ba_d )^\tau$ which minimizes
303: \begin{equation} \label{b6}
304: \Psi_n(\bA) \equiv
305: \sum_{1\le i < j \le d} \; \sup_{B \in \calB,\, 1\le k \le k_0 }\;
306:  {1 \over n-k} \Big|\ba_i^\tau\Big\{ \sum_{t=k+1}^n \bX_t
307: \bX_t^\tau I( \bX_{t-k} \in B ) \Big\} \ba_j \Big|,
308: \end{equation}
309: where $\calB$ is a collection of subsets in $\RR^d$, $k_0 \ge 1$ is
310: a prescribed integer. We denote by $\wh \bA = ( \wh\ba_1, \cdots,
311: \wh\ba_d )^\tau$ the resulting estimator.
312: 
313: % Note that when  $\calB$ consists of only two sets, empty set and the
314: % whole $d$-dimensional space $\RR^d$, $\Psi_n(\bA)$ is basically the same
315: % as
316: % $$
317: % \sum_{1\le i < j \le d} {1 \over n-1} \Big|\ba_i^\tau\Big\{ \sum_{t=2}^n \bX_t
318: %  \bX_t^\tau  \Big\} \ba_j \Big|.
319: % $$
320: % Hence, $\{\wh \ba_j, j=1, \cdots, d\}$ are the principal components.  In
321: % other words, our model becomes the orthogonal GARCH model in Alexander
322: % (2001).
323: 
324: Since the order of $\ba_1, \cdots, \ba_d$ is arbitrary, we measure the
325: estimation error by
326: \begin{equation} \label{b7}
327: D(\wh \bA, \; \bA) = 1 - {1\over d}
328: \sum_{i=1}^d   \max_{1\le j \le d}  | \ba_i^\tau \wh \ba_j| .
329: \end{equation}
330: Note that for any orthogonal matrices $\bA$ and $ \bB$, $D(\bA,
331: \bB)\ge 0$. Furthermore, if the columns of $\bA$ are obtained from
332: a permutation of the columns of $\bB$ or their reflections, $D(\bA, \bB) = 0$.
333: In fact $\Psi_n(\bA) = \Psi_n(\bB)$ if and only if $D(\bA, \bB) = 0$.
334: 
335: 
336: In practice, we may let $\calB$ consist of balls with an appropriately
337: selected radius (such that each ball contains sufficiently many data
338: points) centered on a grid in the sample space of $\bX_t$.
339: For example, we may use those observations $\bX_t$ as  the centres of balls such
340: as at least one of the components of $\bX_t$ is the 10th, the 20th, $\cdots$
341: the 90th sample percentile of the corresponding component observations.
342: 
343: To overcome the difficulties in handling the constraint $ \bA^\tau\bA = \bI_d$
344: in solving the above optimization problem,
345: we reparametrize $\bA$ in terms of the decompositions:
346: \begin{equation} \label{b8}
347: \bA = \prod_{1\le i < j \le d} \bE_{ij}(\varphi_{ij}),
348: \end{equation}
349: where $\bE_{ij}(\varphi_{ij})$ is obtained from the identity matrix
350: $\bI_d$ with the following replacements: both the $(i,i)$-th and the
351: $(j,j)$-th elements are replaced by $\cos \varphi_{ij}$, the $(i,j)$-th
352: and the $(j,i)$-th elements are replaced, respectively, by $\sin
353: \varphi_{ij}$ and $-\sin \varphi_{ij}$ (Vilenkin 1968, van der Weide 2002).
354: Obviously $\bE_{ij}(\varphi_{ij})$ is an orthogonal matrix, so is $\bA$
355: given in (\ref{b8}). Writing $\bA$ in (\ref{b2}) in the form of
356: (\ref{b8}), the constrained minimization of (\ref{b6}) over
357: orthogonal $\bA$ is transformed to an unconstrained minimization
358: problem over a ${d(d-1)\over 2}\times 1$ vector $\bvarphi =
359: (\varphi_{12}, \varphi_{13},\cdots, \varphi_{1d}, \varphi_{23}, \cdots,
360: \varphi_{d-1,d} )^\tau$. This minimization problem is typically
361: solved by iterative algorithms.
362: We stop the iteration when $D(\bA_k, \bA_{k+1})$ is
363: smaller than a prescribed small number, where $\bA_k$ denotes the
364: value of $\bA$ in the $k$-th iteration, and $D$ is defined as in
365: (\ref{b7}).
366: 
367: \noindent {\bf Remark 1}. In practice,  we may replace (\ref{b6})
368: by a weighted version
369: $$
370: \Psi_n(\bA) = \sum_{1\le i < j \le d} \; \sup_{B \in \calB,\, 1\le
371: k \le k_0 }\;
372:  {1 \over n-k} \Big|\ba_i^\tau\Big\{ { \sum_{t=k+1}^n \bX_t
373: \bX_t^\tau [I( \bX_{t-k} \in B ) + \varepsilon_0] \over
374: \sum_{i=k+1}^n [I( \bX_{t-k} \in B )+\varepsilon_0] } \Big\} \ba_j
375: \Big|,
376: $$
377: where $\varepsilon_0$ is a small constant guarding against zero
378: denominator.  This puts more emphasis on small sets $B$.
379: Furthermore, the superemum over $k$ in (\ref{b6}) may be replaced
380: the summation over~$k$.
381: 
382: \subsubsection{Asymptotic properties}
383: 
384: We first introduce two concepts:  mixing which measures the decaying speed of
385: the auto-dependence for a time series over an increasing time span, and
386: the Vapnik-$\breve{\mbox{C}}$ervonenkis (or VC) index which measures
387: the complexity of a collection of sets.
388: 
389: Let $\calF_{i}^j$ be the $\sigma$-algebra generated by $\{\bX_t, i
390: \leq t \leq j \}$. The  $\beta$-mixing coefficients is defined  as
391: $$
392:  \beta(n) = E \left \{ \sup_{ B \in
393:   \calF_n^\infty} | P(B) - P(B|\calF_{-\infty}^0 ) | \right \}.
394: $$
395: (See \S 2.6.1 of Fan and Yao, 2003.)
396: 
397: 
398: For an arbitrary set of $n$ points $\{x_1, \cdots, x_n \}$, there are
399: $2^n$ possible subsets.  Say that $\calB$ picks out a certain subset
400: from $\{x_1, \cdots, x_n\}$ if this can be formed as a set of the
401: form $B \cap \{x_1, \cdots, x_n\}$ for a set $B$ in $\calB$. The
402: collection $\calB$ shatters $\{x_1, \cdots, x_n\}$ if each of its
403: $2^n$ subsets can be picked out by $\calB$.  The VC-index of $\calB$
404: refers
405: to the smallest $n$ for which no set of size $n$ is shattered by
406: $\calB$. A collection of sets $\calB$ is called a VC-class if its
407: VC-index is finite.  The collections of sets of rectangles, balls and
408: their unions are VC-classes. See Chapter 2.6 of van der Vaart and
409: Wellner (1996) for further discussion on VC-classes.
410: 
411: Under the regularity conditions listed below, the estimator $\wh \bA$
412: is consistent; see Theorem~1. Its proof is relegated in Appendix A.
413: \begin{quote}
414:  (A1) The collection $\calB$ of sets  in $\RR^d$  is a VC-class.
415: 
416:  (A2) The process $\{ \bX_t \}$ is strictly stationary with $E||
417: \bX_t ||^2 < \infty$, where $||\cdot||$ denotes the Euclidean
418: norm. Furthermore, the $\beta$-mixing coefficients  $\{\bX_t \}$
419: satisfy $\beta(n) = O(n^{-b})$ for some $b > 0$.
420: 
421: (A3) There exists a $d\times d$ orthogonal matrix $\bA_0$ which
422: minimises
423: \[
424: \Psi(\bA) \equiv \sum_{1\le i < j \le d}  \sup_{1 \leq k \leq k_0,
425: B \in \calB} \big| E \{ \ba_i^\tau \bX_t \bX_t^\tau \ba_j
426: I(\bX_{t-k} \in B) \} \big|.
427: \]
428: Furthermore the minimum value of $\Psi$ is obtained at an orthogonal
429: matrix $\bA$ if and only if $D(\bA, \bA_0) = 0$.
430: 
431: (A4).  $E \| \bX_t \|^{2p} < \infty$ for some $p >2$ and the
432: $\beta$-mixing coefficient in (A2) holds for $b > p/(p-2)$.
433: 
434: 
435: (A5) $\Psi(\bA_0) - \Psi(\bA) \le - a D(\bA, \bA_0)$
436: for any orthogonal matrix $\bA$ such that $D(\bA, \bA_0)$ is smaller than a
437: small but fixed constant, where $a > 0$ is a constant.
438: \end{quote}
439: 
440: \noindent{\bf Remark 2}.  Let $\calH$ be the set consisting of all
441: $d\times d$ orthogonal matrices.
442: Then $\calH$ may be partitioned into the equivalent classes defined
443: by the distance $D$ in (\ref{b7}) as follows: the $D$-distance
444: between  any two elements within an equivalent class is 0, and the
445: $D$-distance between
446: any two elements from different classes is greater than 0.
447: Let $\calH_D$ be the quotient space $\calH/D$ consisting of those
448: equivalent classes in $\calH$, i.e.  we treat $\bA$ and $\bB$ as the
449: same element in $\calH_D$ if and only if $D(\bA, \bB) =0$.
450: Condition (A3) ensures $\bA_0$ is the unique minimiser
451: of $\Psi(\bA)$ on $\calH_D$.
452: In fact both $\Psi(\cdot)$ and $\Psi_n(\cdot)$ are
453:  Lipschitz continuous on $\calH_D$ with $D$-distance; see Lemma~1 in Appendix~A
454: below.
455: 
456: 
457: 
458: 
459: \askip
460: 
461: \noindent {\bf Theorem 1}. Let $k_0\ge 1$ be a fixed integer.
462: Under conditions (A1)--(A3), $D(\wh \bA, \bA_0) \to 0$ almost
463: surely as $n \to \infty$.  If, in addition, condition (A4) holds,
464: then
465: $$
466: \Psi_n (\bA) - \Psi(\bA) = O_P(n^{-1/2}), \quad \mbox{for any
467: orthogonal $\bA$.}
468: $$
469: Furthermore, $n^{1/2} D(\wh\bA, \bA_0) = O_P(1)$ provided that, in
470: addition, condition (A5) also holds.
471: 
472: 
473: 
474: 
475: When the CUCs exist, namely $\Psi(\bA_0) = 0$, $\bA_0$ corresponds to the 
476: transform for the CUCs.  When the CUC does not exist, Theorem 1
477: continues to hold.  In this case, $\Psi(\bA_0) \not = 0$ and indeed
478: $\bA_0$ can depend on the $\pi$-class $\cal B$.
479: In practice, we really do not know whether this condition holds
480: or not. In that case, our aim becomes naturally to find an
481: orthogonal transform such that the resulting components are as
482: less conditionally correlated as possible.  Observe that the
483: conditional correlation criterion
484: $$
485:   \Psi(\bA) = \sum_{1\le i < j \le d}  \sup_{1 \leq k \leq k_0,
486:    B \in \calB} \big| \mbox{Corr} (\ba_i^\tau \bX_t, \ba_j^T \bX_t |
487:    \bX_{t-k} \in B ) \big | P( \bX_{t-k} \in B).
488: $$
489: Thus, a reasonable criterion is to find an orthogonal transform
490: $\bA$ to minimize $\Psi(\bA)$.  The following theorem shows that
491: our estimation method possesses some degrees of robustness and is
492: better than the principal component transform in terms of
493: minimizing the conditional correlation criterion $\Psi(\bA)$.
494: 
495: \noindent {\bf Theorem 2}. Let $k_0\ge 1$ be a fixed integer.
496: Under conditions (A1), (A2),  for any other orthogonal transform
497: $\hat{\bB}$, we have
498: $$
499:    \liminf  \{\Psi(\hat{\bA}) - \Psi(\hat{\bB})\} \leq 0.
500: $$
501: %If, in addition, $\bA_0$ is the unique minimizer of $\Psi(\bA)$ on
502: %the quotient space $\calH_D$, then $D(\hat{\bA}, \bA_0) \to 0$
503: %almost surely.
504: 
505: Theorem 2 shows for any other orthogonal transform $\hat{\bB}$,
506: asymptotically, the transformed components have higher conditional
507: correlation, in terms of $\Psi(\cdot)$, than those transformed by
508: $\hat{\bA}$.
509: 
510: 
511: 
512: \subsection{Modelling volatilities for CUCs}
513: 
514: 
515: Once the CUCs have been identified, we may fit each
516: $\sigma_{tj}^2$ with any appropriate univariate volatility model,
517: for example, a GARCH model, a stochastic volatility model, or any
518: nonparametric and semiparametric volatility models. As a simple
519: illustration, we establish below an extended GARCH(1,1) model for
520: each of $\sigma_{ti}^2$ given in (\ref{b3}).
521: 
522: \subsubsection{Extended GARCH(1,1) models}
523: 
524: We assume, for the $j$-th CUC, $j=1, \cdots, d$,
525: \begin{equation} \label{b9}
526: Z_{tj} = \sigma_{tj} \ve_{tj}, \quad \quad \sigma_{tj}^2 = \ga_j +
527: \sum_{i=1}^d \alpha_{ji} Z_{t-1, i}^2 + \beta_j \sigma_{t-1,j}^2,
528: \end{equation}
529: where $ \{\ve_{tj}, \; -\infty < t < \infty\} $ is a sequence of i.i.d.
530: random variables with mean 0 and
531: variance 1, $\ve_{tj}$ is independent of $\calF_{t-1}$, $\ga_j
532: >0$ and $\alpha_j, \alpha_{ji}, \beta_j \ge 0$.
533: This model contains extra $d-1$ terms $\sum_{i \not = j}
534: \alpha_{ji} Z_{t-1, i}^2$ from the standard GARCH(1,1) model,
535: which incorporates the possible association between the $j$-th CUC
536: and the other CUCs, while the conditional zero-correlation
537: condition (\ref{b1}) still holds. Such a dependence is termed as
538: that the $i$-th component (if $\alpha_{ji} \not = 0$) is causal in
539: variance to the $j$-th component (Engle, Ito and Lin~1991).
540: 
541: 
542: In practice, we expect that $\sigma_{tj}^2$ may depend on
543: $Z_{t-1, i}^2$ only for a small number of $i$'s, including $i=j$, i.e. many
544: coefficients $\alpha_{ji}$ (for $i\ne j$) may be 0.
545: Section~2.3.3 below outlines a data-analytic approach for
546: building such a component-dependent model.
547: 
548: 
549: When $\beta_j \in [0, 1)$, (\ref{b9}) implies
550: \begin{equation} \label{b10}
551: \sigma_{tj}^2 = \var(Z_{tj} |\calF_{t-1}) = {\ga_j \over 1
552: -\beta_j} + \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty
553: \beta_j^{k-1} Z_{t-k,\, i}^2.
554: \end{equation}
555: Put $\bZ_t = (Z_{t1}, \cdots, Z_{td})^\tau$. Theorem~2 below gives
556: a sufficient condition of the existence of stationary solution to
557: model~(\ref{b9}).
558: 
559: 
560: \askip
561: 
562: \noindent {\bf Theorem 3}. (i) The extended GARCH(1,1) model
563: (\ref{b9}) defines a unique $d$-dimensional strictly stationary
564: process $\{ \bZ_t \}$ with $E || \bZ_{t}||^2 < \infty$ under the
565: condition
566: \begin{equation} \label{b11}
567: r\cdot \max\{\alpha_{j1}, \cdots, \alpha_{jd} \} + \beta_j < 1,
568: \quad \quad 1\le j \le d,
569: \end{equation}
570: where $r = \max_{1\le j \le d} d_j$, and $d_j$ is the number of non-vanishing
571: coefficients among $ \alpha_{j1}, \cdots, \alpha_{jd} $.
572: 
573: (ii) Under condition (\ref{b11}), $E(Z_{tj}^2) = 1 $ for all $1\le
574: j \le d$ if and only if
575: \begin{equation}
576: \ga_j = 1 - \beta_j - \sum_{i=1}^d \alpha_{ji} , \quad \quad \quad
577: 1\le j \le d.  \label{b12}
578: \end{equation}
579: 
580: \askip
581: 
582: The proof of the above theorem is in Appendix B. When
583: $\alpha_{ji}= 0$ for all $i \not = j$, i.e. each $Z_{tj}$ follows
584: a standard GARCH(1,1) model, (\ref{b11}) reduces to $\alpha_{jj} +
585: \beta_j < 1$, which is the necessary and sufficient condition for
586: the existence of unique strictly stationary solution with finite
587: second moments for the corresponding GARCH(1,1) model; see Chen
588: and An (1998).  In practice condition (\ref{b11}) may often be
589: violated, indicating the likely inappropriateness of GARCH
590: specification for $\sigma_{tj}^2$. However if we view the right
591: hand side of (\ref{b10}) as an approximation for $\sigma_{tj}^2$,
592: such an approximation process is strictly stationary under a weaker
593: condition $\beta_j <1$. For further discussion of the
594: approximation point of view, we refer to Penzer, Wang and Yao~(2004).
595: 
596: 
597: \subsubsection{quasi-MLE}
598: 
599: To facilitate a likelihood, let us assume hypothetically
600: that $\ve_{tj}$ in (\ref{b9}) has a density $f(\cdot)$,
601: which can be the standard normal distribution, generalized Gaussian
602: distribution and $t$-distribution.  The implied (negative)
603: log-likelihood function for $\btheta_j \equiv (\alpha_{j1},
604: \cdots, \alpha_{jd}, \beta_j)^\tau$ is
605: \begin{equation} \label{b13}
606: l_j(\btheta_j ) = \sum_{t=\nu+1}^n \big\{
607: \log \sigma_{tj}(\btheta_j)  - \log f(Z_{tj}/\sigma_{tj}(\btheta_j)) \big\},
608: \end{equation}
609: for a given integer $\nu \ge 1$, where $\sigma_{tj}(\btheta_j)^2 =
610: \var(Z_j | \calF_{t-1})$ is given by $(\ref{b9})$.  By (\ref{b10})
611: and  (\ref{b12}),
612: \begin{eqnarray}
613: \sigma_{tj}(\btheta_j)^2 &=&
614:  \frac{\gamma}{1 - \beta_j} + \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty
615:     \beta_j^{k-1} Z_{t-k,i}^2 \nonumber \\
616: &=& 1 - \frac{ \sum_{i=1}^d \alpha_{ji}}{ 1 -\beta_j}  +
617: \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty \beta_j^{k-1}
618: Z_{t-k,i}^2 . \label{b14}
619: \end{eqnarray}
620: This form of $\sigma_{tj}(\btheta_j)^2  $ ensures
621: $\var(Z_{tj})=1$; see Theorem~2(ii). The quasi-maximum likelihood
622: estimator $\wt \btheta_j$ minimizes (\ref{b13}). In practice, we
623: let $Z_{ti} \equiv 0$ for all $t\le 0$ on the right hand side of
624: (\ref{b14}).
625: 
626: 
627: \subsubsection{Selection of casual components}
628: 
629: To obtain a parsimonious representation for $\sigma_{tj}^2$, we
630: may select only those significant $Z_{t-1,i}$ on the RHS of the
631: second equation in (\ref{b9}). This is particularly important when
632: the number of components $d$ is large. It may be achieved by using
633: the ideas for variable selection in regression analysis. Below we
634: outline such an algorithm based on a combination of the stepwise
635: addition method and the BIC criterion.
636: 
637: 
638: 
639: We start with the standard GARCH(1,1) model (i.e. $\alpha_{jj}\ne 0$
640: and $\alpha_{ji} = 0$ for $j \not = i$). We then add one more $Z_{t-1,i}$
641: each time which maximizes the (quasi-)likelihood.
642: More precisely, suppose the model contains
643: $(k-1)$ terms $Z_{t-1, j_1}, \cdots, Z_{t-1, j_{k-1}}$ already.
644:  We choose an additional term $Z_{t-1, \ell}$ among
645: $\ell\not\in \{j, j_1, \cdots, j_{k-1}\}$ which maximizes the
646: quasi-likelihood function. Note that this is a two-step
647: maximization problem:  For each given $\ell\not\in \{j, j_1,
648: \cdots, j_{k-1}\}$, we compute the qMLE $\wt \btheta_j^{(k)}$ for
649: $\btheta_j^{(k)} \equiv  (\alpha_{jj}, \alpha_{jj_1}, \cdots,
650: \alpha_{j\ell}, \beta_j)^\tau$ with the constraints
651: $\alpha_{ji} = 0$, for $i \not \in \{j, j_1, \cdots, j_{k-1}, \ell\}$. We then choose
652: an $\ell \not \in \{j, j_1, \cdots, j_{k-1} \}$ to minimize
653: $l_j(\wt \btheta_j^{(k)})$, and denote by $l_j(k)$ the minimum
654: value and the index of the selected variable $j_k$. Put
655: \[
656: {\rm BIC}_j(k) = l_j(k) +  (k+2) \log(n-\nu).
657: \]
658: We choose $r_j$ which minimizes BIC$_j(k) $ over $0 \le k \le d$.
659: Note that $k=0$ corresponds the standard GARCH(1,1) fitting for
660: $Z_{tj}$.
661: 
662: 
663: 
664: \subsubsection{LADE}
665: 
666: If CUCs $Z_{tj}$ are known (i.e. $\ba_j$ are known), the asymptotic properties
667: of qMLE may be derived in the similar manner as Hall and
668: Yao~(2003). See also Mikosch and Straumann~(2004). For example, the
669: estimator $\wt \btheta_j$ would suffer from
670: complicated asymptotic distributions and slow convergence rates if $\ve_{tj}$
671: is heavy-tailed in the sense that $E(|\ve_{tj}|^4) = \infty$.
672: On the other hand, a least absolute deviation estimator
673: based on a log-transformation
674: is always asymptotically normal with the standard root-$n$
675: convergence rate; see Peng and Yao (2003).
676: 
677: To construct the LADE with the constraint $\var(Z_{tj})=1$, we
678: write $\ve_{tj} = v_0 e_{tj}$ in the first equation in (\ref{b9}),
679: where the median of $e_{tj}^2$ is equal to 1 and $v_0 =
680: 1/{\mbox{STD}}(e_{tj})$. With $\sigma_{tj}(\btheta_j)^2$ expressed
681: in (\ref{b14}), parameters $\btheta_j$ and $v_0$ are (jointly)
682: identifiable. Now
683: \[
684: \log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2
685: = \log (e_{tj}^2).
686: \]
687: Since the median of $\log (e_{tj}^2) $ is 0, the true values of the
688: parameters minimise
689: \[
690: E \big|\log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2\big|.
691: \]
692: Therefore we may estimate the
693: parameters by minimizing
694: \begin{equation} \label{b15}
695: \sum_{t=\nu+1}^n
696: |\log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2\big|,
697: \end{equation}
698: where $\sigma_{tj}(\btheta_j)^2$ is given in (\ref{b14}), with the
699: part of $a_{ji} = 0$ for the non-casual component in the variance.
700: So far $\btheta_j$ and $v_0$ are treated as free parameters. The estimators
701: obtained are root-$n$ consistent.
702: 
703: To make an explicit use of the condition that $\var(\ve_{tj})=1$,
704: we may estimate parameters $\btheta_j$ as follows.
705: With the initial estimate $\hat{\btheta}_j^{(0)}$, let $\hat{v}_0$ be the
706: reciprocal of the sample standard deviation of the residuals $\{
707: \wt\ve_{tj} \}$, where $\wt\ve_{tj}
708: =Z_{tj}/\{\sigma_{tj}(\btheta_j^{(0)}) \}$.
709: With the given $\hat{v}_0$ and $\hat{\btheta}_j^{(0)}$, we can
710: minimize
711: $$
712: \sum_{t=\nu+1}^n  w_t \bigl ( \log Z_{tj}^2 - \log \{
713: \sigma_{tj}(\btheta_j)^2\} - \log \hat{v}_0^2\bigr )^2,
714: $$
715: where $w_t =  |\log Z_{tj}^2 - \log \{
716: \sigma_{tj}(\hat{\btheta}_j^{(0)})^2\} - \log
717: \hat{v}_0^2\big|^{-1}$. We may update $\hat{v}_0$ and
718: iterate further until the estimated
719: $\btheta_j$ converges. Note that we have used a weighted $L_2$ loss
720: function to approximate the $L_1$ loss to expedite the computation.
721: 
722: 
723: 
724: \subsection{Inference based on bootstrapping }
725: 
726: A natural question for the proposed approach is if the CUCs
727: $Z_{t1}, \cdots, Z_{td}$ exist, although the minimiser $\{ \wh
728: \ba_j\}$ of (\ref{b6}) always exists. To address this issue
729: statistically, we may construct a test for the null hypothesis
730: \[
731: H_0: \; \bX_t = \bA \bZ_t \quad \mbox{and} \quad
732: \bZ_t = \diag(\sigma_{t1}, \cdots, \sigma_{td}) \bve_t,
733: \]
734: where $\bA^\tau\bA = \bI_d$, $\bve_t = (\ve_{t1}, \cdots,
735: \ve_{td})^\tau$, $\{ \ve_{t1}\}, \cdots, \{ \ve_{td}\}$ are $d$
736: independent series, and each of them is a sequence of i.i.d. r.v.s
737: with mean 0 and variance 1. Note that the null hypothesis above is
738: a sufficient but not necessary condition for the existence of
739: CUCs. The independence condition is required to construct a
740: bootstrap test for this null hypothesis.
741: 
742: Note when $Z_{ti}$ and $Z_{tj}$ are not conditionally
743: uncorrelated, the left hand side of (\ref{b5}) is equal to
744: positive constant instead of 0. Therefore, the {\sl large} values
745: of $\Psi_n(\wh \bA)$ will indicate that the CUCs do  not exist. We
746: adopt a bootstrap method below to assess how large is large enough
747: to reject~$H_0$.
748: 
749: If the null hypothesis $H_0$ could not be rejected, we may also
750: construct confidence sets for the coefficients $\ba_j$ (i.e. the
751: columns of $\bA$) of the CUCs, and the parameters $\btheta_j$
752: based on the same bootstrap scheme. Formally confidence sets for
753: $\btheta_j$ could be constructed based on asymptotic distributions
754: of, for example, the LADE $\wh \btheta_j$, which may be derived in
755: the similar manner of Peng and Yao~(2003). However such an
756: approach is based on the assumption that the CUCs are known (i.e.
757: the vectors $\ba_j$ are known), and, therefore, fails to take
758: into account of the errors due to the estimation for $\ba_j$.
759: 
760: Let $\wh \bA =(\wh \ba_1, \cdots, \wh\ba_d)$ be the estimator
761: derived from minimizing (\ref{b6}). Let $Z_{tj} = \wh \ba_j^\tau
762: \bX_t$. Let $\wh \btheta_j$ be an estimator
763: for $\btheta_j$, such as the LADE defined in section~2.3.4.
764: 
765: The bootstrap sampling scheme consists of the three steps below.
766: \begin{quote}
767: (i) For $j=1, \cdots, d$, draw $\ve_{tj}^*$, for $-\infty< t \le n$,
768: by sampling randomly with replacement from  the standardized residuals
769: $\{\wh \ve_{\nu+1, j}, \cdots , \wh \ve_{nj}\}$ which are obtained
770: from standardizing the raw residuals
771: \[
772: Z_{tj}/\sigma_{tj}(\wh \btheta_j), \quad \quad t=\nu+1, \cdots, n.
773: \]
774: 
775: (ii) For $j=1, \cdots, d$, draw $Z_{tj}^* = \sigma_{tj}^* \ve_{tj}^*$,
776: for $-\infty< t \le n$, where
777: \[
778: ( \sigma_{tj}^*)^2 =1 -  \wh \beta_j - \sum_{i=1}^d \wh
779: \alpha_{ji}   + \sum_{i=1}^d \wh \alpha_{ji}(Z_{t-1, i}^*)^2 + \wh
780: \beta_j (\sigma_{t-1,j}^*)^2.
781: \]
782: 
783: (iii) Let $\bX_t^* = \wh \bA (Z_{t1}^*, \cdots, Z_{td}^*)^\tau$ for $t=1, \cdots,
784: n$.
785: \end{quote}
786: 
787: \askip
788: 
789: \noindent {\sl A test for the existence of the CUCs}: Let
790: $\Psi_n^*(\bA)$ be defined as in (\ref{b6}) with $\{ \bX_t \}$
791: replaced by $\{ \bX_t^* \}$, and the bootstrap estimator $\bA^*=
792: (\ba_1^*, \cdots,  \ba_d^*)$ be computed in the same manner as
793: $\wh \bA$ with $\{ \bX_t \}$ replaced by $\{ \bX_t^* \}$. Note
794: that the bootstrap sample $\{ \bX_t^* \}$
795:  is drawn from the model with $\wh \ba_j^\tau \bX_t$ as its {\sl genuine}
796: CUCs. Hence the conditional
797: distribution of $\Psi_n^*( \bA^*)$ (given the original sample $\{ \bX_t
798: \}$) may be taken as an approximation for the distribution of $\Psi_n(\wh
799: \bA)$ under $H_0$.  Thus we reject
800: $H_0$ if $\Psi_n(\wh \bA)$ is greater than the $[B\alpha]$-th largest
801: value of $\Psi_n^*( \bA^*)$ in a replication of the above bootstrap
802: resampling for $B$ times, where $\alpha \in (0, 1)$ is the size of the
803: test and $B$ is a large integer.
804: 
805: \askip
806: 
807: \noindent {\sl Confidence sets for ${\bf A}$}: A bootstrap
808: approximation for an $(1-\alpha)$ confidence set of the
809: transformation matrix ${\bf A}$ can be constructed  as
810: \begin{equation}\label{b16}
811: \{ {\bf A} \, \big| \, D(\hat{\bf A}; {\bf A}) \le c_\alpha , {\bf A}^{\tau}{\bf A}={\bf I}_d \},
812: \end{equation}
813: where  $c_\alpha $ is the $[B\alpha]$-th largest value of
814: $D(\hat{\bf A}; {\bf A}^*)$ in a replication of bootstrap
815: resampling for $B$ times. Note that when $\bA$ is in the
816: confidence set, so is $\bB$ if the columns of $\bB$ form a
817: permutation of the (reflected) columns of $\bA$; see (\ref{b7}).
818: 
819: \askip
820: 
821: \noindent
822: {\sl Interval estimators for the components of $\wh \btheta_j$}:
823: A bootstrap confidence interval for any component, say, $\beta_j$
824: of $\btheta_j$ may be obtained as follows. Repeat the above
825: bootstrap sampling $B$ times for some large integer $B$, resulting
826: in bootstrap estimates $ \beta^*_{j1}, \cdots,  \beta^*_{jB}$. An
827: approximate $(1-\alpha)$ confidence interval for $\beta_j$ is $(
828: \beta_{j(b_1)}^*, \; \beta_{j(b_2)}^*)$, where $ \beta_{j(i)}^*$
829: denotes the $i$-th smallest value among $ \beta^*_{j1}, \cdots,
830: \beta^*_{jB}$, and $ b_1=[B\alpha/2]$ and $b_2=[B(1-\alpha/2)]$.
831: 
832: 
833: \section{Simulation}
834: 
835: We conduct a Monte Carlo experiment to illustrate the proposed
836: CUC-approach. In particular we check the accuracy of the
837: estimation for the transformation matrix $\bf A$ in (\ref{b2}).
838: 
839: We consider a
840: CUC-GARCH(1,1) model with $d=3$
841: \begin{equation} \label{ex1}
842: {\bf X}_t  =  {\bf A}{\bf Z}_t, \quad \quad
843:             {\bf Z}_{t}| \calF_{t-1}\; \sim \;  N(0,\;  \diag\{\sigma_{t,1}^2,
844: \sigma_{t,2}^2, \sigma_{t,3}^2\}),
845: \end{equation}
846: where $ \sigma_{t,i}^2  =  \gamma_i+\alpha_i
847: Z_{t-1,i}^2+\beta_i \sigma_{t-1,i}^2 $, and
848: \begin{center}
849: \begin{tabular}{ccc|cccc}
850:      & {\bf A}  &        &   $i$       &   $\gamma_i$   &  $\alpha_i$
851:   &  $\beta_i$  \\[0.5ex]\hline
852:   0  &   0.500  & 0.866  &     1             & 0.02         &   0.08
853:   &      0.90 \\
854:   0  &   0.866  & -0.500 &     2             & 0.10         &   0.10
855:   & 0.80 \\
856:   -1 &   0      & 0      &     3             & 0.28         &   0.12
857:   & 0.60 \\
858: \end{tabular}
859: \end{center}
860: It is easy to see that ${\bf A}^\tau{\bf A}={\bf I}_3$ and
861: $\gamma_i = 1 - \alpha_i - \beta_i$ so that the variances of the
862: CUCs are 1 [see (\ref{b12})]. Since $\alpha_1 + \beta_1 = 0.98$
863: is very close to 1, the volatility for the first CUC is highly
864: persistence. On the contrary, the volatility persistence in the
865: third component is less  pronounced as $\alpha_3+\beta_3=0.72$
866: only.
867: 
868: For each of 200 samples with size $n=500$ and 1000 respectively
869: from the above model, we estimated the transformation matrix $\bA$
870: by minimizing $\Psi_n({\bf A})$ defined in (\ref{b6}), which was
871: solved using the proprietary optimization routines in MATLAB. Note
872: that as far as the estimation of $\bA$ is concerned, two
873: orthogonal matrices are treated as identical if the $D$-distance
874: between them is 0; see (\ref{b7}). The coefficients $\alpha_i,
875: \beta_i$ and $\gamma_i$ were estimated using quasi-MLE based on a
876:  Gaussian likelihood. The
877: resulting estimates were summarized in Table~1 and Figure~1.
878: 
879: \begin{table}[htb]
880: \begin{center}
881: \caption[Table 1]{Simulation Results: summary statistics of the
882: errors in estimation}
883: \begin{tabular}{cc | c c c c c c c}\hline
884: && $D(\hat{\bf A}, {\bf A})$ &    $\hat{\alpha}_1$  & $ \hat{\beta}_1$  &  $ \hat{\alpha}_2$  &  $ \hat{\beta}_2$  &  $\hat{\alpha}_3$     &  $\hat{\beta}_3$  \\[0.5ex]\hline
885: &  mean       &   0.0753      &    0.0719            &      0.8701       &
886:      0.0865         &    0.7506          &     0.0997            &
887: 0.6189      \\
888: &  median     &   0.0474      &    0.0705            &      0.8870       &
889:      0.0830         &    0.7801          &     0.0861            &
890: 0.6445      \\
891: $n=500$ & STD   &   0.0714      &    0.0300            &      0.0830       &
892:      0.0469         &    0.1469          &     0.0600            &
893: 0.2017      \\
894: &  bias       &      -        &   -0.0081            &     -0.0299       &
895:     -0.0135         &   -0.0494          &    -0.0203            &
896: 0.0189      \\
897: &  RMSE       &      -        &    0.0303            &      0.0888       &
898:      0.0484         &    0.1546          &     0.0629            &
899: 0.2022      \\ \hline
900: &  mean       &   0.0679      &    0.0722            &      0.8921       &
901:      0.0846         &    0.7751          &     0.0937            &
902: 0.6307      \\
903: &  median     &   0.0434      &    0.0731            &      0.8999       &
904:      0.0833         &    0.7956          &     0.0938            &
905: 0.6517      \\
906: $n=1000$&  STD   &   0.0648      &    0.0224            &      0.0400       &
907:      0.0346         &    0.1065          &     0.0412            &
908: 0.1634      \\
909:  & bias       &      -        &   -0.0078            &     -0.0079       &
910:     -0.0154         &   -0.0249          &    -0.0263            &
911: 0.0307      \\
912:   &RMSE       &      -        &    0.0234            &      0.0403       &
913:      0.0384         &    0.1191          &     0.0487            &
914: 0.1660      \\ \hline
915: \end{tabular}\\[0.5ex]
916: \end{center}
917: \end{table}
918: 
919: Since both the  means and the standard deviations $D(\hat{\bf A},{\bf
920: A})$ are very small, the estimation for  ${\bf A}$ is accurate.
921: The coefficients in each CUC models were also estimated accurately.
922: The errors in estimation decrease as the sample size increases
923: from 500 to 1000.
924: 
925: Since biases reported in Table~1 are always negative; see also
926: Figure~1. This indicates that the coefficients in the GARCH(1, 1)
927: models for CUCs were slightly underestimated. Also note that the
928: estimation errors decrease when the volatility persistence
929: (measured by $\alpha_i + \beta_i$) increases; see the upper panel
930: of Figure~1 for the estimation with the sample size 1000. To make
931: a comparison, the estimation errors of the GARCH coefficients when
932: the true ${\bf A}$ is used are plotted in the lower panel.  The
933: differences are small.
934: 
935: 
936: \section{Real data examples}
937: 
938: In this section we illustrate the proposed method
939: with two real data sets.
940: 
941: The first data set, denoted as SCI, consists of the 2275
942: daily log returns (in percentages) of S\&P 500 index, stock price
943: of Cisco System and stock price of Intel Corporation  in  2
944: January 1991 --- 31 December 1999. This data set has been analyzed
945: in Tsay~(2001). Figure 2 depicts the time series plots of the
946: three series.  Descriptive statistics are listed in Table 2.
947: Obviously, the unconditional distribution of all of these series
948: exhibit excessive kurtosis; indicating significant departure from
949: normal distributions.
950: 
951: The Ljung-Box $Q$ statistics suggest some plausible autocorrelation
952: in these series. But this may be due to the
953: heteroscedasticity. Hence we compute the $p$-values of these $Q$ tests
954: based on a bootstrap procedure: for each of the mean-deleted
955: component return series, we first fit a univariate
956: GARCH(1,1) model
957: \begin{equation}\nonumber
958: Y_t=\sigma_t \epsilon_t, \hspace{1cm} \sigma_t^2=\alpha_0
959: + \alpha_1 Y_{t-1}^2 +\beta_1 \sigma_{t-1}^2,
960: \end{equation}
961: and denote the estimated
962: parameters as $\hat{\alpha}_0, \hat{\alpha}_1, \hat{\beta}_1$,
963: respectively, and the standardized residuals as
964: $\hat{\epsilon}_t$. Draw $ \epsilon_t^{\ast}$
965: randomly with replacement from $\{\hat\epsilon_t,\; t=1,\cdots,n\}$ and
966: draw $Y_t^{\ast}$
967: from
968: \begin{equation}\nonumber
969: Y_t^{\ast}=\sigma_t \epsilon_t^{\ast}, \hspace{1cm}
970: \sigma_t^2=\hat{\alpha}_0 + \hat{\alpha}_1 Y^{\ast 2}_{t-1}
971: +\hat{\beta}_1 \sigma_{t-1}^{\ast 2}.
972: \end{equation}
973: Let $Q^{\ast}$ be a $Q$-statistic based on $Y_t^{\ast}$.
974: The $p$-value of $Q$ is now
975: estimated by the relative frequency of the occurrence of the event
976: that $Q^{\ast}$ is great than
977: $Q$ in a repeated bootstrap sampling for 1000 times.
978: In Table 2,  those $p$-values are listed in parentheses
979: below the values of the corresponding $Q$ statistics.
980: Based on those $p$-values, there is no significant evidence
981: for the existence of autocorrelation in all the three
982: component series.
983: Accordingly there is no need to fit a VAR model for the
984: conditional mean  for this data set.
985: 
986: 
987: Let $\bY_t$ be the mean-deleted returns of SCI.
988: Let $\bSigma =
989: {\bf P}{\bf \Lambda}{\bf P}^{\tau}$ be the sample covariance
990: matrix of $\bY_t$, where ${\bf P P}^\tau = \bI_3$ and $\bLambda$
991: is diagonal. Let ${\bf X}_t={\bf \Lambda}^{-\frac{1}{2}}{\bf
992: P}^{\tau} {\bf Y}_t$. Then we may regard the (unconditional)
993: covariance matrix of $\bX_t$ is $\bI_3$.
994: 
995: 
996: \begin{table}[htb]
997: \begin{center}
998: \caption[Table 2]{Summary Statistics of the Two Real Data Sets }
999: \begin{tabular}{c | c c c |c c c c c }\hline
1000:                &     S$\&$P 500     &    Cisco          &  Intel            &    HS             &       JN         &   SH              &       ST          &    TW              \\[0.5ex]\hline
1001: N              &       2275         &    2275           &  2275             &   1349            &    1349          &  1349             &      1349         &    1349            \\
1002: Mean           &       0.0656       &   0.2567          &  0.1561           &   -0.0198         &   -0.0477        &  0.0178           &     -0.0081       &  -0.0400           \\
1003: Stdev          &      0.8747        &  2.8540           & 2.4644            &   2.1822          &    1.7382        &  1.5401           &      1.8784       &   1.9863           \\
1004: Min            &      -7.1140       &  -22.1000         & -14.5810          &  -14.7346         &   -9.0145        &  -8.7277          &     -9.1535       &  -9.9360           \\
1005: Max            &       4.9900       &  15.5760          &  12.8500          &   20.2083         &    8.8876        &  8.8491           &     19.5559       &   9.7871           \\
1006: Skewness       &     -0.3600        & -0.3963           &  -0.2353          &    0.6419         &    0.1375        &  0.1861           &      0.9114       &   0.1345           \\
1007: Kurtosis       &      9.0469        &  6.7229           &  5.4701           &   14.3999         &    5.0891        &  8.4310           &     15.2063       &   5.4082           \\ \hline
1008: $Q(10)$        &  22.8322           & 25.3861           & 6.8567            &   32.2251         &    8.8471        & 12.9372           &     28.6943       &  16.9723           \\
1009:                &  \fsize (0.2440)   & \fsize (0.0870)   &  \fsize (0.8180)  & \fsize (0.1760)   &  \fsize (0.7540) &  \fsize (0.7770)  &  \fsize (0.2180)  &   \fsize (0.2540)  \\
1010: $Q(20)$        &  44.2898           & 33.9490           & 30.3427           &   46.1651         &    19.1511       & 26.9255           &     40.7220       &  28.4664           \\
1011:                &  \fsize (0.2300)   & \fsize (0.2500)   & \fsize (0.1170)   & \fsize (0.2810)   &  \fsize (0.7200) & \fsize (0.7310)   &    \fsize (0.2870)&  \fsize (0.3290)   \\ \hline
1012: \end{tabular}\\[0.5ex]
1013: \end{center}
1014: \begin{singlespace}
1015: \emph{Note:} {\sl  $Q(k)$ is referred to the Ljung-Box portmanteau test  statistics.
1016: Figures in parentheses are their corresponding p-values based on 1000 bootstrap
1017: replications.  }
1018: \end{singlespace}
1019: \end{table}
1020: 
1021: Based on data $\bX_t$, an estimator $\wh \bA $ was obtained with
1022: $\Psi_n(\hat{\bf A})= 0.1732$. Consequently a GARCH(1,1) model was
1023: fitted for each CUC. The estimated coefficients are listed
1024: in Table~3 which shows
1025: that the volatility of the first and third CUCs is highly persistent
1026: as $\hat{\alpha}_1+\hat{\beta}_1=0.9925$ and
1027: $\hat{\alpha}_3+\hat{\beta}_3=0.9611$.
1028: (One may fit the first CUC with an IGARCH model.)
1029: On the other hand, the volatility of the second CUC is less persistent as
1030: $\hat{\alpha}_2+\hat{\beta}_2=0.80$.
1031: 
1032: We applied the bootstrap procedure (with 500 replications) described
1033: in section~2.4 to test the existence of the CUCs. The $p$-value
1034: is 0.60, indicating that there is no strong evidence against the
1035: hypothesis of the existence of CUCs.
1036: The $(1-\alpha)$  bootstrap
1037: confidence set for the transformation
1038: matrix ${\bf A}$  is
1039: $\{ \bA | D(\hat{\bf A}, {\bf A})\le c_{\alpha}, \;
1040: {\bf A}^{\tau}{\bf A}={\bf I}_3 \}$ with $c_{\alpha}= 0.1718$ for
1041: $\alpha = 0.05$, and 0.1368 for $\alpha = 0.1$.
1042: Since $D(\hat{\bf A}, \bI_3) = 0.2593$,
1043: $\bI_3$ is not contained in the confidence sets. This indicates that the
1044: principal components cannot be taken as the CUCs.
1045: The confidence intervals for the parameters for each CUC-GARCH(1,1)
1046: models are listed in Table~3.
1047: The length of the confidence intervals increase as
1048: the volatility persistent measured by $\wh \alpha_i + \wh \beta_j$
1049: decreases. This is  consistent with the finding from the simulation
1050: study reported  in section 3.
1051: 
1052: Based on the fitted conditional variances $\wh \sigma_{ti}^2$ for the CUCs,
1053: the conditional variance matrix for the original series $\bY_t$ is equal to
1054: \[
1055: \hat{\bf H}_t={\bf W} \diag\{\wh\sigma^2_{t1}, \wh\sigma^2_{t2},
1056: \wh\sigma^2_{t3}\} {\bf W}^{\tau},
1057: \]
1058: where  ${\bf W}={\bf P}{\bf \Lambda}^{\frac{1}{2}}{\wh\bA}$.
1059: Since the volatility processes of the first and third CUC are highly
1060:  persistent, they can be modelled with Integrated GARCH models. If so,  the volatility processes for original series and their covariance processes are virtually
1061: modelled by  mixtures of IGARCH models and mean-reverting GARCH models,
1062: which is similar to the Component GARCH model used in Ding and
1063: Granger (1996) to capture the long memory properties for a univariate
1064: volatility process.
1065: 
1066: 
1067: 
1068: 
1069: 
1070: \begin{table}[htb]
1071: \begin{center}
1072: \caption[Table 4]{ Fitted CUC-GARCH(1,1) model for
1073: SCI }
1074: \begin{tabular}{c | c c c }\hline
1075:                &        Estimate                         &           95\% Confidence Set      &             90\% Confidence Set               \\[0.5ex]\hline
1076:   ${\bf a}_1$  & $(-0.5605,  -0.0018,  -0.8081)^{\tau}$  &                                    &                                                \\
1077:   ${\bf a}_2$  & $(0.5693,   0.7217,   -0.3939)^{\tau}$  &          $c_{0.05}= 0.1718$        &         $c_{0.10}=0.1368$                      \\
1078:   ${\bf a}_3$  & $(0.6015,   -0.6922,  -0.3989)^{\tau}$  &                                    &                                                \\  \hline
1079:   $\gamma_1$   &            0.0074                       &     (0.0042, 0.0592)               &          (0.0048, 0.0449)                      \\
1080:   $\alpha_1$   &            0.0519                       &     (0.0316, 0.0915)               &          (0.0350, 0.0812)                      \\
1081:   $\beta_1$    &            0.9406                       &     (0.8446, 0.9576)               &          (0.8740, 0.9548)                      \\   \hline
1082:   $\gamma_2$   &            0.1997                       &     (0.0460, 0.7138)               &          (0.0673, 0.5705)                      \\
1083:   $\alpha_2$   &            0.0432                       &     (0.0077, 0.1054)               &          (0.0107, 0.0926)                      \\
1084:   $\beta_2$    &            0.7572                       &     (0.2446, 0.9289)               &          (0.3600, 0.9069)                      \\   \hline
1085:   $\gamma_3$   &            0.0389                       &     (0.0200, 0.1042)               &          (0.0239, 0.0870)                      \\
1086:   $\alpha_3$   &            0.0884                       &     (0.0476, 0.1305)               &          (0.0517, 0.1236)                      \\
1087:   $\beta_3$    &            0.8727                       &     (0.7889, 0.9266)               &          (0.8051, 0.9140)                      \\ \hline
1088: \end{tabular}\\[0.5ex]
1089: \end{center}
1090: \end{table}
1091: 
1092: Figure 3 depicts the fitted volatility processes for each return
1093: series and Figure 4 displays the conditional correlations among
1094: the three components series. Note the volatilities of the S$\&$P
1095: 500  index has a much smaller scale than those of the two
1096: individual stocks.
1097: Increasing trends can be observed in all the three correlation processes
1098: over the last three years,  which may be  connected  with the
1099: high volatilities in all the return series over the same period. But on
1100: the other hand, the high volatility of Cisco prices in the middle period did
1101: not lead to a high correlation with the other two. This suggests a unilateral
1102: impact from the market to the single stock.
1103: 
1104: Figure 5 displays the fitted volatility processes for the three return
1105: series based on the orthogonal GARCH(1,1) model of Alexander (2001) and
1106: Ding and Engle (2001). Note that orthogonal GARCH model effectively
1107: treats the principal
1108: components as conditional uncorrelated variables, which may overlook important
1109: conditional dependence structure in the original data.
1110: Note that the time varying patterns in the three processes in Figure~5 are
1111: similar, which is different from Figure~3 of CUC-GARCH(1,1) fitted.
1112: Especially the orthogonal GARCH fitting artificially inflates
1113: the volatility of S\&P500 index in the middle period; see the original
1114: time plot of the series in Figure~2.
1115:  The inflation is due to treating the conditional correlated principal
1116: components as CUCs. As we stated above, the identity matrix is indeed
1117: not included in the confidence set for~$\bA$.
1118: 
1119: 
1120: \askip
1121: 
1122: Our second data set consists of the daily close returns of five Asian
1123: stock  indices, namely,  Hang Seng index of Hong Kong (HS), Japan Nikkei
1124: 225 index (JN),
1125: Shanghai Composite index of China (SC), Straits Time index of Singapore
1126: (ST) and Taiwan Weighted index (TW) in the period of 1 August 1997 ---
1127: 30 December 2003.  Adjustments are also made to
1128: account for the differences in the holidays of the five markets.
1129: The five return series are plotted in Figure~6, and the descriptive
1130: statistics are listed in Table~2.
1131: Most of the sample means of these returns are
1132: negative, except the mean of SC. Different from the three series
1133: in SCI, all
1134: five series are right-skewed over this specific
1135: period. The bootstrap $p$-values for the $Q$ statistics are obtained in the
1136: same way as before; indicating no significant
1137: autocorrelation in all the five series.
1138: 
1139: We fitted a CUC-extended GARCH(1,1) to the mean-deleted return series.
1140: The lagged valued from the other CUCs were selected using BIC together
1141: with a forward searching; see section~2.3.3.
1142: The fitted extended GARCH(1,1) models, based on quasi-MLE with Gaussian
1143: likelihood, for the five
1144: CUCs are reported in Table~4. According to the fitted models, the first
1145: CUC is causal in variance to the fifth CUC, the second CUC is causal in
1146: variance to the first
1147: and the third CUCs, and the fifth CUC is causal in variance to the first CUC.
1148: On the other hand, no additional variables were selected in the models
1149: for the second and fourth CUCs.
1150: 
1151: Figure~7 displays the fitted volatility processes for the five
1152: original stock returns. As expected, the most volatile waves are
1153: observed at the early of 1998 with the onset of the Asian
1154: financial crisis, which are especially predominant in Hong Kong
1155: and Singapore markets. While the shock is still big, the impact of
1156: the crisis on Japan and Taiwan markets is less drastic.
1157: Furthermore, the effect to Shanghai market is on a much smaller
1158: scale. In Figure 8, we present the fitted conditional correlation
1159: between Hong Kong and the other four markets. Obviously, the most
1160: correlated period is in accord with the epidemic of Asian
1161: financial crisis. After that, the correlations between Hong Kong
1162: and Singapore almost remain at a constant level except two
1163: downslides in the middle of 1999 and 2002, respectively. Likewise,
1164: the correlations between Hong Kong and Taiwan are almost at a
1165: constant level, although a little smaller than
1166:  that with Singapore market.  A upward trend can be seen in the
1167: correlation between Hong Kong and Japan markets in the last few years,
1168: which suggests that these
1169:   two markets  were  becoming more closely integrated.  On the contrary,
1170: the correlations between Hong Kong and Shanghai markets seems to have a
1171: downward to zero trend in
1172:   the last few years. The implications of these observations  to
1173: international diversification deserve a further investigation.
1174: 
1175: 
1176: \begin{table}[htb]
1177: \begin{center}
1178: \caption[Table 4]{Extended GARCH(1,1) for CUCs of Asian
1179: Market Data }
1180: \begin{tabular}{c | c | c | l |c  }\hline
1181:      $j$       &   $j_i$     &   $ r$ &        \multicolumn{1}{c|}{                                  $ \sigma_{t,j}  $       }                  &       BIC       \\[0.5ex]\hline
1182:        1       &   5, 2      &    2   &   $\sigma_{t,1}^2=0.0271+0.8609\sigma_{t-1,1}^2+0.0405Z_{t-1,1}^2+0.0637Z_{t-1,5}^2+0.0117Z_{t-1,2}^2 $&    $3622$       \\
1183:        2       &             &    0   &   $\sigma_{t,2}^2=0.0521+0.8004\sigma_{t-1,2}^2+0.1475Z_{t-1,2}^2 $                                    &    $3602$       \\
1184:        3       &    2        &    1   &   $\sigma_{t,3}^2=0.0077+0.9301\sigma_{t-1,3}^2+0.0526Z_{t-1,3}^2+0.0098Z_{t-1,2}^2$                   &    $3731$       \\
1185:        4       &             &    0   &   $\sigma_{t,4}^2=0.0704+0.8539\sigma_{t-1,4}^2+0.0757Z_{t-1,4}^2 $                                    &    $3780$       \\
1186:        5       &    1        &    1   &   $\sigma_{t,5}^2=0.0122+0.8227\sigma_{t-1,5}^2+0.1530Z_{t-1,5}^2+0.0261Z_{t-1,1}^2$                   &    $2534$       \\  \hline
1187:  \end{tabular}\\[0.5ex]
1188: \end{center}
1189: \end{table}
1190: 
1191: Finally we compared the fitting based on our CUC-based GARCH(1,1) with
1192: the orthogonal GARCH(1,1) models and Engle's dynamic conditional correction
1193: (DCC) model (\ref{a4}) and
1194: (\ref{a5}) in terms of a goodness-of-fit tests based on the Ljung-Box statistic
1195: (Tse and Tsui 1999). Note the DCC-model for each component of $\bY_t$
1196: reduced to the standard univariate GARCH(1,1) fitting.
1197: We define the standardized residual for the
1198: $i$-th series as
1199: $
1200: \hat{u}_{ti}=Y_{ti}/\hat{\sigma}_{t,ii}^{1/2},
1201: $
1202: where $\hat{\sigma}_{t,ii}$ is the $(i,i)$-th element of the fitted
1203: conditional variance of $\bY_{t}$. Define
1204: \[
1205: C_{t,ij}=\left\{ \begin{array} {l r} \hat{u}_{ti}^2-1 & i= j \\
1206: \hat{u}_{ti}\hat{u}_{tj}-\hat{\rho}_{t,ij}  & i\ne j, \end{array} \right.
1207: \]
1208: where
1209: $\hat{\rho}_{t,ij}=\hat{\sigma}_{t,ij}/(\hat{\sigma}_{t,ii}
1210: \hat{\sigma}_{t,jj})^{1/2}$
1211: is the estimated conditional correlation between $Y_{ti}$ and $Y_{tj}$.
1212: If the model
1213: is correctly specified, there is no autocorrelation in $\{ C_{t,ij}, t\ge 1\}$
1214: for any fixed $i, j$.
1215: Put
1216: \[
1217: Q(ij, M)=n \sum_{k=1}^M r_{ij,k}^2,
1218: \]
1219: where $r_{ij,k}$ is the lag $k$ sample autocorrelation of
1220: $C_{t,ij}$. It is intuitively clear that the large values of $Q(ij, M)$ indicate
1221: the lack of fit for the conditional correlation between the $i$-th and
1222: $j$-th components $\bY_t$ for $i\ne j$, and the lack of fit for the
1223: conditional variance of the $i$-th component for $i=j$.  Although the
1224: distribution theory of $Q(ij,M)$ is
1225: unknown, empirical evidence suggests that $\chi^2_M$ provides a
1226: reasonable reference in practice; see Tse and Tsui (1999).
1227: 
1228: Table~5 lists the values of the $Q$-statistics with $M=10$. The
1229: significant levels were gauged  according to the
1230: $\chi^2_{10}$-distribution. The advantage  of using the CUC-GARCH
1231: model over the Orthogonal GARCH model is obvious as the $Q$-values
1232: for the former tend to be smaller, or significantly smaller, than
1233: those for the latter. Furthermore, all the $Q$ values for the
1234: fitted CUC-GARCH models are insignificant at the level of 10\%,
1235: while the test rejects some Orthogonal GARCH fittings at the
1236: significance level 1\%. For example, the $p$-values for testing
1237: the correlations between S\&P~500 and Cisco stock,  and S\&P~500
1238: and Intel stock is less than 1\%; indicating significant
1239: autocorrelation. This may explain the incomprehensible jumps in
1240: the fitted volatility for S\&P 500 by orthogonal GARCH model in
1241: Figure~5. The same phenomena may also be observed in the fitting
1242: for the second data set. The  orthogonal GARCH model failed to
1243: provide adequate fittings for Hang Seng index (HS), Singapore
1244: Straits Time index (ST) and Taiwan Weighted index (TW), as
1245: indicated by  the large $Q$-values; see Table~5.
1246: 
1247: 
1248: Overall the DCC model provide a competitive performance
1249: to the CUC model for the Asian Markets data. This is may due
1250: to a certain degree of homogeneity
1251: among the five Asian market indices.
1252: For SCI consisting of one market index and two stock prices,
1253: the gain of using CUC over DCC is more pronounced. First, the DCC-model
1254:  seems to fail to catch the dynamic correlation
1255: between the returns of the S\&P 500 index and the Cisco stock price.
1256: Furthermore, although $Q$-value for the CUC-model for S\&P 500 is
1257: marginally larger than that of the DCC model,
1258: the $Q$-values for the CUC-models for both Intel and Cisco prices
1259: are substantially smaller than those for the DCC models; suggesting
1260: an improvement for the modelling  volatility dynamics for the Intel or
1261: the Cisco price by incorporating the information from other series.
1262: 
1263: 
1264: 
1265: 
1266: 
1267: The $Q$-tests with different values of $M$ lead to similar pattern
1268: as Table~5, which, therefore, are omitted to save the space.
1269: 
1270: 
1271: \begin{landscape}
1272: \begin{table}[tabh]
1273: \begin{center}
1274: \caption[Table 5]{Specification test ---  $Q(10)$ for cross products of standardized residuals }
1275: \begin{tabular}{@{\hspace{0.6cm}}c @{\hspace{0.6cm}} | @{\hspace{0.5cm}} c @{\hspace{0.8cm}} c @{\hspace{0.8cm}} c @{\hspace{0.5cm}} |  c @{\hspace{1cm}} c @{\hspace{1cm}}c @{\hspace{1cm}}c  @{\hspace{0.6cm}} }\hline
1276:                &             \multicolumn{3}{c|} { SCI   data}      &                     \multicolumn{4}{c} {Asian Market Data}                   \\ \hline
1277:    $i,j$       &    O-GARCH       &   DCC          &   CUC-GARCH    &   \hspace{0.5cm} O-GARCH      &    DCC         &    CUC-GARCH    &   CUC-Ex GARCH      \\[0.5ex]\hline
1278:      1         &  $59.9140^{***}$ &   5.9498       &  6.2050        & \hspace{0.5cm}$56.7580^{***}$ &    6.0285      &    11.4480      &   8.6961                  \\
1279:      2         &   10.5100        &   9.0587       &  8.0542        & \hspace{0.5cm}12.3540         &    7.8517      &    8.6713       &   8.7751                  \\
1280:      3         &   2.6192         &   6.4293       &  2.2397        & \hspace{0.5cm} 8.5368         &    9.2749      &    8.5301       &   8.5265                  \\
1281:      4         &                  &                &                & \hspace{0.5cm}$18.6100^{**}$  &    2.6610      &    4.0512       &  3.7954                   \\
1282:      5         &                  &                &                & \hspace{0.5cm}$18.0610^{*}$   &    7.4710      &    11.7960      &  13.5150                  \\
1283:      1,2       &  $51.8060^{***}$ &  10.4887       &  10.9090       & \hspace{0.5cm} 7.1025         &    7.0622      &    4.6433       &   4.2671                  \\
1284:      1,3       &  $77.5140^{***}$ & $20.6745^{**}$ &  10.5170       & \hspace{0.5cm} 3.8940         &    4.6465      &    3.4987       &   3.5564                  \\
1285:      1,4       &                  &                &                & \hspace{0.5cm}$17.2180^{*}$   &    4.7943      &    6.2915       &   5.8084                  \\
1286:      1,5       &                  &                &                & \hspace{0.5cm} 9.2396         &    6.1648      &    5.6669       &   6.3143                  \\
1287:      2,3       &   5.9453         &   7.0617       &  9.6275        & \hspace{0.5cm} 9.6031         &    10.1762     &    9.6444       &   9.5912                  \\
1288:      2,4       &                  &                &                & \hspace{0.5cm} 6.3708         &    7.7241      &    3.4542       &   3.2648                  \\
1289:      2,5       &                  &                &                & \hspace{0.5cm} 6.8629         &    5.8438      &    6.1856       &   6.9089                  \\
1290:      3,4       &                  &                &                & \hspace{0.5cm} 11.9120        &    8.0303      &    7.3119       &   5.8486                  \\
1291:      3,5       &                  &                &                & \hspace{0.5cm} 2.2256         &    2.1565      &    1.5721       &   1.6857                  \\
1292:      4,5       &                  &                &                & \hspace{0.5cm} 5.4389         &    4.7838      &    3.0312       &   3.1083                  \\  \hline
1293:  \end{tabular}\\[0.5ex]
1294: \end{center}
1295: \begin{singlespace}
1296: \emph{Note:} {\sl 1)  ***, **, * indicate that the corresponding
1297: test is significant at the level 0.01, 0.05 and 0.1, respectively.
1298: 2)  $i, j$ in the left column corresponds to to the orders of component
1299: series in each data sets. For example, ``1,2'' stands for the cross
1300: product of the standardized residuals of S\&P 500 and Cisco for the SCI,
1301: and for HS and JN for the Asian market data set.}
1302: \end{singlespace}
1303: \end{table}
1304: \end{landscape}
1305: 
1306: 
1307: 
1308: 
1309:  \setcounter{equation}{0}
1310:  \renewcommand{\theequation}{5.\arabic{equation}}
1311: 
1312: 
1313: \section*{Appendix A --- Proof of Theorem 1}
1314: 
1315: We introduce some notation first.
1316: Let $$\bC_{n, k}(B) = (n-k)^{-1} \sum_{t=k+1}^n \bX_t \bX_t^\tau I
1317: (\bX_{t-k} \in B), \quad \quad \bC_k (B) = E\{ \bX_t \bX_t^\tau I (\bX_{t-k} \in
1318: B)\}.$$
1319: The lemma below shows that both $\Psi(\cdot)$ and $\Psi_n(\cdot)$ are
1320: Lipschitz continuous on $\calH_D$ with $D$-distance, where $\calH_D$ is
1321: the quotient space; see Remark 2.
1322: 
1323: \askip
1324: 
1325: \noindent
1326: {\bf Lemma 1}. For any $\bU, \bV \in \calH_D$, it holds that
1327:  $$|\Psi(\bU) - \Psi(\bV) | \le c \; \tr E (\bX_t \bX_t^T) \, \{ D(\bU, \bV) \}^{1/2},$$ and
1328: $$
1329: |\Psi_n(\bU) - \Psi_n(\bV) | \le c \; \tr (n^{-1} \sum_{i=1}^n
1330: \bX_t \bX_t^T ) \, \{ D(\bU, \bV) \}^{1/2}$$ almost surely, where
1331: $c>0$ is a constant and $\tr(\bA)$ is the trace of a matrix $\bA$.
1332: 
1333: \askip
1334: 
1335: \noindent {\bf Proof}. We only prove the lemma for $\Psi(\cdot)$.
1336: The result for $\Psi_n(\cdot)$ may be shown in the same manner.
1337: Let $\bU=(\bu_1, \cdots, \bu_d)^\tau$, $\bV=(\bv_1, \cdots,
1338: \bv_d)^\tau$, $u_{ijk}(B) = E\{ \bu_i^\tau \bC_k(B) \bu_j\}$ and
1339: $v_{ijk}(B) = E\{ \bv_i^\tau \bC_k(B) \bv_j\}$. We assume that the
1340: orders and the directions of $\bu_i$ and $\bv_j$ are arranged such
1341: that $\bu_i ^\tau \bv_i\in [0,1]$ for all $i$, and
1342: \begin{equation} \label{p1}
1343: D(\bU, \bV) = 1 - {1\over d} \sum_{i=1}^d \bu_i ^\tau \bv_i
1344: = {1\over d} \sum_{i=1}^d(1 - \bu_i ^\tau \bv_i).
1345: \end{equation}
1346: See (\ref{b7}).
1347: Put the spectral decomposition for $\bC_k(B)$ as
1348: $$\bC_k(B) = \sum_{\ell=1}^d \mu_{\ell}(B,k) \bgamma_\ell
1349: \bgamma_\ell^\tau,$$ where $\mu_1(B,k) \ge \cdots \ge
1350: \mu_d(B,k)\ge 0$ are the eigenvalues of $\bC_k(B)$, and
1351: $\bgamma_1, \cdots, \bgamma_d$ are their corresponding (orthonormal)
1352: eigenvectors. It is easy to see that $\mu_\ell(B,k)\le \mu_\ell$
1353: for all $k$ and $B$, where $\mu_1 \geq \cdots \geq \mu_d$ are the
1354: eigenvalues of the matrix $E \{ \bX_t \bX^\tau\}$.
1355: Consequently, by noticing that
1356: $|\bgamma_\ell^\tau \bu_j | \leq 1$ and $|\bv_i^\tau \bgamma_\ell |
1357: \leq 1$, we have
1358: \begin{eqnarray} \nonumber
1359: && | u_{ijk}(B) - v_{ijk}(B) | \; \le \; \sum_{\ell=1}^d
1360: \mu_\ell | \bu_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -
1361: \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bv_j|\\ \nonumber
1362: &\le & \sum_{\ell=1}^d
1363: \mu_\ell \{ | \bu_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -
1364:  \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j|
1365: + | \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -\bv_i^\tau
1366:    \bgamma_\ell \bgamma^\tau_\ell \bv_j|\}\\ \nonumber
1367: &\le &
1368: \sum_{\ell=1}^d \mu_\ell \{ | ( \bu_i-\bv_i)^\tau \bgamma_\ell| +
1369: |\bgamma^\tau_\ell (\bu_j-\bv_j)|\}
1370: \end{eqnarray}
1371: By using the Cauchy-Schwartz's inequality, the above inequality is
1372: furthered bounded by
1373: \begin{eqnarray}
1374: & & \sum_{\ell=1}^d \mu_\ell  \{ ||\bu_i-\bv_i|| +
1375: ||\bu_j-\bv_j||\} \nonumber \\ \label{p2} &=& \sqrt{2} \{( 1 -
1376: \bu_i^\tau \bv_i)^{1/2} + (1 - \bu_j^\tau \bv_j)^{1/2}\}
1377: \sum_{\ell=1}^d \mu_\ell.
1378: \end{eqnarray}
1379: 
1380: Note that for $x\ne 0$, it holds that
1381: \begin{equation} \label{p3}
1382: |x+y| - |x| = y\, \sgn(x) + 2(x+y)\{ I(-y<x<0) - I(0<x<-y) \}.
1383: \end{equation}
1384: Hence,
1385: \begin{eqnarray}
1386: &&
1387: \Psi(\bU) \; = \;
1388: \sum_{1\le i < j \le d}
1389: \sup_{1\le k \le k_0, \, B\in \calB} \big[
1390: |v_{ijk}(B)|+|v_{ijk}(B) + \{u_{ijk}(B)-v_{ijk}(B)\}|
1391: - | v_{ijk}(B)| \big] \nonumber \\
1392: &=& \sum_{1\le i < j \le d} \sup_{1\le k \le k_0, \, B\in \calB}
1393: \big[ | v_{ijk}(B)| + \{ u_{ijk}(B)-v_{ijk}(B)\}\sgn\{
1394: v_{ijk}(B)\}  \nonumber \\
1395: & & + \; 2 u_{ijk}(B) \{I (B_1) - I( B_2) \}\big], \label{p4}
1396: \end{eqnarray}
1397: where
1398: \[
1399: B_{1} = \{ v_{ijk}(B)-u_{ijk}(B) < v_{ijk}(B) <0\}, \quad B_{2} =
1400: \{ 0< v_{ijk}(B)<v_{ijk}(B)-u_{ijk}(B)\} .
1401: \]
1402: On the set $B_1 \cup B_2$,
1403: \[
1404: |u_{ijk}(B)| \le |u_{ijk}(B)-v_{ijk}(B)|+|v_{ijk}(B)|\le 2
1405: |u_{ijk}(B)-v_{ijk}(B)|.
1406: \]
1407: This, combining with (\ref{p2}) and (\ref{p4}), implies that
1408: \begin{eqnarray}\nonumber
1409: & & |\Psi(\bU) - \Psi(\bV) | \\
1410: &  \le & \sum_{1\le i < j \le d} \sup_{1\le k \le k_0, \, B\in
1411: \calB} \big[\sqrt{2} \{ ( 1 - \bu_i^\tau \bv_i)^{1/2}  + (1 -
1412: \bu_j^\tau \bv_j)^{1/2} \} \sum_{\ell=1}^d \mu_\ell
1413:      +  2|u_{ijk}(B)| I_1 (B_1) \big] \nonumber \\
1414: & \le & \; 5 \sqrt{2} \sum_{1\le i < j \le d} \{( 1 - \bu_i^\tau
1415: \bv_i)^{1/2} + (1 - \bu_j^\tau \bv_j)^{1/2}\} \sum_{\ell=1}^d
1416: \mu_\ell \nonumber \\
1417: & \le &  10 \sqrt{2} d \sum_{\ell=1}^d \mu_\ell \sum_{i=1}^d (1-
1418: \bu_i^\tau \bv_i)^{1/2}. \label{p5}
1419: \end{eqnarray}
1420: 
1421: Now the lemma follows from (\ref{p5}) and the inequality
1422: \[
1423: \sum_{i=1}^d (1- \bu_i^\tau \bv_i) ^{1/2} \le d^{1/2} \Big\{
1424: \sum_{i=1}^d (1- \bu_i^\tau \bv_i) \Big\}^{1/2},
1425: \]
1426: see also (\ref{p1}). This completes the proof.
1427: 
1428: 
1429: 
1430: 
1431: 
1432: \askip
1433: 
1434: \noindent
1435: {\bf Proof of Theorem 1}.
1436: Since $\bC_{n, k} (B) - \bC_k(B)$ is a real symmetric matrix, it holds
1437: for any unit vectors $\ba$ and $\bb$ that
1438: \[
1439: |\ba^\tau \{ \bC_{n, k} (B) - \bC_k(B)\} \bb| \le || \bC_{n, k} (B) - \bC_k(B)||,
1440: \]
1441: where $|| \bC_{n, k} (B) - \bC_k(B)||$ denotes the sum of the absolute values of
1442: the eigenvalues of $\bC_{n, k} (B) - \bC_k(B)$. This may be obtained by
1443: using the spectral decomposition of $\bC_{n, k} (B) - \bC_k(B)$.
1444: Consequently it holds uniformly for any orthogonal matrix $\bA$ that
1445: \begin{eqnarray} \nonumber
1446: |\Psi_n(\bA) - \Psi(\bA)| & \leq & \sum_{1 \leq i < j \leq d}
1447:      \sup_{1 \leq k \leq k_0, B \in \calB} \left | \ba_i ^\tau \{ \bC_{n,
1448:      k} (B) - \bC_k(B) \} \ba_j  \right | \\
1449:      & \leq & {d(d-1)\over 2}
1450:      \sup_{1 \leq k \leq k_0, B \in \calB}  \| \bC_{n, k}
1451:      (B) - \bC_k(B) \| .
1452: \label{p6}
1453: \end{eqnarray}
1454: Note the $(i,j)$-th element of $\bC_{n, k} (B) - \bC_k(B)$
1455: is $${1 \over n-k}  \sum_{t=k+1}^n X_{ti}X_{tj}
1456:      I(\bX_{t-k} \in B) - E\{ X_{ti}X_{tj} I(\bX_{t-k} \in B)\},$$
1457: where $X_{ti}$ denotes the $i$-th element of $\bX_t$.
1458: Since $E | X_{ti}X_{tj}| < \infty$ and $\calB$ is a VC-class, the
1459: covering number for the set of functions $\{X_{ti}X_{tj}
1460: I(\bX_{t-k} \in B), B \in \calB\}$ has a polynomial rate of growth
1461: for any underlying probability measure (Theorem 2.6.4, van der
1462: Vaart and Wellner 1996).   Hence, it is a Glivenko-Cantelli class.
1463: It follows now from Theorem 3.4 of Yu (1994) that
1464: \[
1465:     \sup_{B \in \calB} \Big | {1 \over n-k} \sum_{t=k+1}^n
1466:     X_{ti}X_{tj} I(\bX_{t-k} \in B) - E\{ X_{ti}X_{tj} I(\bX_{t-k} \in B)\} \Big |
1467: \scon 0,
1468: \]
1469: Consequently, $$\sup_{B \in \calB} |\lambda_{\max}(B, k)| \scon 0,
1470: \quad
1471: \quad
1472: \sup_{B \in \calB}| \lambda_{\min}(B,k)| \scon 0,$$
1473: where $\lambda_{\max}(B, k)$ and $ \lambda_{\min}(B,k)$ denote, respectively,
1474: the maximum and the minimum eigenvalues of $\bC_{n, k} (B) - \bC_k(B)$.
1475: Thus
1476: $$
1477:     \sup_{B \in \calB}  \| \bC_{n, k} (B) - \bC_k(B) \| \scon 0,
1478: $$
1479: for $k=1, \cdots, k_0$. Now it follows from (\ref{p6}) that
1480: $$
1481:   \sup_{\bA \in \calH_D} | \Psi_n(\bA) - \Psi(\bA) | \scon 0.
1482: $$
1483: Combining this with Lemma~1 above and
1484: the continuity of the argmax mapping (Theorem 3.2.2 and Corollary
1485: 3.2.3, van der Vaart and Wellner, 1996),  it
1486: holds that $D(\hat{\bA}, \bA_0)
1487: \scon 0$.  This completes the proof of the first part of Theorem 1.
1488: 
1489: \askip
1490: 
1491: Under the additional condition $ E | X_{ti}X_{tj}|^{2p} < \infty$
1492: and the mixing condition given in Condition (A4),  Theorem 1 of
1493: Arcones and Yu (1994) implies that the set of functions
1494: $\{X_{ti}X_{tj} I(\bX_{t-k} \in B), B \in \calB\}$ is a Donsker
1495: class, and hence the process $ \{\bDelta_{n, k}(B), B \in \calB
1496: \}$ indexed by $B \in \calB$  converges weakly to a Gaussian
1497: process, where $\bDelta_{n, k} (B) = \sqrt{n} \{ \bC_{n,k}(B) -
1498: \bC_k(B) \}$. It follows from (\ref{p3}) that
1499: \begin{eqnarray} \nonumber
1500: \Psi_n(\bA) &=& \sum_{1 \leq i < j \leq d} \sup_{B \in \calB, 1 \leq
1501: k \leq k_0 } \big [ |\ba_i^T \bC_k(B) \ba_j| + n^{-1/2}
1502: \sgn\{\ba_i^\tau \bC_k(B) \ba_j\} \ba_i ^\tau \bDelta_{n, k}(B) \ba_j \\
1503: & & + \ba_i^\tau \bC_{n,k}(B) \ba_j \{ I(B_3) - I(B_4)\}  \big ]  \nonumber \\
1504: & = &  \Psi (\bA) + O_P(n^{-1/2}), \label{p7}
1505: \end{eqnarray}
1506: where
1507: \[
1508: B_3 = \{n^{-1/2} \ba_i ^\tau \bDelta_{n, k}(B) \ba_j  < \ba_i^\tau
1509: \bC_k(B) \ba_j<0\}, \quad
1510: B_4=  \{0< \ba_i^\tau \bC_k(B) \ba_j < n^{-1/2}
1511: \ba_i ^\tau \bDelta_{n, k}(B) \ba_j\}.
1512: \]
1513: The last equality in (\ref{p7}) follows from the fact that on
1514: $B_3 \cup B_4$,
1515: \[
1516: |\ba_i^\tau \bC_{n,k}(B) \ba_j| \le |\ba_i^\tau \bC_k(B) \ba_j|
1517: + n^{-1/2}|\ba_i ^\tau \bDelta_{n, k}(B) \ba_j|
1518: \le 2 n^{-1/2}|\ba_i ^\tau \bDelta_{n, k}(B) \ba_j|.
1519: \]
1520: It follows from (\ref{p7}) and condition (A5) that
1521: \begin{eqnarray}
1522: \Psi_n (\bA_0) - \Psi_n(\bA)   =  \Psi(\bA_0) -
1523:     \Psi(\bA) + O_P(n^{-1/2})
1524:  \leq  -a D(\bA_0, \bA) + O_P(n^{-1/2}) \label{p8}.
1525: \end{eqnarray}
1526: Now by substituting $\bA$ by $\hat{\bA}$, the left hand side of
1527: (\ref{p8}) must be non-negative by the definition of~$\hat{\bA}$.
1528: The right hand side of (\ref{p8}) would be negative unless
1529: $$
1530: D(\bA_0, \hat{\bA}) = O_P(n^{-1/2}).
1531: $$
1532: This completes the proof.
1533: 
1534: 
1535: \section*{Appendix B --- Proof of Theorem 2}
1536: 
1537: From the proof of Theorem 1, we have
1538: \begin{equation}
1539:       \sup_{\bA \in \calH} | \Psi_n (\bA) - \Psi(\bA) | \scon 0.
1540:       \label{p9}
1541: \end{equation}
1542: Since $\Psi(\bA)$ is continuous on the compact quotient space
1543: $\calH$, there exists a minimizer $\bA_0$.  It follows that
1544: \begin{eqnarray*}
1545:  \Psi(\hat{\bA}) - \Psi(\hat{\bB}) & = & \Psi(\bA_0) -
1546:  \Psi(\hat{\bB})  + \Psi(\hat{\bA})- \Psi(\bA_0) \\
1547:  & \leq & \Psi(\hat{\bA})- \Psi(\bA_0) \\
1548:  & = & \{\Psi(\hat{\bA}) - \Psi_n( \hat{\bA})\}
1549:         + \{\Psi_n( \hat{\bA}) - \Psi_n( \bA_0 )\}
1550:         + \{ \Psi_n( \bA_0 ) - \Psi( \bA_0)\}.
1551: \end{eqnarray*}
1552: Using the fact $\Psi_n(\hat{A}) - \Psi_n (\bA_0) \leq 0$, we
1553: conclude from (\ref{p9}) that
1554: $$
1555: \liminf \{ \Psi(\hat{\bA}) - \Psi(\hat{\bB})\} \leq 0.
1556: $$
1557: This completes the proof of Theorem 2.
1558: 
1559: 
1560: 
1561: 
1562: \section*{Appendix C --- Proof of Theorem 3}
1563: 
1564: 
1565: For each $j$, there are at most $r$ non-zero $\alpha_{jk}$. Since
1566: $\beta_j < 1$, it holds that
1567: \[
1568: \sigma_{tj}^2 = {\ga_j \over 1- \beta_j} +  \sum_{i=1}^d
1569: \alpha_{ji}\sum_{k=1}^\infty \beta_j^{k-1} Z_{t-k,i}^2.
1570: \]
1571: Now Theorem~2 follows from Lemma~2 below immediately by letting
1572: $Y_{tj} = X_{tj}^2$ and $\rho_{tj} = \sigma_{tj}^2$. Note that
1573: Lemma~2 may be proved in the similar manner to the proof of
1574: Theorem~1 of Giraitis at al~(2000); see also section~2.7.1 of Fan
1575: and Yao~(2003).
1576: 
1577: \bigskip
1578: 
1579: \noindent {\bf Lemma 2}. Consider a $d$-dimensional ARCH($\infty$)
1580: process $\bY_t = (Y_{t1}, \cdots, Y_{td})^\tau$ defined by
1581: \[
1582: Y_{tj} = \rho_{tj} \zeta_{tj}, \quad \quad \rho_{tj} = c_j +
1583: \sum_{i=1}^d \sum_{k=1}^\infty b_{jik} Y_{t-k, i}
1584: \]
1585: for $j=1, \cdots, d$, where $\{ \zeta_{tj} \}$ is a sequence of
1586: non-negative i.i.d. random variables with $E(\zeta_{tj}) =1$,
1587: $Y_{tj}\ge 0$, $c_j, b_{jik} \ge 0$. Furthermore, for each $j$,
1588: $b_{jik} \ne 0$ for at most $r (\ge 0)$ values of $k$. Then the
1589: above model admits a unique strictly stationary solution $\{ \bY_t
1590: \}$ with the finite mean
1591: \[
1592: E(\bY_t) = (\bI_d - \bB)^{-1} (c_1, \cdots, c_d)^\tau
1593: \]
1594: under the condition $
1595:   \max_{ 1\le j,\, i \le d}  b_{ji\,\cdot} < 1/r,
1596: $ where $b_{ji\,\cdot} = \sum_{k\ge 1} b_{jik}$, and $\bB$ is a
1597: $d\times d$ matrix with $b_{ji\,\cdot}$ as its $(j,i)$-th element.
1598: 
1599: 
1600: 
1601: 
1602: 
1603: \section*{References}
1604: \begin{description}
1605: \begin{singlespace}
1606: \item Alexander, C. (2001). Orthogonal GARCH. In {\sl Mastering
1607: Risk}. Financial Times-Prentice Hall: London; {\bf 2}, 21-38.
1608: 
1609: 
1610: \item Arcones, M.A. and Yu, B. (1994).  Central limit theorems for
1611:        empirical processes and U-processes of stationary mixing sequences.
1612:         {\em Jour. Theor. Probab.}, {bf 7}, 47--71.
1613: 
1614: 
1615: \item Back, A. and Weigend, A.S. (1997). A first application on
1616: independent component analysis to extracting structure from stock
1617: returns. {\sl International Journal of Neural Systems}, {\bf 8},473-484.
1618: \item
1619: Bauwens, L., Laurent, S. and Rombouts, J.V.K. (2003). Multivariate GARCH models:
1620:  a survey. {\sl A preprint}.
1621: \item
1622: Bollerslev, T. (1990).  Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model. {\sl Review of Economics and Statistics},
1623:  {\bf 72}, 498-505.
1624: \item
1625: Bollerslev, T.R., Engle, R. and Wooldridge, J. (1998). A capital asset pricing
1626: model with time varying covariances. {\sl Journal of Political Economy}, {\bf 96},
1627: 116-131.
1628: \item Chen, M. and An, H. (1998). A note on the stationarity and the existence of moments
1629:       of the GARCH models. {\sl Statistica Sinica}, {\bf 8}, 505-510.
1630: \item Chow, Y.S. and Teicher, H. (1997). {\sl Probability  Theory} (3rd
1631: edition). Springer, New York.
1632: \item Ding, Z. and Granger, C.W.J. (1996). Modeling volatility persistence of speculative returns:
1633:       A new approach. {\sl Journal of Econometrics}, {\bf 73}, 185-215.
1634: \item Ding, Z. and Engle, R. (2001). Large scale conditional covariance matrix modeling, estimation and testing.
1635:       {\sl  Working Paper}, {\bf FIN-01-029}, NYU Stern School of Business.
1636: \item Engle, R. (2002).Dynamic conditional correlation -- a simple class of multivariate
1637:       GARCH models. {\sl Journal of Business and Economic Statistics}, {\bf 20}, 339-350.
1638: \item
1639: Engle, R.F., Ito, T. and Lin, W.-L. (1990). Meteor shoers or
1640:                 heat waves? heteroskedastic intra-daily volatility in
1641:                 the foreign exchange market.
1642:                 {\sl Econometrica}, {\bf 58}, 525-542.
1643: \item Engle, R.F. and Kroner, K.F. (1995). Multivariate simultaneous generalised ARCH.
1644:       {\sl Econometric Theory }, {\bf 11}, 122-150.
1645: \item Engle, R.F., Ng, V.K. and Rothschild, M. (1990). Asset pricing with a factor ARCH covariance structure:
1646:       Empirical estimates for treasury bills. {\sl Journal of
1647: Econometrics}, {\bf 45}, 213-238.
1648: \item Engle, R.F. and Sheppard, K. (2001). Theoretical and empirical properties
1649: of dynamic conditional correlation multivariate GARCH. {\sl A preprint}.
1650: \item Fan, J. and Yao, Q. (2003). {\sl Nonlinear Time Series: Nonparametric and Parametric Methods}.
1651:       Springer, New York.
1652: \item Giraitis, L., Kokoszka, P., and Leipus, R. (2000). Stationary ARCH models: Dependence structure and
1653:       central limit theorem. {\sl Econometric Theory}, {\bf 16}, 3--22.
1654: \item Hall, P. and Yao, Q. (2003). Inference for ARCH and GARCH models. {\sl Econometrica},
1655:       {\bf 71}, 285-317.
1656: \item Harvey, A., Ruiz, E. and Shephard, N. (1994). Multivariate stochastic
1657: variance models. {\sl The Review of Economic Studies}, {\bf 61}, 247-264.
1658: \item Hyv\"arinen, A., Karhunen, J. and Oja, E. (2001). {\sl Independent Component
1659:       Analysis}. Wiley, New York.
1660: \item Jerez, M., Casals, J. and Sotoca, S. (2001). The likelihood of multivariate GARCH models is ill-conditioned. {\sl A preprint}.
1661: \item Kiviluoto, K. and Oja, E. (1998). Independent component analysis for parallel financial time series.
1662:       In {\sl Proc. Int. Conf. on Neural Information Processing (ICONIP'98)}, vol.2, pp.895-989, Tokyo.
1663: \item M${\breve {\rm a}}$l${\breve {\rm a}}$roiu, S., Kiviluoto, K. and
1664: Oja, E. (2000). Time series prediction with independent component analysis. {\sl A
1665:        preprint}.
1666: \item Mikosch, T. and Straumann, D. (2004). Stable limits of martingale transforms with application to the
1667:       estimation of GARCH parameters. {\sl A preprint}.
1668: \item Peng, L. and Yao, Q. (2003). Least absolute deviations estimation for ARCH
1669:       and GARCH models. {\sl Biometrika}, {\bf 90}, 967-975.
1670: \item Penzer, J., Wang, M. and Yao, Q. (2004). Approximating volatilities by asymmetric power GARCH functions. {\sl A preprint}.
1671: \item Tsay, R. (2001). {\sl Analysis of Financial Time Series}. Wiley, New York.
1672: \item Tse, Y. K. and Tsui, A.K.C. (1999). A note on diagnosing multivariate conditional heteroscedasticity models.
1673:      {\sl Journal of Time Series Analysis}, {\bf 20}, 679-691.
1674: \item Vilenkin, N. (1968). Special functions and the theory of group representation, translations of
1675:       mathematical monographs. {\sl American Math. Soc.}, Providence, Rhode Island, 22.
1676: 
1677: \item van der Vaart, A.W. and  Wellner, J.A. (1996).  Weak Convergence and
1678:          Empirical Processes.  Springer, New York.
1679: 
1680: \item van der Weide, R. (2002). GO-GARCH: a multivariate generalized orthogonal GARCH model. {\sl Journal of
1681:       Applied Econometrics}, {\bf 17}, 549-564.
1682: 
1683: \item Wang, M. and Yao, Q. (2005). Modelling multivariate volatilities: an ad hoc
1684: approach. To appear in ``{\sl Contemporary Multivariate Analysis and
1685: Experimental Designs}'' J. Fan,  G. Li \& R. Li (edit.) World Scientific,
1686: Singapore.
1687: 
1688: \item Yu, B. (1994).  Rates of convergence for empirical processes
1689:       of stationary mixing sequences.   {\sl Ann. Statist.}, {\bf 22},
1690:      94-116.
1691: 
1692: \end{singlespace}
1693: \end{description}
1694: 
1695: 
1696: 
1697: \begin{figure}[hb]
1698: \centerline{\psfig{figure=CUCGARCH_fig1_7.ps}}
1699: \begin{singlespace}
1700: 
1701: \caption[Fig 1] {\sl Boxplots of the errors in estimation for
1702: CUC-GARCH(1,1) model (\ref{ex1}) with
1703: $\bA =\wh \bA$ estimated (upper panel) and the true $\bA$ (lower panel).
1704:  The sample size is $n=1000$.}
1705: \end{singlespace}
1706: \end{figure}
1707: 
1708: \newpage
1709: 
1710: \begin{figure}
1711: \centerline{\psfig{figure=CUCGARCH_fig5.ps}}
1712: \begin{singlespace}
1713: 
1714: \caption[Fig 2] {\sl Plots of daily log return of
1715: (a)  $S\&P$ 500 index, (b) Cisco Systems stock  and (c) Intel
1716: Corporation stock.  Time span is from January 2, 1991 to December 31, 1999 with 2275
1717: observations.}
1718: 
1719: \end{singlespace}
1720: \end{figure}
1721: 
1722: 
1723: \newpage
1724: 
1725: \begin{figure}
1726: \centerline{\psfig{figure=CUCGARCH_fig3.ps}}
1727: \begin{singlespace}
1728: 
1729: \caption[Fig 3] {\sl Fitted volatility processes based on  CUC-GARCH(1,1) model for daily log returns of
1730: (a)  $S\&P$ 500 index, (b) Cisco Systems stock  and (c) Intel
1731: Corporation stock. }
1732: 
1733: \end{singlespace}
1734: \end{figure}
1735: 
1736: \newpage
1737: 
1738: \begin{figure}
1739: \centerline{\psfig{figure=CUCGARCH_fig4.ps}}
1740: \begin{singlespace}
1741: 
1742: \caption[Fig 4] {\sl Fitted conditional correlations based on  CUC-GARCH(1,1) model
1743: for  daily log returns between (a) $S\&P$ 500 index and Cisco Systems
1744: stock, (b) $S\&P$ 500 index and Intel Corporation stock,  and (c)
1745: Cisco Systems stock and Intel Corporation stock.  }
1746: \end{singlespace}
1747: \end{figure}
1748: 
1749: \newpage
1750: 
1751: \begin{figure}
1752: \centerline{\psfig{figure=CUCGARCH_fig6.ps}}
1753: \begin{singlespace}
1754: 
1755: \caption[Fig 5] {\sl Fitted volatility processes based on
1756: Orthogonal-GARCH(1,1) model for daily log returns of (a)  $S\&P$
1757: 500 index, (b) Cisco Systems stock  and (c) Intel Corporation
1758: stock. }
1759: 
1760: \end{singlespace}
1761: \end{figure}
1762: 
1763: 
1764: \newpage
1765: 
1766: \begin{figure}
1767: \centerline{\psfig{figure=CUCGARCH_fig7.ps}}
1768: \begin{singlespace}
1769: 
1770: \caption[Fig 6] {\sl Plots of dividend adjusted daily log returns of
1771: (a)  Hang Seng index in Hong Kong, (b) Japan Nikkei 225 index, (c)
1772: Shanghai Composite index in China, (d) Singapore Straits Time index,
1773:  and (e) Taiwan Weighted index. Time span is from August 1, 1997 to December 30, 2003 with 1349 observations.}
1774: 
1775: \end{singlespace}
1776: \end{figure}
1777: 
1778: 
1779: \newpage
1780: 
1781: \begin{figure}
1782: \centerline{\psfig{figure=CUCGARCH_fig8.ps}}
1783: \begin{singlespace}
1784: 
1785: \caption[Fig 7] {\sl Fitted volatility processes based on CUC-Extended GARCH(1,1) model for daily log returns of
1786: (a)  Hang Seng index in Hong Kong, (b) Japan Nikkei 225 index, (c)
1787: Shanghai Composite index in China, (d) Singapore Straits Time index,
1788:  and (e) Taiwan Weighted index. }
1789: 
1790: \end{singlespace}
1791: \end{figure}
1792: 
1793: \newpage
1794: 
1795: \begin{figure}
1796: \centerline{\psfig{figure=CUCGARCH_fig9.ps}}
1797: \begin{singlespace}
1798: 
1799: \caption[Fig 7] {\sl Fitted conditional correlations between daily
1800: log-returns of Hang Seng index (HS) and (a) Japan Nikkei 225 index (JN),
1801: (b) Shanghai
1802: Composite index in China (SC), (c) Singapore Straits Time index (ST),
1803:  (d) Taiwan Weighted index (TW).}
1804: 
1805: \end{singlespace}
1806: \end{figure}
1807: 
1808: 
1809: 
1810: 
1811: 
1812: 
1813: 
1814: 
1815: 
1816: 
1817: \end{document}
1818: