0506:math0506027/cuc.tex

1: \documentclass[11pt]{article}

2: \usepackage{psfig,amsmath,amssymb,euscript}

3: %\usepackage{showkeys}

4: \usepackage{lscape}

5: \renewcommand{\baselinestretch}{1.6}

6: \oddsidemargin 0in

7: \evensidemargin 0in

8: %\topmargin -0.3in

9: \topmargin -0.7in

10: \textwidth 6.4in

11: \textheight 9.25in

12: \makeatletter

13: \newcommand{\fsize}{\footnotesize}

14: \newcommand{\bvxi}{\mbox{\boldmath$\xi$}}

15: \input{QYalias}

16:

17:

18: \begin{document}

19: \title{\bf Modelling Multivariate Volatilities via Conditionally

20: Uncorrelated Components\thanks{Partially supported by an EPSRC

21: research grant and by NSF grant DMS-0355179.}}

22: \author{

23: Jianqing Fan$^{1,2}$

24: \quad \quad Mingjin Wang$^{2,3}$

25:  \quad \quad Qiwei Yao$^{2,3}$\\[2ex]

26: $^1$ Benheim Center of Finance and \\

27: Department of Operations Research and Financial Engineering\\

28: Princeton University, Princeton, NJ 08544, USA\\[1ex]

29: $^2$Department of Statistics, London School of Economics, London, WC2A

30: 2AE, UK\\[1ex]

31: $^3$ Guanghua School of Management, Peking University, Beijing 100871, China}

32:

33: \date{}

34:

35:

36:

37: \maketitle

38:

39: \begin{abstract}

40: We propose to model multivariate volatility processes based on the

41: newly defined conditionally uncorrelated components (CUCs). This model

42: represents a parsimonious representation for matrix-valued processes.

43: It is flexible in the sense that we may fit each CUC with any

44: appropriate univariate volatility model. Computationally it splits

45: one high-dimensional optimization problem into several lower-dimensional

46: subproblems. Consistency for the estimated CUCs has been established.

47: A bootstrap test is proposed for testing the existence

48: of CUCs. The proposed methodology is illustrated  with both simulated and

49: real data sets.

50: \end{abstract}

51:

52: \noindent

53: {\sl Key words}:

54: dimension reduction,

55: extended GARCH(1,1),

56: financial returns,

57: multivariate volatility,

58: portfolio volatility,

59: time series.

60:

61: \newpage

62:

63: \section{Introduction}

64:

65: One of the most prolific areas of research in the financial

66: econometrics literature in last two decades is to model

67: time-varying volatility of financial returns. Many statistical

68: models, most designed for univariate data, have been proposed for

69: this purpose. From the practical point of view, there are at least

70: two incentives to model several financial returns jointly. First,

71: time-varying correlations among different securities are important

72: and useful information for portfolio optimization, asset pricing

73: and risk management. Secondly, modelling for single security may

74: be improved by incorporating the relevant information in other

75: securities. The quest for modelling multivariate processes, which

76: are often represented by conditional covariance matrices, has

77: motivated the attempts to extending univariate volatility models to

78: multivariate cases, aiming for practical and/or statistical

79: effectiveness. We list some of the endeavors below.

80:

81:

82: Let $\{ \bX_t \}$ be a vector-valued (return) time series with

83: \[

84: E(\bX_t | \calF_{t-1} ) = 0, \quad \quad

85: \var( \bX_t | \calF_{t-1} ) = \bSigma_t \equiv \big( \sigma_{t,ij} \big),

86: \]

87: where $\calF_t$ is the $\sigma$-algebra generated by $\{ \bX_t,

88: \bX_{t-1}, \cdots \}$, and $\bSigma_t$ is an

89: $\calF_{t-1}$-measurable $d\times d$ semi-positive definite

90: matrix. One of the most general multivariate GARCH($p,q$) model is

91: the BEKK representation (Engle and Kroner~1995)

92: \begin{eqnarray}

93: \label{a3}

94: \bSigma_t = \bC + \sum_{i=1}^p \sum_{j=1}^m \bA_{ij} \bX_{t-i} \bX_{t-i}^\tau

95: \bA_{ij}^\tau

96: + \sum_{i=1}^q \sum_{j=1}^m \bB_{ij} \bSigma_{t-i} \bB_{ij}^\tau,

97: \end{eqnarray}

98: where $ \bC, \bA_{ij}, \bB_{ij}$ are  $d\times d$ matrices, and

99: $\bC$ is positive definite (denoted as $\bC>0$).

100: Although the form of the above model is quite general especially when

101: $m$ is reasonably large (Proposition~2.2 of Engle and Kroner 1995), it

102: suffers from the problems of

103: overparametrization. Similar to multivariate ARMA models, not all

104: parameters in model (\ref{a3}) are necessarily

105: identifiable even when $m=1$.

106: Overparametrization will also lead to a flat likelihood function, making

107: statistical inference intrinsically difficult and computationally

108: troublesome. See, for example, Engle and Kroner~(1995), and Jerez, Casals

109: and Sotoca~(2001).

110:

111: To overcome the difficulties due to overparametrization, a dynamic

112: conditional correlation (DCC) model (Engle 2002, Engle and Sheppard~2001)

113: has been proposed. It is based on the decomposition

114: \begin{equation} \label{a4}

115: \bSigma_t = \bD_t \bR_t \bD_t,

116: \end{equation}

117:  where $\bD_t = \diag( \sigma_{t,11}^{1/2},

118: \cdots, \sigma_{t,dd}^{1/2} )$, $\sigma_{t,ii}$ is the conditional

119: variance of the $i$-th component of $\bX_t$, and $\bR_t \equiv

120: (\rho_{t,ij})$ is the conditional correlation

121: matrix. A simple way to facilitate such a model is to model each

122: $\sigma_{t,ii}$ with a univariate volatility model and to model

123: conditional correlation using

124: a rolling exponential smoothing as follows

125: \[

126: \rho_{t,ij} =  \sum_{k=1}^{t-1} \la^k \ve_{t-k,i} \ve_{t-k,j}

127: \Big/ \Big\{ \sum_{k=1}^{t-1} \la^k \ve_{t-k,i}^2

128: \; \sum_{k=1}^{t-1} \la^k \ve_{t-k,j}^2 \Big\}^{1/2},

129: \]

130: where $\ve_{ti}= X_{ti}/\sigma_{t,ii}^{1/2}$. Even with such a

131: simple specification, estimation typically involves solving a

132: high-dimensional optimization problem as, for example, the

133: Gaussian likelihood function cannot be factorized into several

134: lower-dimensional functions. To overcome the computational

135: difficulty, Engle~(2002) proposes a two-step estimation procedure

136: as follows: first fit each $\sigma_{t,ii}$ in (\ref{a4}) with a

137: univariate GARCH(1,1) model using the observations on the $i$-th

138: component of $\bX_t$ only, and then  model the conditional

139: correlation matrix $\bR_t$ by a simple GARCH(1,1) form

140: \begin{equation}

141: \label{a5}

142: {\bR}_t={\bf S}(1-\theta_1-\theta_2)+\theta_1

143: ({\bve}_{t-1}{\bve}_{t-1}^{\prime})+\theta_2 {\bR}_{t-1},

144: \end{equation}

145: and $\bve_t$

146: is a $d\times 1$ vector of the

147: standardized residuals obtained in the separate GARCH(1,1) fittings for

148: the $d$ components of $\bX_t$,

149: and ${\bf S}$ is

150: the sample correlation matrix of $\bX_t$. Note there are only two unknown parameters

151: $\theta_1, \theta_2$

152: in the dynamical correlation model (\ref{a5}), so it can be easily implemented

153: even for large or very large $d$. However it

154: may not provide adequate fitting when the components of $\bX_t$ exhibit

155: different dynamic correlation structures; see an example of three-dimensional

156: data set in section~4 below. Furthermore in modelling the volatility for

157: each component, no attempts are made to extract additional

158: information from other components.

159:

160:

161:

162: Alexander (2001) proposes an orthogonal GARCH model which fits

163: each principal component (PC) with a univariate GARCH model

164: separately, and treats all PCs as {\sl conditionally} uncorrelated

165: random variables. Since PCs are only unconditionally uncorrelated,

166: such a misspecification may lead to non-negligible errors in the

167: fitting; see, for example, Figure~5 and related discussions in

168: section~4 below.

169:

170: Other multivariate volatility models include, for

171: example, vectorized multivariate GARCH models of Bollerslev, Engle

172: and Wooldridge~(1988), constant conditional

173:  correlation

174: multivariate GARCH models of Bollerslev~(1990),

175: a multivariate stochastic volatility model of Harvey, Ruiz and

176: Shephard~(1994),

177: a generalized

178: orthogonal GARCH models of van der Weide~(2002),

179: an easy-to-fit ad hoc

180: approach of Wang and Yao~(2005); see also a survey in Bauwens, Laurent and

181: Rombouts~(2003) and the references within.

182:

183: While all the aforementioned models have their own merits,

184: each of them has one or more of the three drawbacks; (i)

185: overparametrization, (ii) computational complication, and (iii) too

186: simple to catch some

187: important dynamical structures.

188:

189: In this paper, we propose a new modelling methodology which

190: mitigates the above three drawbacks. The basic idea is to assume

191: that $\bX_t$ is a linear combination of a set of {\sl

192: conditionally uncorrelated components} (CUCs); see section~2.1

193: below. One fundamental  difference from the orthogonal GARCH model

194: is that we use CUCs, instead of PCs, which are genuinely

195: conditionally uncorrelated. The advantages of the new approach

196: include: (i)~the CUC decomposition leads to a parsimonious

197: representation for multivariate volatility (matrix-valued)

198: processes --- there is no model identification problems, (ii)~it has

199: the flexibility to model each CUC with any appropriate univariate

200: volatility models, (iii)~computationally it splits a

201: high-dimensional optimization problem into several

202: lower-dimensional subproblems, and (iv)~it allows the volatility

203: model for one CUC to depend on the lagged value of the other

204: CUCs.

205:

206: The idea of using CUCs is similar to the so-called the independent

207: component analysis (Hyv\"arinen, Karhunen and Oja 2001). However

208: instead of requiring all the component series are independent with

209: each other, we only impose a weaker condition that the component

210: series are conditionally uncorrelated; see (\ref{b1}) below. Of

211: course the existence of CUCs is also not always guaranteed. We

212: propose a bootstrap test to assess the feasibility of such an

213: approach. Our empirical experience shows that for a large number

214: of practical examples, there is no significant evidence to reject

215: the hypothesis that the CUCs exist.

216:

217: Literature on applying independent components  analysis to

218: financial and economic time series includes, for example, Back and

219: Weigend (1997), Kiviluoto and Oja (1998), M${\breve {\rm

220: a}}$l${\breve {\rm a}}$roiu, Kiviluoto and Oja (2000), and van der

221: Weide (2002). Although our basic idea is somehow similar to van

222: der Weide~(2002), our approach is completely different.

223:

224: The rest of the paper is organized as follows. Section~2 contains

225: a detailed description of the proposed new methodology and the

226: associated theoretical results. Simulation results are reported in

227: section~3. Illustrations with real data examples are presented in

228: section~4. Technical proofs are relegated in appendices.

229:

230:

231:

232:

233:

234:

235:

236: \section{Methodology}

237:

238: \subsection{Basic setting}

239:

240: To simplify the matter concerned, we may assume $\var(\bX_t) =

241: \bI_d$ --- the $d\times d$ identity matrix. In practice, this

242: amounts to replacing $\bX_t$ by $\bS^{-1/2}\bX_t$, where

243: $\bS$ is the sample covariance matrix of $\bX_t$.

244: We assume that each component of $\bX_t$ is a linear

245: combination of $d$ conditionally uncorrelated components (CUCs)

246: $Z_{t1}, \cdots, Z_{td}$ which satisfy the conditions $E(Z_{ti}|

247: \calF_{t-1} )=0$, Var$(Z_{ti}) =1$, and

248: \begin{equation} \label{b1}

249: E(Z_{ti}Z_{tj} | \calF_{t-1} ) = 0, \quad \mbox{for all } i\ne j.

250: \end{equation}

251: Put $\bZ_t = (Z_{t1}, \cdots, Z_{td})^\tau$.

252: The above setting implies that

253:  \begin{equation} \label{b2}

254:  \bX_t = \bA \bZ_t, \quad \bZ_t = \bA^\tau \bX_t,

255:  \end{equation}

256: for a constant matrix $\bA$. Furthermore, $ \var(\bZ_t) = \bA^\tau

257: \var(\bX_t) \bA = \bA^\tau \bA =\bI_d$. Hence

258: $\bA$ is a $d\times d$ orthogonal matrix with ${d\over 2}(d-1)$

259: free elements. Put

260: \begin{equation} \label{b3}

261: \var(\bZ_t|\calF_{t-1}) = \diag( \sigma_{t1}^2, \cdots, \sigma_{td}^2),

262: \end{equation}

263: i.e. $\sigma_{tj}^2 = \var(Z_{tj} | \calF_{t-1})$. It is easy to see

264: that once we

265: have specified $\sigma_{tj}^2$ -- the volatility of

266: the $j$-th CUC, for $j=1, \cdots, d$,

267: volatilities for any portfolios can be deduced accordingly. For

268: example, for any portfolios $\xi_t = \bb^\tau_1 \bX_t$ and $\eta_t

269: = \bb^\tau_2 \bX_t$ it holds that

270: \[

271: \var(\xi_t | \calF_{t-1}) = \sum_{j=1}^d b_{j1}^2

272: \, \sigma_{tj}^2, \quad \quad \quad

273: \cov(\xi_t, \eta_t | \calF_{t-1}) = \sum_{j=1}^d b_{j1} b_{j2}

274: \, \sigma_{tj}^2.

275: \]

276: where $(b_{1j}, \cdots, b_{dj}) = \bb_j^\tau \bA$ $(j=1, 2)$.

277: Hence, the CUC decomposition (\ref{b2}) facilitates a parsimonious

278: modelling for $d$-dimensional multivariate volatility process via

279: $d$ univariate volatility models. In this way, we reduce the

280: number of parameters involved substantially.

281:

282:

283: \subsection{Estimation of CUCs}

284:

285: \subsubsection{Estimation procedure}

286:

287: By (\ref{b2}), $Z_{tj} = \ba_j^\tau \bX_t$, and $ \ba_1, \cdots,

288: \ba_d$ are $d$ orthogonal vectors. The goal is to estimate the

289: orthogonal matrix $ \bA =( \ba_1, \cdots, \ba_d) $. Note the

290: order of $\ba_1, \cdots, \ba_d$ is arbitrary, and cannot be

291: identified. Furthermore, $\ba_j$ can be replaced by $-\ba_j$.

292:

293: Condition (\ref{b1}) is equivalent to

294: \begin{equation} \label{b5}

295: \max_{B \in \calB_t } \big| E\{ Z_{ti} Z_{tj} I(B) \} \big| = 0

296: \end{equation}

297: for any $\pi$-class $\calB_t \subset \calF_{t-1}$ such that the

298: $\sigma$-algebra generated by $\calB_t$ is equal to $\calF_{t-1}$

299: (Theorem~7.1.1 of Chow and Teicher, 1997).

300: In practice, we use some simple $\calB_t$ for the sake of the

301: tractability. This leads to choosing  an orthogonal matrix $\bA  =

302: ( \ba_1, \cdots, \ba_d )^\tau$ which minimizes

303: \begin{equation} \label{b6}

304: \Psi_n(\bA) \equiv

305: \sum_{1\le i < j \le d} \; \sup_{B \in \calB,\, 1\le k \le k_0 }\;

306:  {1 \over n-k} \Big|\ba_i^\tau\Big\{ \sum_{t=k+1}^n \bX_t

307: \bX_t^\tau I( \bX_{t-k} \in B ) \Big\} \ba_j \Big|,

308: \end{equation}

309: where $\calB$ is a collection of subsets in $\RR^d$, $k_0 \ge 1$ is

310: a prescribed integer. We denote by $\wh \bA = ( \wh\ba_1, \cdots,

311: \wh\ba_d )^\tau$ the resulting estimator.

312:

313: % Note that when  $\calB$ consists of only two sets, empty set and the

314: % whole $d$-dimensional space $\RR^d$, $\Psi_n(\bA)$ is basically the same

315: % as

316: % $$

317: % \sum_{1\le i < j \le d} {1 \over n-1} \Big|\ba_i^\tau\Big\{ \sum_{t=2}^n \bX_t

318: %  \bX_t^\tau  \Big\} \ba_j \Big|.

319: % $$

320: % Hence, $\{\wh \ba_j, j=1, \cdots, d\}$ are the principal components.  In

321: % other words, our model becomes the orthogonal GARCH model in Alexander

322: % (2001).

323:

324: Since the order of $\ba_1, \cdots, \ba_d$ is arbitrary, we measure the

325: estimation error by

326: \begin{equation} \label{b7}

327: D(\wh \bA, \; \bA) = 1 - {1\over d}

328: \sum_{i=1}^d   \max_{1\le j \le d}  | \ba_i^\tau \wh \ba_j| .

329: \end{equation}

330: Note that for any orthogonal matrices $\bA$ and $ \bB$, $D(\bA,

331: \bB)\ge 0$. Furthermore, if the columns of $\bA$ are obtained from

332: a permutation of the columns of $\bB$ or their reflections, $D(\bA, \bB) = 0$.

333: In fact $\Psi_n(\bA) = \Psi_n(\bB)$ if and only if $D(\bA, \bB) = 0$.

334:

335:

336: In practice, we may let $\calB$ consist of balls with an appropriately

337: selected radius (such that each ball contains sufficiently many data

338: points) centered on a grid in the sample space of $\bX_t$.

339: For example, we may use those observations $\bX_t$ as  the centres of balls such

340: as at least one of the components of $\bX_t$ is the 10th, the 20th, $\cdots$

341: the 90th sample percentile of the corresponding component observations.

342:

343: To overcome the difficulties in handling the constraint $ \bA^\tau\bA = \bI_d$

344: in solving the above optimization problem,

345: we reparametrize $\bA$ in terms of the decompositions:

346: \begin{equation} \label{b8}

347: \bA = \prod_{1\le i < j \le d} \bE_{ij}(\varphi_{ij}),

348: \end{equation}

349: where $\bE_{ij}(\varphi_{ij})$ is obtained from the identity matrix

350: $\bI_d$ with the following replacements: both the $(i,i)$-th and the

351: $(j,j)$-th elements are replaced by $\cos \varphi_{ij}$, the $(i,j)$-th

352: and the $(j,i)$-th elements are replaced, respectively, by $\sin

353: \varphi_{ij}$ and $-\sin \varphi_{ij}$ (Vilenkin 1968, van der Weide 2002).

354: Obviously $\bE_{ij}(\varphi_{ij})$ is an orthogonal matrix, so is $\bA$

355: given in (\ref{b8}). Writing $\bA$ in (\ref{b2}) in the form of

356: (\ref{b8}), the constrained minimization of (\ref{b6}) over

357: orthogonal $\bA$ is transformed to an unconstrained minimization

358: problem over a ${d(d-1)\over 2}\times 1$ vector $\bvarphi =

359: (\varphi_{12}, \varphi_{13},\cdots, \varphi_{1d}, \varphi_{23}, \cdots,

360: \varphi_{d-1,d} )^\tau$. This minimization problem is typically

361: solved by iterative algorithms.

362: We stop the iteration when $D(\bA_k, \bA_{k+1})$ is

363: smaller than a prescribed small number, where $\bA_k$ denotes the

364: value of $\bA$ in the $k$-th iteration, and $D$ is defined as in

365: (\ref{b7}).

366:

367: \noindent {\bf Remark 1}. In practice,  we may replace (\ref{b6})

368: by a weighted version

369: $$

370: \Psi_n(\bA) = \sum_{1\le i < j \le d} \; \sup_{B \in \calB,\, 1\le

371: k \le k_0 }\;

372:  {1 \over n-k} \Big|\ba_i^\tau\Big\{ { \sum_{t=k+1}^n \bX_t

373: \bX_t^\tau [I( \bX_{t-k} \in B ) + \varepsilon_0] \over

374: \sum_{i=k+1}^n [I( \bX_{t-k} \in B )+\varepsilon_0] } \Big\} \ba_j

375: \Big|,

376: $$

377: where $\varepsilon_0$ is a small constant guarding against zero

378: denominator.  This puts more emphasis on small sets $B$.

379: Furthermore, the superemum over $k$ in (\ref{b6}) may be replaced

380: the summation over~$k$.

381:

382: \subsubsection{Asymptotic properties}

383:

384: We first introduce two concepts:  mixing which measures the decaying speed of

385: the auto-dependence for a time series over an increasing time span, and

386: the Vapnik-$\breve{\mbox{C}}$ervonenkis (or VC) index which measures

387: the complexity of a collection of sets.

388:

389: Let $\calF_{i}^j$ be the $\sigma$-algebra generated by $\{\bX_t, i

390: \leq t \leq j \}$. The  $\beta$-mixing coefficients is defined  as

391: $$

392:  \beta(n) = E \left \{ \sup_{ B \in

393:   \calF_n^\infty} | P(B) - P(B|\calF_{-\infty}^0 ) | \right \}.

394: $$

395: (See \S 2.6.1 of Fan and Yao, 2003.)

396:

397:

398: For an arbitrary set of $n$ points $\{x_1, \cdots, x_n \}$, there are

399: $2^n$ possible subsets.  Say that $\calB$ picks out a certain subset

400: from $\{x_1, \cdots, x_n\}$ if this can be formed as a set of the

401: form $B \cap \{x_1, \cdots, x_n\}$ for a set $B$ in $\calB$. The

402: collection $\calB$ shatters $\{x_1, \cdots, x_n\}$ if each of its

403: $2^n$ subsets can be picked out by $\calB$.  The VC-index of $\calB$

404: refers

405: to the smallest $n$ for which no set of size $n$ is shattered by

406: $\calB$. A collection of sets $\calB$ is called a VC-class if its

407: VC-index is finite.  The collections of sets of rectangles, balls and

408: their unions are VC-classes. See Chapter 2.6 of van der Vaart and

409: Wellner (1996) for further discussion on VC-classes.

410:

411: Under the regularity conditions listed below, the estimator $\wh \bA$

412: is consistent; see Theorem~1. Its proof is relegated in Appendix A.

413: \begin{quote}

414:  (A1) The collection $\calB$ of sets  in $\RR^d$  is a VC-class.

415:

416:  (A2) The process $\{ \bX_t \}$ is strictly stationary with $E||

417: \bX_t ||^2 < \infty$, where $||\cdot||$ denotes the Euclidean

418: norm. Furthermore, the $\beta$-mixing coefficients  $\{\bX_t \}$

419: satisfy $\beta(n) = O(n^{-b})$ for some $b > 0$.

420:

421: (A3) There exists a $d\times d$ orthogonal matrix $\bA_0$ which

422: minimises

423: \[

424: \Psi(\bA) \equiv \sum_{1\le i < j \le d}  \sup_{1 \leq k \leq k_0,

425: B \in \calB} \big| E \{ \ba_i^\tau \bX_t \bX_t^\tau \ba_j

426: I(\bX_{t-k} \in B) \} \big|.

427: \]

428: Furthermore the minimum value of $\Psi$ is obtained at an orthogonal

429: matrix $\bA$ if and only if $D(\bA, \bA_0) = 0$.

430:

431: (A4).  $E \| \bX_t \|^{2p} < \infty$ for some $p >2$ and the

432: $\beta$-mixing coefficient in (A2) holds for $b > p/(p-2)$.

433:

434:

435: (A5) $\Psi(\bA_0) - \Psi(\bA) \le - a D(\bA, \bA_0)$

436: for any orthogonal matrix $\bA$ such that $D(\bA, \bA_0)$ is smaller than a

437: small but fixed constant, where $a > 0$ is a constant.

438: \end{quote}

439:

440: \noindent{\bf Remark 2}.  Let $\calH$ be the set consisting of all

441: $d\times d$ orthogonal matrices.

442: Then $\calH$ may be partitioned into the equivalent classes defined

443: by the distance $D$ in (\ref{b7}) as follows: the $D$-distance

444: between  any two elements within an equivalent class is 0, and the

445: $D$-distance between

446: any two elements from different classes is greater than 0.

447: Let $\calH_D$ be the quotient space $\calH/D$ consisting of those

448: equivalent classes in $\calH$, i.e.  we treat $\bA$ and $\bB$ as the

449: same element in $\calH_D$ if and only if $D(\bA, \bB) =0$.

450: Condition (A3) ensures $\bA_0$ is the unique minimiser

451: of $\Psi(\bA)$ on $\calH_D$.

452: In fact both $\Psi(\cdot)$ and $\Psi_n(\cdot)$ are

453:  Lipschitz continuous on $\calH_D$ with $D$-distance; see Lemma~1 in Appendix~A

454: below.

455:

456:

457:

458:

459: \askip

460:

461: \noindent {\bf Theorem 1}. Let $k_0\ge 1$ be a fixed integer.

462: Under conditions (A1)--(A3), $D(\wh \bA, \bA_0) \to 0$ almost

463: surely as $n \to \infty$.  If, in addition, condition (A4) holds,

464: then

465: $$

466: \Psi_n (\bA) - \Psi(\bA) = O_P(n^{-1/2}), \quad \mbox{for any

467: orthogonal $\bA$.}

468: $$

469: Furthermore, $n^{1/2} D(\wh\bA, \bA_0) = O_P(1)$ provided that, in

470: addition, condition (A5) also holds.

471:

472:

473:

474:

475: When the CUCs exist, namely $\Psi(\bA_0) = 0$, $\bA_0$ corresponds to the

476: transform for the CUCs.  When the CUC does not exist, Theorem 1

477: continues to hold.  In this case, $\Psi(\bA_0) \not = 0$ and indeed

478: $\bA_0$ can depend on the $\pi$-class $\cal B$.

479: In practice, we really do not know whether this condition holds

480: or not. In that case, our aim becomes naturally to find an

481: orthogonal transform such that the resulting components are as

482: less conditionally correlated as possible.  Observe that the

483: conditional correlation criterion

484: $$

485:   \Psi(\bA) = \sum_{1\le i < j \le d}  \sup_{1 \leq k \leq k_0,

486:    B \in \calB} \big| \mbox{Corr} (\ba_i^\tau \bX_t, \ba_j^T \bX_t |

487:    \bX_{t-k} \in B ) \big | P( \bX_{t-k} \in B).

488: $$

489: Thus, a reasonable criterion is to find an orthogonal transform

490: $\bA$ to minimize $\Psi(\bA)$.  The following theorem shows that

491: our estimation method possesses some degrees of robustness and is

492: better than the principal component transform in terms of

493: minimizing the conditional correlation criterion $\Psi(\bA)$.

494:

495: \noindent {\bf Theorem 2}. Let $k_0\ge 1$ be a fixed integer.

496: Under conditions (A1), (A2),  for any other orthogonal transform

497: $\hat{\bB}$, we have

498: $$

499:    \liminf  \{\Psi(\hat{\bA}) - \Psi(\hat{\bB})\} \leq 0.

500: $$

501: %If, in addition, $\bA_0$ is the unique minimizer of $\Psi(\bA)$ on

502: %the quotient space $\calH_D$, then $D(\hat{\bA}, \bA_0) \to 0$

503: %almost surely.

504:

505: Theorem 2 shows for any other orthogonal transform $\hat{\bB}$,

506: asymptotically, the transformed components have higher conditional

507: correlation, in terms of $\Psi(\cdot)$, than those transformed by

508: $\hat{\bA}$.

509:

510:

511:

512: \subsection{Modelling volatilities for CUCs}

513:

514:

515: Once the CUCs have been identified, we may fit each

516: $\sigma_{tj}^2$ with any appropriate univariate volatility model,

517: for example, a GARCH model, a stochastic volatility model, or any

518: nonparametric and semiparametric volatility models. As a simple

519: illustration, we establish below an extended GARCH(1,1) model for

520: each of $\sigma_{ti}^2$ given in (\ref{b3}).

521:

522: \subsubsection{Extended GARCH(1,1) models}

523:

524: We assume, for the $j$-th CUC, $j=1, \cdots, d$,

525: \begin{equation} \label{b9}

526: Z_{tj} = \sigma_{tj} \ve_{tj}, \quad \quad \sigma_{tj}^2 = \ga_j +

527: \sum_{i=1}^d \alpha_{ji} Z_{t-1, i}^2 + \beta_j \sigma_{t-1,j}^2,

528: \end{equation}

529: where $ \{\ve_{tj}, \; -\infty < t < \infty\} $ is a sequence of i.i.d.

530: random variables with mean 0 and

531: variance 1, $\ve_{tj}$ is independent of $\calF_{t-1}$, $\ga_j

532: >0$ and $\alpha_j, \alpha_{ji}, \beta_j \ge 0$.

533: This model contains extra $d-1$ terms $\sum_{i \not = j}

534: \alpha_{ji} Z_{t-1, i}^2$ from the standard GARCH(1,1) model,

535: which incorporates the possible association between the $j$-th CUC

536: and the other CUCs, while the conditional zero-correlation

537: condition (\ref{b1}) still holds. Such a dependence is termed as

538: that the $i$-th component (if $\alpha_{ji} \not = 0$) is causal in

539: variance to the $j$-th component (Engle, Ito and Lin~1991).

540:

541:

542: In practice, we expect that $\sigma_{tj}^2$ may depend on

543: $Z_{t-1, i}^2$ only for a small number of $i$'s, including $i=j$, i.e. many

544: coefficients $\alpha_{ji}$ (for $i\ne j$) may be 0.

545: Section~2.3.3 below outlines a data-analytic approach for

546: building such a component-dependent model.

547:

548:

549: When $\beta_j \in [0, 1)$, (\ref{b9}) implies

550: \begin{equation} \label{b10}

551: \sigma_{tj}^2 = \var(Z_{tj} |\calF_{t-1}) = {\ga_j \over 1

552: -\beta_j} + \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty

553: \beta_j^{k-1} Z_{t-k,\, i}^2.

554: \end{equation}

555: Put $\bZ_t = (Z_{t1}, \cdots, Z_{td})^\tau$. Theorem~2 below gives

556: a sufficient condition of the existence of stationary solution to

557: model~(\ref{b9}).

558:

559:

560: \askip

561:

562: \noindent {\bf Theorem 3}. (i) The extended GARCH(1,1) model

563: (\ref{b9}) defines a unique $d$-dimensional strictly stationary

564: process $\{ \bZ_t \}$ with $E || \bZ_{t}||^2 < \infty$ under the

565: condition

566: \begin{equation} \label{b11}

567: r\cdot \max\{\alpha_{j1}, \cdots, \alpha_{jd} \} + \beta_j < 1,

568: \quad \quad 1\le j \le d,

569: \end{equation}

570: where $r = \max_{1\le j \le d} d_j$, and $d_j$ is the number of non-vanishing

571: coefficients among $ \alpha_{j1}, \cdots, \alpha_{jd} $.

572:

573: (ii) Under condition (\ref{b11}), $E(Z_{tj}^2) = 1 $ for all $1\le

574: j \le d$ if and only if

575: \begin{equation}

576: \ga_j = 1 - \beta_j - \sum_{i=1}^d \alpha_{ji} , \quad \quad \quad

577: 1\le j \le d.  \label{b12}

578: \end{equation}

579:

580: \askip

581:

582: The proof of the above theorem is in Appendix B. When

583: $\alpha_{ji}= 0$ for all $i \not = j$, i.e. each $Z_{tj}$ follows

584: a standard GARCH(1,1) model, (\ref{b11}) reduces to $\alpha_{jj} +

585: \beta_j < 1$, which is the necessary and sufficient condition for

586: the existence of unique strictly stationary solution with finite

587: second moments for the corresponding GARCH(1,1) model; see Chen

588: and An (1998).  In practice condition (\ref{b11}) may often be

589: violated, indicating the likely inappropriateness of GARCH

590: specification for $\sigma_{tj}^2$. However if we view the right

591: hand side of (\ref{b10}) as an approximation for $\sigma_{tj}^2$,

592: such an approximation process is strictly stationary under a weaker

593: condition $\beta_j <1$. For further discussion of the

594: approximation point of view, we refer to Penzer, Wang and Yao~(2004).

595:

596:

597: \subsubsection{quasi-MLE}

598:

599: To facilitate a likelihood, let us assume hypothetically

600: that $\ve_{tj}$ in (\ref{b9}) has a density $f(\cdot)$,

601: which can be the standard normal distribution, generalized Gaussian

602: distribution and $t$-distribution.  The implied (negative)

603: log-likelihood function for $\btheta_j \equiv (\alpha_{j1},

604: \cdots, \alpha_{jd}, \beta_j)^\tau$ is

605: \begin{equation} \label{b13}

606: l_j(\btheta_j ) = \sum_{t=\nu+1}^n \big\{

607: \log \sigma_{tj}(\btheta_j)  - \log f(Z_{tj}/\sigma_{tj}(\btheta_j)) \big\},

608: \end{equation}

609: for a given integer $\nu \ge 1$, where $\sigma_{tj}(\btheta_j)^2 =

610: \var(Z_j | \calF_{t-1})$ is given by $(\ref{b9})$.  By (\ref{b10})

611: and  (\ref{b12}),

612: \begin{eqnarray}

613: \sigma_{tj}(\btheta_j)^2 &=&

614:  \frac{\gamma}{1 - \beta_j} + \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty

615:     \beta_j^{k-1} Z_{t-k,i}^2 \nonumber \\

616: &=& 1 - \frac{ \sum_{i=1}^d \alpha_{ji}}{ 1 -\beta_j}  +

617: \sum_{i=1}^d \alpha_{ji} \sum_{k=1}^\infty \beta_j^{k-1}

618: Z_{t-k,i}^2 . \label{b14}

619: \end{eqnarray}

620: This form of $\sigma_{tj}(\btheta_j)^2  $ ensures

621: $\var(Z_{tj})=1$; see Theorem~2(ii). The quasi-maximum likelihood

622: estimator $\wt \btheta_j$ minimizes (\ref{b13}). In practice, we

623: let $Z_{ti} \equiv 0$ for all $t\le 0$ on the right hand side of

624: (\ref{b14}).

625:

626:

627: \subsubsection{Selection of casual components}

628:

629: To obtain a parsimonious representation for $\sigma_{tj}^2$, we

630: may select only those significant $Z_{t-1,i}$ on the RHS of the

631: second equation in (\ref{b9}). This is particularly important when

632: the number of components $d$ is large. It may be achieved by using

633: the ideas for variable selection in regression analysis. Below we

634: outline such an algorithm based on a combination of the stepwise

635: addition method and the BIC criterion.

636:

637:

638:

639: We start with the standard GARCH(1,1) model (i.e. $\alpha_{jj}\ne 0$

640: and $\alpha_{ji} = 0$ for $j \not = i$). We then add one more $Z_{t-1,i}$

641: each time which maximizes the (quasi-)likelihood.

642: More precisely, suppose the model contains

643: $(k-1)$ terms $Z_{t-1, j_1}, \cdots, Z_{t-1, j_{k-1}}$ already.

644:  We choose an additional term $Z_{t-1, \ell}$ among

645: $\ell\not\in \{j, j_1, \cdots, j_{k-1}\}$ which maximizes the

646: quasi-likelihood function. Note that this is a two-step

647: maximization problem:  For each given $\ell\not\in \{j, j_1,

648: \cdots, j_{k-1}\}$, we compute the qMLE $\wt \btheta_j^{(k)}$ for

649: $\btheta_j^{(k)} \equiv  (\alpha_{jj}, \alpha_{jj_1}, \cdots,

650: \alpha_{j\ell}, \beta_j)^\tau$ with the constraints

651: $\alpha_{ji} = 0$, for $i \not \in \{j, j_1, \cdots, j_{k-1}, \ell\}$. We then choose

652: an $\ell \not \in \{j, j_1, \cdots, j_{k-1} \}$ to minimize

653: $l_j(\wt \btheta_j^{(k)})$, and denote by $l_j(k)$ the minimum

654: value and the index of the selected variable $j_k$. Put

655: \[

656: {\rm BIC}_j(k) = l_j(k) +  (k+2) \log(n-\nu).

657: \]

658: We choose $r_j$ which minimizes BIC$_j(k) $ over $0 \le k \le d$.

659: Note that $k=0$ corresponds the standard GARCH(1,1) fitting for

660: $Z_{tj}$.

661:

662:

663:

664: \subsubsection{LADE}

665:

666: If CUCs $Z_{tj}$ are known (i.e. $\ba_j$ are known), the asymptotic properties

667: of qMLE may be derived in the similar manner as Hall and

668: Yao~(2003). See also Mikosch and Straumann~(2004). For example, the

669: estimator $\wt \btheta_j$ would suffer from

670: complicated asymptotic distributions and slow convergence rates if $\ve_{tj}$

671: is heavy-tailed in the sense that $E(|\ve_{tj}|^4) = \infty$.

672: On the other hand, a least absolute deviation estimator

673: based on a log-transformation

674: is always asymptotically normal with the standard root-$n$

675: convergence rate; see Peng and Yao (2003).

676:

677: To construct the LADE with the constraint $\var(Z_{tj})=1$, we

678: write $\ve_{tj} = v_0 e_{tj}$ in the first equation in (\ref{b9}),

679: where the median of $e_{tj}^2$ is equal to 1 and $v_0 =

680: 1/{\mbox{STD}}(e_{tj})$. With $\sigma_{tj}(\btheta_j)^2$ expressed

681: in (\ref{b14}), parameters $\btheta_j$ and $v_0$ are (jointly)

682: identifiable. Now

683: \[

684: \log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2

685: = \log (e_{tj}^2).

686: \]

687: Since the median of $\log (e_{tj}^2) $ is 0, the true values of the

688: parameters minimise

689: \[

690: E \big|\log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2\big|.

691: \]

692: Therefore we may estimate the

693: parameters by minimizing

694: \begin{equation} \label{b15}

695: \sum_{t=\nu+1}^n

696: |\log Z_{tj}^2 - \log \{ \sigma_{tj}(\btheta_j)^2\} - \log v_0^2\big|,

697: \end{equation}

698: where $\sigma_{tj}(\btheta_j)^2$ is given in (\ref{b14}), with the

699: part of $a_{ji} = 0$ for the non-casual component in the variance.

700: So far $\btheta_j$ and $v_0$ are treated as free parameters. The estimators

701: obtained are root-$n$ consistent.

702:

703: To make an explicit use of the condition that $\var(\ve_{tj})=1$,

704: we may estimate parameters $\btheta_j$ as follows.

705: With the initial estimate $\hat{\btheta}_j^{(0)}$, let $\hat{v}_0$ be the

706: reciprocal of the sample standard deviation of the residuals $\{

707: \wt\ve_{tj} \}$, where $\wt\ve_{tj}

708: =Z_{tj}/\{\sigma_{tj}(\btheta_j^{(0)}) \}$.

709: With the given $\hat{v}_0$ and $\hat{\btheta}_j^{(0)}$, we can

710: minimize

711: $$

712: \sum_{t=\nu+1}^n  w_t \bigl ( \log Z_{tj}^2 - \log \{

713: \sigma_{tj}(\btheta_j)^2\} - \log \hat{v}_0^2\bigr )^2,

714: $$

715: where $w_t =  |\log Z_{tj}^2 - \log \{

716: \sigma_{tj}(\hat{\btheta}_j^{(0)})^2\} - \log

717: \hat{v}_0^2\big|^{-1}$. We may update $\hat{v}_0$ and

718: iterate further until the estimated

719: $\btheta_j$ converges. Note that we have used a weighted $L_2$ loss

720: function to approximate the $L_1$ loss to expedite the computation.

721:

722:

723:

724: \subsection{Inference based on bootstrapping }

725:

726: A natural question for the proposed approach is if the CUCs

727: $Z_{t1}, \cdots, Z_{td}$ exist, although the minimiser $\{ \wh

728: \ba_j\}$ of (\ref{b6}) always exists. To address this issue

729: statistically, we may construct a test for the null hypothesis

730: \[

731: H_0: \; \bX_t = \bA \bZ_t \quad \mbox{and} \quad

732: \bZ_t = \diag(\sigma_{t1}, \cdots, \sigma_{td}) \bve_t,

733: \]

734: where $\bA^\tau\bA = \bI_d$, $\bve_t = (\ve_{t1}, \cdots,

735: \ve_{td})^\tau$, $\{ \ve_{t1}\}, \cdots, \{ \ve_{td}\}$ are $d$

736: independent series, and each of them is a sequence of i.i.d. r.v.s

737: with mean 0 and variance 1. Note that the null hypothesis above is

738: a sufficient but not necessary condition for the existence of

739: CUCs. The independence condition is required to construct a

740: bootstrap test for this null hypothesis.

741:

742: Note when $Z_{ti}$ and $Z_{tj}$ are not conditionally

743: uncorrelated, the left hand side of (\ref{b5}) is equal to

744: positive constant instead of 0. Therefore, the {\sl large} values

745: of $\Psi_n(\wh \bA)$ will indicate that the CUCs do  not exist. We

746: adopt a bootstrap method below to assess how large is large enough

747: to reject~$H_0$.

748:

749: If the null hypothesis $H_0$ could not be rejected, we may also

750: construct confidence sets for the coefficients $\ba_j$ (i.e. the

751: columns of $\bA$) of the CUCs, and the parameters $\btheta_j$

752: based on the same bootstrap scheme. Formally confidence sets for

753: $\btheta_j$ could be constructed based on asymptotic distributions

754: of, for example, the LADE $\wh \btheta_j$, which may be derived in

755: the similar manner of Peng and Yao~(2003). However such an

756: approach is based on the assumption that the CUCs are known (i.e.

757: the vectors $\ba_j$ are known), and, therefore, fails to take

758: into account of the errors due to the estimation for $\ba_j$.

759:

760: Let $\wh \bA =(\wh \ba_1, \cdots, \wh\ba_d)$ be the estimator

761: derived from minimizing (\ref{b6}). Let $Z_{tj} = \wh \ba_j^\tau

762: \bX_t$. Let $\wh \btheta_j$ be an estimator

763: for $\btheta_j$, such as the LADE defined in section~2.3.4.

764:

765: The bootstrap sampling scheme consists of the three steps below.

766: \begin{quote}

767: (i) For $j=1, \cdots, d$, draw $\ve_{tj}^*$, for $-\infty< t \le n$,

768: by sampling randomly with replacement from  the standardized residuals

769: $\{\wh \ve_{\nu+1, j}, \cdots , \wh \ve_{nj}\}$ which are obtained

770: from standardizing the raw residuals

771: \[

772: Z_{tj}/\sigma_{tj}(\wh \btheta_j), \quad \quad t=\nu+1, \cdots, n.

773: \]

774:

775: (ii) For $j=1, \cdots, d$, draw $Z_{tj}^* = \sigma_{tj}^* \ve_{tj}^*$,

776: for $-\infty< t \le n$, where

777: \[

778: ( \sigma_{tj}^*)^2 =1 -  \wh \beta_j - \sum_{i=1}^d \wh

779: \alpha_{ji}   + \sum_{i=1}^d \wh \alpha_{ji}(Z_{t-1, i}^*)^2 + \wh

780: \beta_j (\sigma_{t-1,j}^*)^2.

781: \]

782:

783: (iii) Let $\bX_t^* = \wh \bA (Z_{t1}^*, \cdots, Z_{td}^*)^\tau$ for $t=1, \cdots,

784: n$.

785: \end{quote}

786:

787: \askip

788:

789: \noindent {\sl A test for the existence of the CUCs}: Let

790: $\Psi_n^*(\bA)$ be defined as in (\ref{b6}) with $\{ \bX_t \}$

791: replaced by $\{ \bX_t^* \}$, and the bootstrap estimator $\bA^*=

792: (\ba_1^*, \cdots,  \ba_d^*)$ be computed in the same manner as

793: $\wh \bA$ with $\{ \bX_t \}$ replaced by $\{ \bX_t^* \}$. Note

794: that the bootstrap sample $\{ \bX_t^* \}$

795:  is drawn from the model with $\wh \ba_j^\tau \bX_t$ as its {\sl genuine}

796: CUCs. Hence the conditional

797: distribution of $\Psi_n^*( \bA^*)$ (given the original sample $\{ \bX_t

798: \}$) may be taken as an approximation for the distribution of $\Psi_n(\wh

799: \bA)$ under $H_0$.  Thus we reject

800: $H_0$ if $\Psi_n(\wh \bA)$ is greater than the $[B\alpha]$-th largest

801: value of $\Psi_n^*( \bA^*)$ in a replication of the above bootstrap

802: resampling for $B$ times, where $\alpha \in (0, 1)$ is the size of the

803: test and $B$ is a large integer.

804:

805: \askip

806:

807: \noindent {\sl Confidence sets for ${\bf A}$}: A bootstrap

808: approximation for an $(1-\alpha)$ confidence set of the

809: transformation matrix ${\bf A}$ can be constructed  as

810: \begin{equation}\label{b16}

811: \{ {\bf A} \, \big| \, D(\hat{\bf A}; {\bf A}) \le c_\alpha , {\bf A}^{\tau}{\bf A}={\bf I}_d \},

812: \end{equation}

813: where  $c_\alpha $ is the $[B\alpha]$-th largest value of

814: $D(\hat{\bf A}; {\bf A}^*)$ in a replication of bootstrap

815: resampling for $B$ times. Note that when $\bA$ is in the

816: confidence set, so is $\bB$ if the columns of $\bB$ form a

817: permutation of the (reflected) columns of $\bA$; see (\ref{b7}).

818:

819: \askip

820:

821: \noindent

822: {\sl Interval estimators for the components of $\wh \btheta_j$}:

823: A bootstrap confidence interval for any component, say, $\beta_j$

824: of $\btheta_j$ may be obtained as follows. Repeat the above

825: bootstrap sampling $B$ times for some large integer $B$, resulting

826: in bootstrap estimates $ \beta^*_{j1}, \cdots,  \beta^*_{jB}$. An

827: approximate $(1-\alpha)$ confidence interval for $\beta_j$ is $(

828: \beta_{j(b_1)}^*, \; \beta_{j(b_2)}^*)$, where $ \beta_{j(i)}^*$

829: denotes the $i$-th smallest value among $ \beta^*_{j1}, \cdots,

830: \beta^*_{jB}$, and $ b_1=[B\alpha/2]$ and $b_2=[B(1-\alpha/2)]$.

831:

832:

833: \section{Simulation}

834:

835: We conduct a Monte Carlo experiment to illustrate the proposed

836: CUC-approach. In particular we check the accuracy of the

837: estimation for the transformation matrix $\bf A$ in (\ref{b2}).

838:

839: We consider a

840: CUC-GARCH(1,1) model with $d=3$

841: \begin{equation} \label{ex1}

842: {\bf X}_t  =  {\bf A}{\bf Z}_t, \quad \quad

843:             {\bf Z}_{t}| \calF_{t-1}\; \sim \;  N(0,\;  \diag\{\sigma_{t,1}^2,

844: \sigma_{t,2}^2, \sigma_{t,3}^2\}),

845: \end{equation}

846: where $ \sigma_{t,i}^2  =  \gamma_i+\alpha_i

847: Z_{t-1,i}^2+\beta_i \sigma_{t-1,i}^2 $, and

848: \begin{center}

849: \begin{tabular}{ccc|cccc}

850:      & {\bf A}  &        &   $i$       &   $\gamma_i$   &  $\alpha_i$

851:   &  $\beta_i$  \\[0.5ex]\hline

852:   0  &   0.500  & 0.866  &     1             & 0.02         &   0.08

853:   &      0.90 \\

854:   0  &   0.866  & -0.500 &     2             & 0.10         &   0.10

855:   & 0.80 \\

856:   -1 &   0      & 0      &     3             & 0.28         &   0.12

857:   & 0.60 \\

858: \end{tabular}

859: \end{center}

860: It is easy to see that ${\bf A}^\tau{\bf A}={\bf I}_3$ and

861: $\gamma_i = 1 - \alpha_i - \beta_i$ so that the variances of the

862: CUCs are 1 [see (\ref{b12})]. Since $\alpha_1 + \beta_1 = 0.98$

863: is very close to 1, the volatility for the first CUC is highly

864: persistence. On the contrary, the volatility persistence in the

865: third component is less  pronounced as $\alpha_3+\beta_3=0.72$

866: only.

867:

868: For each of 200 samples with size $n=500$ and 1000 respectively

869: from the above model, we estimated the transformation matrix $\bA$

870: by minimizing $\Psi_n({\bf A})$ defined in (\ref{b6}), which was

871: solved using the proprietary optimization routines in MATLAB. Note

872: that as far as the estimation of $\bA$ is concerned, two

873: orthogonal matrices are treated as identical if the $D$-distance

874: between them is 0; see (\ref{b7}). The coefficients $\alpha_i,

875: \beta_i$ and $\gamma_i$ were estimated using quasi-MLE based on a

876:  Gaussian likelihood. The

877: resulting estimates were summarized in Table~1 and Figure~1.

878:

879: \begin{table}[htb]

880: \begin{center}

881: \caption[Table 1]{Simulation Results: summary statistics of the

882: errors in estimation}

883: \begin{tabular}{cc | c c c c c c c}\hline

884: && $D(\hat{\bf A}, {\bf A})$ &    $\hat{\alpha}_1$  & $ \hat{\beta}_1$  &  $ \hat{\alpha}_2$  &  $ \hat{\beta}_2$  &  $\hat{\alpha}_3$     &  $\hat{\beta}_3$  \\[0.5ex]\hline

885: &  mean       &   0.0753      &    0.0719            &      0.8701       &

886:      0.0865         &    0.7506          &     0.0997            &

887: 0.6189      \\

888: &  median     &   0.0474      &    0.0705            &      0.8870       &

889:      0.0830         &    0.7801          &     0.0861            &

890: 0.6445      \\

891: $n=500$ & STD   &   0.0714      &    0.0300            &      0.0830       &

892:      0.0469         &    0.1469          &     0.0600            &

893: 0.2017      \\

894: &  bias       &      -        &   -0.0081            &     -0.0299       &

895:     -0.0135         &   -0.0494          &    -0.0203            &

896: 0.0189      \\

897: &  RMSE       &      -        &    0.0303            &      0.0888       &

898:      0.0484         &    0.1546          &     0.0629            &

899: 0.2022      \\ \hline

900: &  mean       &   0.0679      &    0.0722            &      0.8921       &

901:      0.0846         &    0.7751          &     0.0937            &

902: 0.6307      \\

903: &  median     &   0.0434      &    0.0731            &      0.8999       &

904:      0.0833         &    0.7956          &     0.0938            &

905: 0.6517      \\

906: $n=1000$&  STD   &   0.0648      &    0.0224            &      0.0400       &

907:      0.0346         &    0.1065          &     0.0412            &

908: 0.1634      \\

909:  & bias       &      -        &   -0.0078            &     -0.0079       &

910:     -0.0154         &   -0.0249          &    -0.0263            &

911: 0.0307      \\

912:   &RMSE       &      -        &    0.0234            &      0.0403       &

913:      0.0384         &    0.1191          &     0.0487            &

914: 0.1660      \\ \hline

915: \end{tabular}\\[0.5ex]

916: \end{center}

917: \end{table}

918:

919: Since both the  means and the standard deviations $D(\hat{\bf A},{\bf

920: A})$ are very small, the estimation for  ${\bf A}$ is accurate.

921: The coefficients in each CUC models were also estimated accurately.

922: The errors in estimation decrease as the sample size increases

923: from 500 to 1000.

924:

925: Since biases reported in Table~1 are always negative; see also

926: Figure~1. This indicates that the coefficients in the GARCH(1, 1)

927: models for CUCs were slightly underestimated. Also note that the

928: estimation errors decrease when the volatility persistence

929: (measured by $\alpha_i + \beta_i$) increases; see the upper panel

930: of Figure~1 for the estimation with the sample size 1000. To make

931: a comparison, the estimation errors of the GARCH coefficients when

932: the true ${\bf A}$ is used are plotted in the lower panel.  The

933: differences are small.

934:

935:

936: \section{Real data examples}

937:

938: In this section we illustrate the proposed method

939: with two real data sets.

940:

941: The first data set, denoted as SCI, consists of the 2275

942: daily log returns (in percentages) of S\&P 500 index, stock price

943: of Cisco System and stock price of Intel Corporation  in  2

944: January 1991 --- 31 December 1999. This data set has been analyzed

945: in Tsay~(2001). Figure 2 depicts the time series plots of the

946: three series.  Descriptive statistics are listed in Table 2.

947: Obviously, the unconditional distribution of all of these series

948: exhibit excessive kurtosis; indicating significant departure from

949: normal distributions.

950:

951: The Ljung-Box $Q$ statistics suggest some plausible autocorrelation

952: in these series. But this may be due to the

953: heteroscedasticity. Hence we compute the $p$-values of these $Q$ tests

954: based on a bootstrap procedure: for each of the mean-deleted

955: component return series, we first fit a univariate

956: GARCH(1,1) model

957: \begin{equation}\nonumber

958: Y_t=\sigma_t \epsilon_t, \hspace{1cm} \sigma_t^2=\alpha_0

959: + \alpha_1 Y_{t-1}^2 +\beta_1 \sigma_{t-1}^2,

960: \end{equation}

961: and denote the estimated

962: parameters as $\hat{\alpha}_0, \hat{\alpha}_1, \hat{\beta}_1$,

963: respectively, and the standardized residuals as

964: $\hat{\epsilon}_t$. Draw $ \epsilon_t^{\ast}$

965: randomly with replacement from $\{\hat\epsilon_t,\; t=1,\cdots,n\}$ and

966: draw $Y_t^{\ast}$

967: from

968: \begin{equation}\nonumber

969: Y_t^{\ast}=\sigma_t \epsilon_t^{\ast}, \hspace{1cm}

970: \sigma_t^2=\hat{\alpha}_0 + \hat{\alpha}_1 Y^{\ast 2}_{t-1}

971: +\hat{\beta}_1 \sigma_{t-1}^{\ast 2}.

972: \end{equation}

973: Let $Q^{\ast}$ be a $Q$-statistic based on $Y_t^{\ast}$.

974: The $p$-value of $Q$ is now

975: estimated by the relative frequency of the occurrence of the event

976: that $Q^{\ast}$ is great than

977: $Q$ in a repeated bootstrap sampling for 1000 times.

978: In Table 2,  those $p$-values are listed in parentheses

979: below the values of the corresponding $Q$ statistics.

980: Based on those $p$-values, there is no significant evidence

981: for the existence of autocorrelation in all the three

982: component series.

983: Accordingly there is no need to fit a VAR model for the

984: conditional mean  for this data set.

985:

986:

987: Let $\bY_t$ be the mean-deleted returns of SCI.

988: Let $\bSigma =

989: {\bf P}{\bf \Lambda}{\bf P}^{\tau}$ be the sample covariance

990: matrix of $\bY_t$, where ${\bf P P}^\tau = \bI_3$ and $\bLambda$

991: is diagonal. Let ${\bf X}_t={\bf \Lambda}^{-\frac{1}{2}}{\bf

992: P}^{\tau} {\bf Y}_t$. Then we may regard the (unconditional)

993: covariance matrix of $\bX_t$ is $\bI_3$.

994:

995:

996: \begin{table}[htb]

997: \begin{center}

998: \caption[Table 2]{Summary Statistics of the Two Real Data Sets }

999: \begin{tabular}{c | c c c |c c c c c }\hline

1000:                &     S$\&$P 500     &    Cisco          &  Intel            &    HS             &       JN         &   SH              &       ST          &    TW              \\[0.5ex]\hline

1001: N              &       2275         &    2275           &  2275             &   1349            &    1349          &  1349             &      1349         &    1349            \\

1002: Mean           &       0.0656       &   0.2567          &  0.1561           &   -0.0198         &   -0.0477        &  0.0178           &     -0.0081       &  -0.0400           \\

1003: Stdev          &      0.8747        &  2.8540           & 2.4644            &   2.1822          &    1.7382        &  1.5401           &      1.8784       &   1.9863           \\

1004: Min            &      -7.1140       &  -22.1000         & -14.5810          &  -14.7346         &   -9.0145        &  -8.7277          &     -9.1535       &  -9.9360           \\

1005: Max            &       4.9900       &  15.5760          &  12.8500          &   20.2083         &    8.8876        &  8.8491           &     19.5559       &   9.7871           \\

1006: Skewness       &     -0.3600        & -0.3963           &  -0.2353          &    0.6419         &    0.1375        &  0.1861           &      0.9114       &   0.1345           \\

1007: Kurtosis       &      9.0469        &  6.7229           &  5.4701           &   14.3999         &    5.0891        &  8.4310           &     15.2063       &   5.4082           \\ \hline

1008: $Q(10)$        &  22.8322           & 25.3861           & 6.8567            &   32.2251         &    8.8471        & 12.9372           &     28.6943       &  16.9723           \\

1009:                &  \fsize (0.2440)   & \fsize (0.0870)   &  \fsize (0.8180)  & \fsize (0.1760)   &  \fsize (0.7540) &  \fsize (0.7770)  &  \fsize (0.2180)  &   \fsize (0.2540)  \\

1010: $Q(20)$        &  44.2898           & 33.9490           & 30.3427           &   46.1651         &    19.1511       & 26.9255           &     40.7220       &  28.4664           \\

1011:                &  \fsize (0.2300)   & \fsize (0.2500)   & \fsize (0.1170)   & \fsize (0.2810)   &  \fsize (0.7200) & \fsize (0.7310)   &    \fsize (0.2870)&  \fsize (0.3290)   \\ \hline

1012: \end{tabular}\\[0.5ex]

1013: \end{center}

1014: \begin{singlespace}

1015: \emph{Note:} {\sl  $Q(k)$ is referred to the Ljung-Box portmanteau test  statistics.

1016: Figures in parentheses are their corresponding p-values based on 1000 bootstrap

1017: replications.  }

1018: \end{singlespace}

1019: \end{table}

1020:

1021: Based on data $\bX_t$, an estimator $\wh \bA $ was obtained with

1022: $\Psi_n(\hat{\bf A})= 0.1732$. Consequently a GARCH(1,1) model was

1023: fitted for each CUC. The estimated coefficients are listed

1024: in Table~3 which shows

1025: that the volatility of the first and third CUCs is highly persistent

1026: as $\hat{\alpha}_1+\hat{\beta}_1=0.9925$ and

1027: $\hat{\alpha}_3+\hat{\beta}_3=0.9611$.

1028: (One may fit the first CUC with an IGARCH model.)

1029: On the other hand, the volatility of the second CUC is less persistent as

1030: $\hat{\alpha}_2+\hat{\beta}_2=0.80$.

1031:

1032: We applied the bootstrap procedure (with 500 replications) described

1033: in section~2.4 to test the existence of the CUCs. The $p$-value

1034: is 0.60, indicating that there is no strong evidence against the

1035: hypothesis of the existence of CUCs.

1036: The $(1-\alpha)$  bootstrap

1037: confidence set for the transformation

1038: matrix ${\bf A}$  is

1039: $\{ \bA | D(\hat{\bf A}, {\bf A})\le c_{\alpha}, \;

1040: {\bf A}^{\tau}{\bf A}={\bf I}_3 \}$ with $c_{\alpha}= 0.1718$ for

1041: $\alpha = 0.05$, and 0.1368 for $\alpha = 0.1$.

1042: Since $D(\hat{\bf A}, \bI_3) = 0.2593$,

1043: $\bI_3$ is not contained in the confidence sets. This indicates that the

1044: principal components cannot be taken as the CUCs.

1045: The confidence intervals for the parameters for each CUC-GARCH(1,1)

1046: models are listed in Table~3.

1047: The length of the confidence intervals increase as

1048: the volatility persistent measured by $\wh \alpha_i + \wh \beta_j$

1049: decreases. This is  consistent with the finding from the simulation

1050: study reported  in section 3.

1051:

1052: Based on the fitted conditional variances $\wh \sigma_{ti}^2$ for the CUCs,

1053: the conditional variance matrix for the original series $\bY_t$ is equal to

1054: \[

1055: \hat{\bf H}_t={\bf W} \diag\{\wh\sigma^2_{t1}, \wh\sigma^2_{t2},

1056: \wh\sigma^2_{t3}\} {\bf W}^{\tau},

1057: \]

1058: where  ${\bf W}={\bf P}{\bf \Lambda}^{\frac{1}{2}}{\wh\bA}$.

1059: Since the volatility processes of the first and third CUC are highly

1060:  persistent, they can be modelled with Integrated GARCH models. If so,  the volatility processes for original series and their covariance processes are virtually

1061: modelled by  mixtures of IGARCH models and mean-reverting GARCH models,

1062: which is similar to the Component GARCH model used in Ding and

1063: Granger (1996) to capture the long memory properties for a univariate

1064: volatility process.

1065:

1066:

1067:

1068:

1069:

1070: \begin{table}[htb]

1071: \begin{center}

1072: \caption[Table 4]{ Fitted CUC-GARCH(1,1) model for

1073: SCI }

1074: \begin{tabular}{c | c c c }\hline

1075:                &        Estimate                         &           95\% Confidence Set      &             90\% Confidence Set               \\[0.5ex]\hline

1076:   ${\bf a}_1$  & $(-0.5605,  -0.0018,  -0.8081)^{\tau}$  &                                    &                                                \\

1077:   ${\bf a}_2$  & $(0.5693,   0.7217,   -0.3939)^{\tau}$  &          $c_{0.05}= 0.1718$        &         $c_{0.10}=0.1368$                      \\

1078:   ${\bf a}_3$  & $(0.6015,   -0.6922,  -0.3989)^{\tau}$  &                                    &                                                \\  \hline

1079:   $\gamma_1$   &            0.0074                       &     (0.0042, 0.0592)               &          (0.0048, 0.0449)                      \\

1080:   $\alpha_1$   &            0.0519                       &     (0.0316, 0.0915)               &          (0.0350, 0.0812)                      \\

1081:   $\beta_1$    &            0.9406                       &     (0.8446, 0.9576)               &          (0.8740, 0.9548)                      \\   \hline

1082:   $\gamma_2$   &            0.1997                       &     (0.0460, 0.7138)               &          (0.0673, 0.5705)                      \\

1083:   $\alpha_2$   &            0.0432                       &     (0.0077, 0.1054)               &          (0.0107, 0.0926)                      \\

1084:   $\beta_2$    &            0.7572                       &     (0.2446, 0.9289)               &          (0.3600, 0.9069)                      \\   \hline

1085:   $\gamma_3$   &            0.0389                       &     (0.0200, 0.1042)               &          (0.0239, 0.0870)                      \\

1086:   $\alpha_3$   &            0.0884                       &     (0.0476, 0.1305)               &          (0.0517, 0.1236)                      \\

1087:   $\beta_3$    &            0.8727                       &     (0.7889, 0.9266)               &          (0.8051, 0.9140)                      \\ \hline

1088: \end{tabular}\\[0.5ex]

1089: \end{center}

1090: \end{table}

1091:

1092: Figure 3 depicts the fitted volatility processes for each return

1093: series and Figure 4 displays the conditional correlations among

1094: the three components series. Note the volatilities of the S$\&$P

1095: 500  index has a much smaller scale than those of the two

1096: individual stocks.

1097: Increasing trends can be observed in all the three correlation processes

1098: over the last three years,  which may be  connected  with the

1099: high volatilities in all the return series over the same period. But on

1100: the other hand, the high volatility of Cisco prices in the middle period did

1101: not lead to a high correlation with the other two. This suggests a unilateral

1102: impact from the market to the single stock.

1103:

1104: Figure 5 displays the fitted volatility processes for the three return

1105: series based on the orthogonal GARCH(1,1) model of Alexander (2001) and

1106: Ding and Engle (2001). Note that orthogonal GARCH model effectively

1107: treats the principal

1108: components as conditional uncorrelated variables, which may overlook important

1109: conditional dependence structure in the original data.

1110: Note that the time varying patterns in the three processes in Figure~5 are

1111: similar, which is different from Figure~3 of CUC-GARCH(1,1) fitted.

1112: Especially the orthogonal GARCH fitting artificially inflates

1113: the volatility of S\&P500 index in the middle period; see the original

1114: time plot of the series in Figure~2.

1115:  The inflation is due to treating the conditional correlated principal

1116: components as CUCs. As we stated above, the identity matrix is indeed

1117: not included in the confidence set for~$\bA$.

1118:

1119:

1120: \askip

1121:

1122: Our second data set consists of the daily close returns of five Asian

1123: stock  indices, namely,  Hang Seng index of Hong Kong (HS), Japan Nikkei

1124: 225 index (JN),

1125: Shanghai Composite index of China (SC), Straits Time index of Singapore

1126: (ST) and Taiwan Weighted index (TW) in the period of 1 August 1997 ---

1127: 30 December 2003.  Adjustments are also made to

1128: account for the differences in the holidays of the five markets.

1129: The five return series are plotted in Figure~6, and the descriptive

1130: statistics are listed in Table~2.

1131: Most of the sample means of these returns are

1132: negative, except the mean of SC. Different from the three series

1133: in SCI, all

1134: five series are right-skewed over this specific

1135: period. The bootstrap $p$-values for the $Q$ statistics are obtained in the

1136: same way as before; indicating no significant

1137: autocorrelation in all the five series.

1138:

1139: We fitted a CUC-extended GARCH(1,1) to the mean-deleted return series.

1140: The lagged valued from the other CUCs were selected using BIC together

1141: with a forward searching; see section~2.3.3.

1142: The fitted extended GARCH(1,1) models, based on quasi-MLE with Gaussian

1143: likelihood, for the five

1144: CUCs are reported in Table~4. According to the fitted models, the first

1145: CUC is causal in variance to the fifth CUC, the second CUC is causal in

1146: variance to the first

1147: and the third CUCs, and the fifth CUC is causal in variance to the first CUC.

1148: On the other hand, no additional variables were selected in the models

1149: for the second and fourth CUCs.

1150:

1151: Figure~7 displays the fitted volatility processes for the five

1152: original stock returns. As expected, the most volatile waves are

1153: observed at the early of 1998 with the onset of the Asian

1154: financial crisis, which are especially predominant in Hong Kong

1155: and Singapore markets. While the shock is still big, the impact of

1156: the crisis on Japan and Taiwan markets is less drastic.

1157: Furthermore, the effect to Shanghai market is on a much smaller

1158: scale. In Figure 8, we present the fitted conditional correlation

1159: between Hong Kong and the other four markets. Obviously, the most

1160: correlated period is in accord with the epidemic of Asian

1161: financial crisis. After that, the correlations between Hong Kong

1162: and Singapore almost remain at a constant level except two

1163: downslides in the middle of 1999 and 2002, respectively. Likewise,

1164: the correlations between Hong Kong and Taiwan are almost at a

1165: constant level, although a little smaller than

1166:  that with Singapore market.  A upward trend can be seen in the

1167: correlation between Hong Kong and Japan markets in the last few years,

1168: which suggests that these

1169:   two markets  were  becoming more closely integrated.  On the contrary,

1170: the correlations between Hong Kong and Shanghai markets seems to have a

1171: downward to zero trend in

1172:   the last few years. The implications of these observations  to

1173: international diversification deserve a further investigation.

1174:

1175:

1176: \begin{table}[htb]

1177: \begin{center}

1178: \caption[Table 4]{Extended GARCH(1,1) for CUCs of Asian

1179: Market Data }

1180: \begin{tabular}{c | c | c | l |c  }\hline

1181:      $j$       &   $j_i$     &   $ r$ &        \multicolumn{1}{c|}{                                  $ \sigma_{t,j}  $       }                  &       BIC       \\[0.5ex]\hline

1182:        1       &   5, 2      &    2   &   $\sigma_{t,1}^2=0.0271+0.8609\sigma_{t-1,1}^2+0.0405Z_{t-1,1}^2+0.0637Z_{t-1,5}^2+0.0117Z_{t-1,2}^2 $&    $3622$       \\

1183:        2       &             &    0   &   $\sigma_{t,2}^2=0.0521+0.8004\sigma_{t-1,2}^2+0.1475Z_{t-1,2}^2 $                                    &    $3602$       \\

1184:        3       &    2        &    1   &   $\sigma_{t,3}^2=0.0077+0.9301\sigma_{t-1,3}^2+0.0526Z_{t-1,3}^2+0.0098Z_{t-1,2}^2$                   &    $3731$       \\

1185:        4       &             &    0   &   $\sigma_{t,4}^2=0.0704+0.8539\sigma_{t-1,4}^2+0.0757Z_{t-1,4}^2 $                                    &    $3780$       \\

1186:        5       &    1        &    1   &   $\sigma_{t,5}^2=0.0122+0.8227\sigma_{t-1,5}^2+0.1530Z_{t-1,5}^2+0.0261Z_{t-1,1}^2$                   &    $2534$       \\  \hline

1187:  \end{tabular}\\[0.5ex]

1188: \end{center}

1189: \end{table}

1190:

1191: Finally we compared the fitting based on our CUC-based GARCH(1,1) with

1192: the orthogonal GARCH(1,1) models and Engle's dynamic conditional correction

1193: (DCC) model (\ref{a4}) and

1194: (\ref{a5}) in terms of a goodness-of-fit tests based on the Ljung-Box statistic

1195: (Tse and Tsui 1999). Note the DCC-model for each component of $\bY_t$

1196: reduced to the standard univariate GARCH(1,1) fitting.

1197: We define the standardized residual for the

1198: $i$-th series as

1199: $

1200: \hat{u}_{ti}=Y_{ti}/\hat{\sigma}_{t,ii}^{1/2},

1201: $

1202: where $\hat{\sigma}_{t,ii}$ is the $(i,i)$-th element of the fitted

1203: conditional variance of $\bY_{t}$. Define

1204: \[

1205: C_{t,ij}=\left\{ \begin{array} {l r} \hat{u}_{ti}^2-1 & i= j \\

1206: \hat{u}_{ti}\hat{u}_{tj}-\hat{\rho}_{t,ij}  & i\ne j, \end{array} \right.

1207: \]

1208: where

1209: $\hat{\rho}_{t,ij}=\hat{\sigma}_{t,ij}/(\hat{\sigma}_{t,ii}

1210: \hat{\sigma}_{t,jj})^{1/2}$

1211: is the estimated conditional correlation between $Y_{ti}$ and $Y_{tj}$.

1212: If the model

1213: is correctly specified, there is no autocorrelation in $\{ C_{t,ij}, t\ge 1\}$

1214: for any fixed $i, j$.

1215: Put

1216: \[

1217: Q(ij, M)=n \sum_{k=1}^M r_{ij,k}^2,

1218: \]

1219: where $r_{ij,k}$ is the lag $k$ sample autocorrelation of

1220: $C_{t,ij}$. It is intuitively clear that the large values of $Q(ij, M)$ indicate

1221: the lack of fit for the conditional correlation between the $i$-th and

1222: $j$-th components $\bY_t$ for $i\ne j$, and the lack of fit for the

1223: conditional variance of the $i$-th component for $i=j$.  Although the

1224: distribution theory of $Q(ij,M)$ is

1225: unknown, empirical evidence suggests that $\chi^2_M$ provides a

1226: reasonable reference in practice; see Tse and Tsui (1999).

1227:

1228: Table~5 lists the values of the $Q$-statistics with $M=10$. The

1229: significant levels were gauged  according to the

1230: $\chi^2_{10}$-distribution. The advantage  of using the CUC-GARCH

1231: model over the Orthogonal GARCH model is obvious as the $Q$-values

1232: for the former tend to be smaller, or significantly smaller, than

1233: those for the latter. Furthermore, all the $Q$ values for the

1234: fitted CUC-GARCH models are insignificant at the level of 10\%,

1235: while the test rejects some Orthogonal GARCH fittings at the

1236: significance level 1\%. For example, the $p$-values for testing

1237: the correlations between S\&P~500 and Cisco stock,  and S\&P~500

1238: and Intel stock is less than 1\%; indicating significant

1239: autocorrelation. This may explain the incomprehensible jumps in

1240: the fitted volatility for S\&P 500 by orthogonal GARCH model in

1241: Figure~5. The same phenomena may also be observed in the fitting

1242: for the second data set. The  orthogonal GARCH model failed to

1243: provide adequate fittings for Hang Seng index (HS), Singapore

1244: Straits Time index (ST) and Taiwan Weighted index (TW), as

1245: indicated by  the large $Q$-values; see Table~5.

1246:

1247:

1248: Overall the DCC model provide a competitive performance

1249: to the CUC model for the Asian Markets data. This is may due

1250: to a certain degree of homogeneity

1251: among the five Asian market indices.

1252: For SCI consisting of one market index and two stock prices,

1253: the gain of using CUC over DCC is more pronounced. First, the DCC-model

1254:  seems to fail to catch the dynamic correlation

1255: between the returns of the S\&P 500 index and the Cisco stock price.

1256: Furthermore, although $Q$-value for the CUC-model for S\&P 500 is

1257: marginally larger than that of the DCC model,

1258: the $Q$-values for the CUC-models for both Intel and Cisco prices

1259: are substantially smaller than those for the DCC models; suggesting

1260: an improvement for the modelling  volatility dynamics for the Intel or

1261: the Cisco price by incorporating the information from other series.

1262:

1263:

1264:

1265:

1266:

1267: The $Q$-tests with different values of $M$ lead to similar pattern

1268: as Table~5, which, therefore, are omitted to save the space.

1269:

1270:

1271: \begin{landscape}

1272: \begin{table}[tabh]

1273: \begin{center}

1274: \caption[Table 5]{Specification test ---  $Q(10)$ for cross products of standardized residuals }

1275: \begin{tabular}{@{\hspace{0.6cm}}c @{\hspace{0.6cm}} | @{\hspace{0.5cm}} c @{\hspace{0.8cm}} c @{\hspace{0.8cm}} c @{\hspace{0.5cm}} |  c @{\hspace{1cm}} c @{\hspace{1cm}}c @{\hspace{1cm}}c  @{\hspace{0.6cm}} }\hline

1276:                &             \multicolumn{3}{c|} { SCI   data}      &                     \multicolumn{4}{c} {Asian Market Data}                   \\ \hline

1277:    $i,j$       &    O-GARCH       &   DCC          &   CUC-GARCH    &   \hspace{0.5cm} O-GARCH      &    DCC         &    CUC-GARCH    &   CUC-Ex GARCH      \\[0.5ex]\hline

1278:      1         &  $59.9140^{***}$ &   5.9498       &  6.2050        & \hspace{0.5cm}$56.7580^{***}$ &    6.0285      &    11.4480      &   8.6961                  \\

1279:      2         &   10.5100        &   9.0587       &  8.0542        & \hspace{0.5cm}12.3540         &    7.8517      &    8.6713       &   8.7751                  \\

1280:      3         &   2.6192         &   6.4293       &  2.2397        & \hspace{0.5cm} 8.5368         &    9.2749      &    8.5301       &   8.5265                  \\

1281:      4         &                  &                &                & \hspace{0.5cm}$18.6100^{**}$  &    2.6610      &    4.0512       &  3.7954                   \\

1282:      5         &                  &                &                & \hspace{0.5cm}$18.0610^{*}$   &    7.4710      &    11.7960      &  13.5150                  \\

1283:      1,2       &  $51.8060^{***}$ &  10.4887       &  10.9090       & \hspace{0.5cm} 7.1025         &    7.0622      &    4.6433       &   4.2671                  \\

1284:      1,3       &  $77.5140^{***}$ & $20.6745^{**}$ &  10.5170       & \hspace{0.5cm} 3.8940         &    4.6465      &    3.4987       &   3.5564                  \\

1285:      1,4       &                  &                &                & \hspace{0.5cm}$17.2180^{*}$   &    4.7943      &    6.2915       &   5.8084                  \\

1286:      1,5       &                  &                &                & \hspace{0.5cm} 9.2396         &    6.1648      &    5.6669       &   6.3143                  \\

1287:      2,3       &   5.9453         &   7.0617       &  9.6275        & \hspace{0.5cm} 9.6031         &    10.1762     &    9.6444       &   9.5912                  \\

1288:      2,4       &                  &                &                & \hspace{0.5cm} 6.3708         &    7.7241      &    3.4542       &   3.2648                  \\

1289:      2,5       &                  &                &                & \hspace{0.5cm} 6.8629         &    5.8438      &    6.1856       &   6.9089                  \\

1290:      3,4       &                  &                &                & \hspace{0.5cm} 11.9120        &    8.0303      &    7.3119       &   5.8486                  \\

1291:      3,5       &                  &                &                & \hspace{0.5cm} 2.2256         &    2.1565      &    1.5721       &   1.6857                  \\

1292:      4,5       &                  &                &                & \hspace{0.5cm} 5.4389         &    4.7838      &    3.0312       &   3.1083                  \\  \hline

1293:  \end{tabular}\\[0.5ex]

1294: \end{center}

1295: \begin{singlespace}

1296: \emph{Note:} {\sl 1)  ***, **, * indicate that the corresponding

1297: test is significant at the level 0.01, 0.05 and 0.1, respectively.

1298: 2)  $i, j$ in the left column corresponds to to the orders of component

1299: series in each data sets. For example, ``1,2'' stands for the cross

1300: product of the standardized residuals of S\&P 500 and Cisco for the SCI,

1301: and for HS and JN for the Asian market data set.}

1302: \end{singlespace}

1303: \end{table}

1304: \end{landscape}

1305:

1306:

1307:

1308:

1309:  \setcounter{equation}{0}

1310:  \renewcommand{\theequation}{5.\arabic{equation}}

1311:

1312:

1313: \section*{Appendix A --- Proof of Theorem 1}

1314:

1315: We introduce some notation first.

1316: Let $$\bC_{n, k}(B) = (n-k)^{-1} \sum_{t=k+1}^n \bX_t \bX_t^\tau I

1317: (\bX_{t-k} \in B), \quad \quad \bC_k (B) = E\{ \bX_t \bX_t^\tau I (\bX_{t-k} \in

1318: B)\}.$$

1319: The lemma below shows that both $\Psi(\cdot)$ and $\Psi_n(\cdot)$ are

1320: Lipschitz continuous on $\calH_D$ with $D$-distance, where $\calH_D$ is

1321: the quotient space; see Remark 2.

1322:

1323: \askip

1324:

1325: \noindent

1326: {\bf Lemma 1}. For any $\bU, \bV \in \calH_D$, it holds that

1327:  $$|\Psi(\bU) - \Psi(\bV) | \le c \; \tr E (\bX_t \bX_t^T) \, \{ D(\bU, \bV) \}^{1/2},$$ and

1328: $$

1329: |\Psi_n(\bU) - \Psi_n(\bV) | \le c \; \tr (n^{-1} \sum_{i=1}^n

1330: \bX_t \bX_t^T ) \, \{ D(\bU, \bV) \}^{1/2}$$ almost surely, where

1331: $c>0$ is a constant and $\tr(\bA)$ is the trace of a matrix $\bA$.

1332:

1333: \askip

1334:

1335: \noindent {\bf Proof}. We only prove the lemma for $\Psi(\cdot)$.

1336: The result for $\Psi_n(\cdot)$ may be shown in the same manner.

1337: Let $\bU=(\bu_1, \cdots, \bu_d)^\tau$, $\bV=(\bv_1, \cdots,

1338: \bv_d)^\tau$, $u_{ijk}(B) = E\{ \bu_i^\tau \bC_k(B) \bu_j\}$ and

1339: $v_{ijk}(B) = E\{ \bv_i^\tau \bC_k(B) \bv_j\}$. We assume that the

1340: orders and the directions of $\bu_i$ and $\bv_j$ are arranged such

1341: that $\bu_i ^\tau \bv_i\in [0,1]$ for all $i$, and

1342: \begin{equation} \label{p1}

1343: D(\bU, \bV) = 1 - {1\over d} \sum_{i=1}^d \bu_i ^\tau \bv_i

1344: = {1\over d} \sum_{i=1}^d(1 - \bu_i ^\tau \bv_i).

1345: \end{equation}

1346: See (\ref{b7}).

1347: Put the spectral decomposition for $\bC_k(B)$ as

1348: $$\bC_k(B) = \sum_{\ell=1}^d \mu_{\ell}(B,k) \bgamma_\ell

1349: \bgamma_\ell^\tau,$$ where $\mu_1(B,k) \ge \cdots \ge

1350: \mu_d(B,k)\ge 0$ are the eigenvalues of $\bC_k(B)$, and

1351: $\bgamma_1, \cdots, \bgamma_d$ are their corresponding (orthonormal)

1352: eigenvectors. It is easy to see that $\mu_\ell(B,k)\le \mu_\ell$

1353: for all $k$ and $B$, where $\mu_1 \geq \cdots \geq \mu_d$ are the

1354: eigenvalues of the matrix $E \{ \bX_t \bX^\tau\}$.

1355: Consequently, by noticing that

1356: $|\bgamma_\ell^\tau \bu_j | \leq 1$ and $|\bv_i^\tau \bgamma_\ell |

1357: \leq 1$, we have

1358: \begin{eqnarray} \nonumber

1359: && | u_{ijk}(B) - v_{ijk}(B) | \; \le \; \sum_{\ell=1}^d

1360: \mu_\ell | \bu_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -

1361: \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bv_j|\\ \nonumber

1362: &\le & \sum_{\ell=1}^d

1363: \mu_\ell \{ | \bu_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -

1364:  \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j|

1365: + | \bv_i^\tau \bgamma_\ell \bgamma^\tau_\ell \bu_j -\bv_i^\tau

1366:    \bgamma_\ell \bgamma^\tau_\ell \bv_j|\}\\ \nonumber

1367: &\le &

1368: \sum_{\ell=1}^d \mu_\ell \{ | ( \bu_i-\bv_i)^\tau \bgamma_\ell| +

1369: |\bgamma^\tau_\ell (\bu_j-\bv_j)|\}

1370: \end{eqnarray}

1371: By using the Cauchy-Schwartz's inequality, the above inequality is

1372: furthered bounded by

1373: \begin{eqnarray}

1374: & & \sum_{\ell=1}^d \mu_\ell  \{ ||\bu_i-\bv_i|| +

1375: ||\bu_j-\bv_j||\} \nonumber \\ \label{p2} &=& \sqrt{2} \{( 1 -

1376: \bu_i^\tau \bv_i)^{1/2} + (1 - \bu_j^\tau \bv_j)^{1/2}\}

1377: \sum_{\ell=1}^d \mu_\ell.

1378: \end{eqnarray}

1379:

1380: Note that for $x\ne 0$, it holds that

1381: \begin{equation} \label{p3}

1382: |x+y| - |x| = y\, \sgn(x) + 2(x+y)\{ I(-y<x<0) - I(0<x<-y) \}.

1383: \end{equation}

1384: Hence,

1385: \begin{eqnarray}

1386: &&

1387: \Psi(\bU) \; = \;

1388: \sum_{1\le i < j \le d}

1389: \sup_{1\le k \le k_0, \, B\in \calB} \big[

1390: |v_{ijk}(B)|+|v_{ijk}(B) + \{u_{ijk}(B)-v_{ijk}(B)\}|

1391: - | v_{ijk}(B)| \big] \nonumber \\

1392: &=& \sum_{1\le i < j \le d} \sup_{1\le k \le k_0, \, B\in \calB}

1393: \big[ | v_{ijk}(B)| + \{ u_{ijk}(B)-v_{ijk}(B)\}\sgn\{

1394: v_{ijk}(B)\}  \nonumber \\

1395: & & + \; 2 u_{ijk}(B) \{I (B_1) - I( B_2) \}\big], \label{p4}

1396: \end{eqnarray}

1397: where

1398: \[

1399: B_{1} = \{ v_{ijk}(B)-u_{ijk}(B) < v_{ijk}(B) <0\}, \quad B_{2} =

1400: \{ 0< v_{ijk}(B)<v_{ijk}(B)-u_{ijk}(B)\} .

1401: \]

1402: On the set $B_1 \cup B_2$,

1403: \[

1404: |u_{ijk}(B)| \le |u_{ijk}(B)-v_{ijk}(B)|+|v_{ijk}(B)|\le 2

1405: |u_{ijk}(B)-v_{ijk}(B)|.

1406: \]

1407: This, combining with (\ref{p2}) and (\ref{p4}), implies that

1408: \begin{eqnarray}\nonumber

1409: & & |\Psi(\bU) - \Psi(\bV) | \\

1410: &  \le & \sum_{1\le i < j \le d} \sup_{1\le k \le k_0, \, B\in

1411: \calB} \big[\sqrt{2} \{ ( 1 - \bu_i^\tau \bv_i)^{1/2}  + (1 -

1412: \bu_j^\tau \bv_j)^{1/2} \} \sum_{\ell=1}^d \mu_\ell

1413:      +  2|u_{ijk}(B)| I_1 (B_1) \big] \nonumber \\

1414: & \le & \; 5 \sqrt{2} \sum_{1\le i < j \le d} \{( 1 - \bu_i^\tau

1415: \bv_i)^{1/2} + (1 - \bu_j^\tau \bv_j)^{1/2}\} \sum_{\ell=1}^d

1416: \mu_\ell \nonumber \\

1417: & \le &  10 \sqrt{2} d \sum_{\ell=1}^d \mu_\ell \sum_{i=1}^d (1-

1418: \bu_i^\tau \bv_i)^{1/2}. \label{p5}

1419: \end{eqnarray}

1420:

1421: Now the lemma follows from (\ref{p5}) and the inequality

1422: \[

1423: \sum_{i=1}^d (1- \bu_i^\tau \bv_i) ^{1/2} \le d^{1/2} \Big\{

1424: \sum_{i=1}^d (1- \bu_i^\tau \bv_i) \Big\}^{1/2},

1425: \]

1426: see also (\ref{p1}). This completes the proof.

1427:

1428:

1429:

1430:

1431:

1432: \askip

1433:

1434: \noindent

1435: {\bf Proof of Theorem 1}.

1436: Since $\bC_{n, k} (B) - \bC_k(B)$ is a real symmetric matrix, it holds

1437: for any unit vectors $\ba$ and $\bb$ that

1438: \[

1439: |\ba^\tau \{ \bC_{n, k} (B) - \bC_k(B)\} \bb| \le || \bC_{n, k} (B) - \bC_k(B)||,

1440: \]

1441: where $|| \bC_{n, k} (B) - \bC_k(B)||$ denotes the sum of the absolute values of

1442: the eigenvalues of $\bC_{n, k} (B) - \bC_k(B)$. This may be obtained by

1443: using the spectral decomposition of $\bC_{n, k} (B) - \bC_k(B)$.

1444: Consequently it holds uniformly for any orthogonal matrix $\bA$ that

1445: \begin{eqnarray} \nonumber

1446: |\Psi_n(\bA) - \Psi(\bA)| & \leq & \sum_{1 \leq i < j \leq d}

1447:      \sup_{1 \leq k \leq k_0, B \in \calB} \left | \ba_i ^\tau \{ \bC_{n,

1448:      k} (B) - \bC_k(B) \} \ba_j  \right | \\

1449:      & \leq & {d(d-1)\over 2}

1450:      \sup_{1 \leq k \leq k_0, B \in \calB}  \| \bC_{n, k}

1451:      (B) - \bC_k(B) \| .

1452: \label{p6}

1453: \end{eqnarray}

1454: Note the $(i,j)$-th element of $\bC_{n, k} (B) - \bC_k(B)$

1455: is $${1 \over n-k}  \sum_{t=k+1}^n X_{ti}X_{tj}

1456:      I(\bX_{t-k} \in B) - E\{ X_{ti}X_{tj} I(\bX_{t-k} \in B)\},$$

1457: where $X_{ti}$ denotes the $i$-th element of $\bX_t$.

1458: Since $E | X_{ti}X_{tj}| < \infty$ and $\calB$ is a VC-class, the

1459: covering number for the set of functions $\{X_{ti}X_{tj}

1460: I(\bX_{t-k} \in B), B \in \calB\}$ has a polynomial rate of growth

1461: for any underlying probability measure (Theorem 2.6.4, van der

1462: Vaart and Wellner 1996).   Hence, it is a Glivenko-Cantelli class.

1463: It follows now from Theorem 3.4 of Yu (1994) that

1464: \[

1465:     \sup_{B \in \calB} \Big | {1 \over n-k} \sum_{t=k+1}^n

1466:     X_{ti}X_{tj} I(\bX_{t-k} \in B) - E\{ X_{ti}X_{tj} I(\bX_{t-k} \in B)\} \Big |

1467: \scon 0,

1468: \]

1469: Consequently, $$\sup_{B \in \calB} |\lambda_{\max}(B, k)| \scon 0,

1470: \quad

1471: \quad

1472: \sup_{B \in \calB}| \lambda_{\min}(B,k)| \scon 0,$$

1473: where $\lambda_{\max}(B, k)$ and $ \lambda_{\min}(B,k)$ denote, respectively,

1474: the maximum and the minimum eigenvalues of $\bC_{n, k} (B) - \bC_k(B)$.

1475: Thus

1476: $$

1477:     \sup_{B \in \calB}  \| \bC_{n, k} (B) - \bC_k(B) \| \scon 0,

1478: $$

1479: for $k=1, \cdots, k_0$. Now it follows from (\ref{p6}) that

1480: $$

1481:   \sup_{\bA \in \calH_D} | \Psi_n(\bA) - \Psi(\bA) | \scon 0.

1482: $$

1483: Combining this with Lemma~1 above and

1484: the continuity of the argmax mapping (Theorem 3.2.2 and Corollary

1485: 3.2.3, van der Vaart and Wellner, 1996),  it

1486: holds that $D(\hat{\bA}, \bA_0)

1487: \scon 0$.  This completes the proof of the first part of Theorem 1.

1488:

1489: \askip

1490:

1491: Under the additional condition $ E | X_{ti}X_{tj}|^{2p} < \infty$

1492: and the mixing condition given in Condition (A4),  Theorem 1 of

1493: Arcones and Yu (1994) implies that the set of functions

1494: $\{X_{ti}X_{tj} I(\bX_{t-k} \in B), B \in \calB\}$ is a Donsker

1495: class, and hence the process $ \{\bDelta_{n, k}(B), B \in \calB

1496: \}$ indexed by $B \in \calB$  converges weakly to a Gaussian

1497: process, where $\bDelta_{n, k} (B) = \sqrt{n} \{ \bC_{n,k}(B) -

1498: \bC_k(B) \}$. It follows from (\ref{p3}) that

1499: \begin{eqnarray} \nonumber

1500: \Psi_n(\bA) &=& \sum_{1 \leq i < j \leq d} \sup_{B \in \calB, 1 \leq

1501: k \leq k_0 } \big [ |\ba_i^T \bC_k(B) \ba_j| + n^{-1/2}

1502: \sgn\{\ba_i^\tau \bC_k(B) \ba_j\} \ba_i ^\tau \bDelta_{n, k}(B) \ba_j \\

1503: & & + \ba_i^\tau \bC_{n,k}(B) \ba_j \{ I(B_3) - I(B_4)\}  \big ]  \nonumber \\

1504: & = &  \Psi (\bA) + O_P(n^{-1/2}), \label{p7}

1505: \end{eqnarray}

1506: where

1507: \[

1508: B_3 = \{n^{-1/2} \ba_i ^\tau \bDelta_{n, k}(B) \ba_j  < \ba_i^\tau

1509: \bC_k(B) \ba_j<0\}, \quad

1510: B_4=  \{0< \ba_i^\tau \bC_k(B) \ba_j < n^{-1/2}

1511: \ba_i ^\tau \bDelta_{n, k}(B) \ba_j\}.

1512: \]

1513: The last equality in (\ref{p7}) follows from the fact that on

1514: $B_3 \cup B_4$,

1515: \[

1516: |\ba_i^\tau \bC_{n,k}(B) \ba_j| \le |\ba_i^\tau \bC_k(B) \ba_j|

1517: + n^{-1/2}|\ba_i ^\tau \bDelta_{n, k}(B) \ba_j|

1518: \le 2 n^{-1/2}|\ba_i ^\tau \bDelta_{n, k}(B) \ba_j|.

1519: \]

1520: It follows from (\ref{p7}) and condition (A5) that

1521: \begin{eqnarray}

1522: \Psi_n (\bA_0) - \Psi_n(\bA)   =  \Psi(\bA_0) -

1523:     \Psi(\bA) + O_P(n^{-1/2})

1524:  \leq  -a D(\bA_0, \bA) + O_P(n^{-1/2}) \label{p8}.

1525: \end{eqnarray}

1526: Now by substituting $\bA$ by $\hat{\bA}$, the left hand side of

1527: (\ref{p8}) must be non-negative by the definition of~$\hat{\bA}$.

1528: The right hand side of (\ref{p8}) would be negative unless

1529: $$

1530: D(\bA_0, \hat{\bA}) = O_P(n^{-1/2}).

1531: $$

1532: This completes the proof.

1533:

1534:

1535: \section*{Appendix B --- Proof of Theorem 2}

1536:

1537: From the proof of Theorem 1, we have

1538: \begin{equation}

1539:       \sup_{\bA \in \calH} | \Psi_n (\bA) - \Psi(\bA) | \scon 0.

1540:       \label{p9}

1541: \end{equation}

1542: Since $\Psi(\bA)$ is continuous on the compact quotient space

1543: $\calH$, there exists a minimizer $\bA_0$.  It follows that

1544: \begin{eqnarray*}

1545:  \Psi(\hat{\bA}) - \Psi(\hat{\bB}) & = & \Psi(\bA_0) -

1546:  \Psi(\hat{\bB})  + \Psi(\hat{\bA})- \Psi(\bA_0) \\

1547:  & \leq & \Psi(\hat{\bA})- \Psi(\bA_0) \\

1548:  & = & \{\Psi(\hat{\bA}) - \Psi_n( \hat{\bA})\}

1549:         + \{\Psi_n( \hat{\bA}) - \Psi_n( \bA_0 )\}

1550:         + \{ \Psi_n( \bA_0 ) - \Psi( \bA_0)\}.

1551: \end{eqnarray*}

1552: Using the fact $\Psi_n(\hat{A}) - \Psi_n (\bA_0) \leq 0$, we

1553: conclude from (\ref{p9}) that

1554: $$

1555: \liminf \{ \Psi(\hat{\bA}) - \Psi(\hat{\bB})\} \leq 0.

1556: $$

1557: This completes the proof of Theorem 2.

1558:

1559:

1560:

1561:

1562: \section*{Appendix C --- Proof of Theorem 3}

1563:

1564:

1565: For each $j$, there are at most $r$ non-zero $\alpha_{jk}$. Since

1566: $\beta_j < 1$, it holds that

1567: \[

1568: \sigma_{tj}^2 = {\ga_j \over 1- \beta_j} +  \sum_{i=1}^d

1569: \alpha_{ji}\sum_{k=1}^\infty \beta_j^{k-1} Z_{t-k,i}^2.

1570: \]

1571: Now Theorem~2 follows from Lemma~2 below immediately by letting

1572: $Y_{tj} = X_{tj}^2$ and $\rho_{tj} = \sigma_{tj}^2$. Note that

1573: Lemma~2 may be proved in the similar manner to the proof of

1574: Theorem~1 of Giraitis at al~(2000); see also section~2.7.1 of Fan

1575: and Yao~(2003).

1576:

1577: \bigskip

1578:

1579: \noindent {\bf Lemma 2}. Consider a $d$-dimensional ARCH($\infty$)

1580: process $\bY_t = (Y_{t1}, \cdots, Y_{td})^\tau$ defined by

1581: \[

1582: Y_{tj} = \rho_{tj} \zeta_{tj}, \quad \quad \rho_{tj} = c_j +

1583: \sum_{i=1}^d \sum_{k=1}^\infty b_{jik} Y_{t-k, i}

1584: \]

1585: for $j=1, \cdots, d$, where $\{ \zeta_{tj} \}$ is a sequence of

1586: non-negative i.i.d. random variables with $E(\zeta_{tj}) =1$,

1587: $Y_{tj}\ge 0$, $c_j, b_{jik} \ge 0$. Furthermore, for each $j$,

1588: $b_{jik} \ne 0$ for at most $r (\ge 0)$ values of $k$. Then the

1589: above model admits a unique strictly stationary solution $\{ \bY_t

1590: \}$ with the finite mean

1591: \[

1592: E(\bY_t) = (\bI_d - \bB)^{-1} (c_1, \cdots, c_d)^\tau

1593: \]

1594: under the condition $

1595:   \max_{ 1\le j,\, i \le d}  b_{ji\,\cdot} < 1/r,

1596: $ where $b_{ji\,\cdot} = \sum_{k\ge 1} b_{jik}$, and $\bB$ is a

1597: $d\times d$ matrix with $b_{ji\,\cdot}$ as its $(j,i)$-th element.

1598:

1599:

1600:

1601:

1602:

1603: \section*{References}

1604: \begin{description}

1605: \begin{singlespace}

1606: \item Alexander, C. (2001). Orthogonal GARCH. In {\sl Mastering

1607: Risk}. Financial Times-Prentice Hall: London; {\bf 2}, 21-38.

1608:

1609:

1610: \item Arcones, M.A. and Yu, B. (1994).  Central limit theorems for

1611:        empirical processes and U-processes of stationary mixing sequences.

1612:         {\em Jour. Theor. Probab.}, {bf 7}, 47--71.

1613:

1614:

1615: \item Back, A. and Weigend, A.S. (1997). A first application on

1616: independent component analysis to extracting structure from stock

1617: returns. {\sl International Journal of Neural Systems}, {\bf 8},473-484.

1618: \item

1619: Bauwens, L., Laurent, S. and Rombouts, J.V.K. (2003). Multivariate GARCH models:

1620:  a survey. {\sl A preprint}.

1621: \item

1622: Bollerslev, T. (1990).  Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model. {\sl Review of Economics and Statistics},

1623:  {\bf 72}, 498-505.

1624: \item

1625: Bollerslev, T.R., Engle, R. and Wooldridge, J. (1998). A capital asset pricing

1626: model with time varying covariances. {\sl Journal of Political Economy}, {\bf 96},

1627: 116-131.

1628: \item Chen, M. and An, H. (1998). A note on the stationarity and the existence of moments

1629:       of the GARCH models. {\sl Statistica Sinica}, {\bf 8}, 505-510.

1630: \item Chow, Y.S. and Teicher, H. (1997). {\sl Probability  Theory} (3rd

1631: edition). Springer, New York.

1632: \item Ding, Z. and Granger, C.W.J. (1996). Modeling volatility persistence of speculative returns:

1633:       A new approach. {\sl Journal of Econometrics}, {\bf 73}, 185-215.

1634: \item Ding, Z. and Engle, R. (2001). Large scale conditional covariance matrix modeling, estimation and testing.

1635:       {\sl  Working Paper}, {\bf FIN-01-029}, NYU Stern School of Business.

1636: \item Engle, R. (2002).Dynamic conditional correlation -- a simple class of multivariate

1637:       GARCH models. {\sl Journal of Business and Economic Statistics}, {\bf 20}, 339-350.

1638: \item

1639: Engle, R.F., Ito, T. and Lin, W.-L. (1990). Meteor shoers or

1640:                 heat waves? heteroskedastic intra-daily volatility in

1641:                 the foreign exchange market.

1642:                 {\sl Econometrica}, {\bf 58}, 525-542.

1643: \item Engle, R.F. and Kroner, K.F. (1995). Multivariate simultaneous generalised ARCH.

1644:       {\sl Econometric Theory }, {\bf 11}, 122-150.

1645: \item Engle, R.F., Ng, V.K. and Rothschild, M. (1990). Asset pricing with a factor ARCH covariance structure:

1646:       Empirical estimates for treasury bills. {\sl Journal of

1647: Econometrics}, {\bf 45}, 213-238.

1648: \item Engle, R.F. and Sheppard, K. (2001). Theoretical and empirical properties

1649: of dynamic conditional correlation multivariate GARCH. {\sl A preprint}.

1650: \item Fan, J. and Yao, Q. (2003). {\sl Nonlinear Time Series: Nonparametric and Parametric Methods}.

1651:       Springer, New York.

1652: \item Giraitis, L., Kokoszka, P., and Leipus, R. (2000). Stationary ARCH models: Dependence structure and

1653:       central limit theorem. {\sl Econometric Theory}, {\bf 16}, 3--22.

1654: \item Hall, P. and Yao, Q. (2003). Inference for ARCH and GARCH models. {\sl Econometrica},

1655:       {\bf 71}, 285-317.

1656: \item Harvey, A., Ruiz, E. and Shephard, N. (1994). Multivariate stochastic

1657: variance models. {\sl The Review of Economic Studies}, {\bf 61}, 247-264.

1658: \item Hyv\"arinen, A., Karhunen, J. and Oja, E. (2001). {\sl Independent Component

1659:       Analysis}. Wiley, New York.

1660: \item Jerez, M., Casals, J. and Sotoca, S. (2001). The likelihood of multivariate GARCH models is ill-conditioned. {\sl A preprint}.

1661: \item Kiviluoto, K. and Oja, E. (1998). Independent component analysis for parallel financial time series.

1662:       In {\sl Proc. Int. Conf. on Neural Information Processing (ICONIP'98)}, vol.2, pp.895-989, Tokyo.

1663: \item M${\breve {\rm a}}$l${\breve {\rm a}}$roiu, S., Kiviluoto, K. and

1664: Oja, E. (2000). Time series prediction with independent component analysis. {\sl A

1665:        preprint}.

1666: \item Mikosch, T. and Straumann, D. (2004). Stable limits of martingale transforms with application to the

1667:       estimation of GARCH parameters. {\sl A preprint}.

1668: \item Peng, L. and Yao, Q. (2003). Least absolute deviations estimation for ARCH

1669:       and GARCH models. {\sl Biometrika}, {\bf 90}, 967-975.

1670: \item Penzer, J., Wang, M. and Yao, Q. (2004). Approximating volatilities by asymmetric power GARCH functions. {\sl A preprint}.

1671: \item Tsay, R. (2001). {\sl Analysis of Financial Time Series}. Wiley, New York.

1672: \item Tse, Y. K. and Tsui, A.K.C. (1999). A note on diagnosing multivariate conditional heteroscedasticity models.

1673:      {\sl Journal of Time Series Analysis}, {\bf 20}, 679-691.

1674: \item Vilenkin, N. (1968). Special functions and the theory of group representation, translations of

1675:       mathematical monographs. {\sl American Math. Soc.}, Providence, Rhode Island, 22.

1676:

1677: \item van der Vaart, A.W. and  Wellner, J.A. (1996).  Weak Convergence and

1678:          Empirical Processes.  Springer, New York.

1679:

1680: \item van der Weide, R. (2002). GO-GARCH: a multivariate generalized orthogonal GARCH model. {\sl Journal of

1681:       Applied Econometrics}, {\bf 17}, 549-564.

1682:

1683: \item Wang, M. and Yao, Q. (2005). Modelling multivariate volatilities: an ad hoc

1684: approach. To appear in ``{\sl Contemporary Multivariate Analysis and

1685: Experimental Designs}'' J. Fan,  G. Li \& R. Li (edit.) World Scientific,

1686: Singapore.

1687:

1688: \item Yu, B. (1994).  Rates of convergence for empirical processes

1689:       of stationary mixing sequences.   {\sl Ann. Statist.}, {\bf 22},

1690:      94-116.

1691:

1692: \end{singlespace}

1693: \end{description}

1694:

1695:

1696:

1697: \begin{figure}[hb]

1698: \centerline{\psfig{figure=CUCGARCH_fig1_7.ps}}

1699: \begin{singlespace}

1700:

1701: \caption[Fig 1] {\sl Boxplots of the errors in estimation for

1702: CUC-GARCH(1,1) model (\ref{ex1}) with

1703: $\bA =\wh \bA$ estimated (upper panel) and the true $\bA$ (lower panel).

1704:  The sample size is $n=1000$.}

1705: \end{singlespace}

1706: \end{figure}

1707:

1708: \newpage

1709:

1710: \begin{figure}

1711: \centerline{\psfig{figure=CUCGARCH_fig5.ps}}

1712: \begin{singlespace}

1713:

1714: \caption[Fig 2] {\sl Plots of daily log return of

1715: (a)  $S\&P$ 500 index, (b) Cisco Systems stock  and (c) Intel

1716: Corporation stock.  Time span is from January 2, 1991 to December 31, 1999 with 2275

1717: observations.}

1718:

1719: \end{singlespace}

1720: \end{figure}

1721:

1722:

1723: \newpage

1724:

1725: \begin{figure}

1726: \centerline{\psfig{figure=CUCGARCH_fig3.ps}}

1727: \begin{singlespace}

1728:

1729: \caption[Fig 3] {\sl Fitted volatility processes based on  CUC-GARCH(1,1) model for daily log returns of

1730: (a)  $S\&P$ 500 index, (b) Cisco Systems stock  and (c) Intel

1731: Corporation stock. }

1732:

1733: \end{singlespace}

1734: \end{figure}

1735:

1736: \newpage

1737:

1738: \begin{figure}

1739: \centerline{\psfig{figure=CUCGARCH_fig4.ps}}

1740: \begin{singlespace}

1741:

1742: \caption[Fig 4] {\sl Fitted conditional correlations based on  CUC-GARCH(1,1) model

1743: for  daily log returns between (a) $S\&P$ 500 index and Cisco Systems

1744: stock, (b) $S\&P$ 500 index and Intel Corporation stock,  and (c)

1745: Cisco Systems stock and Intel Corporation stock.  }

1746: \end{singlespace}

1747: \end{figure}

1748:

1749: \newpage

1750:

1751: \begin{figure}

1752: \centerline{\psfig{figure=CUCGARCH_fig6.ps}}

1753: \begin{singlespace}

1754:

1755: \caption[Fig 5] {\sl Fitted volatility processes based on

1756: Orthogonal-GARCH(1,1) model for daily log returns of (a)  $S\&P$

1757: 500 index, (b) Cisco Systems stock  and (c) Intel Corporation

1758: stock. }

1759:

1760: \end{singlespace}

1761: \end{figure}

1762:

1763:

1764: \newpage

1765:

1766: \begin{figure}

1767: \centerline{\psfig{figure=CUCGARCH_fig7.ps}}

1768: \begin{singlespace}

1769:

1770: \caption[Fig 6] {\sl Plots of dividend adjusted daily log returns of

1771: (a)  Hang Seng index in Hong Kong, (b) Japan Nikkei 225 index, (c)

1772: Shanghai Composite index in China, (d) Singapore Straits Time index,

1773:  and (e) Taiwan Weighted index. Time span is from August 1, 1997 to December 30, 2003 with 1349 observations.}

1774:

1775: \end{singlespace}

1776: \end{figure}

1777:

1778:

1779: \newpage

1780:

1781: \begin{figure}

1782: \centerline{\psfig{figure=CUCGARCH_fig8.ps}}

1783: \begin{singlespace}

1784:

1785: \caption[Fig 7] {\sl Fitted volatility processes based on CUC-Extended GARCH(1,1) model for daily log returns of

1786: (a)  Hang Seng index in Hong Kong, (b) Japan Nikkei 225 index, (c)

1787: Shanghai Composite index in China, (d) Singapore Straits Time index,

1788:  and (e) Taiwan Weighted index. }

1789:

1790: \end{singlespace}

1791: \end{figure}

1792:

1793: \newpage

1794:

1795: \begin{figure}

1796: \centerline{\psfig{figure=CUCGARCH_fig9.ps}}

1797: \begin{singlespace}

1798:

1799: \caption[Fig 7] {\sl Fitted conditional correlations between daily

1800: log-returns of Hang Seng index (HS) and (a) Japan Nikkei 225 index (JN),

1801: (b) Shanghai

1802: Composite index in China (SC), (c) Singapore Straits Time index (ST),

1803:  (d) Taiwan Weighted index (TW).}

1804:

1805: \end{singlespace}

1806: \end{figure}

1807:

1808:

1809:

1810:

1811:

1812:

1813:

1814:

1815:

1816:

1817: \end{document}

1818: