0608:cs0608121/cs0608121

1: %\headheight=.2cm

2:

3:

4: \documentclass[12pt]{book}

5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

6: \usepackage{amsmath}

7: \usepackage{array}

8: \usepackage[doublespacing]{setspace}

9:

10: \setcounter{MaxMatrixCols}{10}

11: %TCIDATA{OutputFilter=LATEX.DLL}

12: %TCIDATA{Version=5.00.0.2606}

13: %TCIDATA{<META NAME="SaveForMode" CONTENT="1">}

14: %TCIDATA{BibliographyScheme=Manual}

15: %TCIDATA{LastRevised=Sunday, July 30, 2006 10:47:37}

16: %TCIDATA{<META NAME="GraphicsSave" CONTENT="32">}

17: %TCIDATA{Language=American English}

18:

19: \textwidth=31.9pc

20: \textheight=46.5pc

21: \oddsidemargin=1pc

22: \evensidemargin=1pc

23: \headsep=15pt

24: \topmargin=.6cm

25: \parindent=1.7pc

26: \parskip=0pt

27: \setcounter{page}{1}

28: \input{tcilatex}

29: \renewcommand{\baselinestretch}{2}

30:

31: \begin{document}

32:

33:

34: %\pagestyle{fancy}

35: \renewcommand{\baselinestretch}{1.2} %\lhead[\fancyplain{} \leftmark]{}

36: %\chead[]{}

37: %\rhead[]{\fancyplain{}\rightmark}

38: %\cfoot{}

39: %\headrulewidth=0pt

40: \markright{

41: }

42: \markboth{\hfill{\footnotesize\rm Cheng-Yuan Liou and  Bruce R. Musicus

43: }\hfill}

44: {\hfill {\footnotesize\rm Cross Entropy Approx of Structured Covariance Matrices} \hfill}

45: \renewcommand{\thefootnote}{} $\ $

46:

47: \fontsize{10.95}{14pt plus.8pt minus .6pt}\selectfont\vspace{0.812pc} %

48: \centerline{\large\bf Cross Entropy Approximation of Structured} \vspace{2pt}

49: \centerline{\large\bf Covariance Matrices} \vspace{0.4cm} %

50: \centerline{Cheng-Yuan Liou$^{1}$ and Bruce R. Musicus$^{2}$} \vspace{0.4cm} %

51: \centerline{\it  } \vspace{0.55cm} \fontsize{9}{11.5pt plus.8pt minus .6pt}%

52: \selectfont

53:

54: \begin{quotation}

55: \noindent \textit{Abstract:} \ We apply two variations of the principle of

56: Minimum Cross Entropy (the Kullback information measure) to fit

57: parameterized probability density models to observed data densities. For an

58: array beamforming problem with $P$ incident narrowband point sources, $N>P$

59: sensors, and colored noise, both approaches yield eigenvector fitting

60: methods similar to that of the MUSIC algorithm[1]. Furthermore, the

61: corresponding cross-entropies are related to the MDL model order selection

62: criterion[2].

63:

64: \vspace{9pt} \noindent \textit{Key words and phrases:} Array Beamforming,

65: Eigenvector methods, Kullback Information Measure, Minimum Cross Entropy,

66: Stochastic Estimation, Structured Covariance

67: \end{quotation}

68:

69: \fontsize{10.95}{14pt plus.8pt minus .6pt}\selectfont\noindent \textbf{1.

70: Introduction}

71:

72: \bigskip Many existing high resolution methods for spectral analysis and for

73: optimal beamforming utilize covariance matrices estimated from observed

74: data. Often, an underlying structure for the covariance matrix is known in

75: advance, and our goal is to estimate the covariance matrix with this

76: structure which best fits the observed data. Previous literature has

77: suggested a variety of methods of optimally estimating structured covariance

78: matrices from data[3,4,5]. In this paper, we will apply the minimum cross

79: entropy (CE)[6,7] and minimum reverse cross-entropy (RCE)[6] principles to

80: estimate the covariance matrix. These principles have proved to be quite

81: powerful in a wide variety of signal processing applications[8,9] and have

82: been justified as being "optimal" under suitable assumptions. In section 2,

83: we apply the CE and RCE procedures to the problem of estimating structured

84: covariance matrices, and in section 3 we demonstrate the utility of the idea

85: for a beamforming application.

86:

87: .

88:

89: \noindent \textbf{2. Problem Statement}

90:

91: Let \underline{$x$} be an N-dimensional real or complex random vector.

92: Assume that a Gaussian probability density for \underline{$x$} is either

93: known a prior, or has been estimated by some procedure from observed

94: data:\bigskip

95: \begin{equation}

96: p(\underline{x})=N(\underline{m},R)

97: \end{equation}%

98: where $\underline{m}$ is the expected value of \underline{$x$}, and $R$ is

99: the covariance matrix, $R=E[\underline{xx}^{H}]$, and where \underline{$x$}$%

100: ^{H}$ is the Hermitian (complex conjugate transpose) of \underline{$x$}.

101: Suppose we wish to approximate this $p(\underline{x})$ with a parameterized

102: probability density function (PDF):%

103: \begin{equation}

104: q_{\theta }(\underline{x})=N(\underline{m}_{\theta },R_{\theta })

105: \end{equation}%

106: where \underline{$\theta $} denotes the unknown parameters $\theta $ in the

107: model $q_{\theta }(\underline{x})\ $which are to be estimated. Conceptually,

108: we wish to choose $\theta $ to make $q_{\theta }(\underline{x})$\ optimally

109: match $p(\underline{x})$. An appropriate objective function is the Kullback

110: information measure[6], otherwise known as the Minimum Cross-Entropy

111: principle[7]. Because this measure is asymmetric, we can apply it in two

112: different ways to this problem. Following[8,9,10] we call these the

113: "Cross-Entropy" and "Reverse Cross-Entropy" methods:%

114: \begin{eqnarray}

115: \text{CE} &\text{:}&\ \ \hat{q}_{\theta }\leftarrow \min\limits_{\underline{%

116: \theta }}H(q_{\theta },p) \\

117: \text{RCE} &\text{:}&\ \ \hat{q}_{\theta }\leftarrow \min\limits_{\underline{%

118: \theta }}H(p,q_{\theta })

119: \end{eqnarray}%

120: where:%

121: \begin{equation}

122: H(p_{1},p_{2})=\dint p_{1}(\underline{x})\log \frac{p_{1}(\underline{x})}{%

123: p_{2}(\underline{x})}d\underline{x}

124: \end{equation}%

125: Kullback[6] has argued that $H(p_{1},p_{2})$\ measures the mean amount of

126: information for discriminating in favor of the hypothesis that $p_{1}$ is

127: the correct density of \underline{$x$}\ rather than $p_{2}$. Shore and

128: Johnson[7] have argued that minimizing $H(p_{1},p_{2})$ over $p_{1}$ is the

129: only consistent estimation procedure for estimating a PDF given an a prior

130: density estimate $p_{2}(\underline{x})$\ combined with new structural

131: information about the density, such as one or more of its moments. The

132: measure $H(p_{1},p_{2})$ has several pleasing mathematical properties: it is

133: convex in $p_{1}$, and convex in $p_{2}$, and attains its minimum value of

134: zero when $p_{1}(\underline{x})=p_{2}(\underline{x})$\ almost everywhere.

135: Another useful property is that estimating \underline{$\theta $} from either

136: (3) or (4) is straightforward. Substitute (1) and (2) into the CE and RCE

137: formulas to obtain:%

138: \begin{eqnarray*}

139: \text{CE} &\text{:}&H(q_{\theta },p)=\xi \{tr(R^{-1}R_{\theta })-N-\log

140: \left\vert R^{-1}R_{\theta }\right\vert +(\underline{m}_{\theta }-\underline{%

141: m})^{H}R^{-1}(\underline{m}_{\theta }-\underline{m})\} \\

142: \text{RCE} &\text{:}&H(p,q_{\theta })=\xi \{tr(R_{\theta }^{-1}R)-N-\log

143: \left\vert R_{\theta }^{-1}R\right\vert +(\underline{m}_{\theta }-\underline{%

144: m})^{H}R_{\theta }^{-1}(\underline{m}_{\theta }-\underline{m})\}

145: \end{eqnarray*}%

146: where $\xi =1/2$ when \underline{$x$}\ is real and $\xi =1$ when \underline{$%

147: x$}\ is complex. \

148:

149: To simplify the remainder of the discussion, assume that the mean is known,

150: \underline{$m$}$_{\theta }=$\underline{$m$}, so that we can focus on the

151: estimation of the covariance matrix and compare the results with those by

152: Burg and Gray [4] and Gray, Anderson, Sim[5]. The two estimation problems

153: reduce to minimizing:%

154: \begin{eqnarray}

155: \text{CE} &\text{:}&\ \ \ \ H(q_{\theta },p)=\xi \left\{ tr(R^{-1}R_{\theta

156: })-N-\log \left\vert R^{-1}R_{\theta }\right\vert \right\} \\

157: \text{RCE} &\text{:}&\ \ \ \ H(p,q_{\theta })=\xi \{tr(R_{\theta

158: }^{-1}R)-N-\log \left\vert R_{\theta }^{-1}R\right\vert \}

159: \end{eqnarray}

160:

161: Setting the gradients of the above two objective functions with respect to

162: \underline{$\theta $}\ to zero, we obtain the necessary conditions that

163: \underline{$\widehat{\theta }$} be the optimal solution:%

164: \begin{eqnarray}

165: \text{CE}\text{: } &&tr\left. \left\{ (R^{-1}-R_{\theta }^{-1})\frac{%

166: \partial R_{\theta }}{\partial \theta _{i}}\right\} \right\vert _{\underline{%

167: \theta }=\underline{\widehat{\theta }}}=0 \\

168: \text{RCE}\text{: } &&tr\left. \left\{ (R-R_{\theta })\frac{\partial

169: R_{\theta }^{-1}}{\partial \theta _{i}}\right\} \right\vert _{\underline{%

170: \theta }=\underline{\widehat{\theta }}}=0

171: \end{eqnarray}%

172: \bigskip for all $i$, where $\theta _{i}$ is the $i^{th}$ element of

173: \underline{$\theta $}. When $R_{\theta }$\ is invertible and differentiable

174: in \underline{$\theta $}:%

175: \begin{equation}

176: \frac{\partial R_{\theta }^{-1}}{\partial \theta _{i}}=-R_{\theta }^{-1}%

177: \frac{\partial R_{\theta }}{\partial \theta _{i}}R_{\theta }^{-1}

178: \end{equation}%

179: Substituting this into the RCE formula gives an alternate set of necessary

180: conditions for the optimal RCE solution:%

181: \begin{equation}

182: \text{RCE:}\QTR{sl}{\ \ }tr\left. \left\{ (R_{\theta }^{-1}RR_{\theta

183: }^{-1}-R_{\theta }^{-1})\frac{\partial R_{\theta }}{\partial \theta _{i}}%

184: \right\} \right\vert _{\underline{\theta }=\underline{\widehat{\theta }}}=0

185: \end{equation}

186:

187: \noindent \textbf{3. Application to Array Beamforming}

188:

189: In this section we will apply the CE and RCE methods to fitting a low rank

190: plus noise covariance matrix to data. Such problems arise in a variety of

191: contexts, including narrowband sensor array processing and harmonic

192: retrieval. We focus on the former problem. Let \underline{$x$}$%

193: [n]=(x_{1}[n],...,x_{N}[n])^{T}$ be a vector of sensor measurements at time $%

194: n$, where $N$ is the total number of sensors in the array. Assume that the

195: signal is narrowband (perhaps because the sensor data has been preprocessed

196: through a Fast Fourier Transform of each sensor's data). Let our initial PDF

197: estimate for the data be given by $p(\underline{x}[n])=N(\underline{0},R),$

198: where $R$ is any non-parameterized estimate of the signal covariance, such

199: as $R=\frac{1}{K}\sum_{k=1}^{K}\underline{x}\left[ k\right] \underline{x}^{H}%

200: \left[ k\right] $ where $K$ snapshots of array data are used.

201:

202: Now suppose we wish to model the data $\underline{x}[n]$ as:

203: \begin{equation}

204: \underline{x}[n]=\sum_{i=1}^{P}s_{i}[n]\underline{u}_{i}+\sigma \underline{w}%

205: [n]

206: \end{equation}%

207: \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

208: where $s_{1}[n],....,s_{P}[n]$\ are $P$ source signals, $P<N$, arriving from

209: unknown directions \underline{$u$}$_{1},...,\underline{u}_{P},$ with

210: additive noise \underline{$w$}$[n]$ with gain $\sigma $. Suppose that

211: signals $s_{i}[n]$\ are statistically independent, real or complex zero mean

212: Gaussian random variables with covariance $\Lambda _{i}>0$, and that the

213: noise samples \underline{$w$}$[n]$ are statistically independent, real or

214: complex zero mean Gaussian random variables with covariance $W$.%

215: \begin{eqnarray}

216: p(s_{i}[n]) &=&N(0,\Lambda _{i}) \\

217: p(\underline{w}[n]) &=&N(\underline{0},W)

218: \end{eqnarray}

219:

220: Thus the parameterized model PDF of \underline{$x$}$[n]$ is Gaussian:%

221: \begin{equation}

222: q_{\theta }(\underline{x}[n])=N(\underline{0},R_{\theta })

223: \end{equation}%

224: where:%

225: \begin{equation}

226: R_{\theta }=\sum_{i=1}^{P}\Lambda _{i}\underline{u}_{i}\underline{u}%

227: _{i}^{H}+\sigma ^{2}W

228: \end{equation}%

229: We will assume that the noise covariance $W$ is known, but that all the

230: other parameters \underline{$\theta $}$=(\Lambda _{1},...,\Lambda _{P},%

231: \underline{u}_{1},...\underline{u}_{P},\sigma )^{T}$ must be estimated. For

232: convenience, define:%

233: \begin{equation}

234: R_{\theta }=U\Lambda U^{H}+\sigma ^{2}W

235: \end{equation}%

236: where:%

237: \begin{equation}

238: U=\left[

239: \begin{array}{cccc}

240: \underline{u}_{1} & \underline{u}_{2} & ... & \underline{u}_{P}%

241: \end{array}%

242: \right] \text{ and\textsl{\ }}\Lambda =%

243: \begin{bmatrix}

244: \Lambda _{1} &  & 0 \\

245: & \ddots &  \\

246: 0 &  & \Lambda _{P}%

247: \end{bmatrix}%

248: \end{equation}%

249: Suppose there are no a priori constraints on the matrix $U$, and that the

250: only constraints on $\Lambda $\ are that $\Lambda _{i}>0$. This would

251: typically be true if the array were uncalibrated, or subject to heavy

252: unknown multipath distortion. (Note that because we assume an uncalibrated

253: array, we will not be able to directly derive information about the

254: direction of arrival.) Appendices A and B apply the CE and RCE criteria to

255: this model. They show that the solution to these two problems are quite

256: similar, and can be found by the following algorithm:

257:

258: \vspace{0.2in} {\raggedright\textbf{CE and RCE BEAMFORMING ALGORITHMS}}

259:

260: \begin{enumerate}

261: \item Find the generalized eigenvector \underline{$u$}$_{i}$ and eigenvalue $%

262: \lambda _{i}$ solutions to:%

263: \begin{equation}

264: \lambda _{i}R^{-1}\underline{u}_{i}=W^{-1}\underline{u}_{i}

265: \end{equation}%

266: with normalization constraint \underline{$u$}$_{i}^{H}W^{-1}$\underline{$u$}$%

267: _{j}=\delta _{i,j}$.

268:

269: \item Sort the eigenvectors and eigenvalues so that $\lambda _{1}\geq

270: \lambda _{2}\geq ...\geq \lambda _{N}.$ Then the optimal structured

271: covariance matrix approximation $\hat{R}_{\theta }$ to $R$ is :%

272: \begin{equation}

273: \hat{R}_{\theta }=\left(

274: \begin{array}{cccc}

275: \underline{u}_{1} & \underline{u}_{2} & ... & \underline{u}_{P}%

276: \end{array}%

277: \right)

278: \begin{pmatrix}

279: \lambda _{1}-\widehat{\sigma }^{2} &  & 0 \\

280: & \ddots &  \\

281: 0 &  & \lambda _{P}-\widehat{\sigma }^{2}%

282: \end{pmatrix}%

283: \begin{pmatrix}

284: \underline{u}_{1}^{H} \\

285: \underline{u}_{2}^{H} \\

286: . \\

287: . \\

288: . \\

289: \underline{u}_{P}^{H}%

290: \end{pmatrix}%

291: +\widehat{\sigma }^{2}W

292: \end{equation}%

293: where:

294: \begin{equation}

295: \left\{

296: \begin{array}{c}

297: \frac{1}{\widehat{\sigma }^{2}}=\frac{1}{N-P}\sum\limits_{i=P+1}^{N}\frac{1}{%

298: \lambda _{i}}\text{ \ for CE} \\

299: \widehat{\sigma }^{2}=\frac{1}{N-P}\sum\limits_{i=P+1}^{N}\lambda _{i}\text{

300: \ for RCE}%

301: \end{array}%

302: \right.

303: \end{equation}

304:

305: \item The cross entropy for the optimal model is:%

306: \begin{eqnarray}

307: \text{CE}\text{: \ \ \ } &&H(\hat{q}_{\theta },p)=\xi \sum_{i=P+1}^{N}\log (%

308: \frac{\lambda _{i}}{\widehat{\sigma }^{2}}) \\

309: \text{RCE}\text{: \ \ } &&H(p,\hat{q}_{\theta })=\xi \sum_{i=P+1}^{N}\log (%

310: \frac{\hat{\sigma}^{2}}{\lambda _{i}})

311: \end{eqnarray}

312: \end{enumerate}

313:

314: The estimates of $\hat{R}_{\theta }$ \ and $\hat{\sigma}^{2}$\ will be

315: unique if and only if $\lambda _{P}>\lambda _{P+1}.$ (The estimate of $U$

316: will not be unique.)

317:

318: An interesting alternative form for the cross-entropy formulas can be found

319: by substituting the value of $\hat{\sigma}^{2}$\ from (21) into (22):%

320: \begin{eqnarray}

321: \text{CE}\text{: \ } &&H(\hat{q}_{\theta },p)=\xi (N-P)\log \left( \frac{%

322: \left[ \frac{1}{\lambda _{P+1}},...,\frac{1}{\lambda _{N}}\right] _{avg}}{%

323: \left[ \frac{1}{\lambda _{P+1}},...,\frac{1}{\lambda _{N}}\right] _{geo}}%

324: \right) \\

325: \text{RCE}\text{: \ } &&H(p,\hat{q}_{\theta })=\xi (N-P)\log \left( \frac{%

326: \left[ \lambda _{P+1},...,\lambda _{N}\right] _{avg}}{\left[ \lambda

327: _{P+1},...,\lambda _{N}\right] _{geo}}\right)

328: \end{eqnarray}%

329: where:%

330: \begin{eqnarray}

331: \left[ \beta _{P+1},...,\beta _{N}\right] _{avg} &=&\frac{1}{N-P}%

332: \sum_{i=P+1}^{N}\beta _{i} \\

333: \lbrack \beta _{P+1},...,\beta _{N}]_{geo} &=&(\beta _{P+1}\beta

334: _{P+2}...\beta _{N})^{1/(N-P)}

335: \end{eqnarray}%

336: The cross-entropies are proportional to the log of the ratio of the

337: arithmetic mean to the geometric mean of the eigenvalues (or their inverses)

338: that are not used in building $U$. The cross-entropy will therefore be

339: positive, and will attain their minimum value of zero only if the geometric

340: average of $\lambda _{P+1},...,\lambda _{N}$ (or their inverses) equals

341: their arithmetic mean. This will only occur if these $N-P$ smallest

342: generalized eigenvalues are all equal.

343:

344: Note the similarity of the RCE formula to the MDL order determination

345: algorithm suggested by Wax and Kailath[2]. The RCE criterion is also

346: strongly related to the Maximum Likelihood problem of estimating the

347: structured covariance matrix given observations \underline{$x$}$_{1},...,$%

348: \underline{$x$}$_{K}$:

349: \begin{equation}

350: \hat{R}_{\theta }\leftarrow \max_{\theta }\log p(\underline{x}_{1},...,%

351: \underline{x}_{K}\mid \underline{\theta })

352: \end{equation}%

353: where:%

354: \begin{equation}

355: p(\underline{x}_{1},...,\underline{x}_{K}\mid \underline{\theta })\text{ }%

356: =\dprod\limits_{i=1}^{K}p(\underline{x}_{i}\mid \underline{\theta })

357: \end{equation}%

358: and:%

359: \begin{equation}

360: p(\underline{x}_{i}\mid \underline{\theta })=N(\underline{0},R)

361: \end{equation}%

362: This is because:%

363: \begin{equation}

364: H(p,q_{\theta })=\frac{1}{K}\log p(\underline{x}_{1},...,\underline{x}%

365: _{K}\mid \underline{\theta })-\xi (N+\log \left\vert R\right\vert )

366: \end{equation}%

367: \bigskip Since the second term in (31) does not depend on \underline{$\theta

368: $}, the RCE estimate of $R_{\theta }$\ will be identical to the ML estimate.

369:

370: For the special case when the background noise is white Gaussian noise, $W=I$%

371: , the \underline{$u$}$_{i}$ must satisfy:%

372: \begin{equation}

373: R\underline{u}_{i}=\lambda _{i}\underline{u}_{i}

374: \end{equation}%

375: and thus the \underline{$u$}$_{i\text{\textsl{\ }}}$are the eigenvectors of

376: the observed data correlation matrix $R$. This special case is thus quite

377: similar to that used in the MUSIC algorithm[1] and other similar beamforming

378: algorithms.

379:

380: If subroutines for computing generalized eigenvectors are not available, we

381: can use subroutines for computing eigenvectors of symmetric positive

382: definite matrices as follows. Factor $W=W^{1/2}W^{H/2}$\ where $W^{1/2}$\ is

383: any square root of $W$ and $W^{H/2}$\ is its Hermitian. Then to compute the

384: \underline{$u$}$_{i}$:

385:

386: \begin{enumerate}

387: \item From the whitened data correlation matrix:%

388: \begin{equation}

389: \tilde{R}=W^{-1/2}RW^{-H/2}

390: \end{equation}%

391: where $W^{-1/2}\ $is the inverse of $W^{1/2}.$ Note that $\tilde{R}$ is

392: symmetric and positive definite.

393:

394: \item Solve for the eigenvectors \underline{$t$}$_{i}$ and corresponding

395: eigenvalues $\lambda _{i}$\ of $\tilde{R}.$%

396: \begin{equation}

397: \tilde{R}\underline{t}_{i}=\lambda _{i}\underline{t}_{i}

398: \end{equation}%

399: where \underline{$t$}$_{j}^{T}\underline{t}_{i}=\delta _{i.j}$. Sort these

400: so that the eigenvalues are in descending order.

401:

402: \item Then:%

403: \begin{equation}

404: \underline{u}_{i}=W^{1/2}\underline{t}_{i}

405: \end{equation}

406: \end{enumerate}

407:

408: It is also interesting to consider the effect of using the structured

409: covariance matrix estimate when forming either a classical or optimal

410: beamformer. Let \underline{$w$}$_{0}\ $be the ideal array response for a

411: signal in a particular direction. The classical beamformer estimates the

412: signal $s[n]$\ from the array data as $s[n]=\underline{w}_{0}^{T}\underline{x%

413: }[n]$. The expected received power from this direction is then $E[s^{2}[n]]=%

414: \underline{w}_{0}^{T}R_{\theta }\underline{w}_{0}.$\ Now suppose that

415: \underline{$w$}$_{o}$\ is in the space spanned\ by the columns of $R^{-1}U$,

416: i.e. \underline{$w$}$_{0}=R^{-1}U$\underline{$\alpha $} for some vector

417: \underline{$\alpha $}. It is shown in Appendix A that $R_{\theta

418: }^{-1}U=R^{-1}U$. Therefore:%

419: \begin{eqnarray}

420: \underline{w}_{0}^{H}R_{\theta }\underline{w}_{0} &=&\underline{\alpha }%

421: ^{H}U^{H}R^{-H}R_{\theta }R^{-1}U\underline{\alpha }  \notag \\

422: &=&\underline{\alpha }^{H}U^{H}R^{-H}U\underline{\alpha }  \notag \\

423: &=&\underline{w}_{0}^{H}R\underline{w}_{0}

424: \end{eqnarray}%

425: In this case, replacing $R$ with the structured covariance estimate $%

426: R_{\theta }$ in the classical beamformer makes no difference. However, if $%

427: \underline{w}_{0}$ is not in the subspace spanned by $R^{-1}\underline{u}%

428: _{1},...,R^{-1}\underline{u}_{P}$, then $R_{\theta }^{-1}\underline{w}%

429: _{0}\neq R^{-1}\underline{w}_{0}$, and using the structured covariance

430: estimate in the classical beamformer will yield a different beam pattern.

431:

432: A similar statement holds for the optimum minimum variance beamformer, $s[n]=%

433: \underline{w}^{T}\underline{x}[n],$ which uses a window \underline{$w$}

434: designed such that the expected response energy \underline{$w$}$%

435: ^{T}R_{\theta }$\underline{$w$}\ is minimized subject to the constraint that

436: the response to a plane wave from the direction of interest is unity,

437: \underline{$w$}$^{T}\underline{w}_{0}=1$. The solution is $\underline{w}%

438: =\left( \underline{w}_{0}^{T}R_{\theta }^{-1}\underline{w}_{0}\right)

439: ^{-1}R_{\theta }^{-1}\underline{w}_{0}$. Note that if \underline{$w$}$_{0}$\

440: is in the subspace spanned by the columns of \ $U$, then there exists some

441: vector \underline{$\alpha $}\ such that \underline{$w$}$_{0}=U$\underline{$%

442: \alpha $}. Since $R_{\theta }^{-1}U=R^{-1}U,$%

443: \begin{equation}

444: R_{\theta }^{-1}\underline{w}_{0}=R_{\theta }^{-1}U\underline{\alpha }%

445: =R^{-1}U\underline{\alpha }=R^{-1}\underline{w}_{0}

446: \end{equation}%

447: which in turn implies:%

448: \begin{equation}

449: \underline{w}=\left( \underline{w}_{0}^{T}R_{\theta }^{-1}\underline{w}%

450: _{0}\right) ^{-1}R_{\theta }^{-1}\underline{w}_{0}=\left( \underline{w}%

451: _{0}^{T}R^{-1}\underline{w}_{0}\right) ^{-1}R^{-1}\underline{w}_{0}

452: \end{equation}%

453: In this case, replacing $R$ with the structured covariance estimate $%

454: R_{\theta }$\ in the optimal beamformer makes no difference. However, if

455: \underline{$w$}$_{0}$ is not in the subspace spanned by the columns of $U$,

456: then $R_{\theta }^{-1}\underline{w}_{0}\neq R^{-1}\underline{w}_{0}$, and

457: using the structured covariance estimate in the optimal beamformer will

458: yield a different beam pattern. These results are contrary to the suggestion

459: implied in [5] that replacing $R$ with $R_{\theta }$\ in an optimal

460: beamformer should make no difference.

461:

462: \bigskip

463:

464: \noindent \textbf{4. Conclusion}

465:

466: \bigskip In this paper, we have derived the optimal solution for correlation

467: matrix estimation by the CE and RCE principles. The two methods give

468: identical results in the problem of estimating the sum of a low rank signal

469: matrix plus noise matrix, differing only in the value of the noise level

470: estimate. The RCE method gives the same results as the Maximum Likelihood

471: approach, and when the noise is white, both methods are similar to MUSIC. It

472: is interesting that the cross-entropy approach thus provides a unifying

473: framework for deriving spectral estimation algorithm including Bartlett,

474: MLM[8], MEM[10], and now MUSIC.

475:

476: \bigskip \newpage

477:

478: \noindent {\Large A \ \ Derivation of CE Beamforming Algorithm}

479:

480: \bigskip In this appendix we derive the optimal structured covariance

481: estimate using the CE principle. First, to simplify the effort, let us

482: define: $V=U\Lambda ^{1/2}$, where $\Lambda ^{1/2}=diag(\Lambda

483: _{1}^{1/2},...,\Lambda _{N}^{1/2})$. Then:%

484: \begin{equation}

485: R_{\theta }=VV^{H}+\sigma ^{2}W

486: \end{equation}%

487: Substitute this into the CE entropy expression (6), and set the derivatives

488: with respect to the real and imaginary part of every element of the $V$

489: matrix, and with respect to $\sigma ^{2}$, to zero. Arranging these

490: derivatives in complex matrix form gives:%

491: \begin{eqnarray}

492: (R^{-1}-R_{\theta }^{-1})V &=&0 \\

493: tr\{(R^{-1}-R_{\theta }^{-1})W\} &=&0

494: \end{eqnarray}%

495: Using the Woodward lemma:%

496: \begin{equation}

497: R_{\theta }^{-1}=\frac{1}{\sigma ^{2}}W^{-1}-\frac{1}{\sigma ^{2}}W^{-1}V%

498: \text{ }\left[ V^{H}\frac{1}{\sigma ^{2}}W^{-1}V+I\right] ^{-1}V^{H}\frac{1}{%

499: \sigma ^{2}}W^{-1}

500: \end{equation}%

501: Substituting into (40) and simplifying gives:%

502: \begin{equation}

503: R^{-1}V=\frac{1}{\sigma ^{2}}W^{-1}V\left[ V^{H}\frac{1}{\sigma ^{2}}%

504: W^{-1}V+I\right] ^{-1}

505: \end{equation}%

506: This equation has many possible solutions. Let $V$ refer to any one of

507: these. Then let $\Psi =V^{H}W^{-1}V$. Diagonalize $\Psi $ by factoring it: $%

508: \Psi =Q\Phi Q^{H}$, where $\Phi $\ is diagonal and $Q$ is orthonormal, $%

509: Q^{H}Q=I$. Define $\tilde{V}=VQ$. Note that $\tilde{V}$\ is also a solution

510: to (43). In fact,%

511: \begin{equation}

512: R^{-1}\tilde{V}=\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}\left[ \frac{1}{\sigma

513: ^{2}}\Phi +I\right] ^{-1}

514: \end{equation}%

515: and:%

516: \begin{equation}

517: \tilde{V}^{H}W^{-1}\tilde{V}=\Phi

518: \end{equation}%

519: Let the $P$ columns of $\tilde{V}$ be \underline{$\tilde{v}$}$_{1,}...,%

520: \underline{\tilde{v}}_{P},$ and let the $P$ diagonal elements of $\Phi $\ be

521: $\phi _{1},...,\phi _{P}$. Then:%

522: \begin{equation}

523: \lambda _{i}R^{-1}\underline{\tilde{v}}_{i}=W^{-1}\underline{\tilde{v}}_{i}

524: \end{equation}%

525: where:%

526: \begin{equation}

527: \lambda _{i}=\phi _{i}+\sigma ^{2}

528: \end{equation}%

529: The columns of $\tilde{V}$ must therefore either be zero, or else must be

530: generalized eigenvector solutions to (46). Because $R$ and $W$ are conjugate

531: symmetric and positive definite, there are always $N$ linearly independent

532: generalized eigenvector solutions \underline{$\tilde{v}$}$_{1},...,%

533: \underline{\tilde{v}}_{N}$ to (46), with corresponding generalized

534: eigenvalues $\lambda _{1},...,\lambda _{N}$ which are positive. Assume

535: without loss of generality that the first $P_{0}$\ columns of $\tilde{V}$

536: are non-zero, where $P_{0}\leq P$\ . These first $P_{0}$\ columns must be

537: selected from among the $N$ possible generalized eigenvectors, in a manner

538: we will determine later. Also note that it is not necessary to estimate $Q$

539: or $V$ directly, since we can construct $R_{\theta }$\ directly from $\tilde{%

540: V}$:%

541: \begin{eqnarray}

542: R_{\theta } &=&VV^{H}+\sigma ^{2}W  \notag \\

543: &=&VQQ^{H}V^{H}+\sigma ^{2}W  \notag \\

544: &=&\tilde{V}\tilde{V}^{H}+\sigma ^{2}W

545: \end{eqnarray}

546:

547: Now to solve for $\sigma ^{2}$. Substitute (42) into (41), and simplify by

548: exploiting the facts that $tr(AB)=tr(BA)$\ and $tr(C+D)=tr(C)+tr(D)$\ and $%

549: tr(\alpha C)=\alpha tr(C)$\ where $A,B$ are matrices, $C,D$ are square

550: matrices, and $\alpha $\ is a scalar.

551:

552: \bigskip

553:

554: \begin{eqnarray}

555: 0 &=&tr\{(R_{\theta }^{-1}-R^{-1})W\}  \notag \\

556: &=&tr\left\{ \left( \frac{1}{\sigma ^{2}}W^{-1}-\frac{1}{\sigma ^{2}}W^{-1}%

557: \tilde{V}\left[ \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}+I\right]

558: ^{-1}\tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}-R^{-1}\right) W\right\}  \notag

559: \\

560: &=&tr\left\{ \frac{1}{\sigma ^{2}}I\right\} -\frac{1}{\sigma ^{2}}tr\left\{ %

561: \left[ \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}+I\right] ^{-1}\left[

562: \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}\right] \right\} -tr\left\{

563: R^{-1}W\right\}  \notag \\

564: &=&\frac{N}{\sigma ^{2}}-\frac{1}{\sigma ^{2}}\sum_{i=1}^{P}\frac{\phi _{i}}{%

565: \phi _{i}+\sigma ^{2}}-tr\{R^{-1}W\}  \notag \\

566: &=&\frac{N-P_{0}}{\sigma ^{2}}+\sum_{i=1}^{P_{0}}\frac{1}{\lambda _{i}}%

567: -tr\{WR^{-1}\}\text{ }

568: \end{eqnarray}%

569: where we used (45) in the fourth line, and (47) in the fifth. This can be

570: further simplified by noticing that if \underline{$\tilde{v}$}$_{i}$\ is any

571: generalized eigenvector solution to (46), then:%

572: \begin{equation}

573: WR^{-1}\underline{\tilde{v}}_{i}=W(\frac{1}{\lambda _{i}}W^{-1}\underline{%

574: \tilde{v}}_{i})=\frac{1}{\lambda _{i}}\underline{\tilde{v}}_{i}

575: \end{equation}%

576: Therefore, the \underline{$\tilde{v}$}$_{i}$ are eigenvectors of $WR^{-1}$\

577: with eigenvalues $1/\lambda _{i}$. Thus:%

578: \begin{equation}

579: tr\{WR^{-1}\}=\sum_{i=1}^{N}\frac{1}{\lambda _{i}}

580: \end{equation}%

581: Substituting back into (49), then solving for $\sigma ^{2}$\ gives:%

582: \begin{equation}

583: \sigma ^{2}=\frac{N-P_{0}}{\sum\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}}

584: \end{equation}

585:

586: Now substitute the solution for $\tilde{V}$\ and for $\sigma ^{2}$\ into

587: (48), and then substitute this back into the formula (6) for the

588: cross-entropy. The algebra is simplified by noting that if \underline{$%

589: \tilde{v}$}$_{i}$\ is any generalized eigenvector solution to (46), then:%

590: \begin{eqnarray}

591: R_{\theta }R^{-1}\underline{\tilde{v}}_{i} &=&(\tilde{V}\tilde{V}^{H}+\sigma

592: ^{2}W)\left( \frac{1}{\lambda _{i}}W^{-1}\underline{\tilde{v}}_{i}\right)

593: \notag \\

594: &=&\frac{1}{\lambda _{i}}(\tilde{V}\tilde{V}^{H}W^{-1}\underline{\tilde{v}}%

595: _{i}+\sigma ^{2}\underline{\tilde{v}}_{i})  \notag \\

596: &=&\left\{

597: \begin{array}{cl}

598: \frac{1}{\lambda _{i}}(\phi _{i}+\sigma ^{2})\underline{\tilde{v}}_{i} &

599: \text{for }i=1,....,P_{0} \\

600: \frac{1}{\lambda _{i}}\sigma ^{2}\underline{\tilde{v}}_{i} & \text{for }i%

601: \text{ }=P_{0}+1,...,N%

602: \end{array}%

603: \right.  \notag \\

604: &=&\left\{

605: \begin{array}{cl}

606: \underline{\tilde{v}}_{i} & \text{for }i=1,....,P_{0} \\

607: \frac{\sigma ^{2}}{\lambda _{i}}\underline{\tilde{v}}_{i} & \text{for }%

608: i=P_{0}+1,...,N%

609: \end{array}%

610: \right.

611: \end{eqnarray}%

612: Therefore, the \underline{$\tilde{v}$}$_{i}$ are all eigenvectors of $%

613: R_{\theta }R^{-1}$. The first $P_{0}\ $eigenvalues are equal to 1, and the

614: remainder are equal to $\sigma ^{2}/\lambda _{P_{0}+1},...,\sigma

615: ^{2}/\lambda _{N}$. Putting all this together, the cross-entropy at this

616: solution has the value:%

617: \begin{eqnarray}

618: H(q_{\theta },p) &=&\xi \left\{ tr\{R_{\theta }R^{-1}\}-N-\log \left\vert

619: R_{\theta }R^{-1}\right\vert \right\}  \notag \\

620: &=&\xi \left\{ P_{0}+\sigma ^{2}\sum_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}%

621: -N-\log \prod_{i=P_{0}+1}^{N}\text{\ \ }\frac{\sigma ^{2}}{\lambda _{i}}%

622: \right\}  \notag \\

623: &=&\xi \sum_{i=P_{0}+1}^{N}\log \left( \frac{\lambda _{i}}{\sigma ^{2}}%

624: \right)

625: \end{eqnarray}%

626: Substituting the value of $\sigma ^{2}$\ from (52) gives the alternate form:%

627: \begin{equation}

628: H(q_{\theta },p)=\xi (N-P_{0})\log \left[ \frac{\frac{1}{N-P_{0}}%

629: \sum\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}}{\left(

630: \dprod\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}\right) ^{1/(N-P_{0})}}%

631: \right]

632: \end{equation}

633:

634: Now to return to the issue of which of the $N$ possible generalized

635: eigenvector solutions should be used for the $P_{0}\ $non-zero columns of $%

636: \tilde{V}$. Let us call the selected $P_{0}$ eigenvectors $\underline{\tilde{%

637: v}}_{1},...,\underline{\tilde{v}}_{P_{0}}$\ the "signal eigenvectors", and

638: let us call the remainder the "noise eigenvectors". The signal eigenvectors

639: satisfy $\underline{\tilde{v}}_{i}\neq 0$; since $W^{-1}>0$, then $\phi _{i}=%

640: \underline{\tilde{v}}_{i}^{H}W^{-1}\underline{\tilde{v}}_{i}>0$\ and thus $%

641: \lambda _{i}=\phi _{i}+\sigma ^{2}>\sigma ^{2}$ \ for $i=1,...,P_{0}$. We

642: show that these signal eigenvalues must be the largest eigenvalue solutions

643: to (46). Suppose this were not true, so that the global optimum solution

644: corresponded to an $R_{\theta }$ such that one of the signal eigenvalues,

645: say $\lambda _{P_{0}},$ was smaller than the largest of the noise

646: eigenvalues, say $\lambda _{P_{0}+1}$. Thus $\sigma ^{2}<\lambda

647: _{P_{0}}<\lambda _{P_{0}+1}$. But then, as we will see, swapping these

648: eigensolutions, making $\underline{\tilde{v}}_{P_{0}+1}$ a signal

649: eigenvector and $\underline{\tilde{v}}_{P_{0}}$ a noise eigenvector will

650: further decrease the cross-entropy, contradicting our assumption of global

651: optimality. To show this, let $H\left( \lambda _{P_{0}+1},\lambda

652: _{P_{0}+2},...,\lambda _{N}\right) $ represent the cross-entropy with a

653: model $R_{\theta }$ built using non-zero solutions $\underline{\tilde{v}}%

654: _{1},...,\underline{\tilde{v}}_{P_{0}-1},\underline{\tilde{v}}_{P_{0}}$, and

655: let $H\left( \lambda _{P_{0}},\lambda _{P_{0}+2},...,\lambda _{N}\right) $

656: represent the cross-entropy with a model $R_{\theta }$ built using non-zero

657: solutions $\underline{\tilde{v}}_{1},...,\underline{\tilde{v}}_{P_{0}-1},%

658: \underline{\tilde{v}}_{P_{0}+1}$. Then because the cross-entropy formula

659: (55) is an analytic function of the $\lambda _{i}$, by the mean value

660: theorem:%

661: \begin{eqnarray}

662: &&H(\lambda _{P_{0}+1,}\lambda _{P_{0}+2,}...,\lambda _{N})-H(\lambda

663: _{P_{0},}\lambda _{P_{0}+2,}...,\lambda _{N})  \notag \\

664: &=&\frac{\partial H}{\partial \lambda }(\lambda ,\lambda

665: _{P_{0}+2,}....,\lambda _{N}){\Huge \mid }_{\lambda =\bar{\lambda}}(\lambda

666: _{P_{0}+1}-\lambda _{P_{0}})

667: \end{eqnarray}%

668: where $\bar{\lambda}$\ is some value in the range $\lambda _{P_{0}}<\bar{%

669: \lambda}<\lambda _{P_{0}+1}$. But:%

670: \begin{eqnarray}

671: \frac{\partial H}{\partial \lambda } &=&\xi \frac{1}{\lambda ^{2}}\left(

672: \lambda -\frac{N-P_{0}}{\frac{1}{\lambda }+\sum\limits_{i=P_{0}+2}^{N}\frac{1%

673: }{\lambda _{i}}}\right)  \notag \\

674: &>&0

675: \end{eqnarray}%

676: for all $\lambda _{P_{0}}<\lambda <\lambda _{P_{0}+1}$, where the last line

677: is true because:%

678: \begin{eqnarray}

679: \lambda &>&\lambda _{P_{0}}  \notag \\

680: &>&\sigma ^{2}  \notag \\

681: &=&\frac{N-P_{0}}{\sum_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}}  \notag \\

682: &>&\frac{N-P_{0}}{\frac{1}{\lambda }+\sum_{i=P_{0}+2}^{N}\frac{1}{\lambda

683: _{i}}}

684: \end{eqnarray}%

685: Since $\lambda _{P_{0}+1}-\lambda _{P_{0}}>0$, the change in (56) must be

686: positive. Therefore, swapping $\underline{\tilde{v}}_{P_{0}}$ and $%

687: \underline{\tilde{v}}_{P_{0}+1}$\ reduces the cross-entropy, and our assumed

688: global optimum solution cannot be globally optimum. The $P_{0}$\ signal

689: eigenvalues must therefore be the largest eigenvalue solutions to (46), and

690: the non-zero $P_{0}$\ columns of $\tilde{V}$\ must be the corresponding

691: general eigenvectors.

692:

693: Finally, we must show that we should always choose $P_{0}=P$\ eigenvectors.

694: Without loss of generality, let us sort all the eigenvalues $\lambda

695: _{1}\geq \lambda _{2}\geq ...\geq \lambda _{N}$. Let $H_{i}$\ represent the

696: minimum cross-entropy with $i$ non-zero columns in $\tilde{V}$. Then using

697: (55):%

698: \begin{eqnarray}

699: H_{P_{0}}-H_{P_{0}+1} &=&\xi (N-P_{0})\log \left[ \frac{\frac{1}{(N-P_{0})}%

700: \frac{1}{\lambda _{P_{0}+1}}+\left( 1-\frac{1}{(N-P_{0})}\right) \frac{1}{%

701: \bar{\lambda}}}{\left( \frac{1}{\lambda _{P_{0}+1}}\right) ^{\frac{1}{\left(

702: N-P_{0}\right) }}\left( \frac{1}{\bar{\lambda}}\right) ^{1-\frac{1}{(N-P_{0})%

703: }}}\right]  \notag \\

704: &\geq &0

705: \end{eqnarray}%

706: where $\frac{1}{\bar{\lambda}}=\frac{1}{N-P_{0}-1}$\ $\sum_{i=P_{0}+2}^{N}%

707: \frac{1}{\lambda _{i}}$ and where we used the inequality $\rho \alpha

708: +(1-\rho )\beta \geq \alpha ^{\rho }\beta ^{(1-\rho )}$\ for any $0\leq \rho

709: \leq 1$\ in the last line. Thus the cross-entropy decreases as $P_{0}$\

710: varies from $0$\ to $P$, so the best choice for $P_{0}$\ must be $P_{0}=P$\ .

711:

712: The proof that $R_{\theta }$\ is unique when $\lambda _{P}>\lambda _{P+1}$\

713: is messy but straightforward. The key issue is that the space spanned by the

714: signal eigenvectors is uniquely determined. If there are multiple signal

715: eigenvalues, then the eigenvectors themselves may not be uniquely

716: determined, and thus $\tilde{V}$\ may not be uniquely determined.

717:

718: We get the formulas in the text by defining $U=\tilde{V}\Phi ^{-1/2}$\ .

719:

720: \bigskip \newpage

721:

722: \noindent {\Large B \ \ \ Derivation of RCE Algorithm}

723:

724: In this appendix we give the solution to the RCE problem. The derivation is

725: quite similar to that for the CE problem, and therefore we present this

726: quickly. With our Gaussian models, the RCE cross-entropy has the value:%

727: \begin{equation}

728: \text{RCE: \ }H(p,q_{\theta })=\xi \left\{ tr(R_{\theta }^{-1}R)-N-\log

729: \left\vert R_{\theta }^{-1}R\right\vert \right\}

730: \end{equation}%

731: Differentiating with respect to the real and imaginary parts of $V$\ and

732: setting these to zero, as before, gives:%

733: \begin{equation}

734: \left( R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1}\right) V=0

735: \end{equation}%

736: Multiplying both sides by $R^{-1}R_{\theta }$\ gives:%

737: \begin{equation}

738: (R_{\theta }^{-1}-R^{-1})V=0

739: \end{equation}%

740: which is exactly the same equations which the solution for $V$\ in the CE

741: problem must satisfy, (40). Therefore, we can construct $R_{\theta }\ $from

742: (48), where the columns of $\tilde{V}$\ must be solutions to the generalized

743: eigenvector problem (46).

744:

745: Now differentiating (60) with respect to $\sigma ^{2}$ and setting it to

746: zero gives:%

747: \begin{equation}

748: tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})W\}=0

749: \end{equation}%

750: Combining this with (61) gives:%

751: \begin{eqnarray}

752: 0 &=&tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})(\tilde{V}%

753: \tilde{V}^{H}+\sigma ^{2}W)\}  \notag \\

754: &=&tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})R_{\theta }\}

755: \notag \\

756: &=&tr\{R_{\theta }^{-1}R-I\}

757: \end{eqnarray}%

758: which implies that:%

759: \begin{equation}

760: tr\{R_{\theta }^{-1}R\}=N\text{ }

761: \end{equation}%

762: But (53) implies that $R_{\theta }^{-1}R\ $has $P_{0}$ eigenvalues equal to $%

763: 1$, and the rest have values $\lambda _{P_{0}+1}/\sigma ^{2},...,\lambda

764: _{N}/\sigma ^{2}$. Since the trace of a matrix is just the sum of its

765: eigenvalues:%

766: \begin{equation}

767: P_{0}+\frac{1}{\sigma ^{2}}\sum_{i=P_{0}+1}^{N}\lambda _{i}=N

768: \end{equation}%

769: which gives:%

770: \begin{equation}

771: \sigma ^{2}=\frac{1}{(N-P_{0})}\sum_{i=P_{0}+1}^{N}\lambda _{i}

772: \end{equation}%

773: Using the facts that the trace of a matrix is the sum of the eigenvalues,

774: and the determinant is the product of the eigenvalues:%

775: \begin{eqnarray}

776: H(p,q_{\theta }) &=&\xi \{tr\{R_{\theta }^{-1}R\}-N-\log \left\vert

777: R_{\theta }^{-1}R\right\vert \}  \notag \\

778: &=&\xi \sum_{i=P_{0}+1}^{N}\log \frac{\sigma ^{2}}{\lambda _{i}}

779: \end{eqnarray}%

780: The proofs that we must choose $\lambda _{1},...,\lambda _{P_{0}}$\ to be

781: the largest eigenvalues, that we should choose $P_{0}=P$\ , and that the

782: solution $R_{\theta }$\ is unique if $\lambda _{P}>\lambda _{P+1}$, are

783: similar to the proofs for the CE algorithm.

784:

785: \bigskip

786:

787: \begin{thebibliography}{99}

788: \bibitem{Schmidt} Schmidt, R. (1978). Multiple emitter location and signal

789: parameter estimation. \textit{Proc. RADC Spectral Estimation Workshop},

790: 243-258, Rome, NY.

791:

792: \bibitem{Wax} Wax, M. and Kailath, T. (1985) Detection of signals by

793: information theoretic criteria. \ \textit{IEEE Trans. Acoustics, Speech,

794: Sig. Proc.}\ \textbf{ASSP-33}, 387-392.

795:

796: \bibitem{NA84} A., N. Q. (1984) On the uniqueness of the maximum likelihood

797: estimate of structured covariance matrices. \textit{IEEE Trans. Acoustics,

798: Speech, and Sig. Proc.} \textbf{ASSP-32}, 1249-1251.

799:

800: \bibitem{Burg82} Burg, J., Luenberger, D., and Wenger, D. (1982) Estimation

801: of structured covariance\ \ matrices. \textit{Proc. IEEE} \textbf{70},

802: 963-974.

803:

804: \bibitem{Gray87} Gray, D., Anderson, B. and Sim, P. (1987) Estimation of

805: structured covariances\ with application to array beamforming. \textit{%

806: Circuits, Systems, and Signal Proc.} \textbf{6-4}, 421-447.

807:

808: \bibitem{Kullback59} Kullback, S. (1959) \textit{Information Theory and

809: Statistics}. New York: John Wiley \&\ Sons.

810:

811: \bibitem{Shore80} Shore, J. and Johnson, R. (1980) Axiomatic derivation of

812: the principle of\ maximum entropy and the principle of minimum

813: cross-entropy. \textit{IEEE\ Trans. Info. Theory} \textbf{IT-26}, 26-37.

814:

815: \bibitem{Liou90} Liou, C.-Y. and Musicus, B. (1990) A separable

816: cross-entropy approach to power spectral estimation. \textit{IEEE Trans.

817: Acoustics, Speech and Sig. Proc}. \textbf{ASSP-38}, 105-113.

818:

819: \bibitem{Musicus82} Musicus, B. (1982) \textit{Iterative Algorithms for

820: Optimal Signal Reconstruction and\ Parameter Identification Given Noisy and

821: Incomplete Data}. Ph.D. thesis,\ Dept. of Elec. Engg. and Comp. Sci.,

822: Massachusetts Institute of Technology.

823:

824: \bibitem{Shore81} Shore, J. (1981) Minimum cross-entropy spectral analysis.

825: \textit{IEEE Trans.\ Acoustics,} \textit{Speech and Sig. Proc.} \textbf{%

826: ASSP-29}, 230-236.

827: \end{thebibliography}

828:

829: \bigskip

830:

831: \bigskip

832:

833: \noindent $^{1}$Department of Computer Science and Information Engineering,

834: National Taiwan University, Taipei, Taiwan, 106, Republic of China, Tel.:886

835: 2 23625336 ext.515, Fax.:886 2 23628167, \noindent Email:

836: cyliou@csie.ntu.edu.tw

837:

838: \noindent $^{2}$was with Massachusetts Institute of Technology, Research

839: Laboratory of Electronics, Cambridge, MA 02139.

840:

841: \bigskip

842:

843: \end{document}

844: