1: %\headheight=.2cm
2:
3:
4: \documentclass[12pt]{book}
5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6: \usepackage{amsmath}
7: \usepackage{array}
8: \usepackage[doublespacing]{setspace}
9:
10: \setcounter{MaxMatrixCols}{10}
11: %TCIDATA{OutputFilter=LATEX.DLL}
12: %TCIDATA{Version=5.00.0.2606}
13: %TCIDATA{<META NAME="SaveForMode" CONTENT="1">}
14: %TCIDATA{BibliographyScheme=Manual}
15: %TCIDATA{LastRevised=Sunday, July 30, 2006 10:47:37}
16: %TCIDATA{<META NAME="GraphicsSave" CONTENT="32">}
17: %TCIDATA{Language=American English}
18:
19: \textwidth=31.9pc
20: \textheight=46.5pc
21: \oddsidemargin=1pc
22: \evensidemargin=1pc
23: \headsep=15pt
24: \topmargin=.6cm
25: \parindent=1.7pc
26: \parskip=0pt
27: \setcounter{page}{1}
28: \input{tcilatex}
29: \renewcommand{\baselinestretch}{2}
30:
31: \begin{document}
32:
33:
34: %\pagestyle{fancy}
35: \renewcommand{\baselinestretch}{1.2} %\lhead[\fancyplain{} \leftmark]{}
36: %\chead[]{}
37: %\rhead[]{\fancyplain{}\rightmark}
38: %\cfoot{}
39: %\headrulewidth=0pt
40: \markright{
41: }
42: \markboth{\hfill{\footnotesize\rm Cheng-Yuan Liou and Bruce R. Musicus
43: }\hfill}
44: {\hfill {\footnotesize\rm Cross Entropy Approx of Structured Covariance Matrices} \hfill}
45: \renewcommand{\thefootnote}{} $\ $
46:
47: \fontsize{10.95}{14pt plus.8pt minus .6pt}\selectfont\vspace{0.812pc} %
48: \centerline{\large\bf Cross Entropy Approximation of Structured} \vspace{2pt}
49: \centerline{\large\bf Covariance Matrices} \vspace{0.4cm} %
50: \centerline{Cheng-Yuan Liou$^{1}$ and Bruce R. Musicus$^{2}$} \vspace{0.4cm} %
51: \centerline{\it } \vspace{0.55cm} \fontsize{9}{11.5pt plus.8pt minus .6pt}%
52: \selectfont
53:
54: \begin{quotation}
55: \noindent \textit{Abstract:} \ We apply two variations of the principle of
56: Minimum Cross Entropy (the Kullback information measure) to fit
57: parameterized probability density models to observed data densities. For an
58: array beamforming problem with $P$ incident narrowband point sources, $N>P$
59: sensors, and colored noise, both approaches yield eigenvector fitting
60: methods similar to that of the MUSIC algorithm[1]. Furthermore, the
61: corresponding cross-entropies are related to the MDL model order selection
62: criterion[2].
63:
64: \vspace{9pt} \noindent \textit{Key words and phrases:} Array Beamforming,
65: Eigenvector methods, Kullback Information Measure, Minimum Cross Entropy,
66: Stochastic Estimation, Structured Covariance
67: \end{quotation}
68:
69: \fontsize{10.95}{14pt plus.8pt minus .6pt}\selectfont\noindent \textbf{1.
70: Introduction}
71:
72: \bigskip Many existing high resolution methods for spectral analysis and for
73: optimal beamforming utilize covariance matrices estimated from observed
74: data. Often, an underlying structure for the covariance matrix is known in
75: advance, and our goal is to estimate the covariance matrix with this
76: structure which best fits the observed data. Previous literature has
77: suggested a variety of methods of optimally estimating structured covariance
78: matrices from data[3,4,5]. In this paper, we will apply the minimum cross
79: entropy (CE)[6,7] and minimum reverse cross-entropy (RCE)[6] principles to
80: estimate the covariance matrix. These principles have proved to be quite
81: powerful in a wide variety of signal processing applications[8,9] and have
82: been justified as being "optimal" under suitable assumptions. In section 2,
83: we apply the CE and RCE procedures to the problem of estimating structured
84: covariance matrices, and in section 3 we demonstrate the utility of the idea
85: for a beamforming application.
86:
87: .
88:
89: \noindent \textbf{2. Problem Statement}
90:
91: Let \underline{$x$} be an N-dimensional real or complex random vector.
92: Assume that a Gaussian probability density for \underline{$x$} is either
93: known a prior, or has been estimated by some procedure from observed
94: data:\bigskip
95: \begin{equation}
96: p(\underline{x})=N(\underline{m},R)
97: \end{equation}%
98: where $\underline{m}$ is the expected value of \underline{$x$}, and $R$ is
99: the covariance matrix, $R=E[\underline{xx}^{H}]$, and where \underline{$x$}$%
100: ^{H}$ is the Hermitian (complex conjugate transpose) of \underline{$x$}.
101: Suppose we wish to approximate this $p(\underline{x})$ with a parameterized
102: probability density function (PDF):%
103: \begin{equation}
104: q_{\theta }(\underline{x})=N(\underline{m}_{\theta },R_{\theta })
105: \end{equation}%
106: where \underline{$\theta $} denotes the unknown parameters $\theta $ in the
107: model $q_{\theta }(\underline{x})\ $which are to be estimated. Conceptually,
108: we wish to choose $\theta $ to make $q_{\theta }(\underline{x})$\ optimally
109: match $p(\underline{x})$. An appropriate objective function is the Kullback
110: information measure[6], otherwise known as the Minimum Cross-Entropy
111: principle[7]. Because this measure is asymmetric, we can apply it in two
112: different ways to this problem. Following[8,9,10] we call these the
113: "Cross-Entropy" and "Reverse Cross-Entropy" methods:%
114: \begin{eqnarray}
115: \text{CE} &\text{:}&\ \ \hat{q}_{\theta }\leftarrow \min\limits_{\underline{%
116: \theta }}H(q_{\theta },p) \\
117: \text{RCE} &\text{:}&\ \ \hat{q}_{\theta }\leftarrow \min\limits_{\underline{%
118: \theta }}H(p,q_{\theta })
119: \end{eqnarray}%
120: where:%
121: \begin{equation}
122: H(p_{1},p_{2})=\dint p_{1}(\underline{x})\log \frac{p_{1}(\underline{x})}{%
123: p_{2}(\underline{x})}d\underline{x}
124: \end{equation}%
125: Kullback[6] has argued that $H(p_{1},p_{2})$\ measures the mean amount of
126: information for discriminating in favor of the hypothesis that $p_{1}$ is
127: the correct density of \underline{$x$}\ rather than $p_{2}$. Shore and
128: Johnson[7] have argued that minimizing $H(p_{1},p_{2})$ over $p_{1}$ is the
129: only consistent estimation procedure for estimating a PDF given an a prior
130: density estimate $p_{2}(\underline{x})$\ combined with new structural
131: information about the density, such as one or more of its moments. The
132: measure $H(p_{1},p_{2})$ has several pleasing mathematical properties: it is
133: convex in $p_{1}$, and convex in $p_{2}$, and attains its minimum value of
134: zero when $p_{1}(\underline{x})=p_{2}(\underline{x})$\ almost everywhere.
135: Another useful property is that estimating \underline{$\theta $} from either
136: (3) or (4) is straightforward. Substitute (1) and (2) into the CE and RCE
137: formulas to obtain:%
138: \begin{eqnarray*}
139: \text{CE} &\text{:}&H(q_{\theta },p)=\xi \{tr(R^{-1}R_{\theta })-N-\log
140: \left\vert R^{-1}R_{\theta }\right\vert +(\underline{m}_{\theta }-\underline{%
141: m})^{H}R^{-1}(\underline{m}_{\theta }-\underline{m})\} \\
142: \text{RCE} &\text{:}&H(p,q_{\theta })=\xi \{tr(R_{\theta }^{-1}R)-N-\log
143: \left\vert R_{\theta }^{-1}R\right\vert +(\underline{m}_{\theta }-\underline{%
144: m})^{H}R_{\theta }^{-1}(\underline{m}_{\theta }-\underline{m})\}
145: \end{eqnarray*}%
146: where $\xi =1/2$ when \underline{$x$}\ is real and $\xi =1$ when \underline{$%
147: x$}\ is complex. \
148:
149: To simplify the remainder of the discussion, assume that the mean is known,
150: \underline{$m$}$_{\theta }=$\underline{$m$}, so that we can focus on the
151: estimation of the covariance matrix and compare the results with those by
152: Burg and Gray [4] and Gray, Anderson, Sim[5]. The two estimation problems
153: reduce to minimizing:%
154: \begin{eqnarray}
155: \text{CE} &\text{:}&\ \ \ \ H(q_{\theta },p)=\xi \left\{ tr(R^{-1}R_{\theta
156: })-N-\log \left\vert R^{-1}R_{\theta }\right\vert \right\} \\
157: \text{RCE} &\text{:}&\ \ \ \ H(p,q_{\theta })=\xi \{tr(R_{\theta
158: }^{-1}R)-N-\log \left\vert R_{\theta }^{-1}R\right\vert \}
159: \end{eqnarray}
160:
161: Setting the gradients of the above two objective functions with respect to
162: \underline{$\theta $}\ to zero, we obtain the necessary conditions that
163: \underline{$\widehat{\theta }$} be the optimal solution:%
164: \begin{eqnarray}
165: \text{CE}\text{: } &&tr\left. \left\{ (R^{-1}-R_{\theta }^{-1})\frac{%
166: \partial R_{\theta }}{\partial \theta _{i}}\right\} \right\vert _{\underline{%
167: \theta }=\underline{\widehat{\theta }}}=0 \\
168: \text{RCE}\text{: } &&tr\left. \left\{ (R-R_{\theta })\frac{\partial
169: R_{\theta }^{-1}}{\partial \theta _{i}}\right\} \right\vert _{\underline{%
170: \theta }=\underline{\widehat{\theta }}}=0
171: \end{eqnarray}%
172: \bigskip for all $i$, where $\theta _{i}$ is the $i^{th}$ element of
173: \underline{$\theta $}. When $R_{\theta }$\ is invertible and differentiable
174: in \underline{$\theta $}:%
175: \begin{equation}
176: \frac{\partial R_{\theta }^{-1}}{\partial \theta _{i}}=-R_{\theta }^{-1}%
177: \frac{\partial R_{\theta }}{\partial \theta _{i}}R_{\theta }^{-1}
178: \end{equation}%
179: Substituting this into the RCE formula gives an alternate set of necessary
180: conditions for the optimal RCE solution:%
181: \begin{equation}
182: \text{RCE:}\QTR{sl}{\ \ }tr\left. \left\{ (R_{\theta }^{-1}RR_{\theta
183: }^{-1}-R_{\theta }^{-1})\frac{\partial R_{\theta }}{\partial \theta _{i}}%
184: \right\} \right\vert _{\underline{\theta }=\underline{\widehat{\theta }}}=0
185: \end{equation}
186:
187: \noindent \textbf{3. Application to Array Beamforming}
188:
189: In this section we will apply the CE and RCE methods to fitting a low rank
190: plus noise covariance matrix to data. Such problems arise in a variety of
191: contexts, including narrowband sensor array processing and harmonic
192: retrieval. We focus on the former problem. Let \underline{$x$}$%
193: [n]=(x_{1}[n],...,x_{N}[n])^{T}$ be a vector of sensor measurements at time $%
194: n$, where $N$ is the total number of sensors in the array. Assume that the
195: signal is narrowband (perhaps because the sensor data has been preprocessed
196: through a Fast Fourier Transform of each sensor's data). Let our initial PDF
197: estimate for the data be given by $p(\underline{x}[n])=N(\underline{0},R),$
198: where $R$ is any non-parameterized estimate of the signal covariance, such
199: as $R=\frac{1}{K}\sum_{k=1}^{K}\underline{x}\left[ k\right] \underline{x}^{H}%
200: \left[ k\right] $ where $K$ snapshots of array data are used.
201:
202: Now suppose we wish to model the data $\underline{x}[n]$ as:
203: \begin{equation}
204: \underline{x}[n]=\sum_{i=1}^{P}s_{i}[n]\underline{u}_{i}+\sigma \underline{w}%
205: [n]
206: \end{equation}%
207: \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
208: where $s_{1}[n],....,s_{P}[n]$\ are $P$ source signals, $P<N$, arriving from
209: unknown directions \underline{$u$}$_{1},...,\underline{u}_{P},$ with
210: additive noise \underline{$w$}$[n]$ with gain $\sigma $. Suppose that
211: signals $s_{i}[n]$\ are statistically independent, real or complex zero mean
212: Gaussian random variables with covariance $\Lambda _{i}>0$, and that the
213: noise samples \underline{$w$}$[n]$ are statistically independent, real or
214: complex zero mean Gaussian random variables with covariance $W$.%
215: \begin{eqnarray}
216: p(s_{i}[n]) &=&N(0,\Lambda _{i}) \\
217: p(\underline{w}[n]) &=&N(\underline{0},W)
218: \end{eqnarray}
219:
220: Thus the parameterized model PDF of \underline{$x$}$[n]$ is Gaussian:%
221: \begin{equation}
222: q_{\theta }(\underline{x}[n])=N(\underline{0},R_{\theta })
223: \end{equation}%
224: where:%
225: \begin{equation}
226: R_{\theta }=\sum_{i=1}^{P}\Lambda _{i}\underline{u}_{i}\underline{u}%
227: _{i}^{H}+\sigma ^{2}W
228: \end{equation}%
229: We will assume that the noise covariance $W$ is known, but that all the
230: other parameters \underline{$\theta $}$=(\Lambda _{1},...,\Lambda _{P},%
231: \underline{u}_{1},...\underline{u}_{P},\sigma )^{T}$ must be estimated. For
232: convenience, define:%
233: \begin{equation}
234: R_{\theta }=U\Lambda U^{H}+\sigma ^{2}W
235: \end{equation}%
236: where:%
237: \begin{equation}
238: U=\left[
239: \begin{array}{cccc}
240: \underline{u}_{1} & \underline{u}_{2} & ... & \underline{u}_{P}%
241: \end{array}%
242: \right] \text{ and\textsl{\ }}\Lambda =%
243: \begin{bmatrix}
244: \Lambda _{1} & & 0 \\
245: & \ddots & \\
246: 0 & & \Lambda _{P}%
247: \end{bmatrix}%
248: \end{equation}%
249: Suppose there are no a priori constraints on the matrix $U$, and that the
250: only constraints on $\Lambda $\ are that $\Lambda _{i}>0$. This would
251: typically be true if the array were uncalibrated, or subject to heavy
252: unknown multipath distortion. (Note that because we assume an uncalibrated
253: array, we will not be able to directly derive information about the
254: direction of arrival.) Appendices A and B apply the CE and RCE criteria to
255: this model. They show that the solution to these two problems are quite
256: similar, and can be found by the following algorithm:
257:
258: \vspace{0.2in} {\raggedright\textbf{CE and RCE BEAMFORMING ALGORITHMS}}
259:
260: \begin{enumerate}
261: \item Find the generalized eigenvector \underline{$u$}$_{i}$ and eigenvalue $%
262: \lambda _{i}$ solutions to:%
263: \begin{equation}
264: \lambda _{i}R^{-1}\underline{u}_{i}=W^{-1}\underline{u}_{i}
265: \end{equation}%
266: with normalization constraint \underline{$u$}$_{i}^{H}W^{-1}$\underline{$u$}$%
267: _{j}=\delta _{i,j}$.
268:
269: \item Sort the eigenvectors and eigenvalues so that $\lambda _{1}\geq
270: \lambda _{2}\geq ...\geq \lambda _{N}.$ Then the optimal structured
271: covariance matrix approximation $\hat{R}_{\theta }$ to $R$ is :%
272: \begin{equation}
273: \hat{R}_{\theta }=\left(
274: \begin{array}{cccc}
275: \underline{u}_{1} & \underline{u}_{2} & ... & \underline{u}_{P}%
276: \end{array}%
277: \right)
278: \begin{pmatrix}
279: \lambda _{1}-\widehat{\sigma }^{2} & & 0 \\
280: & \ddots & \\
281: 0 & & \lambda _{P}-\widehat{\sigma }^{2}%
282: \end{pmatrix}%
283: \begin{pmatrix}
284: \underline{u}_{1}^{H} \\
285: \underline{u}_{2}^{H} \\
286: . \\
287: . \\
288: . \\
289: \underline{u}_{P}^{H}%
290: \end{pmatrix}%
291: +\widehat{\sigma }^{2}W
292: \end{equation}%
293: where:
294: \begin{equation}
295: \left\{
296: \begin{array}{c}
297: \frac{1}{\widehat{\sigma }^{2}}=\frac{1}{N-P}\sum\limits_{i=P+1}^{N}\frac{1}{%
298: \lambda _{i}}\text{ \ for CE} \\
299: \widehat{\sigma }^{2}=\frac{1}{N-P}\sum\limits_{i=P+1}^{N}\lambda _{i}\text{
300: \ for RCE}%
301: \end{array}%
302: \right.
303: \end{equation}
304:
305: \item The cross entropy for the optimal model is:%
306: \begin{eqnarray}
307: \text{CE}\text{: \ \ \ } &&H(\hat{q}_{\theta },p)=\xi \sum_{i=P+1}^{N}\log (%
308: \frac{\lambda _{i}}{\widehat{\sigma }^{2}}) \\
309: \text{RCE}\text{: \ \ } &&H(p,\hat{q}_{\theta })=\xi \sum_{i=P+1}^{N}\log (%
310: \frac{\hat{\sigma}^{2}}{\lambda _{i}})
311: \end{eqnarray}
312: \end{enumerate}
313:
314: The estimates of $\hat{R}_{\theta }$ \ and $\hat{\sigma}^{2}$\ will be
315: unique if and only if $\lambda _{P}>\lambda _{P+1}.$ (The estimate of $U$
316: will not be unique.)
317:
318: An interesting alternative form for the cross-entropy formulas can be found
319: by substituting the value of $\hat{\sigma}^{2}$\ from (21) into (22):%
320: \begin{eqnarray}
321: \text{CE}\text{: \ } &&H(\hat{q}_{\theta },p)=\xi (N-P)\log \left( \frac{%
322: \left[ \frac{1}{\lambda _{P+1}},...,\frac{1}{\lambda _{N}}\right] _{avg}}{%
323: \left[ \frac{1}{\lambda _{P+1}},...,\frac{1}{\lambda _{N}}\right] _{geo}}%
324: \right) \\
325: \text{RCE}\text{: \ } &&H(p,\hat{q}_{\theta })=\xi (N-P)\log \left( \frac{%
326: \left[ \lambda _{P+1},...,\lambda _{N}\right] _{avg}}{\left[ \lambda
327: _{P+1},...,\lambda _{N}\right] _{geo}}\right)
328: \end{eqnarray}%
329: where:%
330: \begin{eqnarray}
331: \left[ \beta _{P+1},...,\beta _{N}\right] _{avg} &=&\frac{1}{N-P}%
332: \sum_{i=P+1}^{N}\beta _{i} \\
333: \lbrack \beta _{P+1},...,\beta _{N}]_{geo} &=&(\beta _{P+1}\beta
334: _{P+2}...\beta _{N})^{1/(N-P)}
335: \end{eqnarray}%
336: The cross-entropies are proportional to the log of the ratio of the
337: arithmetic mean to the geometric mean of the eigenvalues (or their inverses)
338: that are not used in building $U$. The cross-entropy will therefore be
339: positive, and will attain their minimum value of zero only if the geometric
340: average of $\lambda _{P+1},...,\lambda _{N}$ (or their inverses) equals
341: their arithmetic mean. This will only occur if these $N-P$ smallest
342: generalized eigenvalues are all equal.
343:
344: Note the similarity of the RCE formula to the MDL order determination
345: algorithm suggested by Wax and Kailath[2]. The RCE criterion is also
346: strongly related to the Maximum Likelihood problem of estimating the
347: structured covariance matrix given observations \underline{$x$}$_{1},...,$%
348: \underline{$x$}$_{K}$:
349: \begin{equation}
350: \hat{R}_{\theta }\leftarrow \max_{\theta }\log p(\underline{x}_{1},...,%
351: \underline{x}_{K}\mid \underline{\theta })
352: \end{equation}%
353: where:%
354: \begin{equation}
355: p(\underline{x}_{1},...,\underline{x}_{K}\mid \underline{\theta })\text{ }%
356: =\dprod\limits_{i=1}^{K}p(\underline{x}_{i}\mid \underline{\theta })
357: \end{equation}%
358: and:%
359: \begin{equation}
360: p(\underline{x}_{i}\mid \underline{\theta })=N(\underline{0},R)
361: \end{equation}%
362: This is because:%
363: \begin{equation}
364: H(p,q_{\theta })=\frac{1}{K}\log p(\underline{x}_{1},...,\underline{x}%
365: _{K}\mid \underline{\theta })-\xi (N+\log \left\vert R\right\vert )
366: \end{equation}%
367: \bigskip Since the second term in (31) does not depend on \underline{$\theta
368: $}, the RCE estimate of $R_{\theta }$\ will be identical to the ML estimate.
369:
370: For the special case when the background noise is white Gaussian noise, $W=I$%
371: , the \underline{$u$}$_{i}$ must satisfy:%
372: \begin{equation}
373: R\underline{u}_{i}=\lambda _{i}\underline{u}_{i}
374: \end{equation}%
375: and thus the \underline{$u$}$_{i\text{\textsl{\ }}}$are the eigenvectors of
376: the observed data correlation matrix $R$. This special case is thus quite
377: similar to that used in the MUSIC algorithm[1] and other similar beamforming
378: algorithms.
379:
380: If subroutines for computing generalized eigenvectors are not available, we
381: can use subroutines for computing eigenvectors of symmetric positive
382: definite matrices as follows. Factor $W=W^{1/2}W^{H/2}$\ where $W^{1/2}$\ is
383: any square root of $W$ and $W^{H/2}$\ is its Hermitian. Then to compute the
384: \underline{$u$}$_{i}$:
385:
386: \begin{enumerate}
387: \item From the whitened data correlation matrix:%
388: \begin{equation}
389: \tilde{R}=W^{-1/2}RW^{-H/2}
390: \end{equation}%
391: where $W^{-1/2}\ $is the inverse of $W^{1/2}.$ Note that $\tilde{R}$ is
392: symmetric and positive definite.
393:
394: \item Solve for the eigenvectors \underline{$t$}$_{i}$ and corresponding
395: eigenvalues $\lambda _{i}$\ of $\tilde{R}.$%
396: \begin{equation}
397: \tilde{R}\underline{t}_{i}=\lambda _{i}\underline{t}_{i}
398: \end{equation}%
399: where \underline{$t$}$_{j}^{T}\underline{t}_{i}=\delta _{i.j}$. Sort these
400: so that the eigenvalues are in descending order.
401:
402: \item Then:%
403: \begin{equation}
404: \underline{u}_{i}=W^{1/2}\underline{t}_{i}
405: \end{equation}
406: \end{enumerate}
407:
408: It is also interesting to consider the effect of using the structured
409: covariance matrix estimate when forming either a classical or optimal
410: beamformer. Let \underline{$w$}$_{0}\ $be the ideal array response for a
411: signal in a particular direction. The classical beamformer estimates the
412: signal $s[n]$\ from the array data as $s[n]=\underline{w}_{0}^{T}\underline{x%
413: }[n]$. The expected received power from this direction is then $E[s^{2}[n]]=%
414: \underline{w}_{0}^{T}R_{\theta }\underline{w}_{0}.$\ Now suppose that
415: \underline{$w$}$_{o}$\ is in the space spanned\ by the columns of $R^{-1}U$,
416: i.e. \underline{$w$}$_{0}=R^{-1}U$\underline{$\alpha $} for some vector
417: \underline{$\alpha $}. It is shown in Appendix A that $R_{\theta
418: }^{-1}U=R^{-1}U$. Therefore:%
419: \begin{eqnarray}
420: \underline{w}_{0}^{H}R_{\theta }\underline{w}_{0} &=&\underline{\alpha }%
421: ^{H}U^{H}R^{-H}R_{\theta }R^{-1}U\underline{\alpha } \notag \\
422: &=&\underline{\alpha }^{H}U^{H}R^{-H}U\underline{\alpha } \notag \\
423: &=&\underline{w}_{0}^{H}R\underline{w}_{0}
424: \end{eqnarray}%
425: In this case, replacing $R$ with the structured covariance estimate $%
426: R_{\theta }$ in the classical beamformer makes no difference. However, if $%
427: \underline{w}_{0}$ is not in the subspace spanned by $R^{-1}\underline{u}%
428: _{1},...,R^{-1}\underline{u}_{P}$, then $R_{\theta }^{-1}\underline{w}%
429: _{0}\neq R^{-1}\underline{w}_{0}$, and using the structured covariance
430: estimate in the classical beamformer will yield a different beam pattern.
431:
432: A similar statement holds for the optimum minimum variance beamformer, $s[n]=%
433: \underline{w}^{T}\underline{x}[n],$ which uses a window \underline{$w$}
434: designed such that the expected response energy \underline{$w$}$%
435: ^{T}R_{\theta }$\underline{$w$}\ is minimized subject to the constraint that
436: the response to a plane wave from the direction of interest is unity,
437: \underline{$w$}$^{T}\underline{w}_{0}=1$. The solution is $\underline{w}%
438: =\left( \underline{w}_{0}^{T}R_{\theta }^{-1}\underline{w}_{0}\right)
439: ^{-1}R_{\theta }^{-1}\underline{w}_{0}$. Note that if \underline{$w$}$_{0}$\
440: is in the subspace spanned by the columns of \ $U$, then there exists some
441: vector \underline{$\alpha $}\ such that \underline{$w$}$_{0}=U$\underline{$%
442: \alpha $}. Since $R_{\theta }^{-1}U=R^{-1}U,$%
443: \begin{equation}
444: R_{\theta }^{-1}\underline{w}_{0}=R_{\theta }^{-1}U\underline{\alpha }%
445: =R^{-1}U\underline{\alpha }=R^{-1}\underline{w}_{0}
446: \end{equation}%
447: which in turn implies:%
448: \begin{equation}
449: \underline{w}=\left( \underline{w}_{0}^{T}R_{\theta }^{-1}\underline{w}%
450: _{0}\right) ^{-1}R_{\theta }^{-1}\underline{w}_{0}=\left( \underline{w}%
451: _{0}^{T}R^{-1}\underline{w}_{0}\right) ^{-1}R^{-1}\underline{w}_{0}
452: \end{equation}%
453: In this case, replacing $R$ with the structured covariance estimate $%
454: R_{\theta }$\ in the optimal beamformer makes no difference. However, if
455: \underline{$w$}$_{0}$ is not in the subspace spanned by the columns of $U$,
456: then $R_{\theta }^{-1}\underline{w}_{0}\neq R^{-1}\underline{w}_{0}$, and
457: using the structured covariance estimate in the optimal beamformer will
458: yield a different beam pattern. These results are contrary to the suggestion
459: implied in [5] that replacing $R$ with $R_{\theta }$\ in an optimal
460: beamformer should make no difference.
461:
462: \bigskip
463:
464: \noindent \textbf{4. Conclusion}
465:
466: \bigskip In this paper, we have derived the optimal solution for correlation
467: matrix estimation by the CE and RCE principles. The two methods give
468: identical results in the problem of estimating the sum of a low rank signal
469: matrix plus noise matrix, differing only in the value of the noise level
470: estimate. The RCE method gives the same results as the Maximum Likelihood
471: approach, and when the noise is white, both methods are similar to MUSIC. It
472: is interesting that the cross-entropy approach thus provides a unifying
473: framework for deriving spectral estimation algorithm including Bartlett,
474: MLM[8], MEM[10], and now MUSIC.
475:
476: \bigskip \newpage
477:
478: \noindent {\Large A \ \ Derivation of CE Beamforming Algorithm}
479:
480: \bigskip In this appendix we derive the optimal structured covariance
481: estimate using the CE principle. First, to simplify the effort, let us
482: define: $V=U\Lambda ^{1/2}$, where $\Lambda ^{1/2}=diag(\Lambda
483: _{1}^{1/2},...,\Lambda _{N}^{1/2})$. Then:%
484: \begin{equation}
485: R_{\theta }=VV^{H}+\sigma ^{2}W
486: \end{equation}%
487: Substitute this into the CE entropy expression (6), and set the derivatives
488: with respect to the real and imaginary part of every element of the $V$
489: matrix, and with respect to $\sigma ^{2}$, to zero. Arranging these
490: derivatives in complex matrix form gives:%
491: \begin{eqnarray}
492: (R^{-1}-R_{\theta }^{-1})V &=&0 \\
493: tr\{(R^{-1}-R_{\theta }^{-1})W\} &=&0
494: \end{eqnarray}%
495: Using the Woodward lemma:%
496: \begin{equation}
497: R_{\theta }^{-1}=\frac{1}{\sigma ^{2}}W^{-1}-\frac{1}{\sigma ^{2}}W^{-1}V%
498: \text{ }\left[ V^{H}\frac{1}{\sigma ^{2}}W^{-1}V+I\right] ^{-1}V^{H}\frac{1}{%
499: \sigma ^{2}}W^{-1}
500: \end{equation}%
501: Substituting into (40) and simplifying gives:%
502: \begin{equation}
503: R^{-1}V=\frac{1}{\sigma ^{2}}W^{-1}V\left[ V^{H}\frac{1}{\sigma ^{2}}%
504: W^{-1}V+I\right] ^{-1}
505: \end{equation}%
506: This equation has many possible solutions. Let $V$ refer to any one of
507: these. Then let $\Psi =V^{H}W^{-1}V$. Diagonalize $\Psi $ by factoring it: $%
508: \Psi =Q\Phi Q^{H}$, where $\Phi $\ is diagonal and $Q$ is orthonormal, $%
509: Q^{H}Q=I$. Define $\tilde{V}=VQ$. Note that $\tilde{V}$\ is also a solution
510: to (43). In fact,%
511: \begin{equation}
512: R^{-1}\tilde{V}=\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}\left[ \frac{1}{\sigma
513: ^{2}}\Phi +I\right] ^{-1}
514: \end{equation}%
515: and:%
516: \begin{equation}
517: \tilde{V}^{H}W^{-1}\tilde{V}=\Phi
518: \end{equation}%
519: Let the $P$ columns of $\tilde{V}$ be \underline{$\tilde{v}$}$_{1,}...,%
520: \underline{\tilde{v}}_{P},$ and let the $P$ diagonal elements of $\Phi $\ be
521: $\phi _{1},...,\phi _{P}$. Then:%
522: \begin{equation}
523: \lambda _{i}R^{-1}\underline{\tilde{v}}_{i}=W^{-1}\underline{\tilde{v}}_{i}
524: \end{equation}%
525: where:%
526: \begin{equation}
527: \lambda _{i}=\phi _{i}+\sigma ^{2}
528: \end{equation}%
529: The columns of $\tilde{V}$ must therefore either be zero, or else must be
530: generalized eigenvector solutions to (46). Because $R$ and $W$ are conjugate
531: symmetric and positive definite, there are always $N$ linearly independent
532: generalized eigenvector solutions \underline{$\tilde{v}$}$_{1},...,%
533: \underline{\tilde{v}}_{N}$ to (46), with corresponding generalized
534: eigenvalues $\lambda _{1},...,\lambda _{N}$ which are positive. Assume
535: without loss of generality that the first $P_{0}$\ columns of $\tilde{V}$
536: are non-zero, where $P_{0}\leq P$\ . These first $P_{0}$\ columns must be
537: selected from among the $N$ possible generalized eigenvectors, in a manner
538: we will determine later. Also note that it is not necessary to estimate $Q$
539: or $V$ directly, since we can construct $R_{\theta }$\ directly from $\tilde{%
540: V}$:%
541: \begin{eqnarray}
542: R_{\theta } &=&VV^{H}+\sigma ^{2}W \notag \\
543: &=&VQQ^{H}V^{H}+\sigma ^{2}W \notag \\
544: &=&\tilde{V}\tilde{V}^{H}+\sigma ^{2}W
545: \end{eqnarray}
546:
547: Now to solve for $\sigma ^{2}$. Substitute (42) into (41), and simplify by
548: exploiting the facts that $tr(AB)=tr(BA)$\ and $tr(C+D)=tr(C)+tr(D)$\ and $%
549: tr(\alpha C)=\alpha tr(C)$\ where $A,B$ are matrices, $C,D$ are square
550: matrices, and $\alpha $\ is a scalar.
551:
552: \bigskip
553:
554: \begin{eqnarray}
555: 0 &=&tr\{(R_{\theta }^{-1}-R^{-1})W\} \notag \\
556: &=&tr\left\{ \left( \frac{1}{\sigma ^{2}}W^{-1}-\frac{1}{\sigma ^{2}}W^{-1}%
557: \tilde{V}\left[ \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}+I\right]
558: ^{-1}\tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}-R^{-1}\right) W\right\} \notag
559: \\
560: &=&tr\left\{ \frac{1}{\sigma ^{2}}I\right\} -\frac{1}{\sigma ^{2}}tr\left\{ %
561: \left[ \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}+I\right] ^{-1}\left[
562: \tilde{V}^{H}\frac{1}{\sigma ^{2}}W^{-1}\tilde{V}\right] \right\} -tr\left\{
563: R^{-1}W\right\} \notag \\
564: &=&\frac{N}{\sigma ^{2}}-\frac{1}{\sigma ^{2}}\sum_{i=1}^{P}\frac{\phi _{i}}{%
565: \phi _{i}+\sigma ^{2}}-tr\{R^{-1}W\} \notag \\
566: &=&\frac{N-P_{0}}{\sigma ^{2}}+\sum_{i=1}^{P_{0}}\frac{1}{\lambda _{i}}%
567: -tr\{WR^{-1}\}\text{ }
568: \end{eqnarray}%
569: where we used (45) in the fourth line, and (47) in the fifth. This can be
570: further simplified by noticing that if \underline{$\tilde{v}$}$_{i}$\ is any
571: generalized eigenvector solution to (46), then:%
572: \begin{equation}
573: WR^{-1}\underline{\tilde{v}}_{i}=W(\frac{1}{\lambda _{i}}W^{-1}\underline{%
574: \tilde{v}}_{i})=\frac{1}{\lambda _{i}}\underline{\tilde{v}}_{i}
575: \end{equation}%
576: Therefore, the \underline{$\tilde{v}$}$_{i}$ are eigenvectors of $WR^{-1}$\
577: with eigenvalues $1/\lambda _{i}$. Thus:%
578: \begin{equation}
579: tr\{WR^{-1}\}=\sum_{i=1}^{N}\frac{1}{\lambda _{i}}
580: \end{equation}%
581: Substituting back into (49), then solving for $\sigma ^{2}$\ gives:%
582: \begin{equation}
583: \sigma ^{2}=\frac{N-P_{0}}{\sum\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}}
584: \end{equation}
585:
586: Now substitute the solution for $\tilde{V}$\ and for $\sigma ^{2}$\ into
587: (48), and then substitute this back into the formula (6) for the
588: cross-entropy. The algebra is simplified by noting that if \underline{$%
589: \tilde{v}$}$_{i}$\ is any generalized eigenvector solution to (46), then:%
590: \begin{eqnarray}
591: R_{\theta }R^{-1}\underline{\tilde{v}}_{i} &=&(\tilde{V}\tilde{V}^{H}+\sigma
592: ^{2}W)\left( \frac{1}{\lambda _{i}}W^{-1}\underline{\tilde{v}}_{i}\right)
593: \notag \\
594: &=&\frac{1}{\lambda _{i}}(\tilde{V}\tilde{V}^{H}W^{-1}\underline{\tilde{v}}%
595: _{i}+\sigma ^{2}\underline{\tilde{v}}_{i}) \notag \\
596: &=&\left\{
597: \begin{array}{cl}
598: \frac{1}{\lambda _{i}}(\phi _{i}+\sigma ^{2})\underline{\tilde{v}}_{i} &
599: \text{for }i=1,....,P_{0} \\
600: \frac{1}{\lambda _{i}}\sigma ^{2}\underline{\tilde{v}}_{i} & \text{for }i%
601: \text{ }=P_{0}+1,...,N%
602: \end{array}%
603: \right. \notag \\
604: &=&\left\{
605: \begin{array}{cl}
606: \underline{\tilde{v}}_{i} & \text{for }i=1,....,P_{0} \\
607: \frac{\sigma ^{2}}{\lambda _{i}}\underline{\tilde{v}}_{i} & \text{for }%
608: i=P_{0}+1,...,N%
609: \end{array}%
610: \right.
611: \end{eqnarray}%
612: Therefore, the \underline{$\tilde{v}$}$_{i}$ are all eigenvectors of $%
613: R_{\theta }R^{-1}$. The first $P_{0}\ $eigenvalues are equal to 1, and the
614: remainder are equal to $\sigma ^{2}/\lambda _{P_{0}+1},...,\sigma
615: ^{2}/\lambda _{N}$. Putting all this together, the cross-entropy at this
616: solution has the value:%
617: \begin{eqnarray}
618: H(q_{\theta },p) &=&\xi \left\{ tr\{R_{\theta }R^{-1}\}-N-\log \left\vert
619: R_{\theta }R^{-1}\right\vert \right\} \notag \\
620: &=&\xi \left\{ P_{0}+\sigma ^{2}\sum_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}%
621: -N-\log \prod_{i=P_{0}+1}^{N}\text{\ \ }\frac{\sigma ^{2}}{\lambda _{i}}%
622: \right\} \notag \\
623: &=&\xi \sum_{i=P_{0}+1}^{N}\log \left( \frac{\lambda _{i}}{\sigma ^{2}}%
624: \right)
625: \end{eqnarray}%
626: Substituting the value of $\sigma ^{2}$\ from (52) gives the alternate form:%
627: \begin{equation}
628: H(q_{\theta },p)=\xi (N-P_{0})\log \left[ \frac{\frac{1}{N-P_{0}}%
629: \sum\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}}{\left(
630: \dprod\limits_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}\right) ^{1/(N-P_{0})}}%
631: \right]
632: \end{equation}
633:
634: Now to return to the issue of which of the $N$ possible generalized
635: eigenvector solutions should be used for the $P_{0}\ $non-zero columns of $%
636: \tilde{V}$. Let us call the selected $P_{0}$ eigenvectors $\underline{\tilde{%
637: v}}_{1},...,\underline{\tilde{v}}_{P_{0}}$\ the "signal eigenvectors", and
638: let us call the remainder the "noise eigenvectors". The signal eigenvectors
639: satisfy $\underline{\tilde{v}}_{i}\neq 0$; since $W^{-1}>0$, then $\phi _{i}=%
640: \underline{\tilde{v}}_{i}^{H}W^{-1}\underline{\tilde{v}}_{i}>0$\ and thus $%
641: \lambda _{i}=\phi _{i}+\sigma ^{2}>\sigma ^{2}$ \ for $i=1,...,P_{0}$. We
642: show that these signal eigenvalues must be the largest eigenvalue solutions
643: to (46). Suppose this were not true, so that the global optimum solution
644: corresponded to an $R_{\theta }$ such that one of the signal eigenvalues,
645: say $\lambda _{P_{0}},$ was smaller than the largest of the noise
646: eigenvalues, say $\lambda _{P_{0}+1}$. Thus $\sigma ^{2}<\lambda
647: _{P_{0}}<\lambda _{P_{0}+1}$. But then, as we will see, swapping these
648: eigensolutions, making $\underline{\tilde{v}}_{P_{0}+1}$ a signal
649: eigenvector and $\underline{\tilde{v}}_{P_{0}}$ a noise eigenvector will
650: further decrease the cross-entropy, contradicting our assumption of global
651: optimality. To show this, let $H\left( \lambda _{P_{0}+1},\lambda
652: _{P_{0}+2},...,\lambda _{N}\right) $ represent the cross-entropy with a
653: model $R_{\theta }$ built using non-zero solutions $\underline{\tilde{v}}%
654: _{1},...,\underline{\tilde{v}}_{P_{0}-1},\underline{\tilde{v}}_{P_{0}}$, and
655: let $H\left( \lambda _{P_{0}},\lambda _{P_{0}+2},...,\lambda _{N}\right) $
656: represent the cross-entropy with a model $R_{\theta }$ built using non-zero
657: solutions $\underline{\tilde{v}}_{1},...,\underline{\tilde{v}}_{P_{0}-1},%
658: \underline{\tilde{v}}_{P_{0}+1}$. Then because the cross-entropy formula
659: (55) is an analytic function of the $\lambda _{i}$, by the mean value
660: theorem:%
661: \begin{eqnarray}
662: &&H(\lambda _{P_{0}+1,}\lambda _{P_{0}+2,}...,\lambda _{N})-H(\lambda
663: _{P_{0},}\lambda _{P_{0}+2,}...,\lambda _{N}) \notag \\
664: &=&\frac{\partial H}{\partial \lambda }(\lambda ,\lambda
665: _{P_{0}+2,}....,\lambda _{N}){\Huge \mid }_{\lambda =\bar{\lambda}}(\lambda
666: _{P_{0}+1}-\lambda _{P_{0}})
667: \end{eqnarray}%
668: where $\bar{\lambda}$\ is some value in the range $\lambda _{P_{0}}<\bar{%
669: \lambda}<\lambda _{P_{0}+1}$. But:%
670: \begin{eqnarray}
671: \frac{\partial H}{\partial \lambda } &=&\xi \frac{1}{\lambda ^{2}}\left(
672: \lambda -\frac{N-P_{0}}{\frac{1}{\lambda }+\sum\limits_{i=P_{0}+2}^{N}\frac{1%
673: }{\lambda _{i}}}\right) \notag \\
674: &>&0
675: \end{eqnarray}%
676: for all $\lambda _{P_{0}}<\lambda <\lambda _{P_{0}+1}$, where the last line
677: is true because:%
678: \begin{eqnarray}
679: \lambda &>&\lambda _{P_{0}} \notag \\
680: &>&\sigma ^{2} \notag \\
681: &=&\frac{N-P_{0}}{\sum_{i=P_{0}+1}^{N}\frac{1}{\lambda _{i}}} \notag \\
682: &>&\frac{N-P_{0}}{\frac{1}{\lambda }+\sum_{i=P_{0}+2}^{N}\frac{1}{\lambda
683: _{i}}}
684: \end{eqnarray}%
685: Since $\lambda _{P_{0}+1}-\lambda _{P_{0}}>0$, the change in (56) must be
686: positive. Therefore, swapping $\underline{\tilde{v}}_{P_{0}}$ and $%
687: \underline{\tilde{v}}_{P_{0}+1}$\ reduces the cross-entropy, and our assumed
688: global optimum solution cannot be globally optimum. The $P_{0}$\ signal
689: eigenvalues must therefore be the largest eigenvalue solutions to (46), and
690: the non-zero $P_{0}$\ columns of $\tilde{V}$\ must be the corresponding
691: general eigenvectors.
692:
693: Finally, we must show that we should always choose $P_{0}=P$\ eigenvectors.
694: Without loss of generality, let us sort all the eigenvalues $\lambda
695: _{1}\geq \lambda _{2}\geq ...\geq \lambda _{N}$. Let $H_{i}$\ represent the
696: minimum cross-entropy with $i$ non-zero columns in $\tilde{V}$. Then using
697: (55):%
698: \begin{eqnarray}
699: H_{P_{0}}-H_{P_{0}+1} &=&\xi (N-P_{0})\log \left[ \frac{\frac{1}{(N-P_{0})}%
700: \frac{1}{\lambda _{P_{0}+1}}+\left( 1-\frac{1}{(N-P_{0})}\right) \frac{1}{%
701: \bar{\lambda}}}{\left( \frac{1}{\lambda _{P_{0}+1}}\right) ^{\frac{1}{\left(
702: N-P_{0}\right) }}\left( \frac{1}{\bar{\lambda}}\right) ^{1-\frac{1}{(N-P_{0})%
703: }}}\right] \notag \\
704: &\geq &0
705: \end{eqnarray}%
706: where $\frac{1}{\bar{\lambda}}=\frac{1}{N-P_{0}-1}$\ $\sum_{i=P_{0}+2}^{N}%
707: \frac{1}{\lambda _{i}}$ and where we used the inequality $\rho \alpha
708: +(1-\rho )\beta \geq \alpha ^{\rho }\beta ^{(1-\rho )}$\ for any $0\leq \rho
709: \leq 1$\ in the last line. Thus the cross-entropy decreases as $P_{0}$\
710: varies from $0$\ to $P$, so the best choice for $P_{0}$\ must be $P_{0}=P$\ .
711:
712: The proof that $R_{\theta }$\ is unique when $\lambda _{P}>\lambda _{P+1}$\
713: is messy but straightforward. The key issue is that the space spanned by the
714: signal eigenvectors is uniquely determined. If there are multiple signal
715: eigenvalues, then the eigenvectors themselves may not be uniquely
716: determined, and thus $\tilde{V}$\ may not be uniquely determined.
717:
718: We get the formulas in the text by defining $U=\tilde{V}\Phi ^{-1/2}$\ .
719:
720: \bigskip \newpage
721:
722: \noindent {\Large B \ \ \ Derivation of RCE Algorithm}
723:
724: In this appendix we give the solution to the RCE problem. The derivation is
725: quite similar to that for the CE problem, and therefore we present this
726: quickly. With our Gaussian models, the RCE cross-entropy has the value:%
727: \begin{equation}
728: \text{RCE: \ }H(p,q_{\theta })=\xi \left\{ tr(R_{\theta }^{-1}R)-N-\log
729: \left\vert R_{\theta }^{-1}R\right\vert \right\}
730: \end{equation}%
731: Differentiating with respect to the real and imaginary parts of $V$\ and
732: setting these to zero, as before, gives:%
733: \begin{equation}
734: \left( R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1}\right) V=0
735: \end{equation}%
736: Multiplying both sides by $R^{-1}R_{\theta }$\ gives:%
737: \begin{equation}
738: (R_{\theta }^{-1}-R^{-1})V=0
739: \end{equation}%
740: which is exactly the same equations which the solution for $V$\ in the CE
741: problem must satisfy, (40). Therefore, we can construct $R_{\theta }\ $from
742: (48), where the columns of $\tilde{V}$\ must be solutions to the generalized
743: eigenvector problem (46).
744:
745: Now differentiating (60) with respect to $\sigma ^{2}$ and setting it to
746: zero gives:%
747: \begin{equation}
748: tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})W\}=0
749: \end{equation}%
750: Combining this with (61) gives:%
751: \begin{eqnarray}
752: 0 &=&tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})(\tilde{V}%
753: \tilde{V}^{H}+\sigma ^{2}W)\} \notag \\
754: &=&tr\{(R_{\theta }^{-1}RR_{\theta }^{-1}-R_{\theta }^{-1})R_{\theta }\}
755: \notag \\
756: &=&tr\{R_{\theta }^{-1}R-I\}
757: \end{eqnarray}%
758: which implies that:%
759: \begin{equation}
760: tr\{R_{\theta }^{-1}R\}=N\text{ }
761: \end{equation}%
762: But (53) implies that $R_{\theta }^{-1}R\ $has $P_{0}$ eigenvalues equal to $%
763: 1$, and the rest have values $\lambda _{P_{0}+1}/\sigma ^{2},...,\lambda
764: _{N}/\sigma ^{2}$. Since the trace of a matrix is just the sum of its
765: eigenvalues:%
766: \begin{equation}
767: P_{0}+\frac{1}{\sigma ^{2}}\sum_{i=P_{0}+1}^{N}\lambda _{i}=N
768: \end{equation}%
769: which gives:%
770: \begin{equation}
771: \sigma ^{2}=\frac{1}{(N-P_{0})}\sum_{i=P_{0}+1}^{N}\lambda _{i}
772: \end{equation}%
773: Using the facts that the trace of a matrix is the sum of the eigenvalues,
774: and the determinant is the product of the eigenvalues:%
775: \begin{eqnarray}
776: H(p,q_{\theta }) &=&\xi \{tr\{R_{\theta }^{-1}R\}-N-\log \left\vert
777: R_{\theta }^{-1}R\right\vert \} \notag \\
778: &=&\xi \sum_{i=P_{0}+1}^{N}\log \frac{\sigma ^{2}}{\lambda _{i}}
779: \end{eqnarray}%
780: The proofs that we must choose $\lambda _{1},...,\lambda _{P_{0}}$\ to be
781: the largest eigenvalues, that we should choose $P_{0}=P$\ , and that the
782: solution $R_{\theta }$\ is unique if $\lambda _{P}>\lambda _{P+1}$, are
783: similar to the proofs for the CE algorithm.
784:
785: \bigskip
786:
787: \begin{thebibliography}{99}
788: \bibitem{Schmidt} Schmidt, R. (1978). Multiple emitter location and signal
789: parameter estimation. \textit{Proc. RADC Spectral Estimation Workshop},
790: 243-258, Rome, NY.
791:
792: \bibitem{Wax} Wax, M. and Kailath, T. (1985) Detection of signals by
793: information theoretic criteria. \ \textit{IEEE Trans. Acoustics, Speech,
794: Sig. Proc.}\ \textbf{ASSP-33}, 387-392.
795:
796: \bibitem{NA84} A., N. Q. (1984) On the uniqueness of the maximum likelihood
797: estimate of structured covariance matrices. \textit{IEEE Trans. Acoustics,
798: Speech, and Sig. Proc.} \textbf{ASSP-32}, 1249-1251.
799:
800: \bibitem{Burg82} Burg, J., Luenberger, D., and Wenger, D. (1982) Estimation
801: of structured covariance\ \ matrices. \textit{Proc. IEEE} \textbf{70},
802: 963-974.
803:
804: \bibitem{Gray87} Gray, D., Anderson, B. and Sim, P. (1987) Estimation of
805: structured covariances\ with application to array beamforming. \textit{%
806: Circuits, Systems, and Signal Proc.} \textbf{6-4}, 421-447.
807:
808: \bibitem{Kullback59} Kullback, S. (1959) \textit{Information Theory and
809: Statistics}. New York: John Wiley \&\ Sons.
810:
811: \bibitem{Shore80} Shore, J. and Johnson, R. (1980) Axiomatic derivation of
812: the principle of\ maximum entropy and the principle of minimum
813: cross-entropy. \textit{IEEE\ Trans. Info. Theory} \textbf{IT-26}, 26-37.
814:
815: \bibitem{Liou90} Liou, C.-Y. and Musicus, B. (1990) A separable
816: cross-entropy approach to power spectral estimation. \textit{IEEE Trans.
817: Acoustics, Speech and Sig. Proc}. \textbf{ASSP-38}, 105-113.
818:
819: \bibitem{Musicus82} Musicus, B. (1982) \textit{Iterative Algorithms for
820: Optimal Signal Reconstruction and\ Parameter Identification Given Noisy and
821: Incomplete Data}. Ph.D. thesis,\ Dept. of Elec. Engg. and Comp. Sci.,
822: Massachusetts Institute of Technology.
823:
824: \bibitem{Shore81} Shore, J. (1981) Minimum cross-entropy spectral analysis.
825: \textit{IEEE Trans.\ Acoustics,} \textit{Speech and Sig. Proc.} \textbf{%
826: ASSP-29}, 230-236.
827: \end{thebibliography}
828:
829: \bigskip
830:
831: \bigskip
832:
833: \noindent $^{1}$Department of Computer Science and Information Engineering,
834: National Taiwan University, Taipei, Taiwan, 106, Republic of China, Tel.:886
835: 2 23625336 ext.515, Fax.:886 2 23628167, \noindent Email:
836: cyliou@csie.ntu.edu.tw
837:
838: \noindent $^{2}$was with Massachusetts Institute of Technology, Research
839: Laboratory of Electronics, Cambridge, MA 02139.
840:
841: \bigskip
842:
843: \end{document}
844: