1: \documentclass[11pt]{article}
2:
3: \usepackage{amssymb}
4: \usepackage{amsmath}
5: \usepackage{fullpage}
6: \usepackage{times}
7: \usepackage{graphicx}
8: \setlength{\oddsidemargin}{-0.25in}
9: \setlength{\evensidemargin}{-0.25in}
10: \setlength{\topmargin}{0.5in}
11: \setlength{\headheight}{0pt}
12: \setlength{\headsep}{0pt}
13: \setlength{\footskip}{0.5in}
14: \setlength{\textheight}{8.75in}
15: \setlength{\textwidth}{7in}
16: \setlength{\marginparwidth}{0in}
17: \setlength{\marginparsep}{0in}
18: \newcommand{\SFW}{\mbox{S$^4$W}}
19: \newcommand{\paper}{paper}
20:
21: \title{Using Hierarchical Data Mining to Characterize \\
22: Performance of Wireless System Configurations}
23: \author{Alex Verstak$^*$, Naren Ramakrishnan$^*$, Kyung Kyoon Bae$^{\dagger}$, William H. Tranter$^{\dagger}$,\\
24: Layne T. Watson$^*$, Jian He$^*$, and Clifford A. Shaffer$^*$\\
25: \large $^*$Department of Computer Science\\
26: \large $^{\dagger}$Bradley Department of Electrical and Computer Engineering\\
27: \large Virginia Polytechnic Institute and State University\\
28: \large Blacksburg, VA 24061\\
29: \,\,\,\,\\
30: \large Theodore S. Rappaport\\
31: \large Department of Electrical and Computer Engineering\\
32: \large University of Texas\\
33: \large Austin, TX 78712}
34:
35: \date{}
36: \begin{document}
37:
38: \maketitle
39:
40: \begin{abstract}
41: \noindent
42: This \paper{} presents a statistical framework for assessing wireless
43: systems performance using hierarchical data mining techniques. We
44: consider WCDMA (wideband code division multiple access) systems with
45: two-branch STTD (space time transmit diversity) and 1/2 rate
46: convolutional coding (forward error correction codes). Monte Carlo
47: simulation estimates the bit error probability (BEP) of the system
48: across a wide range of signal-to-noise ratios (SNRs). A performance
49: database of simulation runs is collected over a targeted space of
50: system configurations. This database is then mined to obtain regions
51: of the configuration space that exhibit acceptable average performance.
52: The shape of the mined regions illustrates the joint influence of
53: configuration parameters on system performance. The role of data
54: mining in this application is to provide explainable and statistically
55: valid design conclusions. The research issue is to define
56: statistically meaningful aggregation of data in a manner that permits
57: efficient and effective data mining algorithms. We achieve a good
58: compromise between these goals and help establish the applicability of
59: data mining for characterizing wireless systems performance.
60: \end{abstract}
61: \thispagestyle{empty}
62: %\newpage
63: %\tableofcontents{}
64: \newpage
65:
66: \section{Introduction}
67:
68: Data mining is becoming increasingly relevant in simulation methodology
69: and computational science~\cite{naren-ayg}. It entails the
70: `non-trivial process of identifying valid, novel, potentially useful,
71: and ultimately understandable patterns in data'~\cite{kdd-cacm}. Data
72: mining can
73: be used in both predictive (e.g., quantitative assessment of factors on
74: some performance metric) and descriptive (e.g., summarization and
75: system characterization) settings. Our goal in this \paper{} is to
76: demonstrate a hierarchical data mining framework applied to the problem
77: of characterizing wireless system performance.
78:
79: %\ifthesis
80: %We study the effect of configuration parameters on the bit error probability
81: %(BEP) of a system simulated in~\SFW.
82: %%\else
83: This work is done in the context of the \SFW{} problem solving
84: environment~\cite{ipdps-s4w}---`Site-Specific System Simulator for
85: Wireless System Design'. \SFW{} provides site-specific (deterministic)
86: electromagnetic propagation models as well as stochastic wireless
87: system models for predicting the performance of wireless systems in
88: specific environments, such as office buildings. \SFW{} is also
89: designed to support the inclusion of new models into the system,
90: visualization of results produced by the models, integration of
91: optimization loops around the models, validation of models by
92: comparison with field measurements, and management of the results
93: produced by a large series of experiments. In this paper,
94: we study the effect of
95: configuration parameters on the bit error probability (BEP) of a system
96: simulated in~\SFW.
97:
98: The approach we take is to accumulate a performance database of
99: simulation runs that sweep over a targeted space of system
100: configurations. This database is then mined to obtain regions of the
101: configuration space that exhibit acceptable average performance.
102: Exploiting prior knowledge about the underlying simulation, organizing
103: the computational steps in data mining, and interpreting the results at
104: every stage, are important research issues. In addition, we bring out
105: the often prevailing tension between making statistically meaningful
106: conclusions and the assumptions required for efficient and effective
107: data mining algorithms. This interplay leads to a novel set of problems
108: that we address in the context of the wireless systems performance
109: domain.
110:
111: Data mining algorithms work in a variety of ways but, for the purposes
112: of this \paper{}, it is helpful to think of them as performing systematic
113: aggregation and redescription of data into higher-level objects. Our
114: work can be viewed as employing three such layers of aggregation:
115: points, buckets, and regions. Points (configurations) are records in
116: the performance database. These records contain configuration
117: parameters as well as unbiased estimates of bit error probabilities
118: that we use as performance metrics. Buckets represent averages of
119: points. We use buckets to reduce data dimensionality to two, which is
120: the most convenient number of dimensions for visualization. Finally,
121: buckets are aggregated into 2D regions of constrained shape. We find
122: regions of buckets where we are most confident that the configurations
123: exhibit acceptable average performance. The shapes of these regions
124: illustrate the nature of the joint influence of the two selected
125: configuration parameters on the configuration performance. Specific
126: region attributes, such as region width, provide estimates for the
127: thresholds of sensitivity of configurations to variations in parameter
128: values.
129:
130: \subsection{Reader's Guide}
131:
132: Our major contribution is the development of a statistical framework
133: for assessing wireless system performance using data mining
134: techniques. The following section outlines wireless systems
135: performance simulation methodology and develops a statistical framework
136: for spatial aggregation of simulation results.
137: Section~\ref{sec:w-example} demonstrates a substantial subset of this
138: framework in the context of a performance study of WCDMA (wideband code
139: division multiple access~\cite{wcdma-holma}) systems that employ
140: two-branch STTD (space-time transmit diversity~\cite{sttd-alamouti})
141: techniques and 1/2 rate convolutional coding (forward error correction
142: codes~\cite{wcdma-holma}). We study the effect of power imbalance
143: between the branches on the BEP of the system across a wide range of
144: average signal-to-noise ratios (SNRs). Section~\ref{sec:gizmo} extends
145: the statistical framework to support computation of optimized regions
146: of the bucket space. Such regions are computed by a well-known data
147: mining algorithm~\cite{fukuda-tods, fukuda-rectilinear}.
148: Section~\ref{sec:experiments} applies these concepts to the example in
149: Section~\ref{sec:w-example}. Section~\ref{sec:conclusion} summarizes
150: present findings and outlines directions for future research.
151:
152: \section{The Statistics of Aggregation and the Aggregation of Statistics}
153: \label{sec:stat}
154:
155: Temporal variations in wireless channels have been extensively studied
156: in the literature~\cite{comm-ziemer}. The present work uses a Monte
157: Carlo simulation of WCDMA wireless systems to study the effect of these
158: variations. The simulation traces a number of frames of random
159: information bits through the encoding filters, the channel (a Rayleigh
160: fading linear filter~\cite{cm-hashemi}), and the decoding filters. The
161: inputs are hardware parameters, average SNR, channel impulse response,
162: and the number of frames to simulate. The output is the bit error
163: rate---the ratio of the number of information bits decoded in error to
164: the total number of information bits simulated. Simulations of this
165: kind statistically model channel variations due to changes in the
166: environment and device movement across a small geographical area
167: (\emph{small-scale fading}~\cite{cm-hashemi}). We refer to this kind
168: of channel variation as \emph{temporal variation} because a system is
169: simulated over a period of time. Further, we say that a given list of
170: inputs to the WCDMA simulation is a \emph{configuration} or a
171: \emph{point} in the configuration space.
172:
173: \begin{figure}
174: \begin{center}
175: \includegraphics[width=4.0in]{wcdma-test}
176: \end{center}
177: \caption[Typical 1D slices of the configuration space.]{Typical 1D
178: slices of the configuration space. The plots show simulated BERs (bit
179: error rates) of wireless systems for five common benchmark
180: channels~\cite{umts-utra} across a typical range of average SNRs.}
181: \label{fig:2d-example}
182: \end{figure}
183:
184: \emph{Spatial variations} are due to changes in system configurations.
185: We use this term to describe two quite different phenomena: changes in
186: the average SNR and channel impulse response due to \emph{large-scale
187: fading}~\cite{cm-hashemi} and variations of hardware parameters. A
188: typical approach to the analysis of spatial variations is to run
189: several temporal variation simulations (i.e., compute bit error
190: rates --- BERs --- at several
191: points within a given area of interest) and plot 1D or 2D slices of the
192: configuration space, as shown in Figure~\ref{fig:2d-example}. In this
193: \paper{}, we augment this approach with statistically meaningful
194: aggregation of performance estimates across several points. The result
195: of this aggregation is a space of buckets, each bucket representing the
196: aggregation of a number of points. Moving up one level of aggregation
197: in this manner allows us to bring data mining algorithms to operate at
198: the level of buckets. The space of buckets mined by the data mining
199: algorithm is then visualized using color maps. The color of each bucket
200: is the confidence that the points (configurations) that map to this
201: bucket exhibit acceptable average performance.
202:
203: \begin{table}
204: \begin{center}
205: \begin{tabular}{c l}
206: \hline
207: $c,C,R$ & entities (points, buckets, regions) \\
208: $x,b,B$ & random variables \\
209: $E[x],E[b],E[B]$ & true means of random variables $x$, $b$, $B$ \\
210: $\sigma^2,\Sigma^2$ & true variances of random variables $b$, $B$ \\
211: $\hat{x},\hat{b},\hat{B}$ & estimates of means $E[x]$, $E[b]$, $E[B]$ of random variables $x$, $b$, $B$ \\
212: $\hat{\sigma}^2,\hat{\Sigma}^2$ & estimates of variances $\sigma^2$, $\Sigma^2$ of random variables $b$, $B$ \\
213: $P(E)$ & probability of event $E$, where $E$ is a boolean condition \\
214: $F_{N-1}(T)$ & $P(X<T)$ for $X$ having the Student~$t$ distribution with $N-1$ degrees of freedom \\
215: $\{x_k\}_{k=1}^n$ & set $\{x_1,x_2,\ldots,x_n\}$ \\
216: $\{\{x_{kj}\}_{j=1}^{n_k}\}_{k=1}^n$ & set $\{x_{11},x_{12},\ldots,x_{1n_1},x_{21},x_{22},\ldots,x_{2n_2},\ldots,x_{n1},x_{n2},\ldots,x_{nn_n}\}$ \\
217: \hline
218: \end{tabular}
219: \end{center}
220: \caption[Summary of mathematical notation.]{Summary of mathematical
221: notation. Lower case letters are used for points and upper case letters
222: are used for buckets and regions. Additional conventions are
223: introduced in Table~\ref{tab:notation2}.}
224: \label{tab:notation}
225: \end{table}
226:
227: \subsection{The First Level of Aggregation: Points}
228:
229: Table~\ref{tab:notation} summarizes some of the syntactic conventions
230: used in this \paper{}. Mathematically, we can think of the WCDMA
231: simulation as estimating the mean~$E[x_k]$ of a random variable~$x_k$
232: with some (unknown) distribution~\cite{comm-jeruchim} ($x_k$ is one
233: when the information bit is decoded in error or zero when it is decoded
234: correctly). Each BER $\hat{x}_{kj}$, $1\le{}j\le{}n_k$, output by the
235: simulation is an unbiased estimate of the BEP~$E[x_k]$ of the simulated
236: configuration~$c_k$.
237: %The distribution of~$x_k$ is analytically inconvenient.
238: Instead of building a detailed stochastic model of the
239: simulation (analytically, from the
240: distribution of $x_k$), we choose to work with the simpler distribution of
241: the BER $\hat{x}_{kj}$, referred to henceforth as just $b_k$.
242: Thus, each sample from the distribution of $b_k$ is realized by
243: simulating a number of frames and obtaining an estimate
244: of $E[x_k]$.
245: The distribution of~$b_k$ is
246: approximately Gaussian due to the Central Limit Theorem.
247: Technically, we assume
248: that the number of frames per estimate $\hat{x}_{kj}$ is `large enough'
249: so that the Lindeberg condition is satisfied, that the variance
250: of~$\hat{x}_{kj}$ is finite, and that $\{\hat{x}_{kj}\}_{j=1}^{n_k}$
251: are i.i.d. We say that $E[b_k]=E[E[x_k]]$ is the \emph{expected BEP}
252: of configuration~$c_k$ under Rayleigh fading.
253:
254: \subsection{The Second Level of Aggregation: Buckets}
255:
256: Let us now aggregate several points (i.e., random variables) into one
257: bucket. The purpose of this aggregation is to reduce data
258: dimensionality to a size that is easy to visualize, usually one or two
259: dimensions. The basic idea is to linearly average all points that map
260: to the same bucket but we must do so carefully, in order to preserve a
261: meaningful statistical interpretation. Let $\{b_k\}_{k=1}^n$ be
262: Gaussian random variables with means $\{E[b_k]\}_{k=1}^n$ and variances
263: $\{\sigma^2_k\}_{k=1}^n$. As in the previous paragraph, let each such
264: variable~$b_k$ be the estimated BEP of some configuration~$c_k$,
265: $1\le{}k\le{}n$. For bucket~$C$, define a \emph{bucket} (mixture)
266: \emph{random variable}~$B$ as the convex combination
267: $$B=\sum_{k=1}^np_kb_k,$$ where
268: the $p_k \geq 0$ and $\sum_{k=1}^{n} p_k = 1$.
269: %$\{p_k\}_{k=1}^n$ are (positive) constant weights of
270: %$\{b_k\}_{k=1}^n$.
271: It is convenient to make $\{p_k\}_{k=1}^n$ the
272: probabilities of occurrence of the configurations $\{c_k\}_{k=1}^n$ in
273: the dataset being analyzed. This setup underlines the dependence of
274: the outputs on the distribution of the inputs and frees the user from
275: having to provide values for the constants $\{p_k\}_{k=1}^n$. It is
276: well known that, as long as $\{b_k\}_{k=1}^n$ are \emph{mutually
277: independent} and Gaussian with means~$\{E[b_k]\}_{k=1}^n$ and
278: variances~$\{\sigma^2_k\}_{k=1}^n$, $B$ is Gaussian with mean
279: $E[B]=\sum_{k=1}^np_kE[b_k]$ and variance
280: $\Sigma^2=\sum_{k=1}^np_k^2\sigma_k^2$~\cite{stat-casella}. The
281: expected value $E[B]$ of the random variable~$B$ can be viewed as the
282: expected BEP of bucket $C=\{c_1,c_2,\ldots,c_n\}$ in a Rayleigh fading
283: environment, conditional on the (discrete) distribution of the
284: configurations in~$C$.
285:
286: The values $\{p_k\}_{k=1}^n$ are what the statisticians call
287: \emph{prior probabilities}. For most purposes of this \paper{}, we simply
288: estimate $\{p_k\}_{k=1}^n$ from available data. These values are
289: explicitly or implicitly constructed during experiment design and we
290: assume that they remain constant during experiment analysis. However,
291: one can collect additional data as long as doing so does not change
292: $\{p_k\}_{k=1}^n$. Prior probabilities can come from a number of
293: sources: channel sounding measurements, propagation simulations,
294: hardware and budget constraints, or even educated guesses by wireless
295: system designers. The rest of the \paper{} silently assumes that the
296: values $\{p_k\}_{k=1}^n$ have been established beforehand. It is
297: important to remember that even though the prior probabilities are for
298: the most part transparent to the analysis presented here, they
299: nonetheless always exist and all conclusions of data analysis are made
300: conditional on the prior probabilities.
301:
302: This discussion of $\{p_k\}_{k=1}^n$ can be interpreted as a deferral
303: of the exact definition of~$B$ until experiment setup, or as
304: parameterization of the analysis procedure. A natural question is
305: whether or not this level of parameterization is sufficient. It is
306: sufficient for the purposes of this \paper{} but, strictly speaking,
307: the interrelations between $\{b_k\}_{k=1}^n$ should also be defined
308: during experiment setup. Mutual independence of $\{b_k\}_{k=1}^n$ is a
309: simplifying assumption and it might be desirable to model interactions
310: between $\{b_k\}_{k=1}^n$ in practice. This implies adding covariance
311: terms to~$\Sigma^2$ and re-thinking the distribution of~$B$. Such
312: analysis is necessarily specific to a particular experiment. For the
313: sake of simplicity, the rest of this \paper{} assumes mutual
314: independence of variables in a given bucket.
315:
316: \subsection{Confidence Estimation}
317:
318: Point and bucket estimates of the expected BEP are meaningful
319: performance metrics for wireless systems. Let us also estimate our
320: confidence in these estimates. Confidence analysis enables wireless
321: system designers to make more practical claims than point estimates
322: alone. A statement of the form `this configuration will exhibit
323: acceptable performance in 95\% of the cases' is often preferable to a
324: statement of the form `the expected BEP of this configuration is
325: approximately $5\times{}10^{-4}$'. More precisely, we say that
326: configuration~$c_k$ \emph{exhibits acceptable performance} when the
327: expected BEP $E[b_k]$ of configuration $c_k$ is below some fixed
328: threshold~$T$. This statement is conditional on the temporal
329: simulation assumptions, i.e., Rayleigh fading. Standard values for~$T$
330: are $10^{-3}$ for voice quality systems and $10^{-6}$ for data quality
331: systems. Likewise, we say that bucket~$C$ (a subspace of
332: configurations) \emph{exhibits acceptable average performance} when the
333: expected BEP $E[B]$ of bucket~$C$ is below some fixed threshold~$T$.
334: This statement is conditional on both the temporal simulation
335: assumptions and the distribution of configurations $\{c_k\}_{k=1}^n$ in
336: the bucket (the prior probabilities).
337:
338: The confidence that configuration~$c_k$ (resp. bucket~$C$) exhibits
339: acceptable (average) performance is $P(E[b_k]<T)$ (resp.
340: $P(E[B]<T)$). Since $b_k$ and~$B$ are Gaussian, these probabilities
341: can be estimated as
342: $$P(E[b_k]<T)\approx{}F_{n_k-1}\left(\frac{T-\hat{b}_k}{\hat{\sigma}_k/\sqrt{n_k}}\right),\quad
343: P(E[B]<T)\approx{}F_{N-1}\left(\frac{T-\hat{B}}{\hat{\Sigma}/\sqrt{N}}\right),$$
344: where $F_{N-1}(\cdot)$ is the CDF of the Student~$t$ distribution with
345: $N-1$ degrees of freedom and $n_k$ and~$N$ are the sample sizes for
346: configuration~$c_k$ and bucket~$C$, respectively. For
347: configuration~$c_k$,
348: $$\hat{b}_k=\frac{1}{n_k}\sum_{j=1}^{n_k}\hat{x}_{kj},\quad
349: \hat{\sigma}^2_k=\frac{1}{(n_k-1)}\sum_{j=1}^{n_k}(\hat{x}_{kj}-\hat{b}_k)^2,$$
350: where $\hat{b}_k$ and $\hat{\sigma}_k^2$ are the estimates of the
351: expected BEP and the BEP variance at point~$c_k$, $n_k\ge{}2$ is sample
352: size, and $\{\hat{x}_{kj}\}_{j=1}^{n_k}$ are sample values. For
353: bucket~$C$, we substitute point estimates into
354: $E[B]=\sum_{k=1}^np_kE[b_k]$ and $\Sigma^2=\sum_{k=1}^np_k^2\sigma_k^2$
355: to obtain $$\hat{B}=\sum_{k=1}^n\hat{p}_k\hat{b}_k, \quad
356: \hat{\Sigma}^2=\sum_{k=1}^n\hat{p}_k^2\hat{\sigma}_k^2,$$
357: %\quad N=\min_{1\le{}k\le{}n}n_k,$$
358: where $\hat{B}$ and $\hat{\Sigma}^2$ are
359: the estimates of the expected BEP and the BEP variance at bucket~$C$,
360: %$N$ serves the role of `bucket sample size',
361: and
362: $\{\hat{p}_k\}_{k=1}^n$ are the prior probabilities estimated from the
363: dataset as $\hat{p}_k=n_k/\sum_{i=1}^nn_i$. Observe that
364: $$\hat{B}=\sum_{k=1}^n\hat{p}_k\hat{b}_k=\frac{1}{\sum_{k=1}^nn_k}\sum_{k=1}^n\sum_{j=1}^{n_k}\hat{x}_{kj}$$
365: is exactly the sample mean of all observations in the bucket, but
366: $\hat{\Sigma}^2$ is \emph{not} the variance
367: %and $N$ is \emph{not} the size
368: of this sample. This is the case because
369: $\{\{\hat{x}_{kj}\}_{j=1}^{n_k}\}_{k=1}^n$ are not i.i.d. samples from
370: the mixture distribution of~$B$---they are samples from the constituent
371: distributions of $\{b_k\}_{k=1}^n$.
372:
373: \section{Extended Example}
374: \label{sec:w-example}
375:
376: Let us now apply the techniques developed so far to analyze the
377: performance of a space of configurations. The wireless systems under
378: consideration employ WCDMA technology with two-branch STTD and
379: $1/2$~rate convolutional coding. We require that the transmitter has
380: two antennas (branches) separated by a distance large enough for their
381: signals to be uncorrelated, but small enough for the mean path losses
382: and impulse responses of their channels to be approximately equal at
383: receiver locations of interest. We assume Rayleigh flat fading
384: channels, which is reasonable for indoor applications in the ISM and
385: UNII carrier frequency bands (2.4 and 5.2~GHz, respectively). The goal
386: is to study the effect of power imbalance between the branches on the
387: BEP of the configurations across a wide range of average SNRs.
388:
389: This section presents a number of plots that summarize simulated BERs.
390: We also outline the process of statistically significant sampling of
391: the configuration space. The next section develops a data mining
392: methodology that solves a practically important problem: given a
393: dataset similar to the one presented next, find a region of the
394: configuration space where we can confidently claim that configurations
395: will exhibit acceptable (average) performance.
396:
397: \begin{figure}
398: \begin{center}
399: \includegraphics[width=4.0in]{wcdma_sttd1} \\
400: \medskip
401: \includegraphics[width=4.0in]{sttd_stat}
402: \end{center}
403: \caption[BEP estimates for a space of configurations.]{(top) Estimates
404: of the BEPs for a space of configurations $\{c_k\}_{k=1}^M$ ($M=1600$
405: points at 10000 frames per point). The $X$ and~$Y$ axes are the
406: average SNRs of the branches (in~dB). The $Z$ axis is the (base ten)
407: logarithm of the simulated BER. These estimates are not statistically
408: significant. (bottom) Statistically significant estimates
409: $\{\hat{b}_k\}_{k=1}^M$ of the expected BEPs $\{E[b_k]\}_{k=1}^M$ for
410: the same space of configurations $\{c_k\}_{k=1}^M$. For the most part,
411: we are 90\% confident that the estimated expected BEP lies within 10\%
412: of its true value. See text for exceptions.}
413: \label{fig:space}
414: \end{figure}
415:
416: \begin{figure}
417: \begin{center}
418: \includegraphics[width=4.0in]{sttd_stat_fixed_alpha} \\
419: \bigskip
420: \includegraphics[width=4.0in]{sttd_stat_fixed_power}
421: \end{center}
422: \caption[1D slices of the surface in Figure~\ref{fig:space}.]{1D slices
423: of the configuration space $\{c_k\}_{k=1}^M$ with fixed branch power
424: imbalance factor $\alpha=10^{-0.1|S_1-S_2|}$ and varying effective SNR
425: $S=10\log_{10}\left((10^{0.1S_1}+10^{0.1S_2})/2\right)$ (top), and
426: fixed effective SNR~$S$ and varying branch power imbalance
427: factor~$\alpha$ (bottom). These slices were computed from the surface
428: fit onto the data in Figure~\ref{fig:space} (bottom). The entire
429: fitted surface is shown in Figure~\ref{fig:fitted}.}
430: \label{fig:fixed-slices}
431: \end{figure}
432:
433: \begin{figure}
434: \begin{center}
435: \includegraphics[width=4.0in]{sttd_stat_local_fit}
436: \end{center}
437: \caption[A surface fitted onto the data in Figure~\ref{fig:space}.]{A
438: surface fitted onto the statistically significant results in
439: Figure~\ref{fig:space} (bottom). We used a local linear least squares
440: regression with a 5\% neighborhood and tricubic weighting. This
441: procedure was chosen because it can approximate the relatively steep
442: edge of the tolerance region. See~\cite{dm-stat} for details.}
443: \label{fig:fitted}
444: \end{figure}
445:
446: Let us begin with an initial sample of the configuration space, as
447: shown in Figure~\ref{fig:space} (top). This figure shows the simulated
448: BER as a 2D function $\hat{f}(S_1,S_2)$ of the average branch bit
449: energy-to-noise ratios (SNRs) $S_1$ and~$S_2$, in~dB. The parallel
450: simulation ran for three days on 120 machines (AMD Athlon 1.0~GHz) at a
451: speed of approximately 2.5 points per machine per day. 10000 frames,
452: or 800000 information bits, were simulated for each of the 820 points
453: $S_2=3,4,\ldots,42$; $S_1=3,4,\ldots,S_2$. Since $\hat{f}(S_1,S_2)$ is
454: symmetric~\cite{sttd-stutzman}, we show $M=1600$ points
455: $\{c_k\}_{k=1}^M$ for a full cross-product of $S_1$ and~$S_2$.
456:
457: Wireless system designers are more accustomed to 1D slices of the
458: configuration space, e.g., the ones shown in
459: Figure~\ref{fig:fixed-slices}. Define the \emph{branch power imbalance
460: factor} $$\alpha=10^{-0.1|S_1-S_2|},$$ where $S_1$ and~$S_2$ are the
461: average SNRs of the branches, in~dB. (This definition applies as long
462: as the mean path losses of the branches are equal.) By definition,
463: $0\le\alpha\le{}1$, where zero corresponds to a total malfunction of
464: one of the branches and one corresponds to a perfect balance of branch
465: powers. The graphs in Figure~\ref{fig:fixed-slices} were obtained by
466: fixing $\alpha$ and varying the \emph{effective SNR}
467: $$S=10\log_{10}\left((10^{0.1S_1}+10^{0.1S_2})/2\right),$$ in~dB (top),
468: and fixing the effective SNR and varying $\alpha$ (bottom). (Note that
469: fixing the effective SNR is equivalent to fixing total transmitter
470: power.) The sample of configurations came from the
471: dataset shown in Figure~\ref{fig:space} (bottom), described in detail
472: later. However, this sample does not contain
473: the exact points for typical slices, so we
474: used a fitted surface---Figure~\ref{fig:fitted}---to approximate the
475: BERs for the slices in~Figure~\ref{fig:fixed-slices}. We choose to
476: work with the axes $S_1,S_2$ in Figure~\ref{fig:space} because it
477: simplifies the discussion later.
478:
479: What can be gathered from Figure~\ref{fig:space} (top)? The deep
480: valley along the diagonal is due to the fact that, provided that the
481: effective SNR is fixed, we expect the BEP to be smallest when the
482: branch power is balanced ($S_1=S_2$, $\alpha=1$)~\cite{sttd-stutzman}.
483: Somewhat less expected were (a)~the wide \emph{tolerance region} where
484: $|S_1-S_2|$ is large (up to 12~dB) but the BER is still small, (b)~a
485: very sharp decline in performance at the edge of the tolerance region,
486: and (c)~a region of high local variability in the upper part of the
487: diagonal. The surface is truncated at
488: $$\min_{1\le{}k\le{}M}\{\hat{b}_k\}=3.75\times{}10^{-6}$$ because
489: smaller estimates of the (expected) BEP require an enormous computation
490: time due to the convergence properties of Monte Carlo Estimation (more
491: on this below).
492:
493: \subsection{Statistically Significant Sampling Methodology}
494:
495: The initial sample looks reasonable and uncovers interesting trends in
496: system performance, but it does not contain enough information to make
497: statistically significant claims. Estimating the probability that a
498: configuration exhibits acceptable average performance requires several
499: samples per point~$c_k$. The simulation is computationally expensive
500: and different regions of the configuration space exhibit different
501: variability. Therefore, we must define tight stopping criteria for
502: sampling. Figure~\ref{fig:space} (bottom) shows the output obtained
503: with the following (per point~$c_k$) stopping criteria. The criteria
504: are designed to achieve high estimation accuracy.
505: \begin{enumerate}
506: \item Sampling $\{\hat{x}_{kj}\}$ stops when the relative error in
507: the estimate~$\hat{b}_k$ of the expected BEP $E[b_k]$ is smaller than
508: the \emph{relative accuracy threshold} $\beta=0.1$ times the current
509: estimate~$\hat{b}_k$, at a $\gamma=0.9$ confidence level, i.e., when
510: $$P(|E[b_k]-\hat{b}_k|<\beta\hat{b}_k)\ge\gamma.$$ We required
511: $n_k\ge{}2$ samples to obtain an estimate~$\hat{\sigma}_k^2$ of the BEP
512: variance~$\sigma_k^2$. Notice that the target is the relative error,
513: not the absolute error, because the range of $\{\hat{b}_k\}_{k=1}^M$ in
514: the configuration space spans four orders of magnitude. Therefore,
515: absolute error measures are misleading.
516: \item Sampling $\{\hat{x}_{kj}\}$ also stops when we can say,
517: with confidence $\gamma=0.9$, that the expected BEP $E[b_k]$ is below
518: the \emph{sampling threshold}~$t=10^{-4}$, i.e., when
519: $$P(E[b_k]<t)\ge\gamma.$$ This work considers voice quality
520: applications, so the exact value of the expected BEP is irrelevant as
521: long as it is smaller than the performance threshold~$T=10^{-3}$. The
522: sampling threshold~$t$ was set to an order of magnitude below the
523: performance threshold~$T$ to avoid large approximation error of a
524: fitted surface near~$T$.
525: \item Finally, sampling $\{\hat{x}_{kj}\}$ stops when more than
526: 50 samples of 10000 frames each are required to satisfy either of the
527: previous rules. This rule fired in 5\% of the cases, all at the
528: boundary of the tolerance region and most in mid diagonal.
529: \end{enumerate}
530:
531: \noindent Altogether, 5154 samples were collected for an average of 6.3
532: samples per point. Needless to say, the computational expense of such
533: sampling remains too high for practical applications. While a large
534: number of samples is typically desirable (for validation purposes),
535: we will show that our data mining
536: framework makes very effective use of data
537: and thus requires fewer samples in practice.
538: Let us now look at
539: the data in more detail.
540:
541: \subsection{Results of Statistically Significant Sampling}
542:
543: \begin{figure}
544: \begin{center}
545: \includegraphics[width=4.0in]{sttd_stat_point_cdf}
546: \end{center}
547: \caption[Empirical CDF for one of the configurations.]{Empirical CDF of
548: 21 samples for a randomly chosen point vs. that of the Gaussian
549: distribution with appropriate mean and variance.}
550: \label{fig:ecdf}
551: \end{figure}
552:
553: It is also likely that the samples output by the WCDMA simulation are
554: approximately Gaussian distributed. Intuitively,
555: %We assumed that the BEPs $\{b_k\}_{k=1}^M$ are Gaussian. Intuitively,
556: we are simulating a large number of information bits (800000) per BEP
557: estimate $\hat{x}_{kj}$, so the Lindeberg condition for the Central
558: Limit Theorem should hold. Figure~\ref{fig:ecdf} shows empirical
559: evidence that this is the case. We have arbitrarily chosen one point
560: among those with 20--30 sample values $\{\hat{x}_{kj}\}_{j=1}^{n_k}$
561: and plotted the empirical CDF of this sample against that of the
562: Gaussian distribution with the mean equal to sample mean~$\hat{b}_k$
563: and the variance equal to sample variance~$\hat{\sigma}_k^2$. The
564: curves are close to each other and the Shapiro-Wilk test yields
565: $W=0.98$ ($0\le W\le 1$) and $p$-value of~$0.88$. Other points also
566: demonstrate similar curves and high values of~$W$, but $p$-values vary
567: significantly. This dataset contains sufficient samples to estimate
568: $\{E[b_k]\}_{k=1}^M$ with high relative accuracy, but 6.3 samples per
569: point are insufficient to formally justify a Gaussian assumption.
570:
571: \begin{figure}
572: \begin{center}
573: \includegraphics[width=4.0in]{sttd_stat_sample_size} \\
574: \bigskip
575: \includegraphics[width=4.0in]{sttd_stat_sample_size_2D}
576: \end{center}
577: \caption[Sample sizes for Figure~\ref{fig:space}.]{Sample sizes for
578: Figure~\ref{fig:space} (bottom). The top part shows the perspective
579: plot and the bottom part shows the scatter plot.}
580: \label{fig:sample-size}
581: \end{figure}
582:
583: \begin{figure}
584: \begin{center}
585: \includegraphics[width=4.0in]{sttd_stat_error} \\
586: \bigskip
587: \includegraphics[width=4.0in]{sttd_stat_error_2D}
588: \end{center}
589: \caption[Sample standard-deviation-to-mean ratios for
590: Figure~\ref{fig:space}.]{Sample standard deviation-to-mean ratios for
591: Figure~\ref{fig:space} (bottom). The top part shows the perspective
592: plot and the bottom part shows the scatter plot.}
593: \label{fig:sample-error}
594: \end{figure}
595:
596: It is also instructive to see some measure of how the sample variance
597: is distributed across the configuration space.
598: Figures~\ref{fig:sample-size} and~\ref{fig:sample-error} show sample
599: sizes and sample standard deviation-to-mean ratios for the samples in
600: Figure~\ref{fig:space} (recall that we prefer relative measures because
601: the range of $\{\hat{b}_k\}_{k=1}^M$ is large). Both figures indicate
602: high variance around the boundary of the tolerance region. This is not
603: surprising because the edges of the tolerance region are relatively
604: steep. Figure~\ref{fig:sample-error} also shows relatively high
605: variance at some points inside the tolerance region. This is because
606: the simulation achieved the sampling threshold~$t=10^{-4}$ and stopped
607: before it achieved the relative accuracy threshold~$\beta=0.1$.
608: Knowing this, one would expect a larger relative variance in the
609: tolerance region. Let us examine why this is not the case.
610:
611: We treat the BEP as a continuous Gaussian random variable~$b_k$, but
612: all sample values $\{\hat{x}_{kj}\}_{j=1}^{n_k}$ are discrete---they
613: are ratios of two integers, the number of errors and the number of bits
614: simulated. The simulation may not detect any bit errors when the
615: expected BEP $E[b_k]$ is relatively small (e.g., one error in the
616: number of bits simulated). Since no channel is perfect, zero is too
617: optimistic an estimate for the expected BEP. Instead, we
618: conservatively assume that at least three bit errors have been
619: detected. This is why the smallest estimate~$\hat{b}_k$ of~$E[b_k]$ is
620: $3/800000=3.75\times{}10^{-6}$. However, using any constant cutoff
621: prevents us from estimating the variance $\sigma_k^2$. We would need
622: to simulate a large number of frames to estimate $\sigma_k^2$ when the
623: expected BEP is small. Instead, we can empirically show that the
624: probability that the expected BEP is smaller than the performance
625: threshold~$T=10^{-3}$ is close to one. Let
626: $\hat{b}_k=3.75\times{}10^{-6}$ be the sample mean, $n_k=2$ be the
627: sample size, and $\sigma_k^2$ be the BEP variance at point~$c_k$ where
628: two independent simulations detected three or fewer bit errors each.
629: Sampling $\{\hat{x}_{kj}\}$ will stop because sample variance is zero,
630: so the first stopping rule applies.
631:
632: We need to show that sampling can indeed stop, i.e., that the
633: probability that the expected BEP is below the performance
634: threshold~$T$ is
635: $$P(E[b_k]<T)\approx{}F_{n_k-1}\left(\frac{T-\hat{b}_k}{\hat{\sigma}_k/\sqrt{n_k}}\right)\ge{}0.995.$$
636: This statement can only be false when
637: $(T-\hat{b}_k)\sqrt{n_k}/\hat{\sigma}_k\le{}64$, or
638: $\hat{\sigma}_k\ge{}2.2\times{}10^{-5}$, almost an order of magnitude bigger
639: than the conservative estimate $\hat{b}_k$ of the expected BEP
640: $E[b_k]$. This is unlikely because Figure~\ref{fig:sample-error}
641: (bottom) shows that the sample standard deviation rarely exceeds the
642: sample mean even by half an order of magnitude. In other words, we do
643: not have accurate estimates for variance~$\sigma_k^2$ in the tolerance
644: region. However, we can still reasonably conclude that configurations
645: exhibit acceptable performance in this region.
646:
647: % The next section develops a data mining approach to data analysis.
648: % This approach seeks an optimal and statistically defensible region of
649: % the configuration space. Mild assumptions of region connectivity and
650: % rectilinearity make the data mining approach resistant to highly
651: % variable and scarce data. This is important in applications like the
652: % one just described, where data collection is computationally
653: % expensive.
654:
655:
656: \section{The Third Level of Aggregation: Regions}
657: \label{sec:gizmo}
658:
659: Consider a set of buckets $\{C_k\}_{k=1}^M$ with corresponding random
660: variables $\{B_k\}_{k=1}^M$. Given a number of sample values, the
661: framework developed in Section~\ref{sec:stat} allows us to estimate the
662: probabilities $\{P(E[B_k]<T)\}_{k=1}^M$ that buckets $\{C_k\}_{k=1}^M$
663: exhibit acceptable average performance. (All arguments about buckets
664: equally apply to points because a point is a special case of a bucket.)
665: This section is concerned with finding an optimal subset of random
666: variables from among $\{B_k\}_{k=1}^M$. This optimal subset
667: corresponds to an optimal region of a 2D bucket space. We would like
668: to find a sufficiently large admissible region~$R_m$ such that we are
669: sufficiently confident that buckets in~$R_m$ exhibit acceptable average
670: performance.
671:
672: There are many ways to define admissibility and we are interested in
673: adopting a definition that is both meaningful in the wireless domain
674: and permits effective data mining algorithms. Among a space of such
675: admissible regions, we can define different optimality criteria and
676: data mining then reduces to searching within this space. In this
677: \paper{}, a region~$R_m$ is admissible when it has a particular type of
678: shape. We will explore three different criteria for the mining of
679: optimal regions; the algorithms and these criteria are based on the
680: work of Fukuda et al.~\cite{fukuda-rectilinear} and have been adapted
681: to the problem of mining simulation data in this \paper{}.
682:
683: \begin{table}
684: \begin{center}
685: \begin{tabular}{c l}
686: \hline
687: $X,Y$ & parameters that partition the point space into buckets \\
688: $M_X,M_Y$ & $X$ and $Y$ dimensions of the bucket space \\
689: $M=M_X\times{}M_Y$ & number of buckets in the bucket space \\
690: $D_X,D_Y$ & domains of $X$ and $Y$ \\
691: $\eta(m)$ & number of buckets in region $R_m$ \\
692: $C_{\kappa(m,i)}$ & $i$-th bucket in region $R_m$, $1\le{}i\le{}\eta(m)$ \\
693: $n_{\kappa(m,i)}$ & number of samples in bucket $C_{\kappa(m,i)}$\\
694: $x_{\kappa(m,i)},y_{\kappa(m,i)}$ & $X$ and $Y$ values for bucket $C_{\kappa(m,i)}$ \\
695: \hline
696: \end{tabular}
697: \end{center}
698: \caption[Summary of region notation.]{Summary of region notation. Also
699: see Table~\ref{tab:notation}.}
700: \label{tab:notation2}
701: \end{table}
702:
703: Additional notation relating buckets to regions is introduced in
704: Table~\ref{tab:notation2}. Let $X$ and~$Y$ be two discrete parameters
705: to the temporal (e.g., WCDMA) simulations such that $X$ and~$Y$
706: partition the point space into disjoint buckets $\{C_k\}_{k=1}^M$.
707: More precisely, let $X,Y$ have ordinal domains $D_X,D_Y$, let
708: $|D_X|=M_X,|D_Y|=M_Y,|D_X||D_Y|=M$, and assume that the map
709: $\rho:D_X\times{}D_Y\rightarrow{}\{C_k\}_{k=1}^M$ is bijective. In
710: other words, $X$ and~$Y$ define a discrete 2D space of buckets. Since
711: the domains of $X$ and~$Y$ are ordinal, this space is easily visualized
712: as a 2D color map or a 3D perspective plot.
713:
714: \begin{figure}
715: \begin{center}
716: \includegraphics[width=4.0in]{sttd_stat_probabilities}
717: \end{center}
718: \caption[Probabilities that configurations in Figure~\ref{fig:space}
719: exhibit acceptable performance.]{Probabilities
720: $\{P(E[b_k]<T)\}_{k=1}^M$ that configurations $\{c_k\}_{k=1}^M$
721: exhibit acceptable performance with respect to the performance
722: threshold~$T=10^{-3}$ (voice quality system). This perspective plot
723: corresponds to the STTD dataset in Figure~\ref{fig:space} (bottom).
724: The axes $S_1$ and~$S_2$ are rotated 180 degrees counter-clockwise to
725: provide a better view of the surface.}
726: \label{fig:probabilities}
727: \end{figure}
728:
729: For example, the average SNRs $S_1$ and~$S_2$ in the previous section
730: partition the space of configurations into buckets. Both $S_1$
731: and~$S_2$ vary from~3 to~42 in steps of~1 (in~dB), so $M_X=M_Y=40$ and
732: $M=40\times{}40=1600$ (recall, from Section~\ref{sec:w-example}, that
733: only 820 of these points were simulated and the remaining ones were
734: symmetrically reflected). Furthermore, the domains of $S_1$ and~$S_2$
735: are ordinal because the values of~$S_1$ and~$S_2$ are directly related
736: to the powers of the transmitter antennas. In this case, the buckets
737: are simply the points in the space of configurations. In general,
738: buckets can be convex combinations of points, as detailed in
739: Section~\ref{sec:stat}. Recall that we defined the color of a bucket
740: as the probability that the bucket exhibits acceptable average
741: performance. Figure~\ref{fig:probabilities} shows these `colors' as a
742: perspective plot for the STTD example.
743:
744: \subsection{Region Shape}
745:
746: Consider regions (subsets) of buckets in the bucket space. If the
747: shape of these regions is unconstrained, there are $2^M$ possible
748: regions $\{R_m\}_{m=1}^{2^M}$. Let region $R_m$, $1\le{}m\le{}2^M$,
749: consist of buckets $\{C_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$, where
750: $\eta(m)$, $1\le{}m\le{}2^M$, is a mapping from region number~$m$ to
751: the number of buckets in this region, and $\kappa(m,i)$,
752: $1\le{}m\le{}2^M$, $1\le{}i\le\eta(m)$, is a mapping from region
753: number~$m$ and bucket number~$i$ within region~$R_m$ to bucket
754: number~$k$, $1\le{}k\le{}M$, that we use to subscript buckets
755: $\{C_k\}_{k=1}^M$. The exact definitions of $\eta(m)$ and
756: $\kappa(m,i)$ are not important as long as they generate all possible
757: regions (subsets) $\{R_m\}_{m=1}^{2^M}$.
758:
759: \begin{figure}
760: \begin{center}
761: \includegraphics{rr-types-short}
762: \end{center}
763: \caption[Types of admissible regions.]{Some types of admissible
764: (connected rectilinear) regions. When we look at an admissible region
765: from left to right, its upper boundary must first increase and then
766: decrease monotonically, and its lower boundary must first decrease and
767: then increase monotonically.}
768: \label{fig:admissible}
769: \end{figure}
770:
771: The shape of admissible regions should be constrained because
772: unconstrained regions are hard to interpret and tend to overfit the
773: training data. Besides, the problem of selecting an optimal
774: unconstrained region is computationally intractable---all $2^M$
775: possible regions must be considered, where $M=1600$ in the STTD
776: example. The region shape can be constrained in a number of different
777: ways (rectangular, x-monotone, etc.). Our restrictions on region shape
778: are discussed next.
779:
780: Without loss of generality, assume that $D_X=\{1,2,\ldots,M_X\}$ and
781: $D_Y=\{1,2,\ldots,M_Y\}$. Intuitively, region~$R_m$ is rectilinear
782: when its intersection with any horizontal or vertical line is
783: connected. More formally, region~$R_m$ is \emph{rectilinear} if and
784: only if whenever buckets $C_{\kappa(m,i)}$ at
785: $(x_{\kappa(m,i)},y_{\kappa(m,i)})$ and $C_{\kappa(m,j)}$ at
786: $(x_{\kappa(m,j)},y_{\kappa(m,j)})$ are both in~$R_m$, then
787: (a)~$\rho(r,s) = C_{\kappa(m,i)}$ and $\rho(r,t) = C_{\kappa(m,j)}$
788: imply buckets $\rho(r,u)$ are also in~$R_m$ for all
789: $u \in [s,t]$, and
790: (b)~$\rho(r,t) = C_{\kappa(m,i)}$ and $\rho(s,t) = C_{\kappa(m,j)}$
791: imply buckets $\rho(u,t)$ are also in~$R_m$ for all
792: $u \in [r,s]$. Here $[a,b]$ means all integers between the integers
793: $a$, $b$, inclusive.
794: %
795: %(a)~$x_{\kappa(m,i)}=x_{\kappa(m,j)}$ implies that each bucket
796: %$C_{\kappa(m,l)}$ where
797: %$$(y_{\kappa(m,l)}-y_{\kappa(m,i)})(y_{\kappa(m,l)}-y_{\kappa(m,j)})<0$$
798: %is also in~$R_m$, and (b)~likewise for
799: %$y_{\kappa(m,i)}=y_{\kappa(m,j)}$.
800: We use Manhattan geometry to define
801: connectedness. Region~$R_m$ is \emph{connected} if and only if for
802: every pair of buckets $C_{\kappa(m,i)}$ and $C_{\kappa(m,j)}$ in~$R_m$
803: there exists a sequence of buckets
804: $$C_{\kappa(m,i)}=C_{\kappa(m,l_1)},C_{\kappa(m,l_2)},\ldots,C_{\kappa(m,l_n)}=C_{\kappa(m,j)}$$
805: in~$R_m$ such that for every $1\le{}k<n$
806: $${\parallel \rho^{-1} (C_{\kappa(m,l_k)}) - \rho^{-1} (C_{\kappa(m,l_{k+1})}) \parallel}_1 = 1.$$
807: %$$|x_{\kappa(m,l_k)}-x_{\kappa(m,l_{k+1})}|+|y_{\kappa(m,l_k)}-y_{\kappa(m,l_{k+1})}|=1.$$
808: Furthermore, we say that region~$R_m$ is \emph{admissible} if it is
809: both rectilinear and connected.
810:
811: This definition of admissibility can be viewed as a relaxed definition
812: of convexity. Geometrically, it is easy to see that region~$R_m$ is
813: admissible if and only if, when we look at~$R_m$ from left to right,
814: its upper boundary first increases and then decreases monotonically (a
815: pseudoconcave function), and its lower boundary first decreases and
816: then increases monotonically (a pseudoconvex function). In other
817: words, the region boundary need not be strictly convex or strictly
818: concave, but it must be pseudoconvex or pseudoconcave. Admissible
819: regions are informally summarized in Figure~\ref{fig:admissible}. All
820: admissible regions are composed of regions of four primitive types: W
821: (region gets wider from left to right), N (region gets narrower), U
822: (region slants up), and D (region slants down). Twelve combinations of
823: these types yield all types of admissible regions: W, WU, WUN, WD, WDN,
824: WN, UN, DN, U, D, N, and the empty region.
825:
826: Our choice of connected rectilinear regions is due to primarily
827: heuristic considerations. These considerations are commonly
828: applicable, but must be re-evaluated for each study. Both the
829: connectedness and the rectilinearity restrictions can be justified for
830: the STTD example (see next section). In general, it is easy to justify
831: connectedness, but hard to justify rectilinearity. We advocate the use
832: of connected rectilinear regions primarily because this shape is
833: resistant to noise in the sample, not because we can analytically show
834: that the region boundary is rectilinear. In data mining, the choice of
835: region shape is most commonly dictated by the desired tradeoff between
836: bias and variance~\cite{dm-stat}. Regions with flexible shape exhibit
837: small bias (they can fit any data) but high variance (they can be
838: overly sensitive to a particular dataset). Regions with rigid shape
839: exhibit high bias but small variance. Connected rectilinear regions
840: provide a reasonable tradeoff between bias and variance for many
841: applications.
842:
843: \subsection{Evaluating Regions}
844:
845: Another prerequisite to finding regions with the desired properties is
846: a definition of region `goodness'. Let us map bucket confidence
847: $P(E[B_{\kappa(m,i)}]<T)$ to a discrete range $[0\ldots{}1000]$ and
848: define the \emph{hit of bucket} $C_{\kappa(m,i)}$ as
849: $$h_{\kappa(m,i)}=\lfloor{}1000P(E[B_{\kappa(m,i)}]<T)+0.5\rfloor$$
850: ($\lfloor{}X\rfloor$ denotes the largest integer that does not exceed
851: $X$), the \emph{support of bucket} $C_{\kappa(m,i)}$ as
852: $$s_{\kappa(m,i)}=1000$$ (this constant was chosen to make the
853: discretization error reasonably small), the \emph{hit of region} $R_m$
854: as $$H_m=\sum_{i=1}^{\eta(m)}h_{\kappa(m,i)},$$ and the \emph{support
855: of region} $R_m$ as
856: $$S_m=\sum_{i=1}^{\eta(m)}s_{\kappa(m,i)}=1000\eta(m).$$ The key to
857: efficient computation of optimized-confidence and optimized-support
858: admissible regions is the definition of region confidence as
859: $$\Theta_m=H_m/S_m,$$ where $H_m$ is the hit and~$S_m$ is the support
860: of region~$R_m$. Let us explore the implications of these definitions
861: in more detail.
862:
863: \subsubsection{Model-Based and Model-Free Analyses}
864:
865: Suppose, $n_{\kappa(m,i)}=6$ samples have been collected for bucket
866: $C_{\kappa(m,i)}$ that consists of a single point. Let the sample mean
867: be $\hat{B}_{\kappa(m,i)}=5\times{}10^{-4}$ and the sample standard
868: deviation be $\hat{\Sigma}_{\kappa(m,i)}=8.87\times{}10^{-4}$.
869: Furthermore, suppose that five of these samples have the BER below
870: $10^{-3}$ and one has the BER above $10^{-3}$. Then,
871: $$P(E[B_{\kappa(m,i)}]<T)\approx{}F_5\left(\frac{10^{-3}-5\times{}10^{-4}}{8.87\times{}10^{-4}/\sqrt{6}}\right)\approx{}0.887.$$
872: A purely model-free approach would interpret the above simulation
873: results as `bucket $C_{\kappa(m,i)}$ will exhibit acceptable average
874: performance in~5 out of~6 cases.' A strongly model-based approach would
875: interpret the simulation results as `we are 88.7\% confident that
876: bucket $C_{\kappa(m,i)}$ exhibits acceptable average performance.' Our
877: interpretation lies between the model-based approach and a model-free
878: approach and posits that `bucket $C_{\kappa(m,i)}$ will exhibit
879: acceptable average performance in~887 out of~1000 cases.' These
880: interpretations provide confidence estimates under different
881: simplifying assumptions.
882:
883: The model-free interpretation does not take either sample variance or
884: sample distribution into account. This interpretation is only reliable
885: for a sufficiently large number of samples, which is a luxury in our
886: application. Our middle-ground interpretation explicitly accounts for
887: sample variance and sample distribution. When sample size is small,
888: our interpretation provides a statistically valid estimate of
889: confidence that the bucket exhibits acceptable average performance.
890: For a single bucket, this interpretation is as good as a strongly
891: model-based interpretation, modulo a reasonably small discretization
892: error. However, our interpretation diverges from the model-based
893: interpretation at the region level.
894:
895: A strongly model-based analysis procedure would define a region random
896: variable
897: $$Q_m=\frac{1}{W_m}\sum_{i=1}^{\eta(m)}w_{\kappa(m,i)}B_{\kappa(m,i)},$$
898: where $\{B_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ are bucket random
899: variables, $\{w_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ are \emph{a priori}
900: (positive) constant weights, and
901: $$W_m=\sum_{i=1}^{\eta(m)}w_{\kappa(m,i)}$$ is a normalization factor
902: that maps these weights to probabilities of bucket occurrence in the
903: region. A procedure similar to that in Section~\ref{sec:stat} would
904: then be used to estimate $P(E[Q_m]<T)$ for a threshold~$T$. The
905: result of this calculation can be interpreted as the probability that
906: region~$R_m$ exhibits acceptable average performance, conditional on
907: the temporal simulation assumptions, the bucketing prior probabilities,
908: and the region prior probabilities. However, as we shall see later,
909: this definition of region confidence violates a property that permits
910: an efficient data mining algorithm.
911:
912: We think of region confidence in terms of average bucket confidence
913: over the whole region, namely,
914: $$\Theta_m\approx{}\frac{1}{\eta(m)}\sum_{i=1}^{\eta(m)}P(E[B_{\kappa(m,i)}]<T).$$
915: (If region size $\eta(m)$ is large enough, we can reasonably expect the
916: discretization errors to cancel each other.) This interpretation
917: of~$\Theta_m$ does not correspond to the strongly model-based
918: probability that region~$R_m$ exhibits acceptable average performance.
919: Instead, we define a region random variable~$P_m$ as the probability
920: that \emph{any} bucket $C_{\kappa(m,i)}$ in region~$R_m$ exhibits
921: acceptable average performance. Then, we estimate the expected value
922: $E[P_m]$ across the region~$R_m$ by the sample mean
923: $\hat{P}_m\approx\Theta_m$ of estimates of bucket confidences
924: $\{P(E[B_{\kappa(m,i)}]<T)\}_{i=1}^{\eta(m)}$.
925:
926: How do these two definitions relate to each other? It is easy to show
927: that they are equivalent only under very restrictive assumptions.
928: Basically, we are assuming that the buckets are mutually independent,
929: that population variance is small, and that the region is consistent,
930: i.e., `good' and `bad' buckets are never mixed in the same region. Let
931: bucket random variables $\{B_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ be
932: mutually independent, let the estimates
933: $\{\hat{\Sigma}_{\kappa(m,i)}^2\}_{i=1}^{\eta(m)}$ of bucket variances
934: be approximately equal to zero, and let the estimates
935: $\{\hat{B}_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ of bucket expected BEPs be
936: either all greater than the performance threshold~$T$ or all smaller
937: than the performance threshold~$T$ (i.e., all
938: $\{T-\hat{B}_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ have the same sign).
939: Then, bucket confidences
940: $$P(E[B_{\kappa(m,i)}]<T)\approx{}F_{n_{\kappa(m,i)}-1}\left(\frac{T-\hat{B}_{\kappa(m,i)}}{\hat{\Sigma}_{\kappa(m,i)}/\sqrt{n_{\kappa(m,i)}}}\right),$$
941: $1\le{}i\le\eta(m)$, will be either all approximately equal to zero
942: ($\hat{B}_{\kappa(m,i)}>T$), or all approximately equal to one
943: ($\hat{B}_{\kappa(m,i)}<T$). Therefore, region confidence~$\Theta_m$
944: will be approximately equal to zero or one. Likewise, the strongly
945: model-based region confidence
946: $$P(E[Q_m]<T)\approx{}F_{\eta(m)-1}\left(\frac{T-\hat{Q}_m}{\hat{\Psi}_m/\sqrt{\eta(m)}}\right)$$
947: will be approximately equal to zero or one because the estimate
948: $\hat{\Psi}^2_m$ of region variance is (see Section~\ref{sec:stat})
949: $$\hat{\Psi}_m^2=\frac{1}{W_m^2}\sum_{i=1}^{\eta(m)}w_{\kappa(m,i)}^2\hat{\Sigma}_{\kappa(m,i)}^2\approx{}0.$$
950: The sign of $T-\hat{Q}_m$ determines whether $P(E[Q_m]<T)$ is
951: approximately equal to zero or one. After a minor rearrangement of
952: terms,
953: $$T-\hat{Q}_m=\frac{1}{W_m}\sum_{i=1}^{\eta(m)}w_{\kappa(m,i)}(T-\hat{B}_{\kappa(m,i)}).$$
954: We assumed that $\{T-\hat{B}_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ have the
955: same sign, so we have shown that $P(E[Q_m]<T)\approx\Theta_m$. The
956: equality is asymptotically exact as all variance estimates
957: $\{\hat{\Sigma}_{\kappa(m,i)}^2\}_{i=1}^{\eta(m)}$ approach zero. This
958: argument applies regardless of the distributions of
959: $\{B_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$, as long as these random
960: variables are mutually independent.
961:
962: \subsection{Optimized Regions}
963:
964: We now pursue the definition of optimized regions. Given a
965: {\it slope}~$\tau$, $0\le\tau\le{}1$, define the \emph{gain} of region~$R_m$,
966: $1\le{}m\le{}2^M$, as $$G(R_m,\tau)=H_m-\tau{}S_m,$$ where $H_m$ is the
967: region hit and~$S_m$ is the region support. Let an
968: \emph{optimized-gain admissible region}~$R_\tau$ with respect to
969: slope~$\tau$, $0\le{}\tau\le{}1$, be an admissible region with the
970: maximum gain $G(R_\tau,\tau)$ over all admissible regions (this region
971: need not be unique). Optimized-gain admissible regions are easy to
972: define, compute, and analyze, but hard to interpret. Common practice
973: is to define optimized-confidence and optimized-support admissible
974: regions. Admissible region~$R_*$ is an \emph{optimized-confidence
975: admissible region} with respect to a given support
976: threshold~$1000\eta$, $0\le{}\eta\le{}M$, if $R_*$ has the maximum
977: confidence $\Theta_*=H_*/S_*$ among all admissible regions with support
978: of at least $1000\eta$. Likewise, admissible region~$R_\diamond$ is an
979: \emph{optimized-support admissible region} with respect to a given
980: confidence threshold~$\theta$, $0\le{}\theta\le{}1$, if $R_\diamond$
981: has the maximum support $S_\diamond=1000\eta(\diamond)$ among all
982: admissible regions with confidence of at least~$\theta$. In other
983: words, we can either fix the region confidence~$\theta$ and find the
984: largest region~$R_\diamond$ with confidence of at least~$\theta$, or we
985: can fix the minimum region size (support) $1000\eta$ and find the most
986: confident region~$R_*$ with support of at least $1000\eta$.
987:
988: Observe that~$\tau$ in the definition of an optimized-gain admissible
989: region is the relative importance of support vs. that of confidence.
990: We can find a small region with high confidence or a large region with
991: small confidence, but both objectives cannot be maximized
992: simultaneously. Increasing~$\tau$ will increase the confidence of the
993: optimized-gain admissible region, but decrease its support. Likewise,
994: decreasing~$\tau$ will decrease the confidence of the optimized-gain
995: admissible region, but increase its support. Therefore, we can find
996: approximate optimized-confidence and optimized-support admissible
997: regions by a binary search for the value of~$\tau$ where the respective
998: threshold is barely satisfied. The search can stop at a given level of
999: precision $\Delta\tau$, where the lower bound on $\Delta\tau$ can be
1000: found in~\cite{fukuda-rectilinear} (they show that the number of steps
1001: in this search is logarithmic in the support $1000M$ of the bucket
1002: space). This algorithm is approximate because an optimized-confidence
1003: (resp. optimized-support) admissible region need not be an
1004: optimized-gain admissible region for any value of~$\tau$. Yoda et
1005: al.~\cite{fukuda-rectilinear} argue that this approximation is
1006: reasonable for large datasets.
1007:
1008: Let us revisit the definition of region `goodness'. Geometrically, the
1009: buckets with the same value of~$X$ are the columns and the buckets with
1010: the same value of~$Y$ are the rows. An optimized-gain admissible
1011: region can be computed in $O(M_XM_Y^2)$ time by a set of rules of the
1012: following form. Recall that a region of type W gets wider from left to
1013: right (see Figure~\ref{fig:admissible}). Let $R_W(m,[s,t])$ be the
1014: region of type W with maximum gain $f_W(m,[s,t])$ over all admissible
1015: regions of type W that end in column~$m$ and span rows~$s$ through~$t$
1016: in this column. Then, either (a)~$m$ is the first column of
1017: $R_W(m,[s,t])$, or (b)~$R_W(m,[s,t])$ includes the region
1018: $R_W(m-1,[s',t'])$ with the maximum gain $f_W(m,[s',t'])$ over all
1019: admissible regions that end in column~$m-1$ and span rows $s'\ge{}s$
1020: through $t'\le{}t$ in this column. \cite{fukuda-rectilinear}~keeps the
1021: regions with maximum gain for every region type and every triple
1022: $(m,[s,t])$ in a dynamic programming table. These locally maximal
1023: regions then grow according to a set of rules that compute an
1024: optimized-gain admissible region. This efficient greedy algorithm for
1025: computing optimized-gain admissible regions depends on the property of
1026: the gain function that we refer to as monotonicity. Let
1027: $0\le\tau\le{}1$ be a slope and~$R_{m'}$ and~$R_{m''}$ be two
1028: admissible regions with gains $G(R_{m'},\tau)\ge{}G(R_{m''},\tau)$.
1029: The gain function $G(R_m,\tau)$ is \emph{monotonic} if for any region
1030: $R_k$ disjoint with both~$R_{m'}$ and~$R_{m''}$
1031: $$G(R_{m'}\cup{}R_k,\tau)\ge{}G(R_{m''}\cup{}R_k,\tau),$$ where the
1032: union of regions is defined in the obvious way. It is easy to see that
1033: our gain function
1034: $$G(R_m,\tau)=H_m-\tau{}S_m=\sum_{i=1}^{\eta(m)}\lfloor{}1000P(E[B_{\kappa(m,i)}]<T)+0.5\rfloor-1000\tau\eta(m)$$
1035: is monotonic because it is additive. However, a strongly model-based
1036: gain function $$G^{(M)}(R_m,\tau)=P(E[Q_m]<T)-\tau\eta(m)/M$$ is not
1037: monotonic even if we assume independence of bucket random variables
1038: $\{B_{\kappa(m,i)}\}_{i=1}^{\eta(m)}$ that make up~$Q_m$. To the best
1039: of our knowledge, only monotonic gain functions are known to result in
1040: practical algorithms for computing optimized-gain admissible regions.
1041:
1042: What happens when no estimates of mean and/or variance are available
1043: for some bucket~$C_k$? The answer to this question depends on
1044: problem-specific considerations. As was demonstrated in
1045: Section~\ref{sec:w-example}, it is sometimes possible to provide
1046: conservative estimates for these values. For example, we have
1047: empirically shown that the expected BEPs of some configurations
1048: $\{c_k\}$ are smaller than $T=10^{-3}$ with confidence
1049: $P(E[b_k]<T)\ge{}0.995$. Likewise, we know that as the effective SNR
1050: approaches negative infinity (in~dB), the BEP approaches 0.5, which is
1051: the probability of correctly guessing the value of a random bit when
1052: the transmitter is turned off. Thus, we can let $P(E[b_k]<T)=0$ for
1053: points with sufficiently small effective SNRs and a reasonable
1054: performance threshold~$T$. If no such estimates are available, we can
1055: simply omit the missing buckets from the probability computation. This
1056: must be done with care because such buckets will contribute nothing to
1057: the confidence of the region. This fact can be used to reduce the
1058: computational expense of sampling.
1059:
1060: This section has highlighted the sometimes contradictory objectives
1061: that aggregation must satisfy: permit valid statistical interpretations
1062: and afford structure that can be exploited by data mining algorithms.
1063: Our approach has been a judicious mix of concepts from both statistics
1064: and data mining. We showed that our formulation of the data mining
1065: problem lies between the completely model-free approach and the
1066: strongly model-based approach.
1067: %In particular, mutual independence of
1068: %bucket random variables is crucial to efficient computation of
1069: %optimized admissible regions. If the covariance terms are
1070: %non-negligible, our approach is no longer asymptotically equivalent to
1071: %the strongly model-based approach. Likewise, interactions between the
1072: %buckets break the monotonicity of the gain function and make one
1073: %question the greedy region expansion strategy.
1074: The next section
1075: applies the data mining methodology described here to the example in
1076: Section~\ref{sec:w-example}.
1077:
1078:
1079: \section{Optimized-Support Regions for the STTD Example}
1080: \label{sec:experiments}
1081:
1082: This section continues the example in Section~\ref{sec:w-example}.
1083: First, we show that optimized-gain regions are both rectilinear and
1084: connected for this example. It immediately follows that
1085: optimized-support and optimized-confidence regions are also
1086: admissible. An optimized-support admissible region is presented next.
1087: We show that the elaborate region mining setup leads to simple
1088: engineering interpretations. Finally, we look at the performance of
1089: data mining when the number of samples is small. Three-fold
1090: cross-validation shows that data mining performs well under these
1091: circumstances.
1092:
1093: \subsection{Justification of Data Mining for the STTD Example}
1094:
1095: Let the average SNRs $S_1=X$ and~$S_2=Y$ partition the space of
1096: configurations in Figure~\ref{fig:space} into disjoint points (buckets)
1097: $\{c_k\}_{k=1}^M$, $1\le{}M\le{}1600$.
1098: We now give an intuitive
1099: argument to justify the suitability of the data mining algorithm for the
1100: STTD study.
1101: Without loss of generality,
1102: consider only the points with $X\le{}Y$, i.e., $S_1\le{}S_2$. It is
1103: easy to extend all arguments to $X>Y$, but this adds little to the
1104: discussion.
1105:
1106: Let $c_1$ at $(x_1,y_1)$ and $c_2$ at $(x_2,y_1)$, $x_1<x_2<y_1$, be
1107: two points in an optimized-gain region (of arbitrary shape) for some
1108: slope $0<\tau<1$ (see Figure~\ref{fig:admissible-example}). This means
1109: that the confidences of these points are one, and thus the expected
1110: BEPs of these points are smaller than the performance threshold~$T$.
1111: When $x_1,x_2<y_1$ and $y_1$ is fixed, the BEP is a monotonically
1112: decreasing function of $x$---increasing~$x$ decreases the power
1113: imbalance and increases the effective SNR, so the BEP must decrease.
1114: Therefore, the expected BEP of any point~$c_u$ at $(x_u,y_1)$,
1115: $x_1<x_u<x_2$, is below the performance threshold~$T$. Thus, the
1116: confidences of points~$\{c_u\}$ are one and these points must also be in the
1117: optimized-gain region~$R_\tau$. Three more symmetric arguments of this
1118: kind show that optimized-gain regions are rectilinear.
1119:
1120: Likewise, let $c_1$ at $(x_1,y_1)$ and $c_2$ at $(x_2,y_2)$,
1121: $x_1<x_2<y_1<y_2$, be two points in an optimized-gain rectilinear
1122: region (refer to Figure~\ref{fig:admissible-example}). Since $c_1$ is
1123: in the optimized-gain region and $x_1<x_2$, the point at $(x_2,y_1)$ is
1124: also in this region because it has a smaller BEP than~$c_1$. Since the
1125: optimized-gain region is rectilinear, there is a horizontal path from
1126: $(x_1,y_1)$ to $(x_2,y_1)$ and a vertical path from $(x_2,y_1)$ to
1127: $(x_2,y_2)$. Thus, there is a Manhattan path from $(x_1,y_1)$ to
1128: $(x_2,y_2)$. Arguments of this kind show that optimized-gain
1129: rectilinear regions must be connected as long as they are `wide
1130: enough'.
1131:
1132: To summarize, we have shown that
1133: optimized-gain (and thus optimized-support and optimized-confidence)
1134: regions are admissible. The data mining
1135: algorithm described in Section~\ref{sec:gizmo}, which results in
1136: optimal admissible regions, is thus appropriate for the
1137: STTD example. We now show and interpret data mining results.
1138:
1139: \begin{figure}
1140: \begin{center}
1141: \includegraphics[width=4.0in]{admissible-example}
1142: \end{center}
1143: \caption[Why optimal regions are admissible.]{Points for arguments
1144: about region shape (see text).}
1145: \label{fig:admissible-example}
1146: \end{figure}
1147:
1148: \subsection{Optimized-Support Admissible Regions}
1149:
1150: \begin{figure}
1151: \begin{center}
1152: \includegraphics[width=4.0in]{p_region} \\
1153: \end{center}
1154: \caption[Optimized-support admissible regions for
1155: Figure~\ref{fig:space}.]{Optimized-support admissible region for data
1156: in Figure~\ref{fig:space} (bottom) with the confidence
1157: threshold~$\theta=0.99$ and the performance threshold~$T=10^{-3}$.}
1158: \label{fig:region}
1159: \end{figure}
1160:
1161: Figure~\ref{fig:region} shows an optimized-support admissible region
1162: for the confidence threshold~$\theta=0.99$. Intuitively, this is the
1163: largest admissible region where we can claim, with confidence of at
1164: least~$0.99$, that configurations exhibit acceptable performance. This
1165: claim is conditional on temporal simulation assumptions and on mutual
1166: independence of configurations in the region. The shape of this region
1167: confirms that, under a fixed effective SNR, the BEP is minimal when the
1168: average SNRs of the two branches are equal. The width of this region
1169: shows the largest acceptable power imbalance. For this example, the
1170: system tolerates power imbalance of up to 12~dB. However, the width of
1171: the optimized region is not uniform. The region is narrower for small
1172: effective SNRs and wider for large effective SNRs. This means that
1173: configurations with low effective SNRs are more sensitive to power
1174: imbalance than configurations with high effective SNRs. None of these
1175: observations are news to an informed reader. The contribution of data
1176: mining in this context is not qualitative discoveries; it is
1177: statistically significant quantitative results.
1178:
1179: \begin{figure*}
1180: \begin{center}
1181: \begin{tabular}{c c}
1182: \includegraphics[width=250pt]{p_1} &
1183: \includegraphics[width=250pt]{p_2} \\
1184: \includegraphics[width=250pt]{p_3} &
1185: \includegraphics[width=250pt]{p_all} \\
1186: \end{tabular}
1187: \end{center}
1188: \caption[Cross-validation of optimized-support admissible
1189: regions.]{Cross-validation of optimized-support admissible regions with
1190: the confidence threshold~$\theta=0.95$. The regions in top left,
1191: bottom left, and top right have been computed with $n_k=2$ independent
1192: samples per bucket. There are $758\pm{}2$ buckets ($47\%$ of all data)
1193: per such region. The region in the bottom right has been computed from
1194: the statistically significant data in Figure~\ref{fig:space} (bottom).
1195: It consists of 766 buckets ($48\%$ of all data). Red (dark)
1196: corresponds to low bucket confidence and white (light) corresponds to
1197: high bucket confidence w.r.t. the voice quality threshold~$T=10^{-3}$.}
1198: \label{fig:cv}
1199: \end{figure*}
1200:
1201: Let us see how the data mining algorithm performs when data is scarce.
1202: The initial sample of the configuration space in Figure~\ref{fig:space}
1203: (top) contains one sample value per bucket. The statistically
1204: significant sample in Figure~\ref{fig:space} (bottom) contains at least
1205: two additional sample values per bucket (recall that we required at
1206: least two sample values to estimate bucket variance $\sigma_k^2$).
1207: Therefore, three-fold cross-validation is the most elaborate
1208: cross-validation procedure that this dataset affords. The regions in
1209: the top left, bottom left, and top right of Figure~\ref{fig:cv} have
1210: been computed for sample values in Figure~\ref{fig:space} (top) and the
1211: first two sample values per bucket in Figure~\ref{fig:space} (bottom).
1212: Each of these regions has been computed with two out of the three
1213: sample values per bucket. The region in the lower right has been
1214: computed with all data in Figure~\ref{fig:space} (bottom). All four
1215: regions are optimized-support admissible regions with the confidence
1216: threshold~$\theta=0.95$. The regions are overlaid on top of the
1217: color-coded bucket confidence values. Red (dark) corresponds to low
1218: confidence and white (light) corresponds to high confidence that
1219: configuration $c_k$ exhibits acceptable average performance w.r.t. the
1220: voice quality threshold~$T=10^{-3}$.
1221:
1222: The regions in Figure~\ref{fig:cv} are identical except for the lower
1223: left corner. This is not surprising because this part of the
1224: configuration space exhibits high relative variance. Also, the data is
1225: symmetric but the regions are asymmetric in the lower-left corner.
1226: Recall that optimized-gain admissible regions, and thus
1227: optimized-support admissible regions, are not unique. The ties in
1228: region gains are broken arbitrarily. Therefore, region asymmetry is an
1229: additional indicator of region instability.
1230:
1231: Figure~\ref{fig:cv} also shows that additional data improves image
1232: contrast but does not significantly affect region shape. Collecting
1233: additional sample values separates the points into ones with low
1234: confidence and ones with high confidence. A curious side effect occurs
1235: when the difference in confidence estimates of low-confidence points
1236: falls below the discretization error (1/1000). In this case, the
1237: `confidence slack' $1-\theta$ is allocated to arbitrary points with low
1238: confidence. One way to correct this situation is to raise the
1239: confidence threshold~$\theta$---after all, more accurate data should
1240: afford stronger claims. Another alternative is to lower the
1241: discretization threshold. In general, optimized regions work best when
1242: the data is noisy. A contour plot will suffice when the data is highly
1243: accurate.
1244:
1245: It can also be seen that the high contrast created by the sharp edge of
1246: the tolerance region is advantageous to data mining. The region is
1247: stable where the contrast is high. When the image is blurred, data
1248: mining tries to avoid the questionable boundary points.
1249:
1250: To summarize, this section has demonstrated that optimized-gain regions
1251: are rectilinear and connected for a non-trivial space of wireless
1252: system configurations. We have also shown that optimized-support
1253: admissible regions are easy to interpret. Finally, we have shown that
1254: data mining works well when sample sizes are small.
1255:
1256: \section{Discussion and Future Work}
1257: \label{sec:conclusion}
1258:
1259: We have demonstrated a hierarchical formulation of data mining suitable
1260: for assessing performance of wireless system configurations. WCDMA
1261: simulation results are systematically aggregated and redescribed,
1262: leading to intuitive regions that allow the engineer to evaluate
1263: wireless system configuration parameters. We have shown that the
1264: assumptions about region shape and properties made by data mining
1265: algorithms can be valid in the wireless design context; the patterns
1266: mined hence lead to explainable and statistically valid design
1267: conclusions. As a methodology, data mining is thus shown to be
1268: extremely powerful when coupled with statistically meaningful
1269: performance evaluation.
1270:
1271: This work is the first (known to the authors) application of data
1272: mining methodology to solve problems in wireless system design.
1273: Therefore, a large number of extensions are possible and called for.
1274: We outline possible extensions at the three levels of aggregation:
1275: points, buckets, and regions.
1276:
1277: At the point level, it may be advantageous to model temporal
1278: simulations more precisely. This paper assumes a `large enough' number
1279: of frames per simulation and works with the distribution of estimated BEPs.
1280: We have shown reasonable analytical and empirical evidence that this
1281: distribution is Gaussian. The advantage of this problem formulation is
1282: the independence of spatial aggregation from the assumptions of
1283: temporal simulation. This helps introduce wireless engineers to the
1284: methodology of data mining for studying design problems. However, a
1285: stronger model of temporal simulation (e.g., Markov chains
1286: in~\cite{fsmc-wang}) may yield appreciable gains in software
1287: performance. This direction is worth pursuing because few research
1288: groups have access to parallel computing facilities of the scale used
1289: in this work. For instance, the initial sample of the configuration
1290: space in Figure~\ref{fig:space} (top) would take one year of
1291: computation time on a modern workstation. The study presented in this
1292: paper would clearly be impossible without significant computational
1293: power.
1294:
1295: Aggregation of points into buckets is the least developed part of this
1296: work. Suppose that we would like to simulate the effects of
1297: interference on configuration performance. Assume that the
1298: distribution of the average strengths of the interfering signals is
1299: known \emph{a priori} (e.g., estimated by ray tracing). We can either
1300: make this distribution known to the temporal simulation, or,
1301: alternatively, run several temporal simulations for different strengths
1302: of interfering signals. The former is more accurate and
1303: computationally more efficient, but the latter is more generic and
1304: simpler to implement. Bucketing of simulation results with varying
1305: simulation parameters is intended to approximate the performance of a
1306: single device under varying conditions. This paper does not employ
1307: such bucketing but instead builds all the necessary kinds of parameter
1308: variation into the temporal simulation (which can be argued to be the
1309: right way to do it). However, bucketing may be necessary when one has
1310: to work with a given dataset (e.g., measurements). Bucket space can be
1311: viewed as a configuration space for a more complex temporal
1312: simulation. Therefore, an in-depth treatment of bucketing is
1313: orthogonal to the primary topic of this paper, which is data mining.
1314:
1315: Significant work remains to be done at the region level as well. For
1316: instance, the assumption of small variance could conceivably be relaxed.
1317: One can
1318: also pursue the relatively difficult task of incorporating strongly
1319: model-based prior knowledge into the data mining algorithm, or the
1320: somewhat easier task of applying different kinds of region mining
1321: algorithms to problems in wireless system design.
1322:
1323: Defining additional case studies is another obvious direction for
1324: future work. We have studied a relatively small part of the parameter
1325: space of modern wireless systems. More studies of this type must be
1326: performed to highlight the merits and the shortcomings of data mining
1327: in this domain.
1328:
1329: Finally, the strict staging of data collection and data mining can be
1330: relaxed. One can fruitfully interleave the two activities and have the
1331: results of data mining drive subsequent data collection. In
1332: data-scarce domains, it would be advantageous to focus the data
1333: collection effort on only those regions deemed most important to
1334: support a particular data mining objective. Methodologies for
1335: closing-the-loop in this manner are becoming increasingly
1336: prevalent~\cite{sampling-cise}. This will also help define alternative
1337: criteria for evaluating experiment designs and layouts.
1338:
1339: \bibliographystyle{alpha}
1340: \bibliography{paper}
1341: \end{document}
1342: