1:
2: \section{Performance Analysis}
3: \label{sec_performance_analysis}
4:
5:
6: In this section, we present a performance analysis of the proposed
7: classification algorithm. We show that the optimality loss due to
8: relaxation and random rounding is negligible if the total sample
9: number $N$ is sufficiently large. Therefore, our algorithm is
10: near-optimal with reduced computational complexity.
11:
12:
13: We need to use the inequality in Lemma \ref{azuma_inequality} in our
14: discussion. The inequality is one variation of the Azuma inequality
15: proven by Janson \cite{azuma67}\cite{janson98}.
16:
17:
18:
19:
20:
21: \begin{lemma} \cite{janson98}
22: \label{azuma_inequality}\emph{(Azuma Inequality)} Let
23: $Z_1,\dots,Z_N$ be independent random variables, with $Z_k$ taking
24: values in a set $\Lambda_k$. Assume that a (measurable) function
25: $f:\Lambda_1\times \Lambda_2\times \cdots \times
26: \Lambda_N\rightarrow {\mathbb R}$ satisfies the following Lipschitz
27: condition (L).
28: \begin{itemize}
29: \item (L) If the vectors $z,z'\in\prod_{1}^{N}\Lambda_i$ differ only
30: in the $k$th coordinate, then $|f(z)-f(z')|<c_k$, $k=1,\ldots,N$.
31: \end{itemize}
32: Then, the random variable $X=f(Z_1,\ldots,Z_N)$ satisfies, for any
33: $t\geq 0$,
34: \begin{align}
35: {\mathbb P}(X\geq {\mathbb E}X+t)\leq
36: \exp\left(\frac{-2t^2}{\sum_{1}^{N}c_k^2}\right),
37: \end{align}
38: \begin{align}
39: {\mathbb P}(X\leq {\mathbb E}X-t)\leq
40: \exp\left(\frac{-2t^2}{\sum_{1}^{N}c_k^2}\right).
41: \end{align}
42: \end{lemma}
43:
44:
45:
46: As in the previous sections, we use $a_{ni}^{\ast}$ to denote the
47: solution for the relaxation programming. We use $p_i^{\ast}$,
48: $\left(\sigma_i^{\ast}\right)^2$, $\mu_i^{\ast}$ to denote the
49: corresponding occurrence probability, variance and mean. That is,
50: \begin{align}
51: & \mu_i^{\ast}=\frac{\sum_{n=1}^{N}a_{ni}^{\ast}x_n}{\sum_{n=1}^{N}a_{ni}^{\ast}}, \\
52: & (\sigma_i^\ast)^2= \left(\frac{1}{\sum_{n=1}^{N}a_{ni}^\ast}\right)\sum_{n=1}^{N}a_{ni}^\ast\left(x_n-\mu_i^\ast\right)^2,\\
53: & p_i^\ast=\frac{\sum_{n=1}^{N}a_{ni}^\ast}{N}.
54: \end{align}
55: We use $z_1,\ldots,z_N$ to denote the classification scheme
56: obtained from Algorithm \ref{relaxation_algorithm}. In the
57: following, we abuse the notation and use $a_{ni}$ to denote the
58: randomly rounded version of the variable $a_{ni}^\ast$, i.e.,
59: \begin{align}
60: a_{ni}=\left\{\begin{array}{ll}
61: 1, & \mbox{if }z_n=i \\
62: 0, & \mbox{otherwise}
63: \end{array}\right.
64: \end{align}
65: Similarly, we use $p_i$, $\sigma_i^2$, $\mu_i$ to denote the
66: corresponding occurrence probability, variance, and mean. That is,
67: \begin{align}
68: & \mu_i=\frac{\sum_{n=1}^{N}a_{ni}x_n}{\sum_{n=1}^{N}a_{ni}}, \\
69: & \sigma_i^2= \left(\frac{1}{\sum_{n=1}^{N}a_{ni}}\right)\sum_{n=1}^{N}a_{ni}\left(x_n-\mu_i\right)^2,\\
70: & p_i=\frac{\sum_{n=1}^{N}a_{ni}}{N}.
71: \end{align}
72:
73:
74: \begin{definition}
75: Let $\epsilon_1,\epsilon_2,\epsilon_3$ be arbitrary positive real
76: numbers. We say that one classification scheme is
77: $(\epsilon_1,\epsilon_2,\epsilon_3)$-typical if the following
78: conditions hold for all $i$, $1\leq i\leq J$,
79: \begin{align}
80: \left|\sum_{n=1}^{N}a_{ni}-\sum_{n=1}^{N}a_{ni}^\ast\right|\leq
81: \epsilon_1 N,
82: \end{align}
83: \begin{align}
84: \left|\sum_{n=1}^{N}a_{ni}x_n-\sum_{n=1}^{N}a_{ni}^\ast
85: x_n\right|\leq \epsilon_2 N,
86: \end{align}
87: \begin{align}
88: \left|\sum_{n=1}^{N}a_{ni}\left(x_n-\mu_i^\ast\right)^2-\sum_{n=1}^{N}a_{ni}^\ast\left(x-\mu_i^\ast\right)^2\right|\leq
89: \epsilon_3 N.
90: \end{align}
91: \end{definition}
92:
93:
94:
95: \begin{lemma}
96: If $\epsilon_1,\epsilon_2,\epsilon_3$ all go to zero, then for
97: $(\epsilon_1,\epsilon_2,\epsilon_3)$-typical classification schemes,
98: $\mu_i$, $p_i$, $\sigma_i^2$ go to $\mu_i^\ast$, $p_i^\ast$,
99: $\left(\sigma_i^\ast\right)^2$ respectively.
100: \end{lemma}
101: \begin{proof}
102: It can be easily checked that $\mu_i$ goes to $\mu_i^\ast$, and
103: $p_i$ goes to $p_i^\ast$. For $\sigma_i^2$, we notice that
104: \begin{align}
105: & \sum_{n=1}^{N}a_{ni}(x_n-\mu_i)^2 \\
106: & =\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast}+\mu_i^{\ast}-\mu_i)^2\\
107: & =\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast})^2+
108: \sum_{n=1}^{N}a_{ni}(\mu_i^{\ast}-\mu_i)^2 \\
109: & \hspace{0.5in} +2
110: \sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast})(\mu_i^{\ast}-\mu_i) \\
111: & =\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast})^2+
112: p_iN(\mu_i^{\ast}-\mu_i)^2 \\
113: & \hspace{0.5in} +2
114: (\mu_i^{\ast}-\mu_i)\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast}) \\
115: & =\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast})^2-
116: p_iN(\mu_i^{\ast}-\mu_i)^2
117: \end{align}
118: Therefore,
119: \begin{align}
120: \sigma_i^2 & =\frac{\sum_{n=1}^{N}a_{ni}(x_n-\mu_i)^2}{\sum_{n=1}^{N}a_{ni}}\\
121: &
122: =\frac{\sum_{n=1}^{N}a_{ni}(x_n-\mu_i^{\ast})^2}{\sum_{n=1}^{N}a_{ni}}-(\mu_i^{\ast}-\mu_i)^2
123: \end{align}
124: It follows that $\sigma_i^2$ goes to
125: $\left(\sigma_i^\ast\right)^2$.\qed
126: \end{proof}
127:
128:
129: \begin{theorem}
130: \label{main_theorem} Let $\epsilon_1,\epsilon_2,\epsilon_3$ be
131: arbitrary positive real numbers. Let $V=\max_n x_n-\min_n x_n$.
132: Then, the probability that the classification scheme obtained from
133: Algorithm \ref{relaxation_algorithm} is not
134: $(\epsilon_1,\epsilon_2,\epsilon_3)$-typical is upper bounded as
135: follows.
136: \begin{align}
137: & {\mathbb P}\left(\mbox{the classification scheme is not
138: }(\epsilon_1,\epsilon_2,\epsilon_3)\mbox{-typical}\right) \\
139: & \leq 2J
140: \exp\left(-2\epsilon_1^2N\right)+2J\exp\left(\frac{-2\epsilon_2^2N}{V^2}\right)+
141: 2J\exp\left(\frac{-2\epsilon_3^2N}{V^4}\right)
142: \end{align}
143: \end{theorem}
144: \begin{proof}
145: By using the Azuma inequality, we can show that
146: \begin{align}
147: {\mathbb
148: P}\left(\left|\sum_{n=1}^{N}a_{ni}-\sum_{n=1}^{N}a_{ni}^\ast\right|\geq
149: \epsilon_1 N\right)\leq 2\exp\left(-2\epsilon_1^2N\right),
150: \end{align}
151: \begin{align}
152: {\mathbb
153: P}\left(\left|\sum_{n=1}^{N}a_{ni}x_n-\sum_{n=1}^{N}a_{ni}^\ast
154: x_n\right|\geq \epsilon_2 N\right)\leq
155: 2\exp\left(\frac{-2\epsilon_2^2N}{V^2}\right),
156: \end{align}
157: \begin{align}
158: & {\mathbb
159: P}\left(\left|\sum_{n=1}^{N}a_{ni}\left(x_n-\mu_i^\ast\right)^2-\sum_{n=1}^{N}a_{ni}^\ast\left(x-\mu_i^\ast\right)^2\right|\geq
160: \epsilon_3 N\right) \\
161: & \leq 2\exp\left(\frac{-2\epsilon_3^2N}{V^4}\right).
162: \end{align}
163: The theorem follows from a union bound.\qed
164: \end{proof}
165:
166:
167: \begin{corollary}
168: \label{main_corollary} If the sample number $N$ is sufficiently
169: large, then the classification scheme obtained from Algorithm
170: \ref{relaxation_algorithm} is
171: $(\epsilon_1,\epsilon_2,\epsilon_3)$-typical with probability close
172: to one.
173: \end{corollary}
174: \begin{proof}
175: The upper bound in Theorem \ref{main_theorem} is close to zero for
176: sufficiently large $N$.\qed
177: \end{proof}
178:
179:
180: \begin{corollary}
181: \label{existence_corollary} If the sample number $N$ is
182: sufficiently large, then there exists at least one
183: $(\epsilon_1,\epsilon_2,\epsilon_3)$-typical classification scheme.
184: \end{corollary}
185: \begin{proof}
186: We have presented an algorithm, which constructs such a
187: classification scheme with success probability close to one.\qed
188: \end{proof}
189:
190:
191: \begin{remark}
192: Theorem \ref{main_theorem} and Corollary \ref{existence_corollary}
193: imply that the gap between the optimal classification gain achieved
194: in the relaxation optimization and the optimal classification gain
195: achieved in the integer optimization goes to zero asymptotically. In
196: other words, the continuous relaxation incurs an asymptotically
197: vanishing optimality loss.
198: \end{remark}
199:
200:
201:
202:
203:
204: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
205:
206:
207:
208:
209:
210:
211:
212:
213:
214:
215:
216:
217:
218:
219:
220:
221:
222:
223:
224:
225:
226:
227:
228:
229:
230:
231:
232:
233:
234:
235:
236:
237:
238:
239:
240:
241:
242:
243:
244:
245:
246:
247: %
248: