physics0703039/LD.tex
1: \subsection{Linear discriminant analysis (LD)}\index{Linear Discriminant}\index{LD}
2: \label{sec:ld}
3: 
4: The linear discriminant analysis provides data classification using a linear model, 
5: where \textit{linear} refers to the discriminant function $y(\mathbf{x})$ being 
6: linear in the parameters $\mathbf{\beta}$
7: \beq
8: 	y(\mathbf{x})=\mathbf{x}^\top\beta + \beta_0\;,
9: \eeq
10: where $\beta_0$ (denoted the {\em bias}) is adjusted so that $y(\mathbf{x})\geq0$ 
11: for signal and $y(\mathbf{x})<0$ for background. It can be shown that this is equivalent 
12: to the Fisher discriminant, which seeks to maximise the ratio of between-class 
13: variance to within-class variance by projecting the data onto a linear subspace.
14: 
15: \subsubsection{Booking options}
16: 
17: The LD is booked via the command:
18: \begin{codeexample}
19: \begin{tmvacode}
20: factory->BookMethod( Types::kLD, "LD" );
21: \end{tmvacode}
22: \caption[.]{\codeexampleCaptionSize Booking of the linear discriminant: the first argument is 
23: 		   a predefined enumerator, the second argument is a user-defined 
24: 	   	string identifier. No method-specific options are available.
25:         See Sec.~\ref{sec:usingtmva:booking} for more information on the booking.}
26: \end{codeexample}
27: 
28: No specific options are defined for this method beyond those shared with all the other 
29: methods (\cf Option Table~\ref{opt:mva::methodbase} on page~\pageref{opt:mva::methodbase}).
30: 
31: \subsubsection{Description and implementation}
32: 
33: Assuming that there are $m+1$ parameters $\beta_0, \cdots ,\beta_m$ to be estimated using 
34: a training set comprised of $n$ events, the defining equation for $\mathbf{\beta}$ is
35: \beq
36: 	Y=X\mathbf{\beta}\;,
37: \eeq
38: where we have absorbed $\beta_0$ into the vector $\beta$ and introduced the matrices
39: \beq
40: 	Y=\left( \begin{array}{c}
41: 	y_1\\
42: 	y_2\\
43: 	\vdots\\
44: 	y_n \end{array} \right) \mbox{  and  } X=\left( \begin{array}{cccc}
45: 							1 & x_{11} & \cdots & x_{1m} \\
46: 							1 & x_{21} & \cdots & x_{2m} \\
47: 							\vdots & \vdots & \ddots & \vdots \\
48: 							1 & x_{n1} & \cdots & x_{nm} \end{array} \right)\;,
49: \eeq
50: where the constant column in $X$ represents the bias $\beta_0$ and $Y$ is composed of 
51: the target values with $y_i=1$ if the $i$th event belongs to the signal class and $y_i=0$ 
52: if the $i$th event belongs to the background class. Applying the method of least squares, 
53: we now obtain the {\em normal equations} for the classification problem, given by
54: \beq
55: 	X^TX\beta=X^TY \Longleftrightarrow \beta=(X^TX)^{-1}X^TY\;.
56: \eeq
57: The transformation $(X^TX)^{-1}X^T$ is known as the \textit{Moore-Penrose pseudo inverse} 
58: of $X$ and can be regarded as a generalisation of the matrix inverse to non-square 
59: matrices. It requires that the matrix $X$ has full rank.
60: 
61: If weighted events are used, this is simply taken into account by introducing a diagonal
62: weight matrix $W$ and modifying the normal equations as follows:
63: \beq
64: 	\beta=(X^TWX)^{-1}X^TWY\;.
65: \eeq
66: Considering two events $\mathbf{x}_1$ and $\mathbf{x}_2$ on the decision boundary, we 
67: have $y(\mathbf{x}_1)=y(\mathbf{x}_2)=0$ and hence $(\mathbf{x}_1-\mathbf{x}_2)^T\beta=0$. 
68: Thus we see that the LD can be geometrically interpreted as determining the decision 
69: boundary by finding an orthogonal vector $\beta$.
70: 
71: \subsubsection{Variable ranking}
72: 
73: The present implementation of LD provides a ranking of the input variables based on the 
74: coefficients of the variables in the linear combination that forms the decision boundary. 
75: The order of importance of the discriminating variables is assumed to agree with the 
76: order of the absolute values of the coefficients.
77: 
78: \subsubsection{Regression with LD}
79: 
80: It is straightforward to apply the LD algorithm to linear regression by replacing the 
81: binary targets $y_i \in {0,1}$ in the training data with the measured values of the 
82: function which is to be estimated. The resulting function $y(\mathbf{x})$ is then 
83: the best estimate for the data obtained by least-squares regression.
84: 
85: \subsubsection{Performance}
86: 
87: The LD is optimal for Gaussian distributed variables with linear correlations (\cf the 
88: standard toy example that comes with TMVA) and can be competitive with likelihood and 
89: nonlinear discriminants in certain cases. No discrimination is achieved when a variable 
90: has the same sample mean for signal and background, but the LD can often benefit from 
91: suitable transformations of the input variables. For example, if a variable 
92: $x \in [-1,1]$ has a signal distribution of the form $x^2$ and a uniform background 
93: distribution, their mean value is zero in both cases, leading to no separation. The 
94: simple transformation $x \rightarrow |x|$ renders this variable powerful for the use 
95: with LD.
96: 
97: %%% Local Variables: 
98: %%% mode: latex
99: %%% TeX-master: "TMVAUsersGuide"
100: %%% End: 
101: