1: \subsection{Linear discriminant analysis (LD)}\index{Linear Discriminant}\index{LD}
2: \label{sec:ld}
3:
4: The linear discriminant analysis provides data classification using a linear model,
5: where \textit{linear} refers to the discriminant function $y(\mathbf{x})$ being
6: linear in the parameters $\mathbf{\beta}$
7: \beq
8: y(\mathbf{x})=\mathbf{x}^\top\beta + \beta_0\;,
9: \eeq
10: where $\beta_0$ (denoted the {\em bias}) is adjusted so that $y(\mathbf{x})\geq0$
11: for signal and $y(\mathbf{x})<0$ for background. It can be shown that this is equivalent
12: to the Fisher discriminant, which seeks to maximise the ratio of between-class
13: variance to within-class variance by projecting the data onto a linear subspace.
14:
15: \subsubsection{Booking options}
16:
17: The LD is booked via the command:
18: \begin{codeexample}
19: \begin{tmvacode}
20: factory->BookMethod( Types::kLD, "LD" );
21: \end{tmvacode}
22: \caption[.]{\codeexampleCaptionSize Booking of the linear discriminant: the first argument is
23: a predefined enumerator, the second argument is a user-defined
24: string identifier. No method-specific options are available.
25: See Sec.~\ref{sec:usingtmva:booking} for more information on the booking.}
26: \end{codeexample}
27:
28: No specific options are defined for this method beyond those shared with all the other
29: methods (\cf Option Table~\ref{opt:mva::methodbase} on page~\pageref{opt:mva::methodbase}).
30:
31: \subsubsection{Description and implementation}
32:
33: Assuming that there are $m+1$ parameters $\beta_0, \cdots ,\beta_m$ to be estimated using
34: a training set comprised of $n$ events, the defining equation for $\mathbf{\beta}$ is
35: \beq
36: Y=X\mathbf{\beta}\;,
37: \eeq
38: where we have absorbed $\beta_0$ into the vector $\beta$ and introduced the matrices
39: \beq
40: Y=\left( \begin{array}{c}
41: y_1\\
42: y_2\\
43: \vdots\\
44: y_n \end{array} \right) \mbox{ and } X=\left( \begin{array}{cccc}
45: 1 & x_{11} & \cdots & x_{1m} \\
46: 1 & x_{21} & \cdots & x_{2m} \\
47: \vdots & \vdots & \ddots & \vdots \\
48: 1 & x_{n1} & \cdots & x_{nm} \end{array} \right)\;,
49: \eeq
50: where the constant column in $X$ represents the bias $\beta_0$ and $Y$ is composed of
51: the target values with $y_i=1$ if the $i$th event belongs to the signal class and $y_i=0$
52: if the $i$th event belongs to the background class. Applying the method of least squares,
53: we now obtain the {\em normal equations} for the classification problem, given by
54: \beq
55: X^TX\beta=X^TY \Longleftrightarrow \beta=(X^TX)^{-1}X^TY\;.
56: \eeq
57: The transformation $(X^TX)^{-1}X^T$ is known as the \textit{Moore-Penrose pseudo inverse}
58: of $X$ and can be regarded as a generalisation of the matrix inverse to non-square
59: matrices. It requires that the matrix $X$ has full rank.
60:
61: If weighted events are used, this is simply taken into account by introducing a diagonal
62: weight matrix $W$ and modifying the normal equations as follows:
63: \beq
64: \beta=(X^TWX)^{-1}X^TWY\;.
65: \eeq
66: Considering two events $\mathbf{x}_1$ and $\mathbf{x}_2$ on the decision boundary, we
67: have $y(\mathbf{x}_1)=y(\mathbf{x}_2)=0$ and hence $(\mathbf{x}_1-\mathbf{x}_2)^T\beta=0$.
68: Thus we see that the LD can be geometrically interpreted as determining the decision
69: boundary by finding an orthogonal vector $\beta$.
70:
71: \subsubsection{Variable ranking}
72:
73: The present implementation of LD provides a ranking of the input variables based on the
74: coefficients of the variables in the linear combination that forms the decision boundary.
75: The order of importance of the discriminating variables is assumed to agree with the
76: order of the absolute values of the coefficients.
77:
78: \subsubsection{Regression with LD}
79:
80: It is straightforward to apply the LD algorithm to linear regression by replacing the
81: binary targets $y_i \in {0,1}$ in the training data with the measured values of the
82: function which is to be estimated. The resulting function $y(\mathbf{x})$ is then
83: the best estimate for the data obtained by least-squares regression.
84:
85: \subsubsection{Performance}
86:
87: The LD is optimal for Gaussian distributed variables with linear correlations (\cf the
88: standard toy example that comes with TMVA) and can be competitive with likelihood and
89: nonlinear discriminants in certain cases. No discrimination is achieved when a variable
90: has the same sample mean for signal and background, but the LD can often benefit from
91: suitable transformations of the input variables. For example, if a variable
92: $x \in [-1,1]$ has a signal distribution of the form $x^2$ and a uniform background
93: distribution, their mean value is zero in both cases, leading to no separation. The
94: simple transformation $x \rightarrow |x|$ renders this variable powerful for the use
95: with LD.
96:
97: %%% Local Variables:
98: %%% mode: latex
99: %%% TeX-master: "TMVAUsersGuide"
100: %%% End:
101: