0703:physics0703039/LD.tex

1: \subsection{Linear discriminant analysis (LD)}\index{Linear Discriminant}\index{LD}

2: \label{sec:ld}

3:

4: The linear discriminant analysis provides data classification using a linear model,

5: where \textit{linear} refers to the discriminant function $y(\mathbf{x})$ being

6: linear in the parameters $\mathbf{\beta}$

7: \beq

8: 	y(\mathbf{x})=\mathbf{x}^\top\beta + \beta_0\;,

9: \eeq

10: where $\beta_0$ (denoted the {\em bias}) is adjusted so that $y(\mathbf{x})\geq0$

11: for signal and $y(\mathbf{x})<0$ for background. It can be shown that this is equivalent

12: to the Fisher discriminant, which seeks to maximise the ratio of between-class

13: variance to within-class variance by projecting the data onto a linear subspace.

14:

15: \subsubsection{Booking options}

16:

17: The LD is booked via the command:

18: \begin{codeexample}

19: \begin{tmvacode}

20: factory->BookMethod( Types::kLD, "LD" );

21: \end{tmvacode}

22: \caption[.]{\codeexampleCaptionSize Booking of the linear discriminant: the first argument is

23: 		   a predefined enumerator, the second argument is a user-defined

24: 	   	string identifier. No method-specific options are available.

25:         See Sec.~\ref{sec:usingtmva:booking} for more information on the booking.}

26: \end{codeexample}

27:

28: No specific options are defined for this method beyond those shared with all the other

29: methods (\cf Option Table~\ref{opt:mva::methodbase} on page~\pageref{opt:mva::methodbase}).

30:

31: \subsubsection{Description and implementation}

32:

33: Assuming that there are $m+1$ parameters $\beta_0, \cdots ,\beta_m$ to be estimated using

34: a training set comprised of $n$ events, the defining equation for $\mathbf{\beta}$ is

35: \beq

36: 	Y=X\mathbf{\beta}\;,

37: \eeq

38: where we have absorbed $\beta_0$ into the vector $\beta$ and introduced the matrices

39: \beq

40: 	Y=\left( \begin{array}{c}

41: 	y_1\\

42: 	y_2\\

43: 	\vdots\\

44: 	y_n \end{array} \right) \mbox{  and  } X=\left( \begin{array}{cccc}

45: 							1 & x_{11} & \cdots & x_{1m} \\

46: 							1 & x_{21} & \cdots & x_{2m} \\

47: 							\vdots & \vdots & \ddots & \vdots \\

48: 							1 & x_{n1} & \cdots & x_{nm} \end{array} \right)\;,

49: \eeq

50: where the constant column in $X$ represents the bias $\beta_0$ and $Y$ is composed of

51: the target values with $y_i=1$ if the $i$th event belongs to the signal class and $y_i=0$

52: if the $i$th event belongs to the background class. Applying the method of least squares,

53: we now obtain the {\em normal equations} for the classification problem, given by

54: \beq

55: 	X^TX\beta=X^TY \Longleftrightarrow \beta=(X^TX)^{-1}X^TY\;.

56: \eeq

57: The transformation $(X^TX)^{-1}X^T$ is known as the \textit{Moore-Penrose pseudo inverse}

58: of $X$ and can be regarded as a generalisation of the matrix inverse to non-square

59: matrices. It requires that the matrix $X$ has full rank.

60:

61: If weighted events are used, this is simply taken into account by introducing a diagonal

62: weight matrix $W$ and modifying the normal equations as follows:

63: \beq

64: 	\beta=(X^TWX)^{-1}X^TWY\;.

65: \eeq

66: Considering two events $\mathbf{x}_1$ and $\mathbf{x}_2$ on the decision boundary, we

67: have $y(\mathbf{x}_1)=y(\mathbf{x}_2)=0$ and hence $(\mathbf{x}_1-\mathbf{x}_2)^T\beta=0$.

68: Thus we see that the LD can be geometrically interpreted as determining the decision

69: boundary by finding an orthogonal vector $\beta$.

70:

71: \subsubsection{Variable ranking}

72:

73: The present implementation of LD provides a ranking of the input variables based on the

74: coefficients of the variables in the linear combination that forms the decision boundary.

75: The order of importance of the discriminating variables is assumed to agree with the

76: order of the absolute values of the coefficients.

77:

78: \subsubsection{Regression with LD}

79:

80: It is straightforward to apply the LD algorithm to linear regression by replacing the

81: binary targets $y_i \in {0,1}$ in the training data with the measured values of the

82: function which is to be estimated. The resulting function $y(\mathbf{x})$ is then

83: the best estimate for the data obtained by least-squares regression.

84:

85: \subsubsection{Performance}

86:

87: The LD is optimal for Gaussian distributed variables with linear correlations (\cf the

88: standard toy example that comes with TMVA) and can be competitive with likelihood and

89: nonlinear discriminants in certain cases. No discrimination is achieved when a variable

90: has the same sample mean for signal and background, but the LD can often benefit from

91: suitable transformations of the input variables. For example, if a variable

92: $x \in [-1,1]$ has a signal distribution of the form $x^2$ and a uniform background

93: distribution, their mean value is zero in both cases, leading to no separation. The

94: simple transformation $x \rightarrow |x|$ renders this variable powerful for the use

95: with LD.

96:

97: %%% Local Variables:

98: %%% mode: latex

99: %%% TeX-master: "TMVAUsersGuide"

100: %%% End:

101: