0308:astro-ph0308100/WeightedAnalysisTeq.tex

1: \chapter{Weighted Analysis Technique}

2: \label{WeightedAnalysisTechnique:chap}

3:

4: \section{Introduction}

5:

6: This chapter describes the theoretical basis for the weighted analysis technique, which was inspired by the unique requirements of performing a real time transient search in a high data rate gamma-ray observatory like Milagro. The basic requirements for a transient search analysis are to maximize signal sensitivity while keeping the computational cost low enough to enable a real time analysis on many time scales. Any analysis technique involves compromises between sensitivity, model independence, and speed, and I developed the weighted analysis technique as an alternative to the compromises made by the standard binned or maximum likelihood analyses \cite{CygnusTeq}.

7:

8: As is common in wide-field gamma-ray observatories, Milagro has a highly variable point spread function (PSF), with an order-of-magnitude difference in width from the best events to the worst. (Please see Chapter \ref{Characterization:chap} where Milagro's PSF is characterized.)  In an optimal binned analysis the sky is divided into equal area bins and the number of events observed in each bin is counted, with the size of the bins chosen to maximize the signal-to-noise ratio for the detector's average PSF.  In essence the binned analysis discards the information on the quality of an individual event, and instead treats every event as if it was drawn from the average PSF distribution. Maximum likelihood techniques, of which there are several approaches, can use the event-by-event PSF information and are the most sensitive methods for analyzing wide-field gamma-ray observations.  However, most implementations are computationally slow because they require fitting model parameters, and this fit usually requires an iterative fitting algorithm such as MINUIT \cite{MINUIT}.

9:

10: A third analysis method is the Gaussian weighting technique, and has been used by the Fly's Eye and JANZOS experiments as a compromise between the optimal bin and maximum likelihood analyses.  In Woodham's thesis he describes a technique which uses the Gaussian PSFs of individual events to identify excesses, and assumes large statistics so that the central limit theorem can be used to determine the significances of the excesses \cite{Woodhams}. A similar method was used by the Fly's Eye group to analyze data from Cygnus X-3, but they only outlined the technique used and never published the full analysis method.  From the sketch of the analysis provided in Cassiday et al.\ \cite*{FlysEyeTeq} it is not obvious whether the PSF was Gaussian or arbitrary in shape, and they used an extensive Monte Carlo simulation to determine the significance of the excess.\footnote{There is a nice review of point search techniques in Alexandreas et al.\ \cite*{CygnusTeq}, where the authors claim that Gaussian weighting is as sensitive as maximum likelihood in the limit of large statistics and provides a substantial gain in computation time.  Unfortunately, while plausible these claims are not supported by their citations.}

11:

12: The weighted analysis technique presented in this chapter is an extension of the Gaussian weighting technique as developed by Woodhams \cite*{Woodhams} to PSFs of arbitrary shape and Poisson statistics. To introduce the weighted analysis technique, I return to first principles and build a somewhat idealized sky map, then describe how this sky map can be used as the basis for an analysis.

13:

14:

15:

16: \section{Building a Sky Map}

17: \label{WATSkymap}

18:

19: Let's return to the basics, and imagine making a sky map. An idealized sky map would represent our complete knowledge of the sky --- using all of the available information and adding no spurious or biased information. The question of making a sky map becomes what do we know, and how do we represent that knowledge?

20:

21: Surprisingly, the question ``What do we know?" can be quite subtle. Do we use just the information we measure directly (event positions and characteristics), or do we go to the next step and include sources such as the Crab pulsar or characteristics such as an expected source spectrum in the definition of what we know? Including knowledge of sources leads to maximum likelihood analyses, with differences in how much information we include about the sources (spectra, angular extent, etc.) leading to different types of maximum likelihood analysis. Alternately, we could limit the definition of what we know to directly measured quantities --- event positions and characteristics. This approach delays questions of source identification until after the sky map is created. Both approaches are valid, but from different points of view. The weighted analysis technique uses the later viewpoint and includes only directly measured quantities.

22:

23: The second question is how to represent our knowledge of the event positions and characteristics in a sky map? The PSF is defined as the normalized probability density distribution for the true event position given the measured event position:

24: \begin{equation}

25: \label{PSFdef}

26: PSF(\vec{k}_t-\vec{k}_{m}) = \partial P(\vec{k}_{t}|\vec{k}_{m})/ \partial \Omega,

27: \end{equation}

28: where $\vec{k}$ is a vector on the unit sphere and $\vec{k}_{m}$ is the measured location and $\vec{k}_{t}$ is the true location.\footnote{Since in astronomy we are only concerned with the direction of the initial photon, it is useful to represent this direction as a vector on the unit sphere.  The PSF can also be defined by $\partial P(\vec{k}_{m}|\vec{k}_{t})/ \partial \Omega$, which is identical to the form in Equation \ref{PSFdef} by Bayes formula if $P(\vec{k}_{t})$ is uniform on the scale of the PSF.  For studying analyses, the definition in Equation \ref{PSFdef} is more convenient because $\vec{k}_{m}$ is the observed quantity.} For gamma-ray telescopes, the width and shape of the PSF depends on the characteristics $\psi_i$ of the individual event, and may vary considerably from one event to the next. The PSF can also be multiplied by the probability $P_\gamma$ that the event was a photon at all (as opposed to background) to create a value that is the photon probability density for the $i^{\rm th}$ event's true position being at a position $\vec{k}$ on the sky and being a photon:

29: \begin{equation}

30: \label{probDensityEq}

31: p_i(\vec{k}) = p(\vec{k}|\vec{k}_i, \psi_i) = PSF(\vec{k}-\vec{k}_{i}, \psi_i)P_{\gamma}(\psi_i).

32: \end{equation}

33:

34: A sky map can be created by adding together the photon probability distributions of many events to form an overall map of the photon probability distribution (see Figure \ref{PSFSummationDiagram} for a graphical description).  The total photon probability density distribution $w$ is given by the sum of the individual photon probability densities:

35: \begin{equation}

36: \label{weightSum:Eq}

37: w(\vec{k})=\sum_i^ {\text{all showers}} p_i(\vec{k})=\sum_i^{\text{all showers}}PSF(\vec{k}-\vec{k}_{i}, \psi_i)P_{\gamma}(\psi_i).

38: \end{equation}

39: There is some information loss in forming this sky map because the characteristics of the incoming showers cannot be uniquely determined from the sky map.  In essence the sky map represents total photon probability density, but we have lost the individual $p_i(\vec{k})$ values that make up the sum.\footnote{The $p_i(\vec{k})$ information can be retained if  there is a small set of PSFs and $P_\gamma$s instead of continuous distributions.  In this case a separate sky map for each combination of PSF and $P_\gamma$ can be created and this individual information retained.  Sky maps of this type were used in a maximum likelihood analysis by the EGRET collaboration \cite{EgretTeq} for PSFs which were binned in energy and a binary $P_\gamma$ cut. If the PSF or $P_\gamma$ distributions are continuous only the original list of event positions and characteristics retains all of the information, and any sky map is an approximation.} The sky map of the photon probability distribution uses all of the event-by-event knowledge, and represents our knowledge of the spatial distribution of events.

40:

41:

42: \begin{figure}

43: \begin{center}

44: \includegraphics[height=6.75in]{PSFSummationDiagram.ps}

45: \caption[PSF Summation Diagram]{This is diagram shows how individual photons are added to form a sky map in the weighted analysis technique.  In the top frame we have events with varying PSFs.  The lower frame shows how these events can be added together to form a probability density map, first as a 1-dimensional example of three events, then a more realistic 2-dimensional example incorporating many events.}

46: \label{PSFSummationDiagram}

47: \end{center}

48: \end{figure}

49:

50: A continuous sky map created by adding the PSFs of individual photons represents a somewhat idealized map which is difficult to manipulate with a computer. The spatial scale for fluctuations across the sky map is set by the width of  the narrowest PSF. The sky map can be digitized by sampling the total photon probability density $w(\vec{k})$ at individual points on the map surface.  If the spacing of these samples is small compared to the narrowest PSF the information loss can be made arbitrarily small (Figure \ref{SkyMapSamplingDiagram}). There are several nice features of this digitized sky map. Because the value at a location on the sky map represents the photon probability density at that point and is not an integral over nearby locations (as is common in binned analyses), the spacing between the points need not be uniform and tiling problems associated with binning a spherical sky are avoided. Additionally, two sky maps which share a sampling pattern can be summed. An 80 second sky map can be formed by adding two sky maps of 40 seconds duration --- a significant computational advantage when hunting over multiple time scales.

51:

52: \begin{figure}

53: \begin{center}

54: \includegraphics[width=5.75in]{SkyMapSamplingDiagram.ps}

55: \caption[Sky Map Sampling Diagram]{The shaded background represents the smooth variation of the probability density seen in an example sky map, with the red dots representing locations where the probability density has been sampled.  The sampling pattern need not be uniform, and as long as the sample spacing is small compared to the narrowest PSF, all the information in the continuous sky map is captured in the sampled map.}

56: \label{SkyMapSamplingDiagram}

57: \end{center}

58: \end{figure}

59:

60: In this discussion spectral information has been ignored, but can be added in a completely analogous manner. The key is determining the normalized energy probability density function $\partial P(E_t|E_m,\psi_i)/\partial E$ --- the one dimensional energy analog to the PSF --- for each event. This energy distribution can then be multiplied by $P_\gamma(\psi_i)$ to form the photon energy probability density $p_i(E)$ and added as a third independent axis of the sky map.  The resulting three dimensional sky map is harder to visualize, but again represents the total photon probability distribution.  Similarly, the probability density of any other parameter of interest (such as polarization) can be added to a multidimensional sky map to enrich the representation of the data.  In conclusion, we can form a digitized sky map that represents our direct knowledge by summing the photon probability density distributions of each event and digitizing the resulting map.

61:

62:

63:

64: \section{Source Identification}

65: \label{WATSourceID}

66:

67: Now that we have a sky map of the photon probability distribution we need to tackle the issue of source identification, and again the analysis approach depends on exactly what question is being asked. A maximum likelihood analysis could be directly applied to the photon probability distribution of the sky map. Maximum likelihood is a very flexible approach, allowing searches for sources of different types and characteristics, and forming the sky map first could lead to significant time savings when the number of photons exceeds the number of sampled map locations (very similar to binned maximum likelihood techniques). However, maximum likelihood techniques tend to be too slow for searching multiple time scales in real time. If we are developing a new technique, the question becomes ``What are we looking for, and what compromises are we willing to make?"

68:

69: The weighted analysis technique was developed for discovery mode real-time GRB searches in the Milagro experiment. We expect signals to be transient point sources,\footnote{In a transient source the size of the object in light seconds must be smaller than the emission time to allow different parts of the object to be causally connected.  This requirement can be relaxed somewhat in relativistic outflows due to time dilation, but any signal with a duration less than a few hours which originates at galactic or cosmological distances will have an angular extent much smaller than the Milagro PSF, and appear to be a point source.} but the spectrum of a TeV transient is highly model dependent and uncertain. Furthermore, since we don't know the signal duration we need an analysis which is computationally fast enough to handle a real time search over multiple time scales. So we want a search for point sources which is fast, sensitive, and not strongly biased by spectral expectations. The compromise we are willing to make is that this is a discovery mode search:  we just need to identify sources, once sources are identified they can be analyzed at length using slower more precise methods.

70:

71: Because we are performing a discovery mode search, the relevant statistic is the probability of the background producing the observed signal.  Since we expect point sources, we can look at each of the sampled locations independently and ask ``What is the probability of the background producing the observed photon probability density?" Mathematically, we need to determine the probability that the background could produce a probability greater than or equal to the observed photon probability density  ($w_{obs}$) given the spectrum of probability densities observed in the background ($g(w)$),

72: \begin{equation}

73: \label{baseProb}

74: P(w\ge w_{obs}|g(w,N,\vec{k})).

75: \end{equation}

76: Note that in general the probability density spectrum is dependent on both the number of events ($N$) and the position in the sky ($\vec{k}$).  For small $N$, the probability density spectrum is typically very skewed, with many small values from the tails of the PSF and $\text{P}_\gamma$ distributions and only a few large values. However, as $N$ becomes large the central limit theorem comes into play and the probability density spectrum becomes Gaussian-distributed around the mean.  The probability density spectrum can also be spatially dependent if the distribution of event characteristics changes with position in the sky (if the distribution of $\psi_i$ in Equation \ref{weightSum:Eq} depends on $\vec{k}$).

77:

78: Equation \ref{baseProb} represents the full probability of the background producing an observed probability density if the PSFs used to generate the sky map --- called the weighting functions $PSF'$ --- cover the entire sky and $N$ is deterministic.  In implementing the weighted analysis technique, the weighting functions are often truncated at some angular distance (see Table \ref{Cutoff:table}).  Truncating the weighting function significantly improves the computational speed of the analysis because only sample locations near the position of a new event must be updated, not the entire sky map.  The cost of truncating the weighting functions is a further complication of the statistics.  When the weighting functions are truncated, the number of events summed to form the photon probability density will vary from location to location.  Furthermore, the number of events summed at a location on the sky map ($N_{obs}$) experiences Poisson fluctuations around the expected background value $N_{exp}$, which adds a second observable to the probability.  Determining the probability of the background producing an event which is equal or more signal-like than the current observation is subtle when there are two or more independent variables ($w_{obs}$ and $N_{obs}$ in this case), but can be approximated by\footnote{For a complete explanation of the relevant statistics and this approximation, please see Appendix \ref{ProbAppendix}.}

79: \begin{equation}

80: \label{CorrectProbEq}

81: \sum_{N=N_{obs}}^\infty P(w\ge w_{obs}|g(w,N,\vec{k}))P(N|N_{exp}).

82: \end{equation}

83: If the background probability density spectrum $g(w)$ changes slowly with $N$ ($\partial P(w\ge w_{obs}|g(w,N,\vec{k}))/\partial N \ll \partial P(N|N_{exp})/\partial N$) then equation \ref{CorrectProbEq} can be further approximated by:

84: \begin{equation}

85: \label{Prob:Eq}

86: P(w\ge w_{obs}|g(w,N_{obs},\vec{k}))P(N\ge N_{obs}|N_{exp}).

87: \end{equation}

88: The first term in Equation \ref{Prob:Eq} is just the probability of the observed probability density being produced by the background, and can be easily measured in a background dominated experiment like Milagro. The second term is the Poisson probability of seeing an observed number of events given a background expectation, and is only important if the weighting functions used have a finite extent.  Qualitatively the two terms serve distinct purposes. The first term depends on how gamma-like the events are from the $P_{\gamma}$ values and how clumped the events are from the PSF values (see Equation \ref{weightSum:Eq}), and asks how likely is it that the background could have produced the observed probability density. The second term is looking for a simple excess of events, and becomes important for Milagro only if the PSF is truncated at a given angular distance so that there is an effective bin size for each PSF. Typically $g(w)$ varies slowly with $N$ and a small truncation distance can introduce correlation between the probability terms in Equation \ref{Prob:Eq} which must be accounted for (see Section \ref{WATExamples}).

89:

90: The probability of the background producing an observation as given by Equation \ref{Prob:Eq} can be used to identify signals in the data.  The typical probability threshold for a source discovery is set at $\sim$5$\sigma$, or a probability less than $2.8\times10^{-5}$.  For a single search location this is exactly the threshold that would be used.  However, in a GRB search we are looking at hundreds of millions of independent locations and multiple time scales, and probabilities less than $2.8\times10^{-5}$ are quite common.  By definition, if a 1000 independent locations are analyzed, $\sim$1 will have a probability $\le1/1000$ and $\sim$10 will have a probability  $\le 1/100$.  A log-log plot of the probability histogram has a characteristic slope of -1 if there are no sources, and this can be used as a diagnostic to ensure that the distributions used to calculate the probability are correct.  For a search involving many locations, the source discovery threshold is set at ``$5\sigma$" below the probability where one background event is expected (for 1000 locations $10^{-3}\times[2.8\times10^{-5}] = 2.8\times10^{-8}$).  For a real-time search, the approximate number of independent locations that will be analyzed is determined in advance, and used to set the probability threshold for the discovery of transient TeV emission.

91:

92: Signal identification in the weighted analysis technique has several nice features. First, $P(w \ge w_{obs}|g(w,N_{obs},\vec{k}))$ can be stored in computer lookup tables. Because no parameter fitting is required to determine the probability, calculating the significance of a signal is very fast. Second, because we are simply looking for something which does not look like the background, we are less model sensitive than some maximum likelihood implementations. However, the weighted analysis method does sacrifice some features like estimating the observed spectrum in favor of speed. A source identified with this method would need to be reanalyzed with maximum likelihood to obtain all the information from a signal.

93:

94: \section{Sensitivity}

95: \label{WATSensitivity}

96:

97: We would like to compare the sensitivity of the weighted analysis technique to maximum likelihood and binned analyses. Unfortunately, there is no simple analytic way of comparing the various maximum likelihood analyses to either the optimal binned analysis or the weighted analysis technique. Qualitatively we expect the weighted analysis to do well because it is using all available information, but it is safe to say that it only approaches the sensitivity of a well implemented maximum likelihood analysis. That being said, many common maximum likelihood implementations require a model signal, and their sensitivity can be significantly impacted by mistakes in the original model.\footnote{The oral physics tradition maintains that maximum likelihood is superior to all other techniques.  In the literature there are counterexamples which show cases where maximum likelihood is not ideal \cite{Eadie}, though it may be due to how the maximum likelihood analysis was implemented.  I was not able to find citations proving the superiority of maximum likelihood.}

98:

99: Comparing the weighted analysis technique to an optimal binned analysis also requires a model signal and a Monte Carlo simulation in most cases. However, we can illustrate the key differences using a few toy models with analytic solutions. In the limit of large statistics, we can obtain analytic solutions for both the binned and weighted analysis techniques for a detector with a single Gaussian PSF, and a detector with events drawn from two Gaussian PSFs of different widths. The limit of large statistics allows us to use the central limit theorem to calculate the variance of $w_{obs}$, and the combination of the large statistics limit, no weighting for background rejection, and Gaussian PSFs reduces the weighted analysis technique to the Gaussian weighting analysis developed by Woodhams \cite*{Woodhams}. Since the Gaussian weighting and weighted analysis techniques are identical in this limit, we will refer to them generically as weighted analyses in the following discussion.

100:

101: For these toy models, $N$ is the number of signal photons and $b$ is the number of background events per square degree. After background subtraction, the significance of the signal is given by the signal/noise ratio and has the form $AN/\sqrt{b}$, where $A$ characterizes the sensitivity of the search and is the object of the following calculations. Because the PSFs in these models are symmetric, $\vec{k}-\vec{k}_i$ depends only on the angular separation between the source location ($\vec{k}$) and the reconstructed event location ($\vec{k}_i$).  To simplify the equations, $r$ is used to denote the angular separation in degrees between the source and reconstructed positions.

102:

103: For a single Gaussian, the signal observed in a circular bin of radius $R$ is given by the integral of the PSF

104: \begin{equation}

105: \label{ }

106: Signal=\int_0^R\frac{N}{2\pi\sigma^2}e^{-r^2/2\sigma^2}2\pi rdr = N(1-e^{-R^2/2\sigma^2}),

107: \end{equation}

108: and the noise is given by the square root of the number of background events $\sqrt{b\pi R^2}$.  This ratio is maximized for $R=1.585\sigma$, giving a sensitivity parameter A of $0.255/\sigma$ for a Gaussian PSF.

109:

110: For the weighted analyses, the signal from a point source with $N$ photons is the probability distribution of the photon positions (the true point spread function $PSF$) times the weight given to each photon (the weighting function $PSF'$):

111: \begin{equation}

112: \label{ }

113: Signal = \int_0^{\infty} N\ PSF\ PSF'\ 2\pi rdr.

114: \end{equation}

115:  Since the weighting function and the PSF are the same Gaussian function in this example, the integral becomes

116: \begin{equation}

117: \label{ }

118: Signal = \int_0^{\infty} N\Bigl[\frac{1}{2\pi \sigma^2}e^{-r^2/2\sigma^2}\Bigr]^2 2\pi rdr = \frac{N}{4\pi\sigma^2}.

119: \end{equation}

120: The noise is given by the square root of the variance of the probability density.  In the limit of large statistics, the variance is given by integrating the flat background distribution by the square of the weighting function:

121: \begin{equation}

122: \label{ }

123: Noise = \Biggl[ \int_0^{\infty} b\ PSF'^2\ 2\pi rdr \Biggr]^{1/2}.

124: \end{equation}

125: Since the weighting function is the same Gaussian as the true PSF, this is the same integral as used  for the signal with $b$ replacing $N$. The signal to noise ratio becomes $\frac{N}{\sqrt{4\pi\sigma^2}\sqrt{b}}$, giving a sensitivity parameter A of $0.282/\sigma$.  This implies that the weighted analyses are $\sim10\%$ more sensitive than an optimal bin analysis.  Woodhams \cite*{Woodhams} argued that this 10\% improvement should be a lower limit, and that detectors which have a spectrum of PSFs should benefit even more from a weighted analysis.

126:

127: The next toy model has two Gaussian PSFs, with 25\% of the events coming from a PSF of width 0.33$\sigma$, and 75\% from a PSF of width 1$\sigma$. Following the previous calculation, the optimal bin size is $0.764\sigma$ and the sensitivity parameter is $0.312/\sigma$ for the optimal binned analysis.  For the weighted analyses the sensitivity parameter is $0.489/\sigma$, or a $\sim 56\%$ improvement in sensitivity over the binned analysis. This is the kind of improvement we expected from a weighted analysis technique.

128:

129: However, the improvement depends very much on the spectrum of PSFs, and in special circumstances the improvement can be zero. To show that the 10\% improvement from a single Gaussian PSF is not a lower limit, consider a spectrum of PSFs given by $g(\sigma)$.  The general problem of finding the signal in a round bin becomes

130: \begin{equation}

131: \label{ }

132: N\iint_0^R PSF(\sigma,r)g(\sigma)2\pi r \, dr \, d\sigma,

133: \end{equation}

134: and the signal in a weighted analysis becomes

135: \begin{equation}

136: \label{ }

137: N\iint PSF^2(\sigma,r) g(\sigma)2\pi r \, dr \, d\sigma.

138: \end{equation}

139: For a flat spectrum of Gaussian PSFs from width 0.1$\sigma$ to width 1$\sigma$, the weighted analysis gives less than a 7\% improvement over the binned analysis despite the wide range of PSFs used. In retrospect, this can be explained by reversing the order of integration. By integrating the spectrum of PSFs first (over $d\sigma$), a composite PSF can be obtained which has a distinctly non-Gaussian profile. By choosing the appropriate PSF and spectrum, a composite PSF with a top-hat profile could be generated, and in this extreme case the optimal binned analysis would be just as effective as the weighted analyses. This can be seen by realizing that the weighted analysis technique with a top-hat weighting function

140: \begin{equation}

141: \label{tophat:Eq}

142:  \Theta(\overrightarrow{\triangle k}) = \begin{cases}

143:       a & \overrightarrow{\triangle k}<R \\

144:       0 & \overrightarrow{\triangle k}\ge R\ ,

145: \end{cases}

146: \end{equation}

147: is identical to a binned analysis.  In Equation \ref{tophat:Eq} $R$ is the size of the bin and $a$ is a constant. Returning to the probability of a background fluctuation producing the observed signal as defined in Equation \ref{CorrectProbEq}, the top-hat weighting function leads to $g(w)=\delta(aN_{obs})$ since all the events with a non-zero probability density have a probability density of $a$.  Consequently, the total observed probability density $w_{obs}$ is deterministic and the first probability term is always equal to 1.  The total probability of the background producing the observed signal is solely determined by the second term which is simply the Poisson probability of seeing $N_{obs}$ events inside a bin of radius $R$ --- exactly the same result as a binned analysis.  It can also be shown that the optimal weighting function to use in the weighted analysis technique is the true PSF \cite{Woodhams}. Since the optimal weighting function is the true PSF, and the weighted analysis with a top-hat weighting function is identical to a binned analysis, it follows that the sensitivity of a weighted analysis is never worse than a binned analysis, and would only be equal for a detector with a top-hat composite PSF.  In general, the less square the composite PSF is, the more effective a weighted analysis will be.

148:

149: One final topic we can explore with simple examples is model sensitivity.  Returning to the example with two Gaussian PSFs, we can compare the sensitivity of both analyses to signals where all the signal events come from either the narrow or wide PSFs while the expectation is still for a 25\% -- 75\% division between the PSFs. In these examples, the bin size or background distributions will be wrong, and we can explore how errors in the expected PSF affect the sensitivity of the analysis.  If the PSF of the signal is $0.33\sigma$ (all narrow PSF events), the weighted analyses are more than twice as sensitive as the binned analysis (114\% improvement).  At the opposite extreme, if the PSF of the signal is $1\sigma$ (all wide PSF events), then the binned analysis is nearly 13\% more effective than the weighted analyses.  This surprising result is because the weighted analysis techniques only use information from a single position on the sky map to identify excesses. An excess in photon probability at one location can be produced by either a few high quality photons with narrow PSFs, or a larger number of poor quality photons with wide PSFs.  The weighted analyses determine the significance of a signal by looking at only one position on the sky map and implicitly assuming that the excess has the expected spatial distribution. Another way of looking at this is that the power of the weighed analysis techniques comes from weighting the events with the expected PSF.  However, if the expected PSF is wrong, there can be times when the expected optimal bin/top-hat PSF from a binned analysis happens to be more accurate than the expected PSF.  This shows that there is some model dependence in the weighted analysis technique which can be detrimental in certain specific scenarios.

150:

151: In the preceding examples the $P_\gamma$ term from Equation \ref{probDensityEq} has been assumed to be one.  This is equivalent to a hard background cut which treats all events passing the cut identically ($P_\gamma = 0$ or 1).  The weighted analysis technique can use an analog $P_\gamma$ value instead of a hard cut, and this will magnify the sensitivity advantage of the weighted analysis technique over a binned analysis. This can be seen by observing that a background cut is equivalent to a 1-dimensional bin in the cut parameter, and the same argument which showed that the sensitivity of the weighted analysis technique is greater than or equal to that of the binned analysis applies (if the correct PSF and $P_\gamma$ distributions are used).  In effect, background rejection adds a third dimension to the analysis, and an optimal binned analysis with a background cut uses a step-like probability distribution in all three dimensions, whereas the weighted analysis uses the expected probability distributions.

152:

153: In general, the sensitivity of two analyses can only be compared using Monte Carlo simulation and an expected signal. There are a number of subtleties which have been masked by the simplicity of these examples, including the effect of fluctuations (on all parameters) in the limit of low statistics. For GRB searches, the limit of large statistics does not hold and the similarity between Gaussian weighting and the weighted analysis technique is broken. Gaussian weighting as developed by Woodhams \cite*{Woodhams} can only be used in the limit of large statistics, and the weighted analysis technique can be seen as an extension of Gaussian weighting to arbitrary PSF and the regime of Poisson statistics. Alexandreas et al.\ \cite*{CygnusTeq} performed a Monte Carlo simulation to compare the sensitivity of  maximum likelihood and optimal binned analyses, and for the simple case of a single Gaussian PSF (see the first example in this section) they also observed a $\sim10\%$ improvement with maximum likelihood.  This implies that the weighted analysis technique is similar to the sensitivity of maximum likelihood in this limit. The weighted analysis technique is more sensitive than the binned analysis for much but not all of the possible phase space, and should approach the sensitivity of well implemented maximum likelihood searches for at least some of the phase space.

154:

155:

156:

157: \section{Summary}

158:

159: The weighted analysis technique fits a particular analysis niche. For discovery mode GRB searches with variable PSF instruments, we want an analysis which is fast and uses all available information. Binned analyses are very fast, but sacrifice sensitivity by ignoring the variable PSF typical of wide-field gamma-ray telescopes. Maximum likelihood techniques use all available information, and can give valuable information like the estimated spectrum, but are computationally slow.  The weighted analysis technique is a compromise between binned and maximum likelihood techniques, landing somewhere in the middle on computational speed, but like advanced likelihood techniques uses the event-by-event PSF and $P_\gamma$ information for source identification.

160:

161:

162:

163:

164: