0402:cs0402014/cs0402014

1: \documentclass{elsart}

2:

3: % Use the option doublespacing or reviewcopy to obtain double line spacing

4: % \documentclass[doublespacing]{elsart}

5:

6: % if you use PostScript figures in your article

7: % use the graphics package for simple commands

8: % \usepackage{graphics}

9: % or use the graphicx package for more complicated commands

10: % \usepackage{graphicx}

11: % or use the epsfig package if you prefer to use the old commands

12: % \usepackage{epsfig}

13:

14: % The amssymb package provides various useful mathematical symbols

15:

16: \usepackage{amsmath,amsfonts,amssymb}

17: \usepackage{latexsym}

18:

19: \begin{document}

20:

21: \begin{frontmatter}

22:   \title{Self-Organising Networks for Classification: developing Applications to

23:     Science Analysis for Astroparticle Physics}

24:   \author{A.~De Angelis,

25:     P.~Boinee,~M. Frailis, E.~Milotti}

26:   \address{Dipartimento di Fisica

27:     dell'Universit\`a di~Udine and INFN Trieste, via delle Scienze~208,

28:     I-33100~Udine, Italy}

29:

30:

31: \begin{abstract}

32:   Physics analysis in astroparticle experiments requires the capability of

33:   recognizing new phenomena; in order to establish what is new, it is important

34:   to develop tools for automatic classification, able to compare the final

35:   result with data from different detectors.  A typical example is the problem

36:   of Gamma Ray Burst detection, classification, and possible association to

37:   known sources: for this task physicists will need in the next years tools to

38:   associate data from optical databases, from satellite experiments (EGRET,

39:   GLAST), and from Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS).

40: \end{abstract}

41:

42: % Title, authors and addresses

43:

44: % use the thanksref command within \title, \author or \address for footnotes;

45: % use the corauthref command within \author for corresponding author footnotes;

46: % use the ead command for the email address,

47: % and the form \ead[url] for the home page:

48: % \title{Title\thanksref{label1}}

49: % \thanks[label1]{}

50: % \author{Name\corauthref{cor1}\thanksref{label2}}

51: % \ead{email address}

52: % \ead[url]{home page}

53: % \thanks[label2]{}

54: % \corauth[cor1]{}

55: % \address{Address\thanksref{label3}}

56: % \thanks[label3]{}

57:

58: % use optional labels to link authors explicitly to addresses:

59: % \author[label1,label2]{}

60: % \address[label1]{}

61: % \address[label2]{}

62:

63: \begin{keyword}

64: SOM \sep Classification \sep Clustering \sep GRB

65: % keywords here, in the form: keyword \sep keyword

66:

67: % PACS codes here, in the form: \PACS code \sep code

68:   \PACS

69: \end{keyword}

70: \end{frontmatter}

71:

72: % main text

73: \section{Introduction}

74: Clustering of features is an important problem in many physics experiments. Such

75: an analysis task can be performed:

76: \begin{itemize}

77: \item in a supervised way, when the analyst has some examples, for which the

78:   correct classification is known. This can be done, for example, in most

79:   problems related to particle physics at accelerators, where there is a

80:   generally good knowledge of detectors and of the underlying physics, and good

81:   simulations are available.

82: \item in an unsupervised way, when the events are partitioned into classes of

83:   similar elements, without using additional information. This is the case

84:   especially for fields operating in a discovery regime, as, e.g., astroparticle

85:   physics.

86: \end{itemize}

87:

88: The idea of automatic classification is not new in particle and astroparticle

89: physics.  Cleaning up the signal and separating concurrent signals when

90: nonlinear effects and high-order correlations are important is a standard in

91: particle physics since the analysis of the branching fraction of the Z boson

92: into $b\bar{b}$ pairs by DELPHI \cite{del92}.

93:

94: An important literature exists for the use of automatic classifiers in

95: astroparticle physics (see for example \cite{praveen} and references therein).

96: Such a classification was mostly done with the use of Multilayer Perceptrons,

97: while the bulk of the works based on unsupervised classification uses

98: Independent Component Analysis (see for example \cite{ciccia} and references

99: therein).  Studies based on Self-Organized Maps and Growing Self-Organizing

100: Networks \cite{p2,p3,p6,p7,p8} have recently started \cite{p16}, but a general

101: framework for multiwavelength classification is still missing.

102:

103:

104: \section{A case study in astroparticle physics}

105: Gamma-ray astroparticle physics is a relatively new science; it has as a

106: counterpart optical astrophysics, one of the oldest sciences.  Many of the

107: objects we observe in the gamma sky, sensitive to the phenomena of high-energy

108: physics, have an optical counterpart or clear relations to optical objects.

109: Finding what is a signature of a new phenomenon requires the ability to classify

110: observations, and the ability to recognize what is not new.

111:

112: Astrophysical databases contain large amounts of data; one example is given by

113: the growing number of experiments studying Gamma Ray Bursts (GRBs). Data sets

114: can be found in several archives (see e.g.~Ref. \cite{p1}).

115:

116: Large datasets are available from systematic sky surveys. The size of such

117: databases is now of the order of $10^{12}$ bytes, but in the near future it will

118: grow by three orders of magnitude thanks to the technological development of

119: telescopes and detectors.  Surveys are done on a wide energy range (from

120: $10^{-7}$ to $10^{14}$ eV), and they are heterogeneous (mission-oriented,

121: platform and instrument dependent). The attributes registered are variable

122: (polarization etc.); numerical simulations have to be matched to real data.

123:

124: Such a complexity poses nontrivial data management issues (see \cite{frailis});

125: moreover, we need uniform interfaces to access complex data.  A few projects

126: started in the last years with the simple purpose of making the data readable.

127:

128: \section{The project at the University of Udine}

129:

130: At the University of Udine we are developing a project involving data

131: organization, data mining and analysis tools for the analysis of gamma sources

132: (Gamma Ray Bursts in particular: most of the EGRET sources were unidentified).

133:

134: The sources detected by GLAST \cite{glast} and MAGIC \cite{magic} will be

135: compared with existing databases to detect what is new. What is new can be then

136: classified based on an unsupervised classifier.

137:

138: Another important analysis tool is a powerful visualization package: the idea is

139: to visually present many variables together offering a degree of control over a

140: number of different visual properties.  High dimensionality of data set and

141: visual properties such as color, size can be added to the position property for

142: proper visualization purposes.  Multiple views can be used by linking all

143: separate views together when the use of these properties makes it difficult.

144:

145:

146: \subsection{Classification of GRBs}

147:

148: The kernel of the analysis is the strategy for the classification. With the

149: growing number of experiments dedicated to GRBs \cite{grb} it is essential to

150: optimize the techniques for the complex task of classification. Artificial

151: Intelligence- (AI-) based pattern recognition algorithms are one possible

152: candidate: automated linear classification of vector data into a given number

153: (or an arbitrary number) of classes is a well established technique in the field

154: of machine learning.  Several varieties of AI-based classifiers

155: exist~\cite{p16}.

156:

157: Clustering is the unsupervised classification of patterns~\cite{p4}

158: (observations, data items or feature vectors) into groups called clusters.

159: Clustering is useful in several exploratory pattern analysis, grouping, decision

160: making and machine learning situations including data mining, document

161: retrieval, image segmentation and pattern classification.

162:

163: Self-Organising Neural Networks~\cite{p2,p3,p6,p7,p8} are often used to cluster

164: input data. Similar patterns are grouped by the network and are represented by a

165: single unit. This grouping is done automatically on the basis of data

166: correlations. Well-known examples of Self-Organising Artificial Neural Networks

167: (ANN) used for clustering include Kohonen's self-organising maps,

168: Self-Organising Tree Algorithm (SOTA), Growing Cell Structures (GCS).

169:

170: In our prototype, Self-Organizing Maps (SOM) were used.

171:

172:

173: \section {Research Perspectives}

174:

175: One promising area where the potential of self-organizing networks has not been

176: fully exploited is certainly data mining and knowledge discovery. Clustering

177: huge data sets without knowing in advance the number of clusters is something

178: such strategy should excel at.

179:

180: Making hybrid neural networks (combining various self -organizing networks) can

181: result in an efficient clustering.

182:

183: Visualisation has an important role in cluster analysis . Advanced Visualisation

184: techniques~\cite{p15} such as Galaxies, Correlation Tool, OmniViz Pro,

185: Hypercube, play an important role in analyzing clusters. Integrating these

186: techniques with neural networks can provide interesting results.

187:

188: GRB classification~\cite{p16} could be an case study to use as a

189: benchmark. Possible applications could be tested on data sets from the GRB

190: catalogs, for example using light curves or band-spectral parameters.

191:

192: Separation of gamma from hadrons is another important and difficult problem in

193: Gamma-Ray experiments. The classification problem has been addressed with

194: supervised neural networks. The network separation is based on the study of

195: simulated data. It is very likely that severe adjustments have to be

196: made to the simulation to better reflect the data, and the network training has

197: to be redone with the improved simulation. The disadvantage of this approach is

198: the output ambiguity and the network should be refined constantly to improve the

199: separation of the output. Applying Self-Organizing Networks would be useful as the

200: classification could be automatic and model-independent.

201:

202: The final research perspective is a library of Science Tools for AstroParticle

203: Physics.  Such library should include tools for data mining, tools for

204: optimizing the features selection (physical characteristics which can

205: be extracted from different detectors, in particular GLAST, MAGIC, and X-ray

206: detectors like INTEGRAL, CHANDRA, SWIFT), and a powerful visualization package.

207:

208:

209: \begin{thebibliography}{99}

210:

211:

212: \bibitem{del92} DELPHI Coll., Phys. Lett. B295 (1992) 383;\\

213:   L. Lonnblad, C. Petersen and T. Rognvaldsson, Nucl. Phys. B349 (1991) 675;\\

214:   C. Bortolotto, A. De Angelis and L. Lanceri, Nucl. Instr. and Methods A306

215:   (1991) 457.

216:

217: \bibitem{praveen} P. Boinee, A. De Angelis, E. Milotti, ``Automatic

218:   Classification using Self-Organizing Neural Networks in Astrophysical

219:   Experiments'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti

220:   (eds.): Proc. of "Science with the New Generation of High Energy Gamma-ray

221:   Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p.177,

222:   arXiv:cs.NE/0307031.

223:

224: \bibitem{ciccia} C. Cecchi, F. Marcucci, G. Tosti,``An application of the

225:   Independent Component Analysis methodology to gamma ray astrophysical

226:   imaging'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti (eds.):

227:   Proc. of "Science with the New Generation of High Energy Gamma-ray

228:   Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p. 168,

229:   astro-ph/0306563.

230:

231: \bibitem{p2} T.~Kohonen, ``Self-Organizing Maps'', Springer, Berlin (1995).

232:

233: \bibitem{p3} B.~Fritzke, ``Growing self-organizing networks - why?'', ESANN,

234:   Bruges (1996).

235:

236: \bibitem{p4} A.~K.~Jain, R.~C.~Dubes, ``Algorithms for Clustering Data'',

237:   Prentice Hall, Englewood Cliffs, New Jersey (1988)

238: \bibitem{p6} B.~Fritzke, ``Kohonen feature maps and growing cell structures - a

239:   performance comparison'', NIPS, Denver (1992).

240:

241: \bibitem{p7} B.~Fritzke, ``Unsupervised clustering with growing cell

242:   structures'', IJCNN, Seattle (1991).

243:

244: \bibitem{p8} B.~Fritzke, ``Growing self-organizing networks - history, status

245:   quo, and perspectives'', in ``Kohonen Maps'', Proceedings of WSOM-99, eds. E.

246:   Oja {\it et al.}, Elsevier (1999).

247:

248: \bibitem{p16} H.~J.~Rajaneimi, P.~Mahonen, ``Classifying GRB using SOM'',

249:   Astrophys. J. {566} (2001) 202.

250:

251: \bibitem{p1} http://www.batse.msfc.nasa.gov/batse/grb/catalog/current/

252:

253: \bibitem{frailis} M. Frailis, A. De Angelis, V. Roberto, "Data Management and

254:   Mining in Astrophysical Databases", in: S. Ciprini, A. De Angelis, P. Lubrano

255:   and O. Mansutti (eds): Proc. of "Science with the New Generation of High

256:   Energy Gamma-ray Experiments" (Perugia, Italy, May 2003), p. 157

257:   [arXiv:cs.DB/0307032]

258:

259: \bibitem{p15} SPIRE, http://www.pnl.gov/infoviz/spire/spire.html

260:

261: \bibitem{glast} http://glast.gsfc.nasa.gov/

262:

263: \bibitem{magic} http://hegra1.mppmu.mpg.de/MAGICWeb/

264:

265: \bibitem{grb} http://www.batse.msfc.nasa.gov/batse/grb/

266:

267: \end{thebibliography}

268:

269: \end{document}

270:

271: