1: \documentclass{elsart}
2:
3: % Use the option doublespacing or reviewcopy to obtain double line spacing
4: % \documentclass[doublespacing]{elsart}
5:
6: % if you use PostScript figures in your article
7: % use the graphics package for simple commands
8: % \usepackage{graphics}
9: % or use the graphicx package for more complicated commands
10: % \usepackage{graphicx}
11: % or use the epsfig package if you prefer to use the old commands
12: % \usepackage{epsfig}
13:
14: % The amssymb package provides various useful mathematical symbols
15:
16: \usepackage{amsmath,amsfonts,amssymb}
17: \usepackage{latexsym}
18:
19: \begin{document}
20:
21: \begin{frontmatter}
22: \title{Self-Organising Networks for Classification: developing Applications to
23: Science Analysis for Astroparticle Physics}
24: \author{A.~De Angelis,
25: P.~Boinee,~M. Frailis, E.~Milotti}
26: \address{Dipartimento di Fisica
27: dell'Universit\`a di~Udine and INFN Trieste, via delle Scienze~208,
28: I-33100~Udine, Italy}
29:
30:
31: \begin{abstract}
32: Physics analysis in astroparticle experiments requires the capability of
33: recognizing new phenomena; in order to establish what is new, it is important
34: to develop tools for automatic classification, able to compare the final
35: result with data from different detectors. A typical example is the problem
36: of Gamma Ray Burst detection, classification, and possible association to
37: known sources: for this task physicists will need in the next years tools to
38: associate data from optical databases, from satellite experiments (EGRET,
39: GLAST), and from Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS).
40: \end{abstract}
41:
42: % Title, authors and addresses
43:
44: % use the thanksref command within \title, \author or \address for footnotes;
45: % use the corauthref command within \author for corresponding author footnotes;
46: % use the ead command for the email address,
47: % and the form \ead[url] for the home page:
48: % \title{Title\thanksref{label1}}
49: % \thanks[label1]{}
50: % \author{Name\corauthref{cor1}\thanksref{label2}}
51: % \ead{email address}
52: % \ead[url]{home page}
53: % \thanks[label2]{}
54: % \corauth[cor1]{}
55: % \address{Address\thanksref{label3}}
56: % \thanks[label3]{}
57:
58: % use optional labels to link authors explicitly to addresses:
59: % \author[label1,label2]{}
60: % \address[label1]{}
61: % \address[label2]{}
62:
63: \begin{keyword}
64: SOM \sep Classification \sep Clustering \sep GRB
65: % keywords here, in the form: keyword \sep keyword
66:
67: % PACS codes here, in the form: \PACS code \sep code
68: \PACS
69: \end{keyword}
70: \end{frontmatter}
71:
72: % main text
73: \section{Introduction}
74: Clustering of features is an important problem in many physics experiments. Such
75: an analysis task can be performed:
76: \begin{itemize}
77: \item in a supervised way, when the analyst has some examples, for which the
78: correct classification is known. This can be done, for example, in most
79: problems related to particle physics at accelerators, where there is a
80: generally good knowledge of detectors and of the underlying physics, and good
81: simulations are available.
82: \item in an unsupervised way, when the events are partitioned into classes of
83: similar elements, without using additional information. This is the case
84: especially for fields operating in a discovery regime, as, e.g., astroparticle
85: physics.
86: \end{itemize}
87:
88: The idea of automatic classification is not new in particle and astroparticle
89: physics. Cleaning up the signal and separating concurrent signals when
90: nonlinear effects and high-order correlations are important is a standard in
91: particle physics since the analysis of the branching fraction of the Z boson
92: into $b\bar{b}$ pairs by DELPHI \cite{del92}.
93:
94: An important literature exists for the use of automatic classifiers in
95: astroparticle physics (see for example \cite{praveen} and references therein).
96: Such a classification was mostly done with the use of Multilayer Perceptrons,
97: while the bulk of the works based on unsupervised classification uses
98: Independent Component Analysis (see for example \cite{ciccia} and references
99: therein). Studies based on Self-Organized Maps and Growing Self-Organizing
100: Networks \cite{p2,p3,p6,p7,p8} have recently started \cite{p16}, but a general
101: framework for multiwavelength classification is still missing.
102:
103:
104: \section{A case study in astroparticle physics}
105: Gamma-ray astroparticle physics is a relatively new science; it has as a
106: counterpart optical astrophysics, one of the oldest sciences. Many of the
107: objects we observe in the gamma sky, sensitive to the phenomena of high-energy
108: physics, have an optical counterpart or clear relations to optical objects.
109: Finding what is a signature of a new phenomenon requires the ability to classify
110: observations, and the ability to recognize what is not new.
111:
112: Astrophysical databases contain large amounts of data; one example is given by
113: the growing number of experiments studying Gamma Ray Bursts (GRBs). Data sets
114: can be found in several archives (see e.g.~Ref. \cite{p1}).
115:
116: Large datasets are available from systematic sky surveys. The size of such
117: databases is now of the order of $10^{12}$ bytes, but in the near future it will
118: grow by three orders of magnitude thanks to the technological development of
119: telescopes and detectors. Surveys are done on a wide energy range (from
120: $10^{-7}$ to $10^{14}$ eV), and they are heterogeneous (mission-oriented,
121: platform and instrument dependent). The attributes registered are variable
122: (polarization etc.); numerical simulations have to be matched to real data.
123:
124: Such a complexity poses nontrivial data management issues (see \cite{frailis});
125: moreover, we need uniform interfaces to access complex data. A few projects
126: started in the last years with the simple purpose of making the data readable.
127:
128: \section{The project at the University of Udine}
129:
130: At the University of Udine we are developing a project involving data
131: organization, data mining and analysis tools for the analysis of gamma sources
132: (Gamma Ray Bursts in particular: most of the EGRET sources were unidentified).
133:
134: The sources detected by GLAST \cite{glast} and MAGIC \cite{magic} will be
135: compared with existing databases to detect what is new. What is new can be then
136: classified based on an unsupervised classifier.
137:
138: Another important analysis tool is a powerful visualization package: the idea is
139: to visually present many variables together offering a degree of control over a
140: number of different visual properties. High dimensionality of data set and
141: visual properties such as color, size can be added to the position property for
142: proper visualization purposes. Multiple views can be used by linking all
143: separate views together when the use of these properties makes it difficult.
144:
145:
146: \subsection{Classification of GRBs}
147:
148: The kernel of the analysis is the strategy for the classification. With the
149: growing number of experiments dedicated to GRBs \cite{grb} it is essential to
150: optimize the techniques for the complex task of classification. Artificial
151: Intelligence- (AI-) based pattern recognition algorithms are one possible
152: candidate: automated linear classification of vector data into a given number
153: (or an arbitrary number) of classes is a well established technique in the field
154: of machine learning. Several varieties of AI-based classifiers
155: exist~\cite{p16}.
156:
157: Clustering is the unsupervised classification of patterns~\cite{p4}
158: (observations, data items or feature vectors) into groups called clusters.
159: Clustering is useful in several exploratory pattern analysis, grouping, decision
160: making and machine learning situations including data mining, document
161: retrieval, image segmentation and pattern classification.
162:
163: Self-Organising Neural Networks~\cite{p2,p3,p6,p7,p8} are often used to cluster
164: input data. Similar patterns are grouped by the network and are represented by a
165: single unit. This grouping is done automatically on the basis of data
166: correlations. Well-known examples of Self-Organising Artificial Neural Networks
167: (ANN) used for clustering include Kohonen's self-organising maps,
168: Self-Organising Tree Algorithm (SOTA), Growing Cell Structures (GCS).
169:
170: In our prototype, Self-Organizing Maps (SOM) were used.
171:
172:
173: \section {Research Perspectives}
174:
175: One promising area where the potential of self-organizing networks has not been
176: fully exploited is certainly data mining and knowledge discovery. Clustering
177: huge data sets without knowing in advance the number of clusters is something
178: such strategy should excel at.
179:
180: Making hybrid neural networks (combining various self -organizing networks) can
181: result in an efficient clustering.
182:
183: Visualisation has an important role in cluster analysis . Advanced Visualisation
184: techniques~\cite{p15} such as Galaxies, Correlation Tool, OmniViz Pro,
185: Hypercube, play an important role in analyzing clusters. Integrating these
186: techniques with neural networks can provide interesting results.
187:
188: GRB classification~\cite{p16} could be an case study to use as a
189: benchmark. Possible applications could be tested on data sets from the GRB
190: catalogs, for example using light curves or band-spectral parameters.
191:
192: Separation of gamma from hadrons is another important and difficult problem in
193: Gamma-Ray experiments. The classification problem has been addressed with
194: supervised neural networks. The network separation is based on the study of
195: simulated data. It is very likely that severe adjustments have to be
196: made to the simulation to better reflect the data, and the network training has
197: to be redone with the improved simulation. The disadvantage of this approach is
198: the output ambiguity and the network should be refined constantly to improve the
199: separation of the output. Applying Self-Organizing Networks would be useful as the
200: classification could be automatic and model-independent.
201:
202: The final research perspective is a library of Science Tools for AstroParticle
203: Physics. Such library should include tools for data mining, tools for
204: optimizing the features selection (physical characteristics which can
205: be extracted from different detectors, in particular GLAST, MAGIC, and X-ray
206: detectors like INTEGRAL, CHANDRA, SWIFT), and a powerful visualization package.
207:
208:
209: \begin{thebibliography}{99}
210:
211:
212: \bibitem{del92} DELPHI Coll., Phys. Lett. B295 (1992) 383;\\
213: L. Lonnblad, C. Petersen and T. Rognvaldsson, Nucl. Phys. B349 (1991) 675;\\
214: C. Bortolotto, A. De Angelis and L. Lanceri, Nucl. Instr. and Methods A306
215: (1991) 457.
216:
217: \bibitem{praveen} P. Boinee, A. De Angelis, E. Milotti, ``Automatic
218: Classification using Self-Organizing Neural Networks in Astrophysical
219: Experiments'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti
220: (eds.): Proc. of "Science with the New Generation of High Energy Gamma-ray
221: Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p.177,
222: arXiv:cs.NE/0307031.
223:
224: \bibitem{ciccia} C. Cecchi, F. Marcucci, G. Tosti,``An application of the
225: Independent Component Analysis methodology to gamma ray astrophysical
226: imaging'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti (eds.):
227: Proc. of "Science with the New Generation of High Energy Gamma-ray
228: Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p. 168,
229: astro-ph/0306563.
230:
231: \bibitem{p2} T.~Kohonen, ``Self-Organizing Maps'', Springer, Berlin (1995).
232:
233: \bibitem{p3} B.~Fritzke, ``Growing self-organizing networks - why?'', ESANN,
234: Bruges (1996).
235:
236: \bibitem{p4} A.~K.~Jain, R.~C.~Dubes, ``Algorithms for Clustering Data'',
237: Prentice Hall, Englewood Cliffs, New Jersey (1988)
238: \bibitem{p6} B.~Fritzke, ``Kohonen feature maps and growing cell structures - a
239: performance comparison'', NIPS, Denver (1992).
240:
241: \bibitem{p7} B.~Fritzke, ``Unsupervised clustering with growing cell
242: structures'', IJCNN, Seattle (1991).
243:
244: \bibitem{p8} B.~Fritzke, ``Growing self-organizing networks - history, status
245: quo, and perspectives'', in ``Kohonen Maps'', Proceedings of WSOM-99, eds. E.
246: Oja {\it et al.}, Elsevier (1999).
247:
248: \bibitem{p16} H.~J.~Rajaneimi, P.~Mahonen, ``Classifying GRB using SOM'',
249: Astrophys. J. {566} (2001) 202.
250:
251: \bibitem{p1} http://www.batse.msfc.nasa.gov/batse/grb/catalog/current/
252:
253: \bibitem{frailis} M. Frailis, A. De Angelis, V. Roberto, "Data Management and
254: Mining in Astrophysical Databases", in: S. Ciprini, A. De Angelis, P. Lubrano
255: and O. Mansutti (eds): Proc. of "Science with the New Generation of High
256: Energy Gamma-ray Experiments" (Perugia, Italy, May 2003), p. 157
257: [arXiv:cs.DB/0307032]
258:
259: \bibitem{p15} SPIRE, http://www.pnl.gov/infoviz/spire/spire.html
260:
261: \bibitem{glast} http://glast.gsfc.nasa.gov/
262:
263: \bibitem{magic} http://hegra1.mppmu.mpg.de/MAGICWeb/
264:
265: \bibitem{grb} http://www.batse.msfc.nasa.gov/batse/grb/
266:
267: \end{thebibliography}
268:
269: \end{document}
270:
271: