cs0402014/cs0402014
1: \documentclass{elsart}
2: 
3: % Use the option doublespacing or reviewcopy to obtain double line spacing
4: % \documentclass[doublespacing]{elsart}
5: 
6: % if you use PostScript figures in your article
7: % use the graphics package for simple commands
8: % \usepackage{graphics}
9: % or use the graphicx package for more complicated commands
10: % \usepackage{graphicx}
11: % or use the epsfig package if you prefer to use the old commands
12: % \usepackage{epsfig}
13: 
14: % The amssymb package provides various useful mathematical symbols
15: 
16: \usepackage{amsmath,amsfonts,amssymb}
17: \usepackage{latexsym}
18: 
19: \begin{document}
20: 
21: \begin{frontmatter}
22:   \title{Self-Organising Networks for Classification: developing Applications to
23:     Science Analysis for Astroparticle Physics} 
24:   \author{A.~De Angelis,
25:     P.~Boinee,~M. Frailis, E.~Milotti}   
26:   \address{Dipartimento di Fisica
27:     dell'Universit\`a di~Udine and INFN Trieste, via delle Scienze~208,
28:     I-33100~Udine, Italy}
29: 
30: 
31: \begin{abstract}
32:   Physics analysis in astroparticle experiments requires the capability of
33:   recognizing new phenomena; in order to establish what is new, it is important
34:   to develop tools for automatic classification, able to compare the final
35:   result with data from different detectors.  A typical example is the problem
36:   of Gamma Ray Burst detection, classification, and possible association to
37:   known sources: for this task physicists will need in the next years tools to
38:   associate data from optical databases, from satellite experiments (EGRET,
39:   GLAST), and from Cherenkov telescopes (MAGIC, HESS, CANGAROO, VERITAS).
40: \end{abstract}
41: 
42: % Title, authors and addresses
43: 
44: % use the thanksref command within \title, \author or \address for footnotes;
45: % use the corauthref command within \author for corresponding author footnotes;
46: % use the ead command for the email address,
47: % and the form \ead[url] for the home page:
48: % \title{Title\thanksref{label1}}
49: % \thanks[label1]{}
50: % \author{Name\corauthref{cor1}\thanksref{label2}}
51: % \ead{email address}
52: % \ead[url]{home page}
53: % \thanks[label2]{}
54: % \corauth[cor1]{}
55: % \address{Address\thanksref{label3}}
56: % \thanks[label3]{}
57: 
58: % use optional labels to link authors explicitly to addresses:
59: % \author[label1,label2]{}
60: % \address[label1]{}
61: % \address[label2]{}
62: 
63: \begin{keyword}
64: SOM \sep Classification \sep Clustering \sep GRB
65: % keywords here, in the form: keyword \sep keyword
66: 
67: % PACS codes here, in the form: \PACS code \sep code
68:   \PACS
69: \end{keyword}
70: \end{frontmatter}
71: 
72: % main text
73: \section{Introduction}
74: Clustering of features is an important problem in many physics experiments. Such
75: an analysis task can be performed:
76: \begin{itemize}
77: \item in a supervised way, when the analyst has some examples, for which the
78:   correct classification is known. This can be done, for example, in most
79:   problems related to particle physics at accelerators, where there is a
80:   generally good knowledge of detectors and of the underlying physics, and good
81:   simulations are available.
82: \item in an unsupervised way, when the events are partitioned into classes of
83:   similar elements, without using additional information. This is the case
84:   especially for fields operating in a discovery regime, as, e.g., astroparticle
85:   physics.
86: \end{itemize}
87: 
88: The idea of automatic classification is not new in particle and astroparticle
89: physics.  Cleaning up the signal and separating concurrent signals when
90: nonlinear effects and high-order correlations are important is a standard in
91: particle physics since the analysis of the branching fraction of the Z boson
92: into $b\bar{b}$ pairs by DELPHI \cite{del92}.
93: 
94: An important literature exists for the use of automatic classifiers in
95: astroparticle physics (see for example \cite{praveen} and references therein).
96: Such a classification was mostly done with the use of Multilayer Perceptrons,
97: while the bulk of the works based on unsupervised classification uses
98: Independent Component Analysis (see for example \cite{ciccia} and references
99: therein).  Studies based on Self-Organized Maps and Growing Self-Organizing
100: Networks \cite{p2,p3,p6,p7,p8} have recently started \cite{p16}, but a general
101: framework for multiwavelength classification is still missing.
102: 
103: 
104: \section{A case study in astroparticle physics}
105: Gamma-ray astroparticle physics is a relatively new science; it has as a
106: counterpart optical astrophysics, one of the oldest sciences.  Many of the
107: objects we observe in the gamma sky, sensitive to the phenomena of high-energy
108: physics, have an optical counterpart or clear relations to optical objects.
109: Finding what is a signature of a new phenomenon requires the ability to classify
110: observations, and the ability to recognize what is not new.
111: 
112: Astrophysical databases contain large amounts of data; one example is given by
113: the growing number of experiments studying Gamma Ray Bursts (GRBs). Data sets
114: can be found in several archives (see e.g.~Ref. \cite{p1}).
115: 
116: Large datasets are available from systematic sky surveys. The size of such
117: databases is now of the order of $10^{12}$ bytes, but in the near future it will
118: grow by three orders of magnitude thanks to the technological development of
119: telescopes and detectors.  Surveys are done on a wide energy range (from
120: $10^{-7}$ to $10^{14}$ eV), and they are heterogeneous (mission-oriented,
121: platform and instrument dependent). The attributes registered are variable
122: (polarization etc.); numerical simulations have to be matched to real data.
123: 
124: Such a complexity poses nontrivial data management issues (see \cite{frailis});
125: moreover, we need uniform interfaces to access complex data.  A few projects
126: started in the last years with the simple purpose of making the data readable.
127: 
128: \section{The project at the University of Udine}
129: 
130: At the University of Udine we are developing a project involving data
131: organization, data mining and analysis tools for the analysis of gamma sources
132: (Gamma Ray Bursts in particular: most of the EGRET sources were unidentified).
133: 
134: The sources detected by GLAST \cite{glast} and MAGIC \cite{magic} will be
135: compared with existing databases to detect what is new. What is new can be then
136: classified based on an unsupervised classifier.
137: 
138: Another important analysis tool is a powerful visualization package: the idea is
139: to visually present many variables together offering a degree of control over a
140: number of different visual properties.  High dimensionality of data set and
141: visual properties such as color, size can be added to the position property for
142: proper visualization purposes.  Multiple views can be used by linking all
143: separate views together when the use of these properties makes it difficult.
144: 
145: 
146: \subsection{Classification of GRBs}
147: 
148: The kernel of the analysis is the strategy for the classification. With the
149: growing number of experiments dedicated to GRBs \cite{grb} it is essential to
150: optimize the techniques for the complex task of classification. Artificial
151: Intelligence- (AI-) based pattern recognition algorithms are one possible
152: candidate: automated linear classification of vector data into a given number
153: (or an arbitrary number) of classes is a well established technique in the field
154: of machine learning.  Several varieties of AI-based classifiers
155: exist~\cite{p16}.
156: 
157: Clustering is the unsupervised classification of patterns~\cite{p4}
158: (observations, data items or feature vectors) into groups called clusters.
159: Clustering is useful in several exploratory pattern analysis, grouping, decision
160: making and machine learning situations including data mining, document
161: retrieval, image segmentation and pattern classification.
162: 
163: Self-Organising Neural Networks~\cite{p2,p3,p6,p7,p8} are often used to cluster
164: input data. Similar patterns are grouped by the network and are represented by a
165: single unit. This grouping is done automatically on the basis of data
166: correlations. Well-known examples of Self-Organising Artificial Neural Networks
167: (ANN) used for clustering include Kohonen's self-organising maps,
168: Self-Organising Tree Algorithm (SOTA), Growing Cell Structures (GCS).
169: 
170: In our prototype, Self-Organizing Maps (SOM) were used.
171: 
172: 
173: \section {Research Perspectives}
174: 
175: One promising area where the potential of self-organizing networks has not been
176: fully exploited is certainly data mining and knowledge discovery. Clustering
177: huge data sets without knowing in advance the number of clusters is something
178: such strategy should excel at.
179:  
180: Making hybrid neural networks (combining various self -organizing networks) can
181: result in an efficient clustering.
182: 
183: Visualisation has an important role in cluster analysis . Advanced Visualisation
184: techniques~\cite{p15} such as Galaxies, Correlation Tool, OmniViz Pro,
185: Hypercube, play an important role in analyzing clusters. Integrating these
186: techniques with neural networks can provide interesting results.
187: 
188: GRB classification~\cite{p16} could be an case study to use as a
189: benchmark. Possible applications could be tested on data sets from the GRB
190: catalogs, for example using light curves or band-spectral parameters.
191: 
192: Separation of gamma from hadrons is another important and difficult problem in
193: Gamma-Ray experiments. The classification problem has been addressed with
194: supervised neural networks. The network separation is based on the study of
195: simulated data. It is very likely that severe adjustments have to be
196: made to the simulation to better reflect the data, and the network training has
197: to be redone with the improved simulation. The disadvantage of this approach is
198: the output ambiguity and the network should be refined constantly to improve the 
199: separation of the output. Applying Self-Organizing Networks would be useful as the
200: classification could be automatic and model-independent.
201: 
202: The final research perspective is a library of Science Tools for AstroParticle
203: Physics.  Such library should include tools for data mining, tools for
204: optimizing the features selection (physical characteristics which can
205: be extracted from different detectors, in particular GLAST, MAGIC, and X-ray
206: detectors like INTEGRAL, CHANDRA, SWIFT), and a powerful visualization package.
207: 
208: 
209: \begin{thebibliography}{99}
210:   
211: 
212: \bibitem{del92} DELPHI Coll., Phys. Lett. B295 (1992) 383;\\
213:   L. Lonnblad, C. Petersen and T. Rognvaldsson, Nucl. Phys. B349 (1991) 675;\\
214:   C. Bortolotto, A. De Angelis and L. Lanceri, Nucl. Instr. and Methods A306
215:   (1991) 457.
216:   
217: \bibitem{praveen} P. Boinee, A. De Angelis, E. Milotti, ``Automatic
218:   Classification using Self-Organizing Neural Networks in Astrophysical
219:   Experiments'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti
220:   (eds.): Proc. of "Science with the New Generation of High Energy Gamma-ray
221:   Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p.177,
222:   arXiv:cs.NE/0307031.
223:   
224: \bibitem{ciccia} C. Cecchi, F. Marcucci, G. Tosti,``An application of the
225:   Independent Component Analysis methodology to gamma ray astrophysical
226:   imaging'', in: S. Ciprini, A. De Angelis, P. Lubrano and O. Mansutti (eds.):
227:   Proc. of "Science with the New Generation of High Energy Gamma-ray
228:   Experiments" (Perugia, Italy, May 2003). Forum, Udine 2003, p. 168,
229:   astro-ph/0306563.
230:   
231: \bibitem{p2} T.~Kohonen, ``Self-Organizing Maps'', Springer, Berlin (1995).
232:   
233: \bibitem{p3} B.~Fritzke, ``Growing self-organizing networks - why?'', ESANN,
234:   Bruges (1996).
235: 
236: \bibitem{p4} A.~K.~Jain, R.~C.~Dubes, ``Algorithms for Clustering Data'',
237:   Prentice Hall, Englewood Cliffs, New Jersey (1988) 
238: \bibitem{p6} B.~Fritzke, ``Kohonen feature maps and growing cell structures - a
239:   performance comparison'', NIPS, Denver (1992).
240:   
241: \bibitem{p7} B.~Fritzke, ``Unsupervised clustering with growing cell
242:   structures'', IJCNN, Seattle (1991).
243:   
244: \bibitem{p8} B.~Fritzke, ``Growing self-organizing networks - history, status
245:   quo, and perspectives'', in ``Kohonen Maps'', Proceedings of WSOM-99, eds. E.
246:   Oja {\it et al.}, Elsevier (1999).
247:   
248: \bibitem{p16} H.~J.~Rajaneimi, P.~Mahonen, ``Classifying GRB using SOM'',
249:   Astrophys. J. {566} (2001) 202.
250:   
251: \bibitem{p1} http://www.batse.msfc.nasa.gov/batse/grb/catalog/current/
252:   
253: \bibitem{frailis} M. Frailis, A. De Angelis, V. Roberto, "Data Management and
254:   Mining in Astrophysical Databases", in: S. Ciprini, A. De Angelis, P. Lubrano
255:   and O. Mansutti (eds): Proc. of "Science with the New Generation of High
256:   Energy Gamma-ray Experiments" (Perugia, Italy, May 2003), p. 157
257:   [arXiv:cs.DB/0307032]
258:   
259: \bibitem{p15} SPIRE, http://www.pnl.gov/infoviz/spire/spire.html
260: 
261: \bibitem{glast} http://glast.gsfc.nasa.gov/
262:   
263: \bibitem{magic} http://hegra1.mppmu.mpg.de/MAGICWeb/
264:   
265: \bibitem{grb} http://www.batse.msfc.nasa.gov/batse/grb/
266: 
267: \end{thebibliography}
268: 
269: \end{document}
270: 
271: