nucl-th0506080/li.tex
1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: %%                                                                       %%
3: %%                                                                       %%
4: %%     Plain-TeX Template for Camera-Ready Manuscript Preparation        %%
5: %%                                                                       %%
6: %%              CMT28 (St. Louis) Workshop Proceedings                   %%
7: %%                                                                       %%
8: %%                Vol. 20, Condensed Matter Theories                     %%
9: %%                                                                       %%
10: %%                     Nova Science Publishers                           %% 
11: %%                                                                       %%
12: %%                                                                       %% 
13: %%               (Prepared by J. W. Clark, October 2002)                 %%
14: %%                                                                       %%
15: %%                                                                       %%
16: %%                                                                       %%
17: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
18: %%
19: %%    Note:  This template is actually my own joint paper for the
20: %%           Vanderbilt (CMT22) proceedings (with unrelated inserts
21: %%           that show how to do two tables & and two figures).  
22: %%
23: %%           The template  includes all the essential elements
24: %%           such as title and by-lines, section headings and
25: %%           associated spacing specifications, figure captions 
26: %%           & commands for embedded figures, references of all 
27: %%           types, and a good sampling of fairly complicated 
28: %%           equations.  If there are remaining ambiguities, please 
29: %%           question me by e-mail at:
30: %%
31: %%                          jwc@wustl.edu
32: %%
33: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
34: %%
35: %%   IF YOU CHOOSE NOT TO USE THIS TEMPLATE, BUT INSTEAD USE ANOTHER
36: %%   WORD PROCESSING SCHEME (LaTeX, Word, etc.), YOUR TROUBLES ARE
37: %%   STILL NOT OVER, SINCE YOU WILL STILL NEED TO MATCH -- AS EXACTLY 
38: %%   AS POSSIBLE -- THE plain-TeX OUTPUT FROM THIS TEMPLATE, WITH
39: %%   RESPECT TO ALL ASPECTS OF THE FORMAT, INCLUDING FONTS, SPACINGS,
40: %%   TABLES, FIGURES, REFERENCES, ETC.  CURRENT STANDARDS OF PUBLICATION
41: %%   REQUIRE A UNIFORM APPEARANCE FOR CONTRIBUTIONS TO CONFERENCE 
42: %%   PROCEEDINGS VOLUMES.  This can be very time-consuming.
43: %%
44: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
45: %%
46: %%     In addition to this template, you will need to put the following
47: %%     files (provided!):
48: %%
49: %%     tables.tex - needed to construct tables 
50: %%     psfig.sty - needed to embed postscript figures
51: %%     fig1.ps & fig2.eps -  the sample postscript figures to be
52: %%               embedded in the paper - replaced these by your 
53: %%               postscript figures as required
54: %%
55: %%     You must put these files in the same directory as the main TeX
56: %%     file of your paper.
57: %%
58: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
59: %%
60: %%    The standard command for generating a ".dvi" file of this TeX
61: %%    file (which is called 26template.tex) is
62: %%
63: %%              tex 26template.tex
64: %%
65: %%    The command for printing out the .dvi file, 26template.dvi, on your
66: %%    laser printer will generally be site-dependent, but what basically
67: %%    needs to be done is to convert the .dvi file to a .ps (postscript)   
68: %%    file, say, with the dvips command
69: %%  
70: %%              dvips -o 26template.ps 26template.dvi
71: %%
72: %%    and then print with
73: %%      
74: %%              lpr -Pprintername 26template.ps
75: %%
76: %%
77: %%    The typefont of the paper should be "computer modern".  This
78: %%    is the default font in plain TeX.  If your system is set up
79: %%    for another font, please switch back to the computer modern
80: %%    (cm) fonts.
81: %%
82: %%    Unfortunately, the same TeX file may not print out identically
83: %%    at all sites.  Therefore you may find it necessary in same cases
84: %%    to make minor adjustments of page and line breaks.
85: %%
86: %%    Examples of page-break and line-break adjustments are given in 
87: %%    this template, but it is suggested that you not worry about such 
88: %%    details until the need actually arises, as indicated by your TeX
89: %%    output.
90: %%
91: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
92: %%
93: %%  Comment lines, which begin with the percent symbol %, are not printed.
94: %%  To make a printable % sign in text, use \%.  
95: %%
96: %%  Style comment:  American, not English, convention is to be followed 
97: %%  for a list of items or names, e.g. red, blue, and green; Peter, Paul,
98: %%  and Mary -- there is a comma before "and".  Also follow American 
99: %%  conventions for spelling.
100: %%
101: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
102: %%
103: %%  As an aid to those who received travel support from the U.S. Army
104: %%  research office, a suitable statement to this effect is included 
105: %%  in the ACKNOWLEDGMENTS section of the template. 
106: %%
107: %%
108: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
109: %%
110: %%
111: %%The following three lines specify type size, dimensions of text (23.5cm 
112: %%by 15.5cm), and line spacing.  The spacing is slightly looser than
113: %%single space (which would be \baselineskip = 12 truept) to allow room 
114: %%for embedded symbols with superscripts and subscripts.  
115: \magnification=\magstep1 
116: \font\bigbfont=cmbx10 scaled\magstep1
117: \font\bigifont=cmti10 scaled\magstep1
118: \font\bigrfont=cmr10 scaled\magstep1
119: \vsize = 23.5 truecm
120: \hsize = 15.5 truecm
121: \hoffset = .2truein
122: \baselineskip = 14 truept
123: \overfullrule = 0pt
124: \parskip = 3 truept
125: \def\frac#1#2{{#1\over#2}}
126: \def\diff{{\rm d}}
127: \def\bfk{{\bf k}}
128: \def\eps{\epsilon}
129: \nopagenumbers
130: %%This command suppresses the printing of page numbers.
131: %%You should number the pages with blue pencil in upper right corner 
132: %%if you send camera-ready copy.  Of course, submission by email
133: %%to cmt28@wuphys.wustl.edu is preferred!!
134: %%
135: %%THE FOLLOWING THREE COMMANDS LEAVE SOME SPACE AT THE TOP OF THE LEAD PAGE. 
136: %%(the command "\vskip 4 truecm" actually results in about 4.5 cm of empty
137: %%space at the top, or about 19.6%).  The publisher will probably reset the 
138: %%chapter heading (your title and by-line), but you should follow my 
139: %%19-20% prescription anyway!  In my design I am following the Les Houches 
140: %%lecture notes volume produced by Nova.   If you have LOTS of authors and 
141: %%by-lines you may want to allow a bit less space at the top (e.g. if you 
142: %%have 3 or more sets of authors and institutions).
143: \topinsert
144: \vskip 3.2 truecm
145: \endinsert
146: \centerline{\bigbfont MODELING NUCLEAR PROPERTIES} 
147: %%If your title is longer than one line, continue thus:
148: \vskip 8 truept
149: \centerline{\bigbfont WITH SUPPORT VECTOR MACHINES}
150: %%Don't forget to remove the % signs from the 2 preceding lines if you
151: %%use them to lengthen the title!
152: \vskip 20 truept
153: %%Now comes your by-line with institutional addresses.  
154: \centerline{\bigifont H. Li and J. W. Clark}
155: \vskip 8truept
156: \centerline{\bigrfont McDonnell Center for the Space Sciences 
157: and Department of Physics}
158: \vskip 2 truept
159: \centerline{\bigrfont Washington University, St. Louis, Missouri 63130, USA} 
160: %%In case of multiple institutions, use the following lines, iterated
161: %%as necessary. 
162: \vskip 11 truept
163: \centerline{\bigifont E. Mavrommatis and S.~Athanassopoulos}
164: \vskip 8 truept
165: \centerline{\bigrfont Physics Department, Division of Nuclear \&
166: Particle Physics}
167: \vskip 2 truept
168: \centerline{\bigrfont University of Athens, GR-15771 Athens, Greece}
169: \vskip 11 truept
170: \centerline{\bigifont K. A. Gernoth}
171: \vskip 8 truept
172: \centerline{\bigrfont Department of Physics, University of Manchester}
173: \vskip 2 truept
174: \centerline{\bigrfont Manchester M13 9PL, United Kingdom}
175: 
176: \vskip 1.8 truecm
177: 
178: \centerline{\bf 1.  INTRODUCTION}
179: \vskip 12 truept
180: Artificial neural networks and other machine-learning strategies can provide 
181: a valuable complement to theory-driven models of the systematics of nuclear data.
182: A significant effort to exploit the potential of data-driven methodologies
183: receives strong motivation from the current thrust toward experimental and
184: theoretical exploration of nuclei far from stability.  It is made possible
185: by the availability of a growing body excellent experimental data on 
186: nuclear species numbering in the thousands.  In outline, statistical models 
187: based on supervised learning are developed as follows.  Suppose, for
188: example, we wish to predict the atomic mass $M$ of a nuclear species, or
189: nuclide, specifying only its mass number $A$ and atomic number $Z$,
190: or alternatively its proton and neutron numbers $(Z,N)$.  A learning
191: machine has an input interface where $(Z,N)$ are fed to the device in 
192: coded form and an output interface where an estimate of the
193: mass appears for decoding.  In between there is a system or network of 
194: interconnected elements that acts to process the incoming information and 
195: produce an appropriate output.  These processing elements may resemble biological 
196: neurons, receiving signals from other units through weighted connections,
197: and displaying nonlinear response to summed input signals.  Given
198: a body of training data to be used as examples of the desired
199: mapping, in this case $(Z,N) \to M$, a suitable learning algorithm
200: is used to adjust the parameters of the network, e.g., the weights of 
201: the connections between the processing elements, so that the learning
202: machine (i) generates responses at the output interface that reproduce,
203: or closely fit, the atomic masses of the training nuclei, and (ii) serves
204: as a reliable predictor of the masses of test nuclei absent from the
205: training set.  This second requirement is a strong one -- the system
206: should not merely serve as a lookup table for masses of known nuclei;
207: it should also perform well in the much more difficult task of 
208: generalization.
209: 
210: The last two decades have seen much activity and considerable progress
211: in the development and application of supervised learning machines
212: of the type described -- which are designed to learn by example.
213: The most popular implementation is the multilayer feedforward
214: neural network (or multilayer perceptron), taught by the backpropagation
215: learning algorithm in one or another of its many variations [1-3].
216: A significant measure of success has been achieved in constructing
217: global models of nuclear properties based on such neural networks,
218: with applications to atomic masses, neutron separation 
219: energies, spins and parities of nuclear ground states, stability versus 
220: instability, branching ratios for different decay modes, and 
221: beta-decay lifetimes.  (For reviews, see Ref.~[4], and for recent
222: results on atomic-mass prediction, see Ref.~[5].)  
223: 
224: The support vector machine (SVM) [6-8], a principled and powerful approach
225: to problems in classification and nonlinear regression, came on the
226: scene in the 1990s.  It has become a standard tool in statistical
227: modeling, and for many problems it is considered the method of 
228: choice.  We have begun to explore the promise of SVMs for modeling
229: and prediction of nuclear properties.  The first results of this
230: effort are reported here.  
231: 
232: Section 2 provides an introduction to support vector machines and the
233: ANOVA decomposition that facilitates their effective implementation.
234: Section 3 summarizes the results obtained for the atomic-mass problem,
235: and compares the predictive performance of the SVM models with
236: that of multilayer backpropagation networks and state-of-the-art
237: ``theory-thick'' models.  Additional results and comparisons for 
238: beta-decay halflives and for ground-state spins and parities are 
239: presented in Secs.~4 and 5, respectively.  Concluding remarks 
240: are made in Sec.~6.
241: \vskip 28 truept
242: 
243: \centerline{\bf 2.  SUPPORT VECTOR MACHINE AND ANOVA DECOMPOSITION}
244: \vskip 12 truept
245: 
246: The support vector machine (SVM), pioneered by Vapnik [6-8], may be viewed 
247: as an approximate realization of the goal of structural risk minimization
248: [9,3].  Let $({\bf x}_1,y_1),...,({\bf x}_P,y_P)$ be a set of training 
249: data drawn from a function $y=f({\bf x})$.  Here, ${\bf x}$ is the input 
250: variable, a vector of dimension $n$, while $y$ is the output variable, 
251: a unique real number for given ${\bf x}$.  (In the example considered
252: in Sec.~1, ${\bf x}$ is a vector formed from the two components $Z$ and
253: $N$, while $y$ is the mass $M$.)   The support vector machine is based 
254: on a suitable nonlinear mapping ${\bf x} \to \varphi({\bf x})$ from the 
255: input space to a feature space of higher dimension $m > n$.  
256: 
257: Applied to the task of regression, the SVM
258: learning strategy begins by posing an approximation $\hat y$ to the output $y$ 
259: as a linear combination of certain basis functions $\varphi_i({\bf x})$
260: in the feature space, with corresponding linear weights connecting 
261: the feature space to the output space.
262: Thus, 
263: $$
264: {\hat y} = {\hat f}({\bf x},{\bf w}) = \sum_{j=1}^m w_j \varphi_j({\bf x})\,, 
265: \eqno(1)
266: $$
267: where ${\bf w}$ is an $m$-dimensional vector composed of weights
268: $w_j$, $j=1,\ldots,m$.  (A bias term $b$ may be included in Eq.~(1)
269: by starting the sum at $j=0$ and introducing $w_0 \equiv b$ and
270: $\varphi_0({\bf x}) \equiv 1$.) To determine the image vectors $\varphi_j({\bf x})$ 
271: and their weights $w_j$, consider an $\epsilon$-insensitive loss function 
272: defined, for input ${\bf x}$, by 
273: %$$
274: %|y - {\hat f}({\bf x},{\bf w})|_\epsilon  =
275: %=\cases { 0 & if $| y - {\hat f}({\bf x}, {\bf w}) | < \epsilon$ \,, \cr
276: %                    $|y-{\hat f}({\bf x},{\bf w})| - \epsilon$ & otherwise \,, 
277: %\cr}
278: %$$
279: $y - {\hat f}({\bf x},{\bf w}) - \epsilon$ in case the magnitude of the error
280: $y - {\hat f}$ exceeds a tolerance $\epsilon$, and taken zero otherwise.
281: The tolerance parameter $\epsilon$ is at the disposal of the machine's 
282: user.  The {\it primal} optimization problem then becomes one of minimizing the overall loss (or cost function, or empirical risk), as given by the 
283: sum of the individual losses for all the training patterns,
284: $$
285: E_\epsilon ({\bf w}) =  \sum_{i=1}^P \left|y_i-{\hat f}({\bf x}_i,{\bf w})
286: \right|_\epsilon \,, \eqno(2) 
287: $$
288: subject to the inequality $\sum_{j=1}^m w_j^2 < c_0$, where $c_0$ is
289: a user-determined constant.
290: 
291: Vapnik has shown that an equivalent solution of this constrained 
292: optimization problem can be obtained by solving the corresponding {\it dual 
293: problem}, which may be stated as follows [3].
294: \item{1.}
295: Choose a kernel of the form
296: $$
297: K({\bf x},{\bf x}_i) = \sum_{j=1}^m \varphi_j({\bf x})\varphi_j({\bf x}_i) \,,
298: \eqno(3)
299: $$
300: symmetrical in its vector arguments and continuous in their components,
301: and qualifying as an inner product in some space, so as to meet the
302: conditions of Mercer's theorem [10,3].
303: \item{2.}
304: Given the training sample $\{ ({\bf x}_i,y_i) \}$, $i=1, \ldots, P$,
305: assemble the convex functional 
306: $$
307: Q(\{\alpha_i,\alpha_i'\}) = \sum_{i=1}^P y_i(\alpha_i - \alpha_i')
308: -\epsilon \sum_{i=1}^P (\alpha_i + \alpha_i')
309: - {1 \over 2} \sum_{i=1}^P \sum_{l=1}^P ( \alpha_i - \alpha_i')
310: (\alpha_l - \alpha_l') K({\bf x}_i, {\bf x}_l) \,. \eqno(4)
311: $$
312: \item{3.}
313: Maximize $Q$ subject to the constraints
314: $$
315: \sum_{i=1}^P (\alpha_i - \alpha_i') = 0\,, \qquad 0 \leq \alpha_i\,,\,\alpha_i' 
316: \leq C \,, \eqno(5)
317: $$
318: 
319: \noindent
320: where $C$ is a user-determined constant.
321: The optimal approximating function then takes the forms
322: $$
323: {\hat f}_{\rm opt}({\bf x},{\bf w}) = {\bf w}^T{\bf w} 
324: = \sum_{i=1}^{P} (\alpha_i - \alpha_i') K({\bf x},{\bf x}_i) \,, \eqno(6)
325: $$
326: where ${\bf w}^T$ the transform of the column vector ${\bf w}$.
327: The subset of training patterns $i$ for which $\alpha_i - \alpha_i'$
328: does not vanish then defines the {\it support vectors} of the machine,
329: corresponding to the training examples that are the most 
330: salient to solution of the problem.
331: 
332: The parameters $\epsilon$ and $C$ provide the user with control over 
333: the complexity of the machine, as measured by the so-called VC dimension 
334: [11,3], and hence over its performance in generalization. 
335: Careful tuning of these parameters is necessary.
336: 
337: Different choices for the inner-product kernel $K({\bf x},{\bf x}_i)$ yield 
338: different versions of the support vector machine.  The most popular are (i) the
339: polynomial learning machine, corresponding to 
340: $$
341: K({\bf x},{\bf x}_i) =
342: ({\bf x}^T{\bf x}_i + 1)^p  \eqno(7)
343: $$
344: (with user-selected power $p$),
345: (ii) the radial-basis function (RBF) network, corresponding to 
346: $$
347: K({\bf x},{\bf x}_i) =\exp \left( - \gamma ||{\bf x} -{\bf x}_i ||^2\right) \eqno(8)
348: $$
349: (with user-selected width parameter $\gamma$), and (iii) the 
350: two-layer perceptron [1-3], with 
351: $$
352: K({\bf x},{\bf x}_i) =\tanh (\beta_1 {\bf x}^T{\bf x}_i + \beta_2)  \eqno(9)
353: $$
354: (freedom in setting the parameters $\beta_1$ and $\beta_2$ being
355: restricted by Mercer's theorem).
356: 
357: We are most interested in creating predictive statistical models 
358: capable of estimating a real-valued function $f({\bf x})$ from given
359: values for its independent variables comprising ${\bf x}$.  For that
360: reason, we have outlined the design of SVMs for solving problems 
361: of nonlinear regression.  However, the support vector machine was originally 
362: introduced to solve yes/no classification problems, and applied to problems 
363: in which positive and negative cases are either separable by a hyperplane in the
364: input space (trivial), or not (nontrivial).  For problems that are
365: not linearly separable in this sense, the input vectors are mapped 
366: nonlinearly into a higher-dimensional feature space, in which separation 
367: by a hyperplane becomes possible.  The principle of structural risk 
368: minimization then dictates that an {\it optimal} hyperplane be sought in
369: this space, such that the margin of separation between positive and negative 
370: cases is minimized.  It is known [7,8] from general learning theory that the 
371: error rate of a learning machine on test data (i.e., in generalization or prediction)
372: is bounded by the sum of two terms, namely the error rate on the training 
373: data and a term involving the VC dimension.  For a linearly separable 
374: problem treated by a SVM, the first term is zero and the second is 
375: minimized.  Thus, good generalization is achieved even without 
376: building into the model any explicit knowledge about the problem to be 
377: solved, beyond the raw training data.  This desirable feature is maintained 
378: approximately in application of SVMs to nonseparable classification 
379: problems and to the generically more difficult problems of regression.
380: 
381: The support vector machine may be broadly viewed as a kind of 
382: feedforward neural network, in that the inner-product kernels 
383: $K({\bf x},{\bf x}_i)$ provide a layer of hidden units that effect 
384: nonlinear processing of the inputs and provide weighted linear outputs,
385: which are summed by an output unit.  As seen above, the familiar structures of
386: radial-basis-function networks and perceptrons with one hidden layer can 
387: be realized as special cases by suitable choices of kernel, as specified 
388: above.  But a support vector machine does more: it also embodies an algorithm 
389: that automatically determines the number of hidden units appropriate to 
390: the problem at hand, whatever the choice of kernel.  This more general 
391: scope of the SVM approach stands in contrast to the backpropagation 
392: learning algorithm [1-3], which is designed especially for training 
393: multilayer perceptrons.
394: 
395: In addition to the benefits already mentioned, the support
396: vector machine offers other significant advantages over the 
397: more traditional approaches to supervised learning based
398: on neural networks, which involve dependence on trial
399: and error, rules of thumb, and heuristics.  The support
400: vector machine offers a generic way to control model complexity.
401: The curse of dimensionality is overcome by the pivotal strategy
402: of introducing an inner-product kernel conforming to Mercer's
403: theorem and solving the constrained optimization problem in its dual 
404: version, thereby determining the dimension of the feature space
405: as the number of support vectors distilled from the training set.
406: The procedure naturally incorporates regularization.  The
407: use of the $\epsilon$-insensitive cost function (2) in the 
408: regression application lends robustness to the machine by avoiding 
409: certain drawbacks of the least-square estimator employed in
410: the backpropagation learning algorithm (e.g., sensitivity to outliers 
411: and to distributions with additive noise having a long tail).  
412: Importantly, the SVM is guaranteed to find a global minimum of 
413: the error surface.  For a more detailed and systematic development 
414: of the properties of SVMs, the reader is directed to Haykin's excellent 
415: text [3], as well as the authoritative monographs of Vapnik [7,8].
416: 
417: Our investigations of the potential of support vector machines for
418: the design of global statistical models of nuclear properties make use 
419: of the RBF kernel (8), as well as a simplified version of what is 
420: called ANOVA decomposition [12].  ANalysis Of VAriance (ANOVA) is a 
421: scheme for imposing a structure on multi-dimensional kernels that are 
422: generated from one-dimensional kernels, in a way that gives better control 
423: over the capacity of the machine (as measured by the VC dimension).  
424: An ANOVA kernel we have found to be well suited to the regression
425: problem posed by the nuclear (atomic) mass data is rooted in
426: the RBF kernel and has the form
427: $$
428: K({\bf x},{\bf x}_i) =  \left(\sum_{l=1}^n \exp\left[-\gamma\left(
429:   x^{(l)} - x_i^{(l)} \right)^2     \right]             \right)^d   \,,
430: $$
431: where the user-selected parameter $\gamma$ can take any positive
432: value and the power $d$ is usually an integer.  We 
433: shall call this the ANOVA kernel.
434: \vskip 28 truept
435: \centerline{\bf 3.  SVM MODELS OF NUCLEAR MASS SYSTEMATICS}
436: \vskip 12 truept
437: 
438: SVM regression models have been trained to predict $(\Delta M)c^2$
439: in MeV, where $\Delta M$ is the mass excess (or mass defect) 
440: defined by the difference $M - A$ between the atomic mass $M$,
441: measured in amu, and the mass number $A$ of the nuclide in question.  
442: In our initial study, we focus on a database given by the union 
443: ${\rm O}\, \oplus {\rm N}\, \oplus {\rm NB}$ of three data sets.  The first
444: consists of the set of 1323 ``old'' (O) experimental mass assignments 
445: which the 1981 semi-empirical droplet-model mass formula of M\"oller 
446: and Nix [13] was intended to reproduce.  The second is a set of 
447: 351 ``new'' (N) experimental mass assignments for nuclei that lie 
448: mostly beyond the edges of the 1981 data (as viewed in the $N-Z$ plane).  
449: In addition to the O and N sets, a set of 158 nuclides with more 
450: recently measured masses (the NB set of ``even newer'' nuclides)
451: is employed in the modeling process.  In earlier work [14-16,5], these 
452: three data sets have been used to quantify the extrapolation capability 
453: (the so-called extrapability) of different global mass models (based 
454: either on nuclear theory or neural networks).
455: 
456: The set ${\rm O}\, \oplus {\rm N} \, \oplus {\rm NB}$ is divided by a 
457: random-sampling procedure into three nonoverlapping subsets, namely 
458: a training set (80\%), a validation set (10\%), and a test set (10\%), 
459: in the indicated approximate proportions.  (In all work reported
460: in this paper, random samplings are drawn from a uniform
461: distribution.)   Training, validation, and
462: test sets are each further subdivided into four subsets labeled EE,
463: EO, OE, and OO, composed respectively of nuclides belonging to the 
464: four ``even-oddness'' classes: even-$Z$-even-$N$, even-$Z$-odd-$N$, 
465: odd-$Z$-even-$N$, and odd-$Z$-odd-$N$.  For convenience, values 
466: of the input variables are encoded by a linear transformation that
467: scales and shifts given values of $Z$ and $N$ to lie in the interval
468: $[0,1]$.  A similar linear transformation decodes the learning machine's 
469: raw output, which lies in the interval $[-1,1]$, so as to provide an 
470: estimate of the corresponding mass excess in MeV.
471: 
472: Effectively, we divide the mass problem into four separate problems,
473: one for each of the four ``even-oddness'' classes in $Z$ and $N$.
474: In doing so, we are actually incorporating some domain knowledge into 
475: the learning strategy.  Distinctive quantum-mechanical features of nuclei, 
476: abundantly supported by empirical evidence, include quantized angular 
477: momenta, magic numbers, shell structure, and pairing energies, 
478: all of which stem from the fact that $Z$ and $N$ 
479: are integers, even or odd.
480: 
481: A SVM model is developed individually for each of the four nuclear classes
482: EE, EO, OE, and OO.  SVM regression (with ANOVA-RBF specification of kernels) 
483: is carried out separately for the respective training sets, thereby
484: constructing a predictive model whose reliability is judged by its
485: performance on the examples in the test set.  Following established
486: practice, performance of each of the four models on its corresponding
487: validation set have been used to guide the final determination of the adjustable 
488: parameters.  Ideally, the test set should have {\it no} role in choosing
489: these parameters (although in some cases a weak influence is allowed).
490: 
491: As is usual in global models of the atomic-mass table, the quality
492: of a given model is judged by the smallness of the root-mean-square (rms)
493: error $\sigma$ in the mass excess $\Delta M$, averaged over the data 
494: set in question (training, validation, or test set for a given 
495: class of nuclides).  To be competitive, a model should have
496: values of $\sigma$ below 1 MeV.  It should be noted however, that
497: only in a few cases has a rigorous test of predictive performance
498: been made for the traditional theoretical models of semi-empirical
499: character.  (An important exception is found in the work of
500: M\"oller, Nix, and collaborators [15,16], who introduce the
501: notion of extrapability, which is equivalent to our 
502: generalization.)
503: 
504: Some of the better results obtained in the present exploratory
505: study are displayed in Table 1.  The performance of these models, all 
506: with RBF parameter $\gamma = 2.5$ and ANOVA degree $d=8$, is evidently
507: of high quality.  
508: 
509: \topinsert
510: \centerline{\bf{Table 1}}
511: \vskip 12 truept
512: \noindent
513: Performance of SVM global models of atomic mass.  For all four models,
514: the RBF parameter $\gamma$ is 2.5 and the ANOVA degree is $d=8$.  The
515: other SVM parameters have been defaulted at $C=0.1$ and $\varepsilon = 0.001$.
516: \vskip 27 truept
517: 
518: \input tables.tex
519: \nrows= 6
520: \ncols= 7
521: \begintable
522: {} | & \quad Learning & Set \quad \quad \quad | &  \quad Validation &  Set 
523: \quad \quad \quad | &  \quad \quad \quad Test &  Set \quad \quad \cr
524: Classes | & \# Nuclides &  $\sigma$(MeV) | & \# Nuclides  & $\sigma$(MeV) |
525: & \# Nuclides  & $\sigma$(MeV) \cr
526: EE | & 381 & 0.58 | & 48 & 0.71 | & 48 & 0.99 \cr
527: EO | & 360 & 0.89 | & 45 & 0.68 | & 45 & 0.62 \cr
528: OE | & 371 & 0.70 | & 46 & 0.78 | & 46 & 0.88 \cr
529: OO | & 353 & 0.75 | & 44 & 0.74 | & 45 & 0.97 
530: \endtable
531: \vskip 14 truept
532: \endinsert
533: 
534: Similar learning experiments can be found among the studies
535: of Ref.~[5] based on multilayer perceptrons and modified 
536: backpropagation training, although procedural differences
537: preclude direct comparisons of performance.  The best model obtained 
538: using O as the training set, NB as validation set, and N
539: as test set gave rms error figures on these sets of 0.71 MeV, 
540: 2.28 MeV, and 2.16 MeV, respectively.  Another strategy yielded 
541: better results.  The set ${\rm O}\, \oplus \, {\rm N}$ was first ``purified'' by 
542: removing 20 nuclides with poorly measured masses.  A random sample 
543: M1 consisting of 1303 of the remaining 1654 examples (some 79\%) was 
544: used as the training set.  The complementary set, M2, played the 
545: role of validation set, and the NB set was used for testing
546: the trained model.  The best model found in this way produced
547: rms errors on the three sets of 0.44 MeV (M1), 0.44 Mev (M2),
548: and 0.95 MeV (NB).  It should be noted that this level of
549: performance on the mass problem was achieved after more
550: than a decade of successive improvements in the choices of
551: architectures, coding schemes, and training algorithms.
552: 
553: In addition to the four class-specific models SVM-EE, SVM-EO, SVM-OE,
554: and SVM-OO reported on in Table 1, we also constructed a single SVM 
555: model (denoted SVM-S) using the full O data set as the training sample, 
556: without making a distinction between EE, EO, OE, and OO nuclides.  
557: In this case, the NB nuclei are used as a validation set, guiding
558: the determination of the RBF and ANOVA parameters.  The parameters 
559: associated with the SVM-S model are again $\gamma = 2.5$ 
560: and $d = 8$, along with $C= 0.1$ and $\varepsilon = 0.001$.  This
561: model yields rms errors of 0.70 MeV on the training set O and
562: 0.75 MeV on the validation set NB, with a $\sigma$ value of 
563: 1.41 MeV on the N nuclei, regarded as a test set.  (These results
564: are erroneously cited in Ref.~[5].)  A proper averaging over
565: the four nuclidic classes permits a comparison between the
566: SVM-S model and the four models represented in Table 1.   The
567: composite performance of the latter models is then reflected
568: in $\sigma$ values of 0.73 MeV, 0.73 MeV, and 0.88 MeV
569: in training, validation, and testing, respectively.  
570: 
571: In some cases, meaningful comparisons may be drawn between the 
572: performance of statistical mass models based on multilayer perceptrons 
573: and support vector machines, and the traditional mass models based on 
574: nuclear theory and phenomenology.  Starting with the simple liquid-drop
575: model, such traditional theory-thick models have evolved over 
576: seven decades to achieve a high degree of sophistication and
577: precision.  For example, the 1992 FRDM model of M\"oller and Nix [15] 
578: gives $\sigma$ values of 0.67 MeV on the O set (when fitted to
579: this set) and 0.74 MeV on the N set (a true measure
580: of predictive performance of the model).  The more enhanced
581: FRDM model of Ref.~[16], which is fitted to the data set
582: ${\rm M1} \, \oplus \, {\rm M2}$, yields rms errors of 0.68 MeV (M1),
583: 0.71 MeV (M2), and 0.70 MeV (NB).  The HFB2 model of Pearson 
584: and collaborators [17] gives respective errors of 0.67 MeV,
585: 0.68 MeV, and 0.73 MeV.  (We note that the result of Ref.~[17]
586: on the ``test set'' NB cannot be regarded as a prediction, since
587: the nuclei involved were used in adjusting model parameters.)
588: 
589: With additional refinements, it is not unreasonable to expect 
590: that SVM models can equal (and possibly surpass) the levels of robustness
591: and predictive accuracy achieved with theory-thick models and with 
592: multilayer perceptron models.  However, a conclusive statement
593: must await a thorough SVM study based on the recent AME03 mass
594: evaluation carried out by Audi {\it et al.}~[18]
595: \vskip 28 truept
596: \centerline{\bf 4.  SVM MODELS OF BETA-DECAY HALFLIVES}
597: \vskip 12 truept
598: 
599: \vbox{
600: We now turn to a second problem of regression in the statistical analysis 
601: of nuclear properties via support vector machines, namely fitting and
602: prediction of the beta-decay halflives of nuclides $(Z,N)$ that decay 100\% via 
603: the $\beta^-$ mode.  The data for this problem have been culled 
604: from the on-line repository at the Brookhaven National Nuclear Data
605: Center (http:$//$www.nndc.bnl.gov).  The data employed are current to May 
606: 2005 and consist of a total of 932 examples.  Restricting 
607: attention to examples with halflives below $10^6$ s leaves 
608: 633 nuclides.  When measured in seconds, the experimental values 
609: of $T_{1/2}$ range over 26 orders of magnitude, so it is 
610: more appropriate to regress $L = \log T_{1/2}$ instead of the
611: halflife itself, and to adopt the rms error $\sigma_L$ of the estimate
612: of $L$ as a figure of merit in learning, validation, and prediction
613: phases of the analysis.
614: 
615: As in the case of the mass problem, separate SVM models are
616: constructed for EE, EO, OE, and OO classes of nuclides.  However,
617: we make the simpler RBF choice of kernel, instead of pursuing
618: the more elaborate ANOVA option.   (Implementation based on the
619: ANOVA decomposition is much more demanding in terms of 
620: computer time.)  Each of the four data subsets (EE, EO, OE, OO) is
621: subdivided into training, validation, and test sets in the
622: approximate proportions 80\%, 10\%, and 10\%, respectively.
623: 
624: The results obtained from the SVM regressions are summarized in Tables 2 
625: and 3.  Table 2 gives the parameters and performance measures of the
626: models constructed for the full set of data, regardless of measured
627: lifetime.  Table 3 displays the corresponding results when nuclides with
628: $T_{1/2} \geq 10^6$ s are removed from the database.
629: 
630: A similar study [19] (see also Ref.~[20]) has been carried out 
631: with multilayer feedforward neural networks trained by ``vanilla''
632: backpropagation, for data available in 1995 (766 examples in total) 
633: However, this study did not employ the now-standard protocol in 
634: which a validation set is used in making the final model selection.  
635: Also, no subdivision into the four even-oddness classes was made.  
636: Instead, the full data set (or the restricted set of examples with 
637: $T_{1/2} < 10^6$~s) was split into a training set of approximately 
638: 75\% of the examples and a test set consisting of the remainder.
639: }
640: 
641: \topinsert
642: \centerline{\bf{Table 2}}
643: \vskip 12 truept
644: \noindent
645: Performance of SVM global models of $\beta$-decay halflives $T_{1/2}$ 
646: (including examples having $T >  10^6$ s).  For all four models, 
647: $C=1$ and $\varepsilon =0.001$.
648: \vskip 30 truept
649: \input tables.tex
650: \nrows=6
651: \ncols=8
652: \begintable
653:          \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad  
654: |\qquad Test & Set \qquad | RBF kernel     \crthick
655: Classes  \|\# Nuclides & $\sigma_L$  |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick
656: 
657: EE       \|  ~137      & 2.88~ | ~16        & 3.61~ |~15        & 1.72~| 5.44 \crnorule
658: EO       \|  ~198      & 2.75~ | ~24        & 2.27~ |~22        & 2.17~| 7.27 \crnorule 
659: OE       \|  ~187      & 2.37~ | ~22        & 2.76~ |~20        & 2.38~| 9.99 \crnorule
660: OO       \|  ~236      & 2.62~ | ~29        & 2.07~ |~26        & 2.96~| 9.55
661: \endtable
662: %\vskip 1.5truecm
663: \vskip 1.2truecm
664: \centerline{\bf{Table 3}}
665: \vskip 12 truept
666: \noindent
667: Performance of SVM global models of $\beta$-decay halflives (with a cutoff 
668: at $10^6$ s).  For all four models, $C=1$, $\varepsilon =0.001$.
669: \vskip 30 truept
670: \input tables.tex
671: \nrows=6
672: \ncols=8
673: \begintable
674:          \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad 
675:  | \qquad Test & Set \qquad | RBF kernel     \crthick
676: Classes  \|\# Nuclides & $\sigma_L$  |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick
677: 
678: EE       \|  ~96       & 1.34~ | ~11        & 0.52~ |~10        & 1.20~| 1.78 \crnorule
679: EO       \|  ~140      & 0.90~ | ~17        & 0.69~ |~15        & 1.22~| 9.97 \crnorule 
680: OE       \|  ~122      & 1.55~ | ~14        & 0.63~ |~13        & 1.18~| 0.84 \crnorule
681: OO       \|  ~159      & 1.00~ | ~19        & 1.28~ |~17        & 1.34~| 8.87
682: \endtable
683: \vskip 14truept
684: \endinsert
685: \vskip 1.3truecm
686: %\vskip 1truecm
687: 
688: Comparison of the rms errors shown in Tables 2 and 3 with the
689: corresponding performance figures from the earlier work [19,20] shows
690: an improvement (reduction) in rms error values by about a factor
691: 2, in both learning and prediction, for both the full and restricted
692: data sets.  Comparison may also be made with results from 
693: traditional nuclear theory (e.g.~Refs.~[21-23]).  Since the 
694: cited neural-network models could already attain performance in fitting 
695: and prediction comparable to that exhibited by these theory-thick models, 
696: we can say with some confidence that the SVM models are capable of a 
697: predictive acuity superior to the best of the traditional global 
698: models currently in play.
699: \vfill\eject
700: 
701: We should also call attention to the greatly improved quality of
702: neural-network models of $\beta$-decay systematics, achieved in
703: very recent studies [24].  Data based on the AME03 evaluation 
704: are divided into training, validation, and test sets in the
705: respective proportions 60\%, 20\%, and 20\%, both with and
706: without the restriction to halflives not greater than $10^6$~s,
707: but without subdivision into even-oddness classes.  In the
708: case where the restriction is imposed, the best results 
709: found for the error measure $\sigma_L$ are 0.55 (training),
710: 0.61 (validation), and 0.64 (prediction).  The corresponding
711: averages for the model represented in Table 3 are 1.43, 0.89,
712: and 1.24, respectively, so further refinement of the SVM models
713: will be needed to match the perfomance of the best multilayer
714: perceptrons.
715: \vskip 28 truept
716: 
717: \centerline{\bf 5.  SVM MODELS OF GROUND-STATE SPINS AND PARITIES} 
718: \vskip 12 truept
719: 
720: In a third illustration of what is possible, the SVM approach is applied
721: to construct global statistical models of the ground-state spins and parities 
722: of nuclei.  (In this context, ``spin'' refers to the total angular momentum 
723: quantum number $J$ of the nuclear state.)   As in the exercises described
724: in Secs.~3 and 4, we again divide the nuclei under consideration into EE, 
725: EO, OE, and OO classes.  In the spin problem, this subdivision is of
726: obvious importance, since the law of angular momentum addition in
727: quantum mechanics dictates that the states of EE and OO nuclei can
728: only have integral values of $J$, whereas the spins of EO and OE 
729: nuclei must be half-odd-integral.  In fact, all EE nuclei are known to
730: have spin/parity $J^\pi = 0^+$.  Clearly, we may exclude this class from
731: consideration, since its modeling is a trivial task for any viable
732: learning machine.
733: 
734: The parity property of nuclear states presents the simplest kind 
735: of classification problem, with two mutually exclusive outcomes, even 
736: or odd.  Moreover, because the spin quantum number $J$ is restricted
737: by quantum theory to a finite set of discrete values, global modeling 
738: of spin systematics is also most efficiently treated, within the 
739: SVM framework, as a problem of classification rather than function 
740: approximation or regression.  In our study, we consider
741: $J$ values ranging from 0 to 23/2 in half-odd-integral steps, the 
742: integral values being available for OO nuclei and the half-odd integral 
743: values, for EO and OE nuclei.  This specification of the problem 
744: may be construed as introducing some basic domain knowledge into the 
745: model-building process.
746: 
747: Data for the spin and parity nuclear ground states have been taken from 
748: the on-line Brookhaven database.  Based on simple RBF kernels, separate 
749: SVM classifier models of these two properties have been developed for each 
750: of the three nontrivial even-oddness cases.
751: 
752: Let us first discuss our findings for the parity problem.  In treating 
753: this problem, the data for each of the cases EO, OE, and OO are divided 
754: at random into training, validation, and test sets in the approximate 
755: proportions 80\%, 10\%, and 10\%, respectively.  Performance is measured in
756: terms of the percentages of correct classifications within these
757: subsets.  The primary results are summarized in Table 4.  It is apparent 
758: that modeling parity is an easy task for SVMs.  Judging from available
759: results [25,14], it is also relatively easy for neural networks 
760: (although SVM performance is somewhat superior).
761: 
762: For the models of Table 4, performance on the training sets is 
763: perfect.  If we are willing to make a small sacrifice in the quality
764: of reproduction of the input data, slightly better performance on the
765: validation and test sets can be achieved, as seen in Table 5.
766: It is interesting that this second model corresponds to a quite different
767: error minimum under variation of the parameter $\gamma$.  In general,
768: there may be many such minima of similar depth.
769: 
770: We have not yet conducted a full training-validation-test process
771: for the spin problem.  Accordingly, we present only preliminary
772: results, which nevertheless are illuminating.  In the first experiment 
773: to be reported (see Table 6), each of the three spin data sets EE, OO, and OO is
774: divided randomly into {\it two} subsets, a training set and a 
775: complementary second set.  The training set contains approximately
776: 90\% of the examples of the given even-oddness class, and the second set, the 
777: remaining $\sim 10$\%.
778: 
779: \topinsert
780: \centerline{\bf{Table 4}}
781: \vskip 12 truept
782: \noindent
783: Performance of SVM global models of ground-state parity. 
784: For all four models, $C=0.1$, $\varepsilon =0.01$.  Model selection
785: is guided by best performance on the validation set, consistent with
786: a perfect score on the training set.
787: \vskip 27 truept
788: \input tables.tex
789: \nrows=5
790: \ncols=6
791: \begintable
792: \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad  
793:  |\qquad Test & Set \qquad | RBF kernel \crthick
794: Classes  \|\# Nuclides & Score  |\# Nuclides & Score |\# Nuclides&
795: Score|$\gamma$\crthick
796: EO   \|  ~474     & 100\%~ | ~58        & 93\%~ |~52        & 83\%~| 9.232 \crnorule
797: OE   \|  ~466     & 100\%~ | ~57        & 89\%~ |~51        & 90\%~| 9.482 \crnorule
798: OO   \|  ~434     & 100\%~ | ~53        & 87\%~ |~48        & 84\%~| 9.176 
799: \endtable
800: \vskip 1truecm
801: \centerline{\bf{Table 5}}
802: \vskip 12 truept
803: \noindent
804: Performance of SVM global models of ground-state parity.
805: For all four models, $C=0.1$, $\varepsilon =0.01$.  In this case,
806: model selection is guided by best performance on the validation
807: set, allowing for minimal nonzero error rate on the training set.
808: \vskip 27 truept
809: \input tables.tex
810: \nrows=5
811: \ncols=8
812: \begintable
813: \| \quad Learning & Set \qquad   | \quad Validation & Set \qquad  | 
814: \qquad Test & Set \qquad | RBF kernel \crthick
815: Classes \|\# Nuclides & Score  |\# Nuclides & Score |\# Nuclides& Score|$\gamma$\crthick
816: EO  \|  ~474      & 100\%~ | ~58        & 91\%~ |~52        & 83\%~| 0.678 \crnorule
817: OE  \|  ~466      & 95\%~  | ~57        & 84\%~ |~51        & 92\%~| 0.180 \crnorule
818: OO  \|  ~434      & 96\%~  | ~53        & 83\%~ |~48        & 86\%~| 0.240
819: \endtable
820: \vskip 14truept
821: \endinsert
822: \vskip 1truecm
823: 
824: \topinsert
825: \centerline{\bf{Table 6}}
826: \vskip 12 truept
827: \noindent
828: Performance of SVM global models of nuclear ground-state spin. 
829: For all three models, $C=0.1$, $\varepsilon =0.01$.  Model selection
830: is guided by best on performance on the validation set, consistent with a
831: perfect score on the training set.
832: \vskip 27 truept
833: \input tables.tex
834: \nrows=5
835: \ncols=6
836: \begintable
837: \| \quad \quad Learning & Set  \qquad  | \quad Validation/Test & Set \qquad \
838: | RBF kernel  \crthick
839: Classes  \|\# Nuclides & Score  |\# Nuclides & Score | $\gamma$    \crthick
840: EO       \|  ~528      & 100\%~ | ~58        & 81\%~ | 9.217       \crnorule
841: OE       \|  ~522      & 100\%~ | ~57        & 68\%~ | 9.001       \crnorule
842: OO       \|  ~488      & 100\%~ | ~54        & 43\%~ | 4.002
843: 
844: \endtable
845: \vskip 14truept
846: \endinsert
847: 
848: \noindent
849: The second set is used to help pin down the RBF parameter
850: $\gamma$ and thereby plays a role in model selection.  Hence it must be
851: interpreted as a validation set.  SVM models are constructed for a range 
852: of $\gamma$ values, and the model whose $\gamma$ value produces the
853: lowest error on the second data set (while scoring 100\% on the
854: training set) is selected.  There is no real test set in this experiment. 
855: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
856: \topinsert
857: \centerline{\bf{Table 7}}
858: \vskip 12 truept
859: \noindent
860: Performance of SVM global models of nuclear ground-state spin. 
861: For all three models, $C=0.1$, $\varepsilon =0.01$.  The
862: parameter $\gamma$ is fixed at the value determined for Table 6.
863: The test set influences model choice only indirectly.
864: \vskip 27 truept
865: \input tables.tex
866: \nrows=5
867: \ncols=6
868: \begintable
869: \|   Learning & Set    | Validation & Set   |Test & Set | RBF kernel \crthick
870: Classes \|\# Nuclides & Score |\# Nuclides & Score |\# Nuclides& Score~|$\gamma$\crthick
871: EO   \|  ~476      & 100\%~ | ~58        & 79\%~ |~52        & 60\%~~| 9.217 \crnorule
872: OE   \|  ~470      & 100\%~ | ~57        & 61\%~ |~52        & 79\%~~| 9.001 \crnorule
873: OO       \|  ~440      & 100\%~ | ~54        & 39\%~ |~48        & 38\%~~| 4.002
874: \endtable
875: \vskip 14truept
876: \endinsert
877: 
878: In an alternative experiment, we have implemented a protocol intermediate
879: between the training-validation scheme leading to Table 6, and the full 
880: training-validation-test procedure.  The data for each of the three 
881: even-oddness classes involved are divided into three subsets as follows.  
882: The second subset is taken to be identical to the second subset formed 
883: in the first experiment.  The first subset, used as the training set, 
884: consists of 80\% of the examples for the class in question, these being 
885: chosen at random from the corresponding training set created in 
886: the first experiment.  The 10\% that are
887: not so chosen constitute the third subset, which is regarded as
888: a test set.  Then, using the {\it same} parameter $\gamma$ as
889: determined in the first experiment with the aid of the second
890: subset, new SVM models are developed from the examples in the reduced 
891: training set.  These models are used to generate spin values for both 
892: second and third subsets -- values which may differ from those given by the
893: models developed in the first experiment (see Table 7).  Although 
894: it is not legitimate to interpret the third subset as a test set in the 
895: purest sense, its influence on model selection is indirect.
896: 
897: From the results shown in Tables 6 and 7, one may plausibly infer
898: that support vector machines can perform very well on the
899: problem of predicting nuclear ground-state spins.  While further
900: experiments are needed to affirm this conclusion, it is already
901: of interest to compare our SVM models with other global models
902: of nuclear spin systematics.  Global nuclear structure calculations
903: within the macroscopic/microscopic approach [26] reproduce
904: the ground-state spins of odd-$A$ nuclei with an accuracy of
905: 60\% (agreement being found in 428 examples out of 713).  
906: (In this work, there is no clear distinction between fitting 
907: and prediction, or between training, validation, and test 
908: sets.) Multilayer feedforward neural networks do somewhat 
909: better [25,14].  Averaging over results of three experiments involving 
910: nets having a single hidden layer and trained with backpropagation, 
911: the performance for odd-$A$ nuclei reaches 62\% on what are 
912: effectively validation sets, the training sets being 
913: reproduced to an accuracy of 93\%.  In an experiment in which 
914: the connection weights of feedforward nets with one hidden layer 
915: are determined by a conjugate gradient procedure, performance
916: at the level of 99.5\% on the training set and 73.2\% on 
917: a validation set has been achieved for OE nuclei.  The
918: spins of odd-odd nuclei are notoriously difficult to predict.
919: This is reflected in the performance figures of neural-network
920: (perceptron) models on the OO category, which are typically 
921: 75\% correct on training-set examples and only 15\% in validation or 
922: testing.
923: 
924: Placed in the context of earlier work, both statistical
925: and phenomenological, the results in Tables 6--7 
926: for the first SVM models of nuclear spin speak for themselves.
927: \vskip 28truept
928: 
929: \centerline{\bf 6.  CONCLUDING REMARKS}
930: \vskip 12truept
931: 
932: We have made initial studies of the potential of support vector machines (SVM)
933: for providing statistical models of nuclear systematics with demonstrable
934: predictive power.  Using SVM regression and classification procedures,
935: we have created global models of atomic masses, beta-decay halflives, 
936: and ground-state spins and parities.  These models exhibit performance
937: in both data-fitting and prediction that is comparable to that of
938: the best global models from nuclear phenomenology and microscopic theory,
939: as well as the best statistical models based on multilayer feedforward
940: neural networks.  Further work to develop the scope, acuity, and reliability
941: of SVM applications to nuclear physics seems to be warranted.  In particular,
942: the full body of data in the AME03 atomic-mass evaluation [18] must be brought to
943: bear in construction of SVM models of mass systematics, and the treatment
944: of the spin problem begun here needs to be completed.  Fruitful applications
945: to nucleon separation energies, $\alpha$-decay halflives, 
946: branching ratios of nuclear decay, nuclear deformations, neutron
947: cross sections, and other nuclear properties may also be on the horizon.
948: \vskip 28truept
949: 
950: \centerline{\bf ACKNOWLEDGMENTS}
951: \vskip 12 truept
952: This research was supported in part by the U.~S.~National 
953: Science Foundation under Grant No.~PHY-0140316.  For the regression
954: problems, we made use of the on-line mySVM software and instruction manual
955: of Stefan R\"uping (Dortmund) [27], and for classification problems 
956: we implemented the SVM-multiclass software of Thorsten Joachims 
957: (Cornell) [28]. 
958: \vskip 28truept
959: 
960: \centerline{\bf REFERENCES}
961: \vskip 12 truept
962: \item{[1]}
963: D.~E.~Rumelhart, G.~E.~Hinton, and R.~J.~Williams, in {\it
964: Parallel Distributed Processing: Explorations in the Microstructure
965: of Cognition}, Vol.~1, edited by D.~E.~Rumelhart {\it et al.} (MIT Press,
966: Cambridge, MA, 1986). 
967: \item{[2]}
968: J.~Hertz, A.~Krogh, and R.~G.~Palmer, {\it Introduction to the Theory
969: of Neural Computation} (Addison-Wesley, Redwood City, CA, 1991).
970: \item{[3]}
971: S.~Haykin, {\it Neural Networks: A Comprehensive Foundation}, Second
972: Edition (McMillan, New York, 1999).
973: \item{[4]}
974: J.~W.~Clark, T.~Lindenau, and M.~L.~Ristig, {\it Scientific Applications
975: of Neural Nets} (Springer-Verlag, Berlin, 1999).
976: \item{[5]}
977: S.~Athanassopoulos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,
978: {\it Nucl.~Phys.~A} {\bf 743}, 222 (2004).
979: \item{[6]}
980: C.~Cortes and V.~Vapnik, {\it Machine Learning} {\bf 20}, 273 (1995).
981: \item{[7]}
982: V.~N.~Vapnik, {\it The Nature of Statistical Learning Theory} (Springer-Verlag, 
983: New York, 1995).
984: \item{[8]}
985: V.~N.~Vapnik, {\it Statistical Learning Theory} (Wiley, New York, 1998).
986: \item{[9]}
987: V.~N.~Vapnik, in {\it Advances in Neural Information Processing Systems},
988: Vol.~4 (Morgan Kaufmann, San Mateo, CA, 1992), p.~831.
989: \item{[10]}
990: J.~Mercer, {\it Transactions of the London Philosophical Society (A)} {\bf 209},
991: 415 (1909).
992: \item{[11]}
993: V.~N.~Vapnik and A.~Ya.~Chervonenkis, in {\it Theoretical Probability
994: and Its Applications} {\bf 17}, 264 (1971).
995: \item{[12]}
996: M.~O.~Stitson, A.~Gammerman, V.~Vapnik, V.~Vovk, C.~Watkins, and J.~Weston,
997: in {\it Advances in Kernel Methods -- Support Vector Learning},
998: %#JWC:   Check Sch\"ukopf - may be Sch\"okopf.
999: edited by B. Sch\"ukopf, C.~Burges, and A.~J.~Smola  
1000: (MIT Press, Cambridge, MA, 1999), p.~285.
1001: \item{[13]}
1002: P.~M\"oller and J.~R.~Nix, {\it At.~Data~Nucl.~Data Tables} {\bf 26}, 
1003: 165 (1981).
1004: \item{[14]}
1005: K.~A.~Gernoth, J.~W.~Clark, J.~S.~Prater, and H.~Bohr, {\it Phys.~Lett.} {\bf B300},
1006: 1 (1993).
1007: \item{[15]}
1008: P.~M\"oller and J.~R.~Nix, {\it J.~Phys.~G} {\bf 20}, 1681 (1994).
1009: \item{[16]}
1010: P.~M\"oller, J.~R.~Nix, W.~D.~Myers, and W.~J.~Swiatecki, 
1011: {\it At.~Data Nucl.~Data Tables} {\bf 59}, 185 (1995).
1012: \item{[17]}
1013: M.~Samyn, S.~Goriely, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,
1014: {\it Nucl.~Phys.} A {\bf 700}, 142 (2002);
1015: S.~Goriely, M.~Samyn, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,
1016: {\it Phys.~Rev.~C} {\bf 66}, 024326 (2002).
1017: \item{[18]}
1018: A.~H.~Wapstra, G.~Audi, and C.~Thibault, {\it Nucl.~Phys.~A} {\bf 729}, 337
1019: (2003).
1020: \item{[19]}
1021: E.~Mavrommatis, A.~Dakos, K.~A.~Gernoth, and J.~W.~Clark, in {\it Condensed
1022: Matter Theories}, Vol. 13, edited by J.~da Providencia and F.~B.~Malik
1023: (Nova Science Publishers, Commack, NY, 199), p.~423.
1024: \item{[20]}
1025: J. W. Clark, E. Mavrommatis, S. Athanassopoulos, A. Dakos, and
1026: K. A. Gernoth, {\it Fission Dynamics of Atomic Clusters and Nuclei}, 
1027: edited by D.~M.~Brink, F.~F.~Karpechine, F.~B.~Malik, and J.~da Providencia 
1028: (World Scientific, Singapore, 2001), p.~76. [nucl-th/0109081]
1029: \item{[21]}
1030: A.~Staudt, E.~Bender, K.~Muto, and H.~V.~Klapdor, 
1031: {\it At.~Data Nucl.~Data Tables} {\bf 44}, 80 (1990).
1032: \item{[22]}
1033: H.~Homma, E.~Bender, M.~Hirsch, K.~Muto, H.~V.~Klapdor-Kleingrothaus,
1034: {\it Phys.~Rev.~C} {\bf 54}, 2972 (1996).
1035: \item{[23]}
1036: P.~M\"oller, J.~R.~Nix, and K.~L.~Kratz, 
1037: {\it At.~Data Nucl.~Data Tables} {\bf 66}, 131 (1997).
1038: \item{[24]}
1039: N.~Costiris, A.~Dakos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,
1040: to be published.
1041: \item{[25]}
1042: J.~W.~Clark, S.~Gazula, K.~A.~Gernoth, J.~Hasenbein, J.~S.~Prater, 
1043: and H.~Bohr, in {\it Recent Progress in Many-Body Theories}, 
1044: Vol.~3, edited by T.~L.~Ainsworth, C.~E.~Campbell, B.~E.~Clements, 
1045: and E.~Krotscheck (Plenum, New York, 1992), p.~371.
1046: \item{[26]}
1047: P.~M\"oller and J.~R.~Nix, {\it Nucl.~Phys.~A}~{\bf 520}, 369c (1990).
1048: \item{[27]}
1049: S. R\"uping, mySVM, 
1050: %${\rm http://www}$-ai.cs.uni-dortmund.de${\rm /SOFTWARE/MYSVM/}$ 
1051: http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/
1052: (2004).
1053: \item{[28]}
1054: T. Joachims (2004), Multi-Class Support Vector Machine,
1055: %${\rm http://www.cs.cornell.edu/}$ People/tj/svm\_light/svm\_multiclass.html (2004).
1056: http://www.cs.cornell. edu/People/tj/svm\_light/svm\_multiclass.html (2004).
1057: \bye
1058: