1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: %% %%
3: %% %%
4: %% Plain-TeX Template for Camera-Ready Manuscript Preparation %%
5: %% %%
6: %% CMT28 (St. Louis) Workshop Proceedings %%
7: %% %%
8: %% Vol. 20, Condensed Matter Theories %%
9: %% %%
10: %% Nova Science Publishers %%
11: %% %%
12: %% %%
13: %% (Prepared by J. W. Clark, October 2002) %%
14: %% %%
15: %% %%
16: %% %%
17: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
18: %%
19: %% Note: This template is actually my own joint paper for the
20: %% Vanderbilt (CMT22) proceedings (with unrelated inserts
21: %% that show how to do two tables & and two figures).
22: %%
23: %% The template includes all the essential elements
24: %% such as title and by-lines, section headings and
25: %% associated spacing specifications, figure captions
26: %% & commands for embedded figures, references of all
27: %% types, and a good sampling of fairly complicated
28: %% equations. If there are remaining ambiguities, please
29: %% question me by e-mail at:
30: %%
31: %% jwc@wustl.edu
32: %%
33: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
34: %%
35: %% IF YOU CHOOSE NOT TO USE THIS TEMPLATE, BUT INSTEAD USE ANOTHER
36: %% WORD PROCESSING SCHEME (LaTeX, Word, etc.), YOUR TROUBLES ARE
37: %% STILL NOT OVER, SINCE YOU WILL STILL NEED TO MATCH -- AS EXACTLY
38: %% AS POSSIBLE -- THE plain-TeX OUTPUT FROM THIS TEMPLATE, WITH
39: %% RESPECT TO ALL ASPECTS OF THE FORMAT, INCLUDING FONTS, SPACINGS,
40: %% TABLES, FIGURES, REFERENCES, ETC. CURRENT STANDARDS OF PUBLICATION
41: %% REQUIRE A UNIFORM APPEARANCE FOR CONTRIBUTIONS TO CONFERENCE
42: %% PROCEEDINGS VOLUMES. This can be very time-consuming.
43: %%
44: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
45: %%
46: %% In addition to this template, you will need to put the following
47: %% files (provided!):
48: %%
49: %% tables.tex - needed to construct tables
50: %% psfig.sty - needed to embed postscript figures
51: %% fig1.ps & fig2.eps - the sample postscript figures to be
52: %% embedded in the paper - replaced these by your
53: %% postscript figures as required
54: %%
55: %% You must put these files in the same directory as the main TeX
56: %% file of your paper.
57: %%
58: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
59: %%
60: %% The standard command for generating a ".dvi" file of this TeX
61: %% file (which is called 26template.tex) is
62: %%
63: %% tex 26template.tex
64: %%
65: %% The command for printing out the .dvi file, 26template.dvi, on your
66: %% laser printer will generally be site-dependent, but what basically
67: %% needs to be done is to convert the .dvi file to a .ps (postscript)
68: %% file, say, with the dvips command
69: %%
70: %% dvips -o 26template.ps 26template.dvi
71: %%
72: %% and then print with
73: %%
74: %% lpr -Pprintername 26template.ps
75: %%
76: %%
77: %% The typefont of the paper should be "computer modern". This
78: %% is the default font in plain TeX. If your system is set up
79: %% for another font, please switch back to the computer modern
80: %% (cm) fonts.
81: %%
82: %% Unfortunately, the same TeX file may not print out identically
83: %% at all sites. Therefore you may find it necessary in same cases
84: %% to make minor adjustments of page and line breaks.
85: %%
86: %% Examples of page-break and line-break adjustments are given in
87: %% this template, but it is suggested that you not worry about such
88: %% details until the need actually arises, as indicated by your TeX
89: %% output.
90: %%
91: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
92: %%
93: %% Comment lines, which begin with the percent symbol %, are not printed.
94: %% To make a printable % sign in text, use \%.
95: %%
96: %% Style comment: American, not English, convention is to be followed
97: %% for a list of items or names, e.g. red, blue, and green; Peter, Paul,
98: %% and Mary -- there is a comma before "and". Also follow American
99: %% conventions for spelling.
100: %%
101: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
102: %%
103: %% As an aid to those who received travel support from the U.S. Army
104: %% research office, a suitable statement to this effect is included
105: %% in the ACKNOWLEDGMENTS section of the template.
106: %%
107: %%
108: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
109: %%
110: %%
111: %%The following three lines specify type size, dimensions of text (23.5cm
112: %%by 15.5cm), and line spacing. The spacing is slightly looser than
113: %%single space (which would be \baselineskip = 12 truept) to allow room
114: %%for embedded symbols with superscripts and subscripts.
115: \magnification=\magstep1
116: \font\bigbfont=cmbx10 scaled\magstep1
117: \font\bigifont=cmti10 scaled\magstep1
118: \font\bigrfont=cmr10 scaled\magstep1
119: \vsize = 23.5 truecm
120: \hsize = 15.5 truecm
121: \hoffset = .2truein
122: \baselineskip = 14 truept
123: \overfullrule = 0pt
124: \parskip = 3 truept
125: \def\frac#1#2{{#1\over#2}}
126: \def\diff{{\rm d}}
127: \def\bfk{{\bf k}}
128: \def\eps{\epsilon}
129: \nopagenumbers
130: %%This command suppresses the printing of page numbers.
131: %%You should number the pages with blue pencil in upper right corner
132: %%if you send camera-ready copy. Of course, submission by email
133: %%to cmt28@wuphys.wustl.edu is preferred!!
134: %%
135: %%THE FOLLOWING THREE COMMANDS LEAVE SOME SPACE AT THE TOP OF THE LEAD PAGE.
136: %%(the command "\vskip 4 truecm" actually results in about 4.5 cm of empty
137: %%space at the top, or about 19.6%). The publisher will probably reset the
138: %%chapter heading (your title and by-line), but you should follow my
139: %%19-20% prescription anyway! In my design I am following the Les Houches
140: %%lecture notes volume produced by Nova. If you have LOTS of authors and
141: %%by-lines you may want to allow a bit less space at the top (e.g. if you
142: %%have 3 or more sets of authors and institutions).
143: \topinsert
144: \vskip 3.2 truecm
145: \endinsert
146: \centerline{\bigbfont MODELING NUCLEAR PROPERTIES}
147: %%If your title is longer than one line, continue thus:
148: \vskip 8 truept
149: \centerline{\bigbfont WITH SUPPORT VECTOR MACHINES}
150: %%Don't forget to remove the % signs from the 2 preceding lines if you
151: %%use them to lengthen the title!
152: \vskip 20 truept
153: %%Now comes your by-line with institutional addresses.
154: \centerline{\bigifont H. Li and J. W. Clark}
155: \vskip 8truept
156: \centerline{\bigrfont McDonnell Center for the Space Sciences
157: and Department of Physics}
158: \vskip 2 truept
159: \centerline{\bigrfont Washington University, St. Louis, Missouri 63130, USA}
160: %%In case of multiple institutions, use the following lines, iterated
161: %%as necessary.
162: \vskip 11 truept
163: \centerline{\bigifont E. Mavrommatis and S.~Athanassopoulos}
164: \vskip 8 truept
165: \centerline{\bigrfont Physics Department, Division of Nuclear \&
166: Particle Physics}
167: \vskip 2 truept
168: \centerline{\bigrfont University of Athens, GR-15771 Athens, Greece}
169: \vskip 11 truept
170: \centerline{\bigifont K. A. Gernoth}
171: \vskip 8 truept
172: \centerline{\bigrfont Department of Physics, University of Manchester}
173: \vskip 2 truept
174: \centerline{\bigrfont Manchester M13 9PL, United Kingdom}
175:
176: \vskip 1.8 truecm
177:
178: \centerline{\bf 1. INTRODUCTION}
179: \vskip 12 truept
180: Artificial neural networks and other machine-learning strategies can provide
181: a valuable complement to theory-driven models of the systematics of nuclear data.
182: A significant effort to exploit the potential of data-driven methodologies
183: receives strong motivation from the current thrust toward experimental and
184: theoretical exploration of nuclei far from stability. It is made possible
185: by the availability of a growing body excellent experimental data on
186: nuclear species numbering in the thousands. In outline, statistical models
187: based on supervised learning are developed as follows. Suppose, for
188: example, we wish to predict the atomic mass $M$ of a nuclear species, or
189: nuclide, specifying only its mass number $A$ and atomic number $Z$,
190: or alternatively its proton and neutron numbers $(Z,N)$. A learning
191: machine has an input interface where $(Z,N)$ are fed to the device in
192: coded form and an output interface where an estimate of the
193: mass appears for decoding. In between there is a system or network of
194: interconnected elements that acts to process the incoming information and
195: produce an appropriate output. These processing elements may resemble biological
196: neurons, receiving signals from other units through weighted connections,
197: and displaying nonlinear response to summed input signals. Given
198: a body of training data to be used as examples of the desired
199: mapping, in this case $(Z,N) \to M$, a suitable learning algorithm
200: is used to adjust the parameters of the network, e.g., the weights of
201: the connections between the processing elements, so that the learning
202: machine (i) generates responses at the output interface that reproduce,
203: or closely fit, the atomic masses of the training nuclei, and (ii) serves
204: as a reliable predictor of the masses of test nuclei absent from the
205: training set. This second requirement is a strong one -- the system
206: should not merely serve as a lookup table for masses of known nuclei;
207: it should also perform well in the much more difficult task of
208: generalization.
209:
210: The last two decades have seen much activity and considerable progress
211: in the development and application of supervised learning machines
212: of the type described -- which are designed to learn by example.
213: The most popular implementation is the multilayer feedforward
214: neural network (or multilayer perceptron), taught by the backpropagation
215: learning algorithm in one or another of its many variations [1-3].
216: A significant measure of success has been achieved in constructing
217: global models of nuclear properties based on such neural networks,
218: with applications to atomic masses, neutron separation
219: energies, spins and parities of nuclear ground states, stability versus
220: instability, branching ratios for different decay modes, and
221: beta-decay lifetimes. (For reviews, see Ref.~[4], and for recent
222: results on atomic-mass prediction, see Ref.~[5].)
223:
224: The support vector machine (SVM) [6-8], a principled and powerful approach
225: to problems in classification and nonlinear regression, came on the
226: scene in the 1990s. It has become a standard tool in statistical
227: modeling, and for many problems it is considered the method of
228: choice. We have begun to explore the promise of SVMs for modeling
229: and prediction of nuclear properties. The first results of this
230: effort are reported here.
231:
232: Section 2 provides an introduction to support vector machines and the
233: ANOVA decomposition that facilitates their effective implementation.
234: Section 3 summarizes the results obtained for the atomic-mass problem,
235: and compares the predictive performance of the SVM models with
236: that of multilayer backpropagation networks and state-of-the-art
237: ``theory-thick'' models. Additional results and comparisons for
238: beta-decay halflives and for ground-state spins and parities are
239: presented in Secs.~4 and 5, respectively. Concluding remarks
240: are made in Sec.~6.
241: \vskip 28 truept
242:
243: \centerline{\bf 2. SUPPORT VECTOR MACHINE AND ANOVA DECOMPOSITION}
244: \vskip 12 truept
245:
246: The support vector machine (SVM), pioneered by Vapnik [6-8], may be viewed
247: as an approximate realization of the goal of structural risk minimization
248: [9,3]. Let $({\bf x}_1,y_1),...,({\bf x}_P,y_P)$ be a set of training
249: data drawn from a function $y=f({\bf x})$. Here, ${\bf x}$ is the input
250: variable, a vector of dimension $n$, while $y$ is the output variable,
251: a unique real number for given ${\bf x}$. (In the example considered
252: in Sec.~1, ${\bf x}$ is a vector formed from the two components $Z$ and
253: $N$, while $y$ is the mass $M$.) The support vector machine is based
254: on a suitable nonlinear mapping ${\bf x} \to \varphi({\bf x})$ from the
255: input space to a feature space of higher dimension $m > n$.
256:
257: Applied to the task of regression, the SVM
258: learning strategy begins by posing an approximation $\hat y$ to the output $y$
259: as a linear combination of certain basis functions $\varphi_i({\bf x})$
260: in the feature space, with corresponding linear weights connecting
261: the feature space to the output space.
262: Thus,
263: $$
264: {\hat y} = {\hat f}({\bf x},{\bf w}) = \sum_{j=1}^m w_j \varphi_j({\bf x})\,,
265: \eqno(1)
266: $$
267: where ${\bf w}$ is an $m$-dimensional vector composed of weights
268: $w_j$, $j=1,\ldots,m$. (A bias term $b$ may be included in Eq.~(1)
269: by starting the sum at $j=0$ and introducing $w_0 \equiv b$ and
270: $\varphi_0({\bf x}) \equiv 1$.) To determine the image vectors $\varphi_j({\bf x})$
271: and their weights $w_j$, consider an $\epsilon$-insensitive loss function
272: defined, for input ${\bf x}$, by
273: %$$
274: %|y - {\hat f}({\bf x},{\bf w})|_\epsilon =
275: %=\cases { 0 & if $| y - {\hat f}({\bf x}, {\bf w}) | < \epsilon$ \,, \cr
276: % $|y-{\hat f}({\bf x},{\bf w})| - \epsilon$ & otherwise \,,
277: %\cr}
278: %$$
279: $y - {\hat f}({\bf x},{\bf w}) - \epsilon$ in case the magnitude of the error
280: $y - {\hat f}$ exceeds a tolerance $\epsilon$, and taken zero otherwise.
281: The tolerance parameter $\epsilon$ is at the disposal of the machine's
282: user. The {\it primal} optimization problem then becomes one of minimizing the overall loss (or cost function, or empirical risk), as given by the
283: sum of the individual losses for all the training patterns,
284: $$
285: E_\epsilon ({\bf w}) = \sum_{i=1}^P \left|y_i-{\hat f}({\bf x}_i,{\bf w})
286: \right|_\epsilon \,, \eqno(2)
287: $$
288: subject to the inequality $\sum_{j=1}^m w_j^2 < c_0$, where $c_0$ is
289: a user-determined constant.
290:
291: Vapnik has shown that an equivalent solution of this constrained
292: optimization problem can be obtained by solving the corresponding {\it dual
293: problem}, which may be stated as follows [3].
294: \item{1.}
295: Choose a kernel of the form
296: $$
297: K({\bf x},{\bf x}_i) = \sum_{j=1}^m \varphi_j({\bf x})\varphi_j({\bf x}_i) \,,
298: \eqno(3)
299: $$
300: symmetrical in its vector arguments and continuous in their components,
301: and qualifying as an inner product in some space, so as to meet the
302: conditions of Mercer's theorem [10,3].
303: \item{2.}
304: Given the training sample $\{ ({\bf x}_i,y_i) \}$, $i=1, \ldots, P$,
305: assemble the convex functional
306: $$
307: Q(\{\alpha_i,\alpha_i'\}) = \sum_{i=1}^P y_i(\alpha_i - \alpha_i')
308: -\epsilon \sum_{i=1}^P (\alpha_i + \alpha_i')
309: - {1 \over 2} \sum_{i=1}^P \sum_{l=1}^P ( \alpha_i - \alpha_i')
310: (\alpha_l - \alpha_l') K({\bf x}_i, {\bf x}_l) \,. \eqno(4)
311: $$
312: \item{3.}
313: Maximize $Q$ subject to the constraints
314: $$
315: \sum_{i=1}^P (\alpha_i - \alpha_i') = 0\,, \qquad 0 \leq \alpha_i\,,\,\alpha_i'
316: \leq C \,, \eqno(5)
317: $$
318:
319: \noindent
320: where $C$ is a user-determined constant.
321: The optimal approximating function then takes the forms
322: $$
323: {\hat f}_{\rm opt}({\bf x},{\bf w}) = {\bf w}^T{\bf w}
324: = \sum_{i=1}^{P} (\alpha_i - \alpha_i') K({\bf x},{\bf x}_i) \,, \eqno(6)
325: $$
326: where ${\bf w}^T$ the transform of the column vector ${\bf w}$.
327: The subset of training patterns $i$ for which $\alpha_i - \alpha_i'$
328: does not vanish then defines the {\it support vectors} of the machine,
329: corresponding to the training examples that are the most
330: salient to solution of the problem.
331:
332: The parameters $\epsilon$ and $C$ provide the user with control over
333: the complexity of the machine, as measured by the so-called VC dimension
334: [11,3], and hence over its performance in generalization.
335: Careful tuning of these parameters is necessary.
336:
337: Different choices for the inner-product kernel $K({\bf x},{\bf x}_i)$ yield
338: different versions of the support vector machine. The most popular are (i) the
339: polynomial learning machine, corresponding to
340: $$
341: K({\bf x},{\bf x}_i) =
342: ({\bf x}^T{\bf x}_i + 1)^p \eqno(7)
343: $$
344: (with user-selected power $p$),
345: (ii) the radial-basis function (RBF) network, corresponding to
346: $$
347: K({\bf x},{\bf x}_i) =\exp \left( - \gamma ||{\bf x} -{\bf x}_i ||^2\right) \eqno(8)
348: $$
349: (with user-selected width parameter $\gamma$), and (iii) the
350: two-layer perceptron [1-3], with
351: $$
352: K({\bf x},{\bf x}_i) =\tanh (\beta_1 {\bf x}^T{\bf x}_i + \beta_2) \eqno(9)
353: $$
354: (freedom in setting the parameters $\beta_1$ and $\beta_2$ being
355: restricted by Mercer's theorem).
356:
357: We are most interested in creating predictive statistical models
358: capable of estimating a real-valued function $f({\bf x})$ from given
359: values for its independent variables comprising ${\bf x}$. For that
360: reason, we have outlined the design of SVMs for solving problems
361: of nonlinear regression. However, the support vector machine was originally
362: introduced to solve yes/no classification problems, and applied to problems
363: in which positive and negative cases are either separable by a hyperplane in the
364: input space (trivial), or not (nontrivial). For problems that are
365: not linearly separable in this sense, the input vectors are mapped
366: nonlinearly into a higher-dimensional feature space, in which separation
367: by a hyperplane becomes possible. The principle of structural risk
368: minimization then dictates that an {\it optimal} hyperplane be sought in
369: this space, such that the margin of separation between positive and negative
370: cases is minimized. It is known [7,8] from general learning theory that the
371: error rate of a learning machine on test data (i.e., in generalization or prediction)
372: is bounded by the sum of two terms, namely the error rate on the training
373: data and a term involving the VC dimension. For a linearly separable
374: problem treated by a SVM, the first term is zero and the second is
375: minimized. Thus, good generalization is achieved even without
376: building into the model any explicit knowledge about the problem to be
377: solved, beyond the raw training data. This desirable feature is maintained
378: approximately in application of SVMs to nonseparable classification
379: problems and to the generically more difficult problems of regression.
380:
381: The support vector machine may be broadly viewed as a kind of
382: feedforward neural network, in that the inner-product kernels
383: $K({\bf x},{\bf x}_i)$ provide a layer of hidden units that effect
384: nonlinear processing of the inputs and provide weighted linear outputs,
385: which are summed by an output unit. As seen above, the familiar structures of
386: radial-basis-function networks and perceptrons with one hidden layer can
387: be realized as special cases by suitable choices of kernel, as specified
388: above. But a support vector machine does more: it also embodies an algorithm
389: that automatically determines the number of hidden units appropriate to
390: the problem at hand, whatever the choice of kernel. This more general
391: scope of the SVM approach stands in contrast to the backpropagation
392: learning algorithm [1-3], which is designed especially for training
393: multilayer perceptrons.
394:
395: In addition to the benefits already mentioned, the support
396: vector machine offers other significant advantages over the
397: more traditional approaches to supervised learning based
398: on neural networks, which involve dependence on trial
399: and error, rules of thumb, and heuristics. The support
400: vector machine offers a generic way to control model complexity.
401: The curse of dimensionality is overcome by the pivotal strategy
402: of introducing an inner-product kernel conforming to Mercer's
403: theorem and solving the constrained optimization problem in its dual
404: version, thereby determining the dimension of the feature space
405: as the number of support vectors distilled from the training set.
406: The procedure naturally incorporates regularization. The
407: use of the $\epsilon$-insensitive cost function (2) in the
408: regression application lends robustness to the machine by avoiding
409: certain drawbacks of the least-square estimator employed in
410: the backpropagation learning algorithm (e.g., sensitivity to outliers
411: and to distributions with additive noise having a long tail).
412: Importantly, the SVM is guaranteed to find a global minimum of
413: the error surface. For a more detailed and systematic development
414: of the properties of SVMs, the reader is directed to Haykin's excellent
415: text [3], as well as the authoritative monographs of Vapnik [7,8].
416:
417: Our investigations of the potential of support vector machines for
418: the design of global statistical models of nuclear properties make use
419: of the RBF kernel (8), as well as a simplified version of what is
420: called ANOVA decomposition [12]. ANalysis Of VAriance (ANOVA) is a
421: scheme for imposing a structure on multi-dimensional kernels that are
422: generated from one-dimensional kernels, in a way that gives better control
423: over the capacity of the machine (as measured by the VC dimension).
424: An ANOVA kernel we have found to be well suited to the regression
425: problem posed by the nuclear (atomic) mass data is rooted in
426: the RBF kernel and has the form
427: $$
428: K({\bf x},{\bf x}_i) = \left(\sum_{l=1}^n \exp\left[-\gamma\left(
429: x^{(l)} - x_i^{(l)} \right)^2 \right] \right)^d \,,
430: $$
431: where the user-selected parameter $\gamma$ can take any positive
432: value and the power $d$ is usually an integer. We
433: shall call this the ANOVA kernel.
434: \vskip 28 truept
435: \centerline{\bf 3. SVM MODELS OF NUCLEAR MASS SYSTEMATICS}
436: \vskip 12 truept
437:
438: SVM regression models have been trained to predict $(\Delta M)c^2$
439: in MeV, where $\Delta M$ is the mass excess (or mass defect)
440: defined by the difference $M - A$ between the atomic mass $M$,
441: measured in amu, and the mass number $A$ of the nuclide in question.
442: In our initial study, we focus on a database given by the union
443: ${\rm O}\, \oplus {\rm N}\, \oplus {\rm NB}$ of three data sets. The first
444: consists of the set of 1323 ``old'' (O) experimental mass assignments
445: which the 1981 semi-empirical droplet-model mass formula of M\"oller
446: and Nix [13] was intended to reproduce. The second is a set of
447: 351 ``new'' (N) experimental mass assignments for nuclei that lie
448: mostly beyond the edges of the 1981 data (as viewed in the $N-Z$ plane).
449: In addition to the O and N sets, a set of 158 nuclides with more
450: recently measured masses (the NB set of ``even newer'' nuclides)
451: is employed in the modeling process. In earlier work [14-16,5], these
452: three data sets have been used to quantify the extrapolation capability
453: (the so-called extrapability) of different global mass models (based
454: either on nuclear theory or neural networks).
455:
456: The set ${\rm O}\, \oplus {\rm N} \, \oplus {\rm NB}$ is divided by a
457: random-sampling procedure into three nonoverlapping subsets, namely
458: a training set (80\%), a validation set (10\%), and a test set (10\%),
459: in the indicated approximate proportions. (In all work reported
460: in this paper, random samplings are drawn from a uniform
461: distribution.) Training, validation, and
462: test sets are each further subdivided into four subsets labeled EE,
463: EO, OE, and OO, composed respectively of nuclides belonging to the
464: four ``even-oddness'' classes: even-$Z$-even-$N$, even-$Z$-odd-$N$,
465: odd-$Z$-even-$N$, and odd-$Z$-odd-$N$. For convenience, values
466: of the input variables are encoded by a linear transformation that
467: scales and shifts given values of $Z$ and $N$ to lie in the interval
468: $[0,1]$. A similar linear transformation decodes the learning machine's
469: raw output, which lies in the interval $[-1,1]$, so as to provide an
470: estimate of the corresponding mass excess in MeV.
471:
472: Effectively, we divide the mass problem into four separate problems,
473: one for each of the four ``even-oddness'' classes in $Z$ and $N$.
474: In doing so, we are actually incorporating some domain knowledge into
475: the learning strategy. Distinctive quantum-mechanical features of nuclei,
476: abundantly supported by empirical evidence, include quantized angular
477: momenta, magic numbers, shell structure, and pairing energies,
478: all of which stem from the fact that $Z$ and $N$
479: are integers, even or odd.
480:
481: A SVM model is developed individually for each of the four nuclear classes
482: EE, EO, OE, and OO. SVM regression (with ANOVA-RBF specification of kernels)
483: is carried out separately for the respective training sets, thereby
484: constructing a predictive model whose reliability is judged by its
485: performance on the examples in the test set. Following established
486: practice, performance of each of the four models on its corresponding
487: validation set have been used to guide the final determination of the adjustable
488: parameters. Ideally, the test set should have {\it no} role in choosing
489: these parameters (although in some cases a weak influence is allowed).
490:
491: As is usual in global models of the atomic-mass table, the quality
492: of a given model is judged by the smallness of the root-mean-square (rms)
493: error $\sigma$ in the mass excess $\Delta M$, averaged over the data
494: set in question (training, validation, or test set for a given
495: class of nuclides). To be competitive, a model should have
496: values of $\sigma$ below 1 MeV. It should be noted however, that
497: only in a few cases has a rigorous test of predictive performance
498: been made for the traditional theoretical models of semi-empirical
499: character. (An important exception is found in the work of
500: M\"oller, Nix, and collaborators [15,16], who introduce the
501: notion of extrapability, which is equivalent to our
502: generalization.)
503:
504: Some of the better results obtained in the present exploratory
505: study are displayed in Table 1. The performance of these models, all
506: with RBF parameter $\gamma = 2.5$ and ANOVA degree $d=8$, is evidently
507: of high quality.
508:
509: \topinsert
510: \centerline{\bf{Table 1}}
511: \vskip 12 truept
512: \noindent
513: Performance of SVM global models of atomic mass. For all four models,
514: the RBF parameter $\gamma$ is 2.5 and the ANOVA degree is $d=8$. The
515: other SVM parameters have been defaulted at $C=0.1$ and $\varepsilon = 0.001$.
516: \vskip 27 truept
517:
518: \input tables.tex
519: \nrows= 6
520: \ncols= 7
521: \begintable
522: {} | & \quad Learning & Set \quad \quad \quad | & \quad Validation & Set
523: \quad \quad \quad | & \quad \quad \quad Test & Set \quad \quad \cr
524: Classes | & \# Nuclides & $\sigma$(MeV) | & \# Nuclides & $\sigma$(MeV) |
525: & \# Nuclides & $\sigma$(MeV) \cr
526: EE | & 381 & 0.58 | & 48 & 0.71 | & 48 & 0.99 \cr
527: EO | & 360 & 0.89 | & 45 & 0.68 | & 45 & 0.62 \cr
528: OE | & 371 & 0.70 | & 46 & 0.78 | & 46 & 0.88 \cr
529: OO | & 353 & 0.75 | & 44 & 0.74 | & 45 & 0.97
530: \endtable
531: \vskip 14 truept
532: \endinsert
533:
534: Similar learning experiments can be found among the studies
535: of Ref.~[5] based on multilayer perceptrons and modified
536: backpropagation training, although procedural differences
537: preclude direct comparisons of performance. The best model obtained
538: using O as the training set, NB as validation set, and N
539: as test set gave rms error figures on these sets of 0.71 MeV,
540: 2.28 MeV, and 2.16 MeV, respectively. Another strategy yielded
541: better results. The set ${\rm O}\, \oplus \, {\rm N}$ was first ``purified'' by
542: removing 20 nuclides with poorly measured masses. A random sample
543: M1 consisting of 1303 of the remaining 1654 examples (some 79\%) was
544: used as the training set. The complementary set, M2, played the
545: role of validation set, and the NB set was used for testing
546: the trained model. The best model found in this way produced
547: rms errors on the three sets of 0.44 MeV (M1), 0.44 Mev (M2),
548: and 0.95 MeV (NB). It should be noted that this level of
549: performance on the mass problem was achieved after more
550: than a decade of successive improvements in the choices of
551: architectures, coding schemes, and training algorithms.
552:
553: In addition to the four class-specific models SVM-EE, SVM-EO, SVM-OE,
554: and SVM-OO reported on in Table 1, we also constructed a single SVM
555: model (denoted SVM-S) using the full O data set as the training sample,
556: without making a distinction between EE, EO, OE, and OO nuclides.
557: In this case, the NB nuclei are used as a validation set, guiding
558: the determination of the RBF and ANOVA parameters. The parameters
559: associated with the SVM-S model are again $\gamma = 2.5$
560: and $d = 8$, along with $C= 0.1$ and $\varepsilon = 0.001$. This
561: model yields rms errors of 0.70 MeV on the training set O and
562: 0.75 MeV on the validation set NB, with a $\sigma$ value of
563: 1.41 MeV on the N nuclei, regarded as a test set. (These results
564: are erroneously cited in Ref.~[5].) A proper averaging over
565: the four nuclidic classes permits a comparison between the
566: SVM-S model and the four models represented in Table 1. The
567: composite performance of the latter models is then reflected
568: in $\sigma$ values of 0.73 MeV, 0.73 MeV, and 0.88 MeV
569: in training, validation, and testing, respectively.
570:
571: In some cases, meaningful comparisons may be drawn between the
572: performance of statistical mass models based on multilayer perceptrons
573: and support vector machines, and the traditional mass models based on
574: nuclear theory and phenomenology. Starting with the simple liquid-drop
575: model, such traditional theory-thick models have evolved over
576: seven decades to achieve a high degree of sophistication and
577: precision. For example, the 1992 FRDM model of M\"oller and Nix [15]
578: gives $\sigma$ values of 0.67 MeV on the O set (when fitted to
579: this set) and 0.74 MeV on the N set (a true measure
580: of predictive performance of the model). The more enhanced
581: FRDM model of Ref.~[16], which is fitted to the data set
582: ${\rm M1} \, \oplus \, {\rm M2}$, yields rms errors of 0.68 MeV (M1),
583: 0.71 MeV (M2), and 0.70 MeV (NB). The HFB2 model of Pearson
584: and collaborators [17] gives respective errors of 0.67 MeV,
585: 0.68 MeV, and 0.73 MeV. (We note that the result of Ref.~[17]
586: on the ``test set'' NB cannot be regarded as a prediction, since
587: the nuclei involved were used in adjusting model parameters.)
588:
589: With additional refinements, it is not unreasonable to expect
590: that SVM models can equal (and possibly surpass) the levels of robustness
591: and predictive accuracy achieved with theory-thick models and with
592: multilayer perceptron models. However, a conclusive statement
593: must await a thorough SVM study based on the recent AME03 mass
594: evaluation carried out by Audi {\it et al.}~[18]
595: \vskip 28 truept
596: \centerline{\bf 4. SVM MODELS OF BETA-DECAY HALFLIVES}
597: \vskip 12 truept
598:
599: \vbox{
600: We now turn to a second problem of regression in the statistical analysis
601: of nuclear properties via support vector machines, namely fitting and
602: prediction of the beta-decay halflives of nuclides $(Z,N)$ that decay 100\% via
603: the $\beta^-$ mode. The data for this problem have been culled
604: from the on-line repository at the Brookhaven National Nuclear Data
605: Center (http:$//$www.nndc.bnl.gov). The data employed are current to May
606: 2005 and consist of a total of 932 examples. Restricting
607: attention to examples with halflives below $10^6$ s leaves
608: 633 nuclides. When measured in seconds, the experimental values
609: of $T_{1/2}$ range over 26 orders of magnitude, so it is
610: more appropriate to regress $L = \log T_{1/2}$ instead of the
611: halflife itself, and to adopt the rms error $\sigma_L$ of the estimate
612: of $L$ as a figure of merit in learning, validation, and prediction
613: phases of the analysis.
614:
615: As in the case of the mass problem, separate SVM models are
616: constructed for EE, EO, OE, and OO classes of nuclides. However,
617: we make the simpler RBF choice of kernel, instead of pursuing
618: the more elaborate ANOVA option. (Implementation based on the
619: ANOVA decomposition is much more demanding in terms of
620: computer time.) Each of the four data subsets (EE, EO, OE, OO) is
621: subdivided into training, validation, and test sets in the
622: approximate proportions 80\%, 10\%, and 10\%, respectively.
623:
624: The results obtained from the SVM regressions are summarized in Tables 2
625: and 3. Table 2 gives the parameters and performance measures of the
626: models constructed for the full set of data, regardless of measured
627: lifetime. Table 3 displays the corresponding results when nuclides with
628: $T_{1/2} \geq 10^6$ s are removed from the database.
629:
630: A similar study [19] (see also Ref.~[20]) has been carried out
631: with multilayer feedforward neural networks trained by ``vanilla''
632: backpropagation, for data available in 1995 (766 examples in total)
633: However, this study did not employ the now-standard protocol in
634: which a validation set is used in making the final model selection.
635: Also, no subdivision into the four even-oddness classes was made.
636: Instead, the full data set (or the restricted set of examples with
637: $T_{1/2} < 10^6$~s) was split into a training set of approximately
638: 75\% of the examples and a test set consisting of the remainder.
639: }
640:
641: \topinsert
642: \centerline{\bf{Table 2}}
643: \vskip 12 truept
644: \noindent
645: Performance of SVM global models of $\beta$-decay halflives $T_{1/2}$
646: (including examples having $T > 10^6$ s). For all four models,
647: $C=1$ and $\varepsilon =0.001$.
648: \vskip 30 truept
649: \input tables.tex
650: \nrows=6
651: \ncols=8
652: \begintable
653: \| \quad Learning & Set \qquad | \quad Validation & Set \qquad
654: |\qquad Test & Set \qquad | RBF kernel \crthick
655: Classes \|\# Nuclides & $\sigma_L$ |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick
656:
657: EE \| ~137 & 2.88~ | ~16 & 3.61~ |~15 & 1.72~| 5.44 \crnorule
658: EO \| ~198 & 2.75~ | ~24 & 2.27~ |~22 & 2.17~| 7.27 \crnorule
659: OE \| ~187 & 2.37~ | ~22 & 2.76~ |~20 & 2.38~| 9.99 \crnorule
660: OO \| ~236 & 2.62~ | ~29 & 2.07~ |~26 & 2.96~| 9.55
661: \endtable
662: %\vskip 1.5truecm
663: \vskip 1.2truecm
664: \centerline{\bf{Table 3}}
665: \vskip 12 truept
666: \noindent
667: Performance of SVM global models of $\beta$-decay halflives (with a cutoff
668: at $10^6$ s). For all four models, $C=1$, $\varepsilon =0.001$.
669: \vskip 30 truept
670: \input tables.tex
671: \nrows=6
672: \ncols=8
673: \begintable
674: \| \quad Learning & Set \qquad | \quad Validation & Set \qquad
675: | \qquad Test & Set \qquad | RBF kernel \crthick
676: Classes \|\# Nuclides & $\sigma_L$ |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick
677:
678: EE \| ~96 & 1.34~ | ~11 & 0.52~ |~10 & 1.20~| 1.78 \crnorule
679: EO \| ~140 & 0.90~ | ~17 & 0.69~ |~15 & 1.22~| 9.97 \crnorule
680: OE \| ~122 & 1.55~ | ~14 & 0.63~ |~13 & 1.18~| 0.84 \crnorule
681: OO \| ~159 & 1.00~ | ~19 & 1.28~ |~17 & 1.34~| 8.87
682: \endtable
683: \vskip 14truept
684: \endinsert
685: \vskip 1.3truecm
686: %\vskip 1truecm
687:
688: Comparison of the rms errors shown in Tables 2 and 3 with the
689: corresponding performance figures from the earlier work [19,20] shows
690: an improvement (reduction) in rms error values by about a factor
691: 2, in both learning and prediction, for both the full and restricted
692: data sets. Comparison may also be made with results from
693: traditional nuclear theory (e.g.~Refs.~[21-23]). Since the
694: cited neural-network models could already attain performance in fitting
695: and prediction comparable to that exhibited by these theory-thick models,
696: we can say with some confidence that the SVM models are capable of a
697: predictive acuity superior to the best of the traditional global
698: models currently in play.
699: \vfill\eject
700:
701: We should also call attention to the greatly improved quality of
702: neural-network models of $\beta$-decay systematics, achieved in
703: very recent studies [24]. Data based on the AME03 evaluation
704: are divided into training, validation, and test sets in the
705: respective proportions 60\%, 20\%, and 20\%, both with and
706: without the restriction to halflives not greater than $10^6$~s,
707: but without subdivision into even-oddness classes. In the
708: case where the restriction is imposed, the best results
709: found for the error measure $\sigma_L$ are 0.55 (training),
710: 0.61 (validation), and 0.64 (prediction). The corresponding
711: averages for the model represented in Table 3 are 1.43, 0.89,
712: and 1.24, respectively, so further refinement of the SVM models
713: will be needed to match the perfomance of the best multilayer
714: perceptrons.
715: \vskip 28 truept
716:
717: \centerline{\bf 5. SVM MODELS OF GROUND-STATE SPINS AND PARITIES}
718: \vskip 12 truept
719:
720: In a third illustration of what is possible, the SVM approach is applied
721: to construct global statistical models of the ground-state spins and parities
722: of nuclei. (In this context, ``spin'' refers to the total angular momentum
723: quantum number $J$ of the nuclear state.) As in the exercises described
724: in Secs.~3 and 4, we again divide the nuclei under consideration into EE,
725: EO, OE, and OO classes. In the spin problem, this subdivision is of
726: obvious importance, since the law of angular momentum addition in
727: quantum mechanics dictates that the states of EE and OO nuclei can
728: only have integral values of $J$, whereas the spins of EO and OE
729: nuclei must be half-odd-integral. In fact, all EE nuclei are known to
730: have spin/parity $J^\pi = 0^+$. Clearly, we may exclude this class from
731: consideration, since its modeling is a trivial task for any viable
732: learning machine.
733:
734: The parity property of nuclear states presents the simplest kind
735: of classification problem, with two mutually exclusive outcomes, even
736: or odd. Moreover, because the spin quantum number $J$ is restricted
737: by quantum theory to a finite set of discrete values, global modeling
738: of spin systematics is also most efficiently treated, within the
739: SVM framework, as a problem of classification rather than function
740: approximation or regression. In our study, we consider
741: $J$ values ranging from 0 to 23/2 in half-odd-integral steps, the
742: integral values being available for OO nuclei and the half-odd integral
743: values, for EO and OE nuclei. This specification of the problem
744: may be construed as introducing some basic domain knowledge into the
745: model-building process.
746:
747: Data for the spin and parity nuclear ground states have been taken from
748: the on-line Brookhaven database. Based on simple RBF kernels, separate
749: SVM classifier models of these two properties have been developed for each
750: of the three nontrivial even-oddness cases.
751:
752: Let us first discuss our findings for the parity problem. In treating
753: this problem, the data for each of the cases EO, OE, and OO are divided
754: at random into training, validation, and test sets in the approximate
755: proportions 80\%, 10\%, and 10\%, respectively. Performance is measured in
756: terms of the percentages of correct classifications within these
757: subsets. The primary results are summarized in Table 4. It is apparent
758: that modeling parity is an easy task for SVMs. Judging from available
759: results [25,14], it is also relatively easy for neural networks
760: (although SVM performance is somewhat superior).
761:
762: For the models of Table 4, performance on the training sets is
763: perfect. If we are willing to make a small sacrifice in the quality
764: of reproduction of the input data, slightly better performance on the
765: validation and test sets can be achieved, as seen in Table 5.
766: It is interesting that this second model corresponds to a quite different
767: error minimum under variation of the parameter $\gamma$. In general,
768: there may be many such minima of similar depth.
769:
770: We have not yet conducted a full training-validation-test process
771: for the spin problem. Accordingly, we present only preliminary
772: results, which nevertheless are illuminating. In the first experiment
773: to be reported (see Table 6), each of the three spin data sets EE, OO, and OO is
774: divided randomly into {\it two} subsets, a training set and a
775: complementary second set. The training set contains approximately
776: 90\% of the examples of the given even-oddness class, and the second set, the
777: remaining $\sim 10$\%.
778:
779: \topinsert
780: \centerline{\bf{Table 4}}
781: \vskip 12 truept
782: \noindent
783: Performance of SVM global models of ground-state parity.
784: For all four models, $C=0.1$, $\varepsilon =0.01$. Model selection
785: is guided by best performance on the validation set, consistent with
786: a perfect score on the training set.
787: \vskip 27 truept
788: \input tables.tex
789: \nrows=5
790: \ncols=6
791: \begintable
792: \| \quad Learning & Set \qquad | \quad Validation & Set \qquad
793: |\qquad Test & Set \qquad | RBF kernel \crthick
794: Classes \|\# Nuclides & Score |\# Nuclides & Score |\# Nuclides&
795: Score|$\gamma$\crthick
796: EO \| ~474 & 100\%~ | ~58 & 93\%~ |~52 & 83\%~| 9.232 \crnorule
797: OE \| ~466 & 100\%~ | ~57 & 89\%~ |~51 & 90\%~| 9.482 \crnorule
798: OO \| ~434 & 100\%~ | ~53 & 87\%~ |~48 & 84\%~| 9.176
799: \endtable
800: \vskip 1truecm
801: \centerline{\bf{Table 5}}
802: \vskip 12 truept
803: \noindent
804: Performance of SVM global models of ground-state parity.
805: For all four models, $C=0.1$, $\varepsilon =0.01$. In this case,
806: model selection is guided by best performance on the validation
807: set, allowing for minimal nonzero error rate on the training set.
808: \vskip 27 truept
809: \input tables.tex
810: \nrows=5
811: \ncols=8
812: \begintable
813: \| \quad Learning & Set \qquad | \quad Validation & Set \qquad |
814: \qquad Test & Set \qquad | RBF kernel \crthick
815: Classes \|\# Nuclides & Score |\# Nuclides & Score |\# Nuclides& Score|$\gamma$\crthick
816: EO \| ~474 & 100\%~ | ~58 & 91\%~ |~52 & 83\%~| 0.678 \crnorule
817: OE \| ~466 & 95\%~ | ~57 & 84\%~ |~51 & 92\%~| 0.180 \crnorule
818: OO \| ~434 & 96\%~ | ~53 & 83\%~ |~48 & 86\%~| 0.240
819: \endtable
820: \vskip 14truept
821: \endinsert
822: \vskip 1truecm
823:
824: \topinsert
825: \centerline{\bf{Table 6}}
826: \vskip 12 truept
827: \noindent
828: Performance of SVM global models of nuclear ground-state spin.
829: For all three models, $C=0.1$, $\varepsilon =0.01$. Model selection
830: is guided by best on performance on the validation set, consistent with a
831: perfect score on the training set.
832: \vskip 27 truept
833: \input tables.tex
834: \nrows=5
835: \ncols=6
836: \begintable
837: \| \quad \quad Learning & Set \qquad | \quad Validation/Test & Set \qquad \
838: | RBF kernel \crthick
839: Classes \|\# Nuclides & Score |\# Nuclides & Score | $\gamma$ \crthick
840: EO \| ~528 & 100\%~ | ~58 & 81\%~ | 9.217 \crnorule
841: OE \| ~522 & 100\%~ | ~57 & 68\%~ | 9.001 \crnorule
842: OO \| ~488 & 100\%~ | ~54 & 43\%~ | 4.002
843:
844: \endtable
845: \vskip 14truept
846: \endinsert
847:
848: \noindent
849: The second set is used to help pin down the RBF parameter
850: $\gamma$ and thereby plays a role in model selection. Hence it must be
851: interpreted as a validation set. SVM models are constructed for a range
852: of $\gamma$ values, and the model whose $\gamma$ value produces the
853: lowest error on the second data set (while scoring 100\% on the
854: training set) is selected. There is no real test set in this experiment.
855: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
856: \topinsert
857: \centerline{\bf{Table 7}}
858: \vskip 12 truept
859: \noindent
860: Performance of SVM global models of nuclear ground-state spin.
861: For all three models, $C=0.1$, $\varepsilon =0.01$. The
862: parameter $\gamma$ is fixed at the value determined for Table 6.
863: The test set influences model choice only indirectly.
864: \vskip 27 truept
865: \input tables.tex
866: \nrows=5
867: \ncols=6
868: \begintable
869: \| Learning & Set | Validation & Set |Test & Set | RBF kernel \crthick
870: Classes \|\# Nuclides & Score |\# Nuclides & Score |\# Nuclides& Score~|$\gamma$\crthick
871: EO \| ~476 & 100\%~ | ~58 & 79\%~ |~52 & 60\%~~| 9.217 \crnorule
872: OE \| ~470 & 100\%~ | ~57 & 61\%~ |~52 & 79\%~~| 9.001 \crnorule
873: OO \| ~440 & 100\%~ | ~54 & 39\%~ |~48 & 38\%~~| 4.002
874: \endtable
875: \vskip 14truept
876: \endinsert
877:
878: In an alternative experiment, we have implemented a protocol intermediate
879: between the training-validation scheme leading to Table 6, and the full
880: training-validation-test procedure. The data for each of the three
881: even-oddness classes involved are divided into three subsets as follows.
882: The second subset is taken to be identical to the second subset formed
883: in the first experiment. The first subset, used as the training set,
884: consists of 80\% of the examples for the class in question, these being
885: chosen at random from the corresponding training set created in
886: the first experiment. The 10\% that are
887: not so chosen constitute the third subset, which is regarded as
888: a test set. Then, using the {\it same} parameter $\gamma$ as
889: determined in the first experiment with the aid of the second
890: subset, new SVM models are developed from the examples in the reduced
891: training set. These models are used to generate spin values for both
892: second and third subsets -- values which may differ from those given by the
893: models developed in the first experiment (see Table 7). Although
894: it is not legitimate to interpret the third subset as a test set in the
895: purest sense, its influence on model selection is indirect.
896:
897: From the results shown in Tables 6 and 7, one may plausibly infer
898: that support vector machines can perform very well on the
899: problem of predicting nuclear ground-state spins. While further
900: experiments are needed to affirm this conclusion, it is already
901: of interest to compare our SVM models with other global models
902: of nuclear spin systematics. Global nuclear structure calculations
903: within the macroscopic/microscopic approach [26] reproduce
904: the ground-state spins of odd-$A$ nuclei with an accuracy of
905: 60\% (agreement being found in 428 examples out of 713).
906: (In this work, there is no clear distinction between fitting
907: and prediction, or between training, validation, and test
908: sets.) Multilayer feedforward neural networks do somewhat
909: better [25,14]. Averaging over results of three experiments involving
910: nets having a single hidden layer and trained with backpropagation,
911: the performance for odd-$A$ nuclei reaches 62\% on what are
912: effectively validation sets, the training sets being
913: reproduced to an accuracy of 93\%. In an experiment in which
914: the connection weights of feedforward nets with one hidden layer
915: are determined by a conjugate gradient procedure, performance
916: at the level of 99.5\% on the training set and 73.2\% on
917: a validation set has been achieved for OE nuclei. The
918: spins of odd-odd nuclei are notoriously difficult to predict.
919: This is reflected in the performance figures of neural-network
920: (perceptron) models on the OO category, which are typically
921: 75\% correct on training-set examples and only 15\% in validation or
922: testing.
923:
924: Placed in the context of earlier work, both statistical
925: and phenomenological, the results in Tables 6--7
926: for the first SVM models of nuclear spin speak for themselves.
927: \vskip 28truept
928:
929: \centerline{\bf 6. CONCLUDING REMARKS}
930: \vskip 12truept
931:
932: We have made initial studies of the potential of support vector machines (SVM)
933: for providing statistical models of nuclear systematics with demonstrable
934: predictive power. Using SVM regression and classification procedures,
935: we have created global models of atomic masses, beta-decay halflives,
936: and ground-state spins and parities. These models exhibit performance
937: in both data-fitting and prediction that is comparable to that of
938: the best global models from nuclear phenomenology and microscopic theory,
939: as well as the best statistical models based on multilayer feedforward
940: neural networks. Further work to develop the scope, acuity, and reliability
941: of SVM applications to nuclear physics seems to be warranted. In particular,
942: the full body of data in the AME03 atomic-mass evaluation [18] must be brought to
943: bear in construction of SVM models of mass systematics, and the treatment
944: of the spin problem begun here needs to be completed. Fruitful applications
945: to nucleon separation energies, $\alpha$-decay halflives,
946: branching ratios of nuclear decay, nuclear deformations, neutron
947: cross sections, and other nuclear properties may also be on the horizon.
948: \vskip 28truept
949:
950: \centerline{\bf ACKNOWLEDGMENTS}
951: \vskip 12 truept
952: This research was supported in part by the U.~S.~National
953: Science Foundation under Grant No.~PHY-0140316. For the regression
954: problems, we made use of the on-line mySVM software and instruction manual
955: of Stefan R\"uping (Dortmund) [27], and for classification problems
956: we implemented the SVM-multiclass software of Thorsten Joachims
957: (Cornell) [28].
958: \vskip 28truept
959:
960: \centerline{\bf REFERENCES}
961: \vskip 12 truept
962: \item{[1]}
963: D.~E.~Rumelhart, G.~E.~Hinton, and R.~J.~Williams, in {\it
964: Parallel Distributed Processing: Explorations in the Microstructure
965: of Cognition}, Vol.~1, edited by D.~E.~Rumelhart {\it et al.} (MIT Press,
966: Cambridge, MA, 1986).
967: \item{[2]}
968: J.~Hertz, A.~Krogh, and R.~G.~Palmer, {\it Introduction to the Theory
969: of Neural Computation} (Addison-Wesley, Redwood City, CA, 1991).
970: \item{[3]}
971: S.~Haykin, {\it Neural Networks: A Comprehensive Foundation}, Second
972: Edition (McMillan, New York, 1999).
973: \item{[4]}
974: J.~W.~Clark, T.~Lindenau, and M.~L.~Ristig, {\it Scientific Applications
975: of Neural Nets} (Springer-Verlag, Berlin, 1999).
976: \item{[5]}
977: S.~Athanassopoulos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,
978: {\it Nucl.~Phys.~A} {\bf 743}, 222 (2004).
979: \item{[6]}
980: C.~Cortes and V.~Vapnik, {\it Machine Learning} {\bf 20}, 273 (1995).
981: \item{[7]}
982: V.~N.~Vapnik, {\it The Nature of Statistical Learning Theory} (Springer-Verlag,
983: New York, 1995).
984: \item{[8]}
985: V.~N.~Vapnik, {\it Statistical Learning Theory} (Wiley, New York, 1998).
986: \item{[9]}
987: V.~N.~Vapnik, in {\it Advances in Neural Information Processing Systems},
988: Vol.~4 (Morgan Kaufmann, San Mateo, CA, 1992), p.~831.
989: \item{[10]}
990: J.~Mercer, {\it Transactions of the London Philosophical Society (A)} {\bf 209},
991: 415 (1909).
992: \item{[11]}
993: V.~N.~Vapnik and A.~Ya.~Chervonenkis, in {\it Theoretical Probability
994: and Its Applications} {\bf 17}, 264 (1971).
995: \item{[12]}
996: M.~O.~Stitson, A.~Gammerman, V.~Vapnik, V.~Vovk, C.~Watkins, and J.~Weston,
997: in {\it Advances in Kernel Methods -- Support Vector Learning},
998: %#JWC: Check Sch\"ukopf - may be Sch\"okopf.
999: edited by B. Sch\"ukopf, C.~Burges, and A.~J.~Smola
1000: (MIT Press, Cambridge, MA, 1999), p.~285.
1001: \item{[13]}
1002: P.~M\"oller and J.~R.~Nix, {\it At.~Data~Nucl.~Data Tables} {\bf 26},
1003: 165 (1981).
1004: \item{[14]}
1005: K.~A.~Gernoth, J.~W.~Clark, J.~S.~Prater, and H.~Bohr, {\it Phys.~Lett.} {\bf B300},
1006: 1 (1993).
1007: \item{[15]}
1008: P.~M\"oller and J.~R.~Nix, {\it J.~Phys.~G} {\bf 20}, 1681 (1994).
1009: \item{[16]}
1010: P.~M\"oller, J.~R.~Nix, W.~D.~Myers, and W.~J.~Swiatecki,
1011: {\it At.~Data Nucl.~Data Tables} {\bf 59}, 185 (1995).
1012: \item{[17]}
1013: M.~Samyn, S.~Goriely, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,
1014: {\it Nucl.~Phys.} A {\bf 700}, 142 (2002);
1015: S.~Goriely, M.~Samyn, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,
1016: {\it Phys.~Rev.~C} {\bf 66}, 024326 (2002).
1017: \item{[18]}
1018: A.~H.~Wapstra, G.~Audi, and C.~Thibault, {\it Nucl.~Phys.~A} {\bf 729}, 337
1019: (2003).
1020: \item{[19]}
1021: E.~Mavrommatis, A.~Dakos, K.~A.~Gernoth, and J.~W.~Clark, in {\it Condensed
1022: Matter Theories}, Vol. 13, edited by J.~da Providencia and F.~B.~Malik
1023: (Nova Science Publishers, Commack, NY, 199), p.~423.
1024: \item{[20]}
1025: J. W. Clark, E. Mavrommatis, S. Athanassopoulos, A. Dakos, and
1026: K. A. Gernoth, {\it Fission Dynamics of Atomic Clusters and Nuclei},
1027: edited by D.~M.~Brink, F.~F.~Karpechine, F.~B.~Malik, and J.~da Providencia
1028: (World Scientific, Singapore, 2001), p.~76. [nucl-th/0109081]
1029: \item{[21]}
1030: A.~Staudt, E.~Bender, K.~Muto, and H.~V.~Klapdor,
1031: {\it At.~Data Nucl.~Data Tables} {\bf 44}, 80 (1990).
1032: \item{[22]}
1033: H.~Homma, E.~Bender, M.~Hirsch, K.~Muto, H.~V.~Klapdor-Kleingrothaus,
1034: {\it Phys.~Rev.~C} {\bf 54}, 2972 (1996).
1035: \item{[23]}
1036: P.~M\"oller, J.~R.~Nix, and K.~L.~Kratz,
1037: {\it At.~Data Nucl.~Data Tables} {\bf 66}, 131 (1997).
1038: \item{[24]}
1039: N.~Costiris, A.~Dakos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,
1040: to be published.
1041: \item{[25]}
1042: J.~W.~Clark, S.~Gazula, K.~A.~Gernoth, J.~Hasenbein, J.~S.~Prater,
1043: and H.~Bohr, in {\it Recent Progress in Many-Body Theories},
1044: Vol.~3, edited by T.~L.~Ainsworth, C.~E.~Campbell, B.~E.~Clements,
1045: and E.~Krotscheck (Plenum, New York, 1992), p.~371.
1046: \item{[26]}
1047: P.~M\"oller and J.~R.~Nix, {\it Nucl.~Phys.~A}~{\bf 520}, 369c (1990).
1048: \item{[27]}
1049: S. R\"uping, mySVM,
1050: %${\rm http://www}$-ai.cs.uni-dortmund.de${\rm /SOFTWARE/MYSVM/}$
1051: http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/
1052: (2004).
1053: \item{[28]}
1054: T. Joachims (2004), Multi-Class Support Vector Machine,
1055: %${\rm http://www.cs.cornell.edu/}$ People/tj/svm\_light/svm\_multiclass.html (2004).
1056: http://www.cs.cornell. edu/People/tj/svm\_light/svm\_multiclass.html (2004).
1057: \bye
1058: