0506:nucl-th0506080/li.tex

1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2: %%                                                                       %%

3: %%                                                                       %%

4: %%     Plain-TeX Template for Camera-Ready Manuscript Preparation        %%

5: %%                                                                       %%

6: %%              CMT28 (St. Louis) Workshop Proceedings                   %%

7: %%                                                                       %%

8: %%                Vol. 20, Condensed Matter Theories                     %%

9: %%                                                                       %%

10: %%                     Nova Science Publishers                           %%

11: %%                                                                       %%

12: %%                                                                       %%

13: %%               (Prepared by J. W. Clark, October 2002)                 %%

14: %%                                                                       %%

15: %%                                                                       %%

16: %%                                                                       %%

17: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

18: %%

19: %%    Note:  This template is actually my own joint paper for the

20: %%           Vanderbilt (CMT22) proceedings (with unrelated inserts

21: %%           that show how to do two tables & and two figures).

22: %%

23: %%           The template  includes all the essential elements

24: %%           such as title and by-lines, section headings and

25: %%           associated spacing specifications, figure captions

26: %%           & commands for embedded figures, references of all

27: %%           types, and a good sampling of fairly complicated

28: %%           equations.  If there are remaining ambiguities, please

29: %%           question me by e-mail at:

30: %%

31: %%                          jwc@wustl.edu

32: %%

33: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

34: %%

35: %%   IF YOU CHOOSE NOT TO USE THIS TEMPLATE, BUT INSTEAD USE ANOTHER

36: %%   WORD PROCESSING SCHEME (LaTeX, Word, etc.), YOUR TROUBLES ARE

37: %%   STILL NOT OVER, SINCE YOU WILL STILL NEED TO MATCH -- AS EXACTLY

38: %%   AS POSSIBLE -- THE plain-TeX OUTPUT FROM THIS TEMPLATE, WITH

39: %%   RESPECT TO ALL ASPECTS OF THE FORMAT, INCLUDING FONTS, SPACINGS,

40: %%   TABLES, FIGURES, REFERENCES, ETC.  CURRENT STANDARDS OF PUBLICATION

41: %%   REQUIRE A UNIFORM APPEARANCE FOR CONTRIBUTIONS TO CONFERENCE

42: %%   PROCEEDINGS VOLUMES.  This can be very time-consuming.

43: %%

44: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

45: %%

46: %%     In addition to this template, you will need to put the following

47: %%     files (provided!):

48: %%

49: %%     tables.tex - needed to construct tables

50: %%     psfig.sty - needed to embed postscript figures

51: %%     fig1.ps & fig2.eps -  the sample postscript figures to be

52: %%               embedded in the paper - replaced these by your

53: %%               postscript figures as required

54: %%

55: %%     You must put these files in the same directory as the main TeX

56: %%     file of your paper.

57: %%

58: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

59: %%

60: %%    The standard command for generating a ".dvi" file of this TeX

61: %%    file (which is called 26template.tex) is

62: %%

63: %%              tex 26template.tex

64: %%

65: %%    The command for printing out the .dvi file, 26template.dvi, on your

66: %%    laser printer will generally be site-dependent, but what basically

67: %%    needs to be done is to convert the .dvi file to a .ps (postscript)

68: %%    file, say, with the dvips command

69: %%

70: %%              dvips -o 26template.ps 26template.dvi

71: %%

72: %%    and then print with

73: %%

74: %%              lpr -Pprintername 26template.ps

75: %%

76: %%

77: %%    The typefont of the paper should be "computer modern".  This

78: %%    is the default font in plain TeX.  If your system is set up

79: %%    for another font, please switch back to the computer modern

80: %%    (cm) fonts.

81: %%

82: %%    Unfortunately, the same TeX file may not print out identically

83: %%    at all sites.  Therefore you may find it necessary in same cases

84: %%    to make minor adjustments of page and line breaks.

85: %%

86: %%    Examples of page-break and line-break adjustments are given in

87: %%    this template, but it is suggested that you not worry about such

88: %%    details until the need actually arises, as indicated by your TeX

89: %%    output.

90: %%

91: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

92: %%

93: %%  Comment lines, which begin with the percent symbol %, are not printed.

94: %%  To make a printable % sign in text, use \%.

95: %%

96: %%  Style comment:  American, not English, convention is to be followed

97: %%  for a list of items or names, e.g. red, blue, and green; Peter, Paul,

98: %%  and Mary -- there is a comma before "and".  Also follow American

99: %%  conventions for spelling.

100: %%

101: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

102: %%

103: %%  As an aid to those who received travel support from the U.S. Army

104: %%  research office, a suitable statement to this effect is included

105: %%  in the ACKNOWLEDGMENTS section of the template.

106: %%

107: %%

108: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

109: %%

110: %%

111: %%The following three lines specify type size, dimensions of text (23.5cm

112: %%by 15.5cm), and line spacing.  The spacing is slightly looser than

113: %%single space (which would be \baselineskip = 12 truept) to allow room

114: %%for embedded symbols with superscripts and subscripts.

115: \magnification=\magstep1

116: \font\bigbfont=cmbx10 scaled\magstep1

117: \font\bigifont=cmti10 scaled\magstep1

118: \font\bigrfont=cmr10 scaled\magstep1

119: \vsize = 23.5 truecm

120: \hsize = 15.5 truecm

121: \hoffset = .2truein

122: \baselineskip = 14 truept

123: \overfullrule = 0pt

124: \parskip = 3 truept

125: \def\frac#1#2{{#1\over#2}}

126: \def\diff{{\rm d}}

127: \def\bfk{{\bf k}}

128: \def\eps{\epsilon}

129: \nopagenumbers

130: %%This command suppresses the printing of page numbers.

131: %%You should number the pages with blue pencil in upper right corner

132: %%if you send camera-ready copy.  Of course, submission by email

133: %%to cmt28@wuphys.wustl.edu is preferred!!

134: %%

135: %%THE FOLLOWING THREE COMMANDS LEAVE SOME SPACE AT THE TOP OF THE LEAD PAGE.

136: %%(the command "\vskip 4 truecm" actually results in about 4.5 cm of empty

137: %%space at the top, or about 19.6%).  The publisher will probably reset the

138: %%chapter heading (your title and by-line), but you should follow my

139: %%19-20% prescription anyway!  In my design I am following the Les Houches

140: %%lecture notes volume produced by Nova.   If you have LOTS of authors and

141: %%by-lines you may want to allow a bit less space at the top (e.g. if you

142: %%have 3 or more sets of authors and institutions).

143: \topinsert

144: \vskip 3.2 truecm

145: \endinsert

146: \centerline{\bigbfont MODELING NUCLEAR PROPERTIES}

147: %%If your title is longer than one line, continue thus:

148: \vskip 8 truept

149: \centerline{\bigbfont WITH SUPPORT VECTOR MACHINES}

150: %%Don't forget to remove the % signs from the 2 preceding lines if you

151: %%use them to lengthen the title!

152: \vskip 20 truept

153: %%Now comes your by-line with institutional addresses.

154: \centerline{\bigifont H. Li and J. W. Clark}

155: \vskip 8truept

156: \centerline{\bigrfont McDonnell Center for the Space Sciences

157: and Department of Physics}

158: \vskip 2 truept

159: \centerline{\bigrfont Washington University, St. Louis, Missouri 63130, USA}

160: %%In case of multiple institutions, use the following lines, iterated

161: %%as necessary.

162: \vskip 11 truept

163: \centerline{\bigifont E. Mavrommatis and S.~Athanassopoulos}

164: \vskip 8 truept

165: \centerline{\bigrfont Physics Department, Division of Nuclear \&

166: Particle Physics}

167: \vskip 2 truept

168: \centerline{\bigrfont University of Athens, GR-15771 Athens, Greece}

169: \vskip 11 truept

170: \centerline{\bigifont K. A. Gernoth}

171: \vskip 8 truept

172: \centerline{\bigrfont Department of Physics, University of Manchester}

173: \vskip 2 truept

174: \centerline{\bigrfont Manchester M13 9PL, United Kingdom}

175:

176: \vskip 1.8 truecm

177:

178: \centerline{\bf 1.  INTRODUCTION}

179: \vskip 12 truept

180: Artificial neural networks and other machine-learning strategies can provide

181: a valuable complement to theory-driven models of the systematics of nuclear data.

182: A significant effort to exploit the potential of data-driven methodologies

183: receives strong motivation from the current thrust toward experimental and

184: theoretical exploration of nuclei far from stability.  It is made possible

185: by the availability of a growing body excellent experimental data on

186: nuclear species numbering in the thousands.  In outline, statistical models

187: based on supervised learning are developed as follows.  Suppose, for

188: example, we wish to predict the atomic mass $M$ of a nuclear species, or

189: nuclide, specifying only its mass number $A$ and atomic number $Z$,

190: or alternatively its proton and neutron numbers $(Z,N)$.  A learning

191: machine has an input interface where $(Z,N)$ are fed to the device in

192: coded form and an output interface where an estimate of the

193: mass appears for decoding.  In between there is a system or network of

194: interconnected elements that acts to process the incoming information and

195: produce an appropriate output.  These processing elements may resemble biological

196: neurons, receiving signals from other units through weighted connections,

197: and displaying nonlinear response to summed input signals.  Given

198: a body of training data to be used as examples of the desired

199: mapping, in this case $(Z,N) \to M$, a suitable learning algorithm

200: is used to adjust the parameters of the network, e.g., the weights of

201: the connections between the processing elements, so that the learning

202: machine (i) generates responses at the output interface that reproduce,

203: or closely fit, the atomic masses of the training nuclei, and (ii) serves

204: as a reliable predictor of the masses of test nuclei absent from the

205: training set.  This second requirement is a strong one -- the system

206: should not merely serve as a lookup table for masses of known nuclei;

207: it should also perform well in the much more difficult task of

208: generalization.

209:

210: The last two decades have seen much activity and considerable progress

211: in the development and application of supervised learning machines

212: of the type described -- which are designed to learn by example.

213: The most popular implementation is the multilayer feedforward

214: neural network (or multilayer perceptron), taught by the backpropagation

215: learning algorithm in one or another of its many variations [1-3].

216: A significant measure of success has been achieved in constructing

217: global models of nuclear properties based on such neural networks,

218: with applications to atomic masses, neutron separation

219: energies, spins and parities of nuclear ground states, stability versus

220: instability, branching ratios for different decay modes, and

221: beta-decay lifetimes.  (For reviews, see Ref.~[4], and for recent

222: results on atomic-mass prediction, see Ref.~[5].)

223:

224: The support vector machine (SVM) [6-8], a principled and powerful approach

225: to problems in classification and nonlinear regression, came on the

226: scene in the 1990s.  It has become a standard tool in statistical

227: modeling, and for many problems it is considered the method of

228: choice.  We have begun to explore the promise of SVMs for modeling

229: and prediction of nuclear properties.  The first results of this

230: effort are reported here.

231:

232: Section 2 provides an introduction to support vector machines and the

233: ANOVA decomposition that facilitates their effective implementation.

234: Section 3 summarizes the results obtained for the atomic-mass problem,

235: and compares the predictive performance of the SVM models with

236: that of multilayer backpropagation networks and state-of-the-art

237: ``theory-thick'' models.  Additional results and comparisons for

238: beta-decay halflives and for ground-state spins and parities are

239: presented in Secs.~4 and 5, respectively.  Concluding remarks

240: are made in Sec.~6.

241: \vskip 28 truept

242:

243: \centerline{\bf 2.  SUPPORT VECTOR MACHINE AND ANOVA DECOMPOSITION}

244: \vskip 12 truept

245:

246: The support vector machine (SVM), pioneered by Vapnik [6-8], may be viewed

247: as an approximate realization of the goal of structural risk minimization

248: [9,3].  Let $({\bf x}_1,y_1),...,({\bf x}_P,y_P)$ be a set of training

249: data drawn from a function $y=f({\bf x})$.  Here, ${\bf x}$ is the input

250: variable, a vector of dimension $n$, while $y$ is the output variable,

251: a unique real number for given ${\bf x}$.  (In the example considered

252: in Sec.~1, ${\bf x}$ is a vector formed from the two components $Z$ and

253: $N$, while $y$ is the mass $M$.)   The support vector machine is based

254: on a suitable nonlinear mapping ${\bf x} \to \varphi({\bf x})$ from the

255: input space to a feature space of higher dimension $m > n$.

256:

257: Applied to the task of regression, the SVM

258: learning strategy begins by posing an approximation $\hat y$ to the output $y$

259: as a linear combination of certain basis functions $\varphi_i({\bf x})$

260: in the feature space, with corresponding linear weights connecting

261: the feature space to the output space.

262: Thus,

263: $$

264: {\hat y} = {\hat f}({\bf x},{\bf w}) = \sum_{j=1}^m w_j \varphi_j({\bf x})\,,

265: \eqno(1)

266: $$

267: where ${\bf w}$ is an $m$-dimensional vector composed of weights

268: $w_j$, $j=1,\ldots,m$.  (A bias term $b$ may be included in Eq.~(1)

269: by starting the sum at $j=0$ and introducing $w_0 \equiv b$ and

270: $\varphi_0({\bf x}) \equiv 1$.) To determine the image vectors $\varphi_j({\bf x})$

271: and their weights $w_j$, consider an $\epsilon$-insensitive loss function

272: defined, for input ${\bf x}$, by

273: %$$

274: %|y - {\hat f}({\bf x},{\bf w})|_\epsilon  =

275: %=\cases { 0 & if $| y - {\hat f}({\bf x}, {\bf w}) | < \epsilon$ \,, \cr

276: %                    $|y-{\hat f}({\bf x},{\bf w})| - \epsilon$ & otherwise \,,

277: %\cr}

278: %$$

279: $y - {\hat f}({\bf x},{\bf w}) - \epsilon$ in case the magnitude of the error

280: $y - {\hat f}$ exceeds a tolerance $\epsilon$, and taken zero otherwise.

281: The tolerance parameter $\epsilon$ is at the disposal of the machine's

282: user.  The {\it primal} optimization problem then becomes one of minimizing the overall loss (or cost function, or empirical risk), as given by the

283: sum of the individual losses for all the training patterns,

284: $$

285: E_\epsilon ({\bf w}) =  \sum_{i=1}^P \left|y_i-{\hat f}({\bf x}_i,{\bf w})

286: \right|_\epsilon \,, \eqno(2)

287: $$

288: subject to the inequality $\sum_{j=1}^m w_j^2 < c_0$, where $c_0$ is

289: a user-determined constant.

290:

291: Vapnik has shown that an equivalent solution of this constrained

292: optimization problem can be obtained by solving the corresponding {\it dual

293: problem}, which may be stated as follows [3].

294: \item{1.}

295: Choose a kernel of the form

296: $$

297: K({\bf x},{\bf x}_i) = \sum_{j=1}^m \varphi_j({\bf x})\varphi_j({\bf x}_i) \,,

298: \eqno(3)

299: $$

300: symmetrical in its vector arguments and continuous in their components,

301: and qualifying as an inner product in some space, so as to meet the

302: conditions of Mercer's theorem [10,3].

303: \item{2.}

304: Given the training sample $\{ ({\bf x}_i,y_i) \}$, $i=1, \ldots, P$,

305: assemble the convex functional

306: $$

307: Q(\{\alpha_i,\alpha_i'\}) = \sum_{i=1}^P y_i(\alpha_i - \alpha_i')

308: -\epsilon \sum_{i=1}^P (\alpha_i + \alpha_i')

309: - {1 \over 2} \sum_{i=1}^P \sum_{l=1}^P ( \alpha_i - \alpha_i')

310: (\alpha_l - \alpha_l') K({\bf x}_i, {\bf x}_l) \,. \eqno(4)

311: $$

312: \item{3.}

313: Maximize $Q$ subject to the constraints

314: $$

315: \sum_{i=1}^P (\alpha_i - \alpha_i') = 0\,, \qquad 0 \leq \alpha_i\,,\,\alpha_i'

316: \leq C \,, \eqno(5)

317: $$

318:

319: \noindent

320: where $C$ is a user-determined constant.

321: The optimal approximating function then takes the forms

322: $$

323: {\hat f}_{\rm opt}({\bf x},{\bf w}) = {\bf w}^T{\bf w}

324: = \sum_{i=1}^{P} (\alpha_i - \alpha_i') K({\bf x},{\bf x}_i) \,, \eqno(6)

325: $$

326: where ${\bf w}^T$ the transform of the column vector ${\bf w}$.

327: The subset of training patterns $i$ for which $\alpha_i - \alpha_i'$

328: does not vanish then defines the {\it support vectors} of the machine,

329: corresponding to the training examples that are the most

330: salient to solution of the problem.

331:

332: The parameters $\epsilon$ and $C$ provide the user with control over

333: the complexity of the machine, as measured by the so-called VC dimension

334: [11,3], and hence over its performance in generalization.

335: Careful tuning of these parameters is necessary.

336:

337: Different choices for the inner-product kernel $K({\bf x},{\bf x}_i)$ yield

338: different versions of the support vector machine.  The most popular are (i) the

339: polynomial learning machine, corresponding to

340: $$

341: K({\bf x},{\bf x}_i) =

342: ({\bf x}^T{\bf x}_i + 1)^p  \eqno(7)

343: $$

344: (with user-selected power $p$),

345: (ii) the radial-basis function (RBF) network, corresponding to

346: $$

347: K({\bf x},{\bf x}_i) =\exp \left( - \gamma ||{\bf x} -{\bf x}_i ||^2\right) \eqno(8)

348: $$

349: (with user-selected width parameter $\gamma$), and (iii) the

350: two-layer perceptron [1-3], with

351: $$

352: K({\bf x},{\bf x}_i) =\tanh (\beta_1 {\bf x}^T{\bf x}_i + \beta_2)  \eqno(9)

353: $$

354: (freedom in setting the parameters $\beta_1$ and $\beta_2$ being

355: restricted by Mercer's theorem).

356:

357: We are most interested in creating predictive statistical models

358: capable of estimating a real-valued function $f({\bf x})$ from given

359: values for its independent variables comprising ${\bf x}$.  For that

360: reason, we have outlined the design of SVMs for solving problems

361: of nonlinear regression.  However, the support vector machine was originally

362: introduced to solve yes/no classification problems, and applied to problems

363: in which positive and negative cases are either separable by a hyperplane in the

364: input space (trivial), or not (nontrivial).  For problems that are

365: not linearly separable in this sense, the input vectors are mapped

366: nonlinearly into a higher-dimensional feature space, in which separation

367: by a hyperplane becomes possible.  The principle of structural risk

368: minimization then dictates that an {\it optimal} hyperplane be sought in

369: this space, such that the margin of separation between positive and negative

370: cases is minimized.  It is known [7,8] from general learning theory that the

371: error rate of a learning machine on test data (i.e., in generalization or prediction)

372: is bounded by the sum of two terms, namely the error rate on the training

373: data and a term involving the VC dimension.  For a linearly separable

374: problem treated by a SVM, the first term is zero and the second is

375: minimized.  Thus, good generalization is achieved even without

376: building into the model any explicit knowledge about the problem to be

377: solved, beyond the raw training data.  This desirable feature is maintained

378: approximately in application of SVMs to nonseparable classification

379: problems and to the generically more difficult problems of regression.

380:

381: The support vector machine may be broadly viewed as a kind of

382: feedforward neural network, in that the inner-product kernels

383: $K({\bf x},{\bf x}_i)$ provide a layer of hidden units that effect

384: nonlinear processing of the inputs and provide weighted linear outputs,

385: which are summed by an output unit.  As seen above, the familiar structures of

386: radial-basis-function networks and perceptrons with one hidden layer can

387: be realized as special cases by suitable choices of kernel, as specified

388: above.  But a support vector machine does more: it also embodies an algorithm

389: that automatically determines the number of hidden units appropriate to

390: the problem at hand, whatever the choice of kernel.  This more general

391: scope of the SVM approach stands in contrast to the backpropagation

392: learning algorithm [1-3], which is designed especially for training

393: multilayer perceptrons.

394:

395: In addition to the benefits already mentioned, the support

396: vector machine offers other significant advantages over the

397: more traditional approaches to supervised learning based

398: on neural networks, which involve dependence on trial

399: and error, rules of thumb, and heuristics.  The support

400: vector machine offers a generic way to control model complexity.

401: The curse of dimensionality is overcome by the pivotal strategy

402: of introducing an inner-product kernel conforming to Mercer's

403: theorem and solving the constrained optimization problem in its dual

404: version, thereby determining the dimension of the feature space

405: as the number of support vectors distilled from the training set.

406: The procedure naturally incorporates regularization.  The

407: use of the $\epsilon$-insensitive cost function (2) in the

408: regression application lends robustness to the machine by avoiding

409: certain drawbacks of the least-square estimator employed in

410: the backpropagation learning algorithm (e.g., sensitivity to outliers

411: and to distributions with additive noise having a long tail).

412: Importantly, the SVM is guaranteed to find a global minimum of

413: the error surface.  For a more detailed and systematic development

414: of the properties of SVMs, the reader is directed to Haykin's excellent

415: text [3], as well as the authoritative monographs of Vapnik [7,8].

416:

417: Our investigations of the potential of support vector machines for

418: the design of global statistical models of nuclear properties make use

419: of the RBF kernel (8), as well as a simplified version of what is

420: called ANOVA decomposition [12].  ANalysis Of VAriance (ANOVA) is a

421: scheme for imposing a structure on multi-dimensional kernels that are

422: generated from one-dimensional kernels, in a way that gives better control

423: over the capacity of the machine (as measured by the VC dimension).

424: An ANOVA kernel we have found to be well suited to the regression

425: problem posed by the nuclear (atomic) mass data is rooted in

426: the RBF kernel and has the form

427: $$

428: K({\bf x},{\bf x}_i) =  \left(\sum_{l=1}^n \exp\left[-\gamma\left(

429:   x^{(l)} - x_i^{(l)} \right)^2     \right]             \right)^d   \,,

430: $$

431: where the user-selected parameter $\gamma$ can take any positive

432: value and the power $d$ is usually an integer.  We

433: shall call this the ANOVA kernel.

434: \vskip 28 truept

435: \centerline{\bf 3.  SVM MODELS OF NUCLEAR MASS SYSTEMATICS}

436: \vskip 12 truept

437:

438: SVM regression models have been trained to predict $(\Delta M)c^2$

439: in MeV, where $\Delta M$ is the mass excess (or mass defect)

440: defined by the difference $M - A$ between the atomic mass $M$,

441: measured in amu, and the mass number $A$ of the nuclide in question.

442: In our initial study, we focus on a database given by the union

443: ${\rm O}\, \oplus {\rm N}\, \oplus {\rm NB}$ of three data sets.  The first

444: consists of the set of 1323 ``old'' (O) experimental mass assignments

445: which the 1981 semi-empirical droplet-model mass formula of M\"oller

446: and Nix [13] was intended to reproduce.  The second is a set of

447: 351 ``new'' (N) experimental mass assignments for nuclei that lie

448: mostly beyond the edges of the 1981 data (as viewed in the $N-Z$ plane).

449: In addition to the O and N sets, a set of 158 nuclides with more

450: recently measured masses (the NB set of ``even newer'' nuclides)

451: is employed in the modeling process.  In earlier work [14-16,5], these

452: three data sets have been used to quantify the extrapolation capability

453: (the so-called extrapability) of different global mass models (based

454: either on nuclear theory or neural networks).

455:

456: The set ${\rm O}\, \oplus {\rm N} \, \oplus {\rm NB}$ is divided by a

457: random-sampling procedure into three nonoverlapping subsets, namely

458: a training set (80\%), a validation set (10\%), and a test set (10\%),

459: in the indicated approximate proportions.  (In all work reported

460: in this paper, random samplings are drawn from a uniform

461: distribution.)   Training, validation, and

462: test sets are each further subdivided into four subsets labeled EE,

463: EO, OE, and OO, composed respectively of nuclides belonging to the

464: four ``even-oddness'' classes: even-$Z$-even-$N$, even-$Z$-odd-$N$,

465: odd-$Z$-even-$N$, and odd-$Z$-odd-$N$.  For convenience, values

466: of the input variables are encoded by a linear transformation that

467: scales and shifts given values of $Z$ and $N$ to lie in the interval

468: $[0,1]$.  A similar linear transformation decodes the learning machine's

469: raw output, which lies in the interval $[-1,1]$, so as to provide an

470: estimate of the corresponding mass excess in MeV.

471:

472: Effectively, we divide the mass problem into four separate problems,

473: one for each of the four ``even-oddness'' classes in $Z$ and $N$.

474: In doing so, we are actually incorporating some domain knowledge into

475: the learning strategy.  Distinctive quantum-mechanical features of nuclei,

476: abundantly supported by empirical evidence, include quantized angular

477: momenta, magic numbers, shell structure, and pairing energies,

478: all of which stem from the fact that $Z$ and $N$

479: are integers, even or odd.

480:

481: A SVM model is developed individually for each of the four nuclear classes

482: EE, EO, OE, and OO.  SVM regression (with ANOVA-RBF specification of kernels)

483: is carried out separately for the respective training sets, thereby

484: constructing a predictive model whose reliability is judged by its

485: performance on the examples in the test set.  Following established

486: practice, performance of each of the four models on its corresponding

487: validation set have been used to guide the final determination of the adjustable

488: parameters.  Ideally, the test set should have {\it no} role in choosing

489: these parameters (although in some cases a weak influence is allowed).

490:

491: As is usual in global models of the atomic-mass table, the quality

492: of a given model is judged by the smallness of the root-mean-square (rms)

493: error $\sigma$ in the mass excess $\Delta M$, averaged over the data

494: set in question (training, validation, or test set for a given

495: class of nuclides).  To be competitive, a model should have

496: values of $\sigma$ below 1 MeV.  It should be noted however, that

497: only in a few cases has a rigorous test of predictive performance

498: been made for the traditional theoretical models of semi-empirical

499: character.  (An important exception is found in the work of

500: M\"oller, Nix, and collaborators [15,16], who introduce the

501: notion of extrapability, which is equivalent to our

502: generalization.)

503:

504: Some of the better results obtained in the present exploratory

505: study are displayed in Table 1.  The performance of these models, all

506: with RBF parameter $\gamma = 2.5$ and ANOVA degree $d=8$, is evidently

507: of high quality.

508:

509: \topinsert

510: \centerline{\bf{Table 1}}

511: \vskip 12 truept

512: \noindent

513: Performance of SVM global models of atomic mass.  For all four models,

514: the RBF parameter $\gamma$ is 2.5 and the ANOVA degree is $d=8$.  The

515: other SVM parameters have been defaulted at $C=0.1$ and $\varepsilon = 0.001$.

516: \vskip 27 truept

517:

518: \input tables.tex

519: \nrows= 6

520: \ncols= 7

521: \begintable

522: {} | & \quad Learning & Set \quad \quad \quad | &  \quad Validation &  Set

523: \quad \quad \quad | &  \quad \quad \quad Test &  Set \quad \quad \cr

524: Classes | & \# Nuclides &  $\sigma$(MeV) | & \# Nuclides  & $\sigma$(MeV) |

525: & \# Nuclides  & $\sigma$(MeV) \cr

526: EE | & 381 & 0.58 | & 48 & 0.71 | & 48 & 0.99 \cr

527: EO | & 360 & 0.89 | & 45 & 0.68 | & 45 & 0.62 \cr

528: OE | & 371 & 0.70 | & 46 & 0.78 | & 46 & 0.88 \cr

529: OO | & 353 & 0.75 | & 44 & 0.74 | & 45 & 0.97

530: \endtable

531: \vskip 14 truept

532: \endinsert

533:

534: Similar learning experiments can be found among the studies

535: of Ref.~[5] based on multilayer perceptrons and modified

536: backpropagation training, although procedural differences

537: preclude direct comparisons of performance.  The best model obtained

538: using O as the training set, NB as validation set, and N

539: as test set gave rms error figures on these sets of 0.71 MeV,

540: 2.28 MeV, and 2.16 MeV, respectively.  Another strategy yielded

541: better results.  The set ${\rm O}\, \oplus \, {\rm N}$ was first ``purified'' by

542: removing 20 nuclides with poorly measured masses.  A random sample

543: M1 consisting of 1303 of the remaining 1654 examples (some 79\%) was

544: used as the training set.  The complementary set, M2, played the

545: role of validation set, and the NB set was used for testing

546: the trained model.  The best model found in this way produced

547: rms errors on the three sets of 0.44 MeV (M1), 0.44 Mev (M2),

548: and 0.95 MeV (NB).  It should be noted that this level of

549: performance on the mass problem was achieved after more

550: than a decade of successive improvements in the choices of

551: architectures, coding schemes, and training algorithms.

552:

553: In addition to the four class-specific models SVM-EE, SVM-EO, SVM-OE,

554: and SVM-OO reported on in Table 1, we also constructed a single SVM

555: model (denoted SVM-S) using the full O data set as the training sample,

556: without making a distinction between EE, EO, OE, and OO nuclides.

557: In this case, the NB nuclei are used as a validation set, guiding

558: the determination of the RBF and ANOVA parameters.  The parameters

559: associated with the SVM-S model are again $\gamma = 2.5$

560: and $d = 8$, along with $C= 0.1$ and $\varepsilon = 0.001$.  This

561: model yields rms errors of 0.70 MeV on the training set O and

562: 0.75 MeV on the validation set NB, with a $\sigma$ value of

563: 1.41 MeV on the N nuclei, regarded as a test set.  (These results

564: are erroneously cited in Ref.~[5].)  A proper averaging over

565: the four nuclidic classes permits a comparison between the

566: SVM-S model and the four models represented in Table 1.   The

567: composite performance of the latter models is then reflected

568: in $\sigma$ values of 0.73 MeV, 0.73 MeV, and 0.88 MeV

569: in training, validation, and testing, respectively.

570:

571: In some cases, meaningful comparisons may be drawn between the

572: performance of statistical mass models based on multilayer perceptrons

573: and support vector machines, and the traditional mass models based on

574: nuclear theory and phenomenology.  Starting with the simple liquid-drop

575: model, such traditional theory-thick models have evolved over

576: seven decades to achieve a high degree of sophistication and

577: precision.  For example, the 1992 FRDM model of M\"oller and Nix [15]

578: gives $\sigma$ values of 0.67 MeV on the O set (when fitted to

579: this set) and 0.74 MeV on the N set (a true measure

580: of predictive performance of the model).  The more enhanced

581: FRDM model of Ref.~[16], which is fitted to the data set

582: ${\rm M1} \, \oplus \, {\rm M2}$, yields rms errors of 0.68 MeV (M1),

583: 0.71 MeV (M2), and 0.70 MeV (NB).  The HFB2 model of Pearson

584: and collaborators [17] gives respective errors of 0.67 MeV,

585: 0.68 MeV, and 0.73 MeV.  (We note that the result of Ref.~[17]

586: on the ``test set'' NB cannot be regarded as a prediction, since

587: the nuclei involved were used in adjusting model parameters.)

588:

589: With additional refinements, it is not unreasonable to expect

590: that SVM models can equal (and possibly surpass) the levels of robustness

591: and predictive accuracy achieved with theory-thick models and with

592: multilayer perceptron models.  However, a conclusive statement

593: must await a thorough SVM study based on the recent AME03 mass

594: evaluation carried out by Audi {\it et al.}~[18]

595: \vskip 28 truept

596: \centerline{\bf 4.  SVM MODELS OF BETA-DECAY HALFLIVES}

597: \vskip 12 truept

598:

599: \vbox{

600: We now turn to a second problem of regression in the statistical analysis

601: of nuclear properties via support vector machines, namely fitting and

602: prediction of the beta-decay halflives of nuclides $(Z,N)$ that decay 100\% via

603: the $\beta^-$ mode.  The data for this problem have been culled

604: from the on-line repository at the Brookhaven National Nuclear Data

605: Center (http:$//$www.nndc.bnl.gov).  The data employed are current to May

606: 2005 and consist of a total of 932 examples.  Restricting

607: attention to examples with halflives below $10^6$ s leaves

608: 633 nuclides.  When measured in seconds, the experimental values

609: of $T_{1/2}$ range over 26 orders of magnitude, so it is

610: more appropriate to regress $L = \log T_{1/2}$ instead of the

611: halflife itself, and to adopt the rms error $\sigma_L$ of the estimate

612: of $L$ as a figure of merit in learning, validation, and prediction

613: phases of the analysis.

614:

615: As in the case of the mass problem, separate SVM models are

616: constructed for EE, EO, OE, and OO classes of nuclides.  However,

617: we make the simpler RBF choice of kernel, instead of pursuing

618: the more elaborate ANOVA option.   (Implementation based on the

619: ANOVA decomposition is much more demanding in terms of

620: computer time.)  Each of the four data subsets (EE, EO, OE, OO) is

621: subdivided into training, validation, and test sets in the

622: approximate proportions 80\%, 10\%, and 10\%, respectively.

623:

624: The results obtained from the SVM regressions are summarized in Tables 2

625: and 3.  Table 2 gives the parameters and performance measures of the

626: models constructed for the full set of data, regardless of measured

627: lifetime.  Table 3 displays the corresponding results when nuclides with

628: $T_{1/2} \geq 10^6$ s are removed from the database.

629:

630: A similar study [19] (see also Ref.~[20]) has been carried out

631: with multilayer feedforward neural networks trained by ``vanilla''

632: backpropagation, for data available in 1995 (766 examples in total)

633: However, this study did not employ the now-standard protocol in

634: which a validation set is used in making the final model selection.

635: Also, no subdivision into the four even-oddness classes was made.

636: Instead, the full data set (or the restricted set of examples with

637: $T_{1/2} < 10^6$~s) was split into a training set of approximately

638: 75\% of the examples and a test set consisting of the remainder.

639: }

640:

641: \topinsert

642: \centerline{\bf{Table 2}}

643: \vskip 12 truept

644: \noindent

645: Performance of SVM global models of $\beta$-decay halflives $T_{1/2}$

646: (including examples having $T >  10^6$ s).  For all four models,

647: $C=1$ and $\varepsilon =0.001$.

648: \vskip 30 truept

649: \input tables.tex

650: \nrows=6

651: \ncols=8

652: \begintable

653:          \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad

654: |\qquad Test & Set \qquad | RBF kernel     \crthick

655: Classes  \|\# Nuclides & $\sigma_L$  |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick

656:

657: EE       \|  ~137      & 2.88~ | ~16        & 3.61~ |~15        & 1.72~| 5.44 \crnorule

658: EO       \|  ~198      & 2.75~ | ~24        & 2.27~ |~22        & 2.17~| 7.27 \crnorule

659: OE       \|  ~187      & 2.37~ | ~22        & 2.76~ |~20        & 2.38~| 9.99 \crnorule

660: OO       \|  ~236      & 2.62~ | ~29        & 2.07~ |~26        & 2.96~| 9.55

661: \endtable

662: %\vskip 1.5truecm

663: \vskip 1.2truecm

664: \centerline{\bf{Table 3}}

665: \vskip 12 truept

666: \noindent

667: Performance of SVM global models of $\beta$-decay halflives (with a cutoff

668: at $10^6$ s).  For all four models, $C=1$, $\varepsilon =0.001$.

669: \vskip 30 truept

670: \input tables.tex

671: \nrows=6

672: \ncols=8

673: \begintable

674:          \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad

675:  | \qquad Test & Set \qquad | RBF kernel     \crthick

676: Classes  \|\# Nuclides & $\sigma_L$  |\# Nuclides & $\sigma_L$ |\# Nuclides& $\sigma_L$|$\gamma$\crthick

677:

678: EE       \|  ~96       & 1.34~ | ~11        & 0.52~ |~10        & 1.20~| 1.78 \crnorule

679: EO       \|  ~140      & 0.90~ | ~17        & 0.69~ |~15        & 1.22~| 9.97 \crnorule

680: OE       \|  ~122      & 1.55~ | ~14        & 0.63~ |~13        & 1.18~| 0.84 \crnorule

681: OO       \|  ~159      & 1.00~ | ~19        & 1.28~ |~17        & 1.34~| 8.87

682: \endtable

683: \vskip 14truept

684: \endinsert

685: \vskip 1.3truecm

686: %\vskip 1truecm

687:

688: Comparison of the rms errors shown in Tables 2 and 3 with the

689: corresponding performance figures from the earlier work [19,20] shows

690: an improvement (reduction) in rms error values by about a factor

691: 2, in both learning and prediction, for both the full and restricted

692: data sets.  Comparison may also be made with results from

693: traditional nuclear theory (e.g.~Refs.~[21-23]).  Since the

694: cited neural-network models could already attain performance in fitting

695: and prediction comparable to that exhibited by these theory-thick models,

696: we can say with some confidence that the SVM models are capable of a

697: predictive acuity superior to the best of the traditional global

698: models currently in play.

699: \vfill\eject

700:

701: We should also call attention to the greatly improved quality of

702: neural-network models of $\beta$-decay systematics, achieved in

703: very recent studies [24].  Data based on the AME03 evaluation

704: are divided into training, validation, and test sets in the

705: respective proportions 60\%, 20\%, and 20\%, both with and

706: without the restriction to halflives not greater than $10^6$~s,

707: but without subdivision into even-oddness classes.  In the

708: case where the restriction is imposed, the best results

709: found for the error measure $\sigma_L$ are 0.55 (training),

710: 0.61 (validation), and 0.64 (prediction).  The corresponding

711: averages for the model represented in Table 3 are 1.43, 0.89,

712: and 1.24, respectively, so further refinement of the SVM models

713: will be needed to match the perfomance of the best multilayer

714: perceptrons.

715: \vskip 28 truept

716:

717: \centerline{\bf 5.  SVM MODELS OF GROUND-STATE SPINS AND PARITIES}

718: \vskip 12 truept

719:

720: In a third illustration of what is possible, the SVM approach is applied

721: to construct global statistical models of the ground-state spins and parities

722: of nuclei.  (In this context, ``spin'' refers to the total angular momentum

723: quantum number $J$ of the nuclear state.)   As in the exercises described

724: in Secs.~3 and 4, we again divide the nuclei under consideration into EE,

725: EO, OE, and OO classes.  In the spin problem, this subdivision is of

726: obvious importance, since the law of angular momentum addition in

727: quantum mechanics dictates that the states of EE and OO nuclei can

728: only have integral values of $J$, whereas the spins of EO and OE

729: nuclei must be half-odd-integral.  In fact, all EE nuclei are known to

730: have spin/parity $J^\pi = 0^+$.  Clearly, we may exclude this class from

731: consideration, since its modeling is a trivial task for any viable

732: learning machine.

733:

734: The parity property of nuclear states presents the simplest kind

735: of classification problem, with two mutually exclusive outcomes, even

736: or odd.  Moreover, because the spin quantum number $J$ is restricted

737: by quantum theory to a finite set of discrete values, global modeling

738: of spin systematics is also most efficiently treated, within the

739: SVM framework, as a problem of classification rather than function

740: approximation or regression.  In our study, we consider

741: $J$ values ranging from 0 to 23/2 in half-odd-integral steps, the

742: integral values being available for OO nuclei and the half-odd integral

743: values, for EO and OE nuclei.  This specification of the problem

744: may be construed as introducing some basic domain knowledge into the

745: model-building process.

746:

747: Data for the spin and parity nuclear ground states have been taken from

748: the on-line Brookhaven database.  Based on simple RBF kernels, separate

749: SVM classifier models of these two properties have been developed for each

750: of the three nontrivial even-oddness cases.

751:

752: Let us first discuss our findings for the parity problem.  In treating

753: this problem, the data for each of the cases EO, OE, and OO are divided

754: at random into training, validation, and test sets in the approximate

755: proportions 80\%, 10\%, and 10\%, respectively.  Performance is measured in

756: terms of the percentages of correct classifications within these

757: subsets.  The primary results are summarized in Table 4.  It is apparent

758: that modeling parity is an easy task for SVMs.  Judging from available

759: results [25,14], it is also relatively easy for neural networks

760: (although SVM performance is somewhat superior).

761:

762: For the models of Table 4, performance on the training sets is

763: perfect.  If we are willing to make a small sacrifice in the quality

764: of reproduction of the input data, slightly better performance on the

765: validation and test sets can be achieved, as seen in Table 5.

766: It is interesting that this second model corresponds to a quite different

767: error minimum under variation of the parameter $\gamma$.  In general,

768: there may be many such minima of similar depth.

769:

770: We have not yet conducted a full training-validation-test process

771: for the spin problem.  Accordingly, we present only preliminary

772: results, which nevertheless are illuminating.  In the first experiment

773: to be reported (see Table 6), each of the three spin data sets EE, OO, and OO is

774: divided randomly into {\it two} subsets, a training set and a

775: complementary second set.  The training set contains approximately

776: 90\% of the examples of the given even-oddness class, and the second set, the

777: remaining $\sim 10$\%.

778:

779: \topinsert

780: \centerline{\bf{Table 4}}

781: \vskip 12 truept

782: \noindent

783: Performance of SVM global models of ground-state parity.

784: For all four models, $C=0.1$, $\varepsilon =0.01$.  Model selection

785: is guided by best performance on the validation set, consistent with

786: a perfect score on the training set.

787: \vskip 27 truept

788: \input tables.tex

789: \nrows=5

790: \ncols=6

791: \begintable

792: \|  \quad Learning & Set \qquad   | \quad Validation & Set \qquad

793:  |\qquad Test & Set \qquad | RBF kernel \crthick

794: Classes  \|\# Nuclides & Score  |\# Nuclides & Score |\# Nuclides&

795: Score|$\gamma$\crthick

796: EO   \|  ~474     & 100\%~ | ~58        & 93\%~ |~52        & 83\%~| 9.232 \crnorule

797: OE   \|  ~466     & 100\%~ | ~57        & 89\%~ |~51        & 90\%~| 9.482 \crnorule

798: OO   \|  ~434     & 100\%~ | ~53        & 87\%~ |~48        & 84\%~| 9.176

799: \endtable

800: \vskip 1truecm

801: \centerline{\bf{Table 5}}

802: \vskip 12 truept

803: \noindent

804: Performance of SVM global models of ground-state parity.

805: For all four models, $C=0.1$, $\varepsilon =0.01$.  In this case,

806: model selection is guided by best performance on the validation

807: set, allowing for minimal nonzero error rate on the training set.

808: \vskip 27 truept

809: \input tables.tex

810: \nrows=5

811: \ncols=8

812: \begintable

813: \| \quad Learning & Set \qquad   | \quad Validation & Set \qquad  |

814: \qquad Test & Set \qquad | RBF kernel \crthick

815: Classes \|\# Nuclides & Score  |\# Nuclides & Score |\# Nuclides& Score|$\gamma$\crthick

816: EO  \|  ~474      & 100\%~ | ~58        & 91\%~ |~52        & 83\%~| 0.678 \crnorule

817: OE  \|  ~466      & 95\%~  | ~57        & 84\%~ |~51        & 92\%~| 0.180 \crnorule

818: OO  \|  ~434      & 96\%~  | ~53        & 83\%~ |~48        & 86\%~| 0.240

819: \endtable

820: \vskip 14truept

821: \endinsert

822: \vskip 1truecm

823:

824: \topinsert

825: \centerline{\bf{Table 6}}

826: \vskip 12 truept

827: \noindent

828: Performance of SVM global models of nuclear ground-state spin.

829: For all three models, $C=0.1$, $\varepsilon =0.01$.  Model selection

830: is guided by best on performance on the validation set, consistent with a

831: perfect score on the training set.

832: \vskip 27 truept

833: \input tables.tex

834: \nrows=5

835: \ncols=6

836: \begintable

837: \| \quad \quad Learning & Set  \qquad  | \quad Validation/Test & Set \qquad \

838: | RBF kernel  \crthick

839: Classes  \|\# Nuclides & Score  |\# Nuclides & Score | $\gamma$    \crthick

840: EO       \|  ~528      & 100\%~ | ~58        & 81\%~ | 9.217       \crnorule

841: OE       \|  ~522      & 100\%~ | ~57        & 68\%~ | 9.001       \crnorule

842: OO       \|  ~488      & 100\%~ | ~54        & 43\%~ | 4.002

843:

844: \endtable

845: \vskip 14truept

846: \endinsert

847:

848: \noindent

849: The second set is used to help pin down the RBF parameter

850: $\gamma$ and thereby plays a role in model selection.  Hence it must be

851: interpreted as a validation set.  SVM models are constructed for a range

852: of $\gamma$ values, and the model whose $\gamma$ value produces the

853: lowest error on the second data set (while scoring 100\% on the

854: training set) is selected.  There is no real test set in this experiment.

855: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

856: \topinsert

857: \centerline{\bf{Table 7}}

858: \vskip 12 truept

859: \noindent

860: Performance of SVM global models of nuclear ground-state spin.

861: For all three models, $C=0.1$, $\varepsilon =0.01$.  The

862: parameter $\gamma$ is fixed at the value determined for Table 6.

863: The test set influences model choice only indirectly.

864: \vskip 27 truept

865: \input tables.tex

866: \nrows=5

867: \ncols=6

868: \begintable

869: \|   Learning & Set    | Validation & Set   |Test & Set | RBF kernel \crthick

870: Classes \|\# Nuclides & Score |\# Nuclides & Score |\# Nuclides& Score~|$\gamma$\crthick

871: EO   \|  ~476      & 100\%~ | ~58        & 79\%~ |~52        & 60\%~~| 9.217 \crnorule

872: OE   \|  ~470      & 100\%~ | ~57        & 61\%~ |~52        & 79\%~~| 9.001 \crnorule

873: OO       \|  ~440      & 100\%~ | ~54        & 39\%~ |~48        & 38\%~~| 4.002

874: \endtable

875: \vskip 14truept

876: \endinsert

877:

878: In an alternative experiment, we have implemented a protocol intermediate

879: between the training-validation scheme leading to Table 6, and the full

880: training-validation-test procedure.  The data for each of the three

881: even-oddness classes involved are divided into three subsets as follows.

882: The second subset is taken to be identical to the second subset formed

883: in the first experiment.  The first subset, used as the training set,

884: consists of 80\% of the examples for the class in question, these being

885: chosen at random from the corresponding training set created in

886: the first experiment.  The 10\% that are

887: not so chosen constitute the third subset, which is regarded as

888: a test set.  Then, using the {\it same} parameter $\gamma$ as

889: determined in the first experiment with the aid of the second

890: subset, new SVM models are developed from the examples in the reduced

891: training set.  These models are used to generate spin values for both

892: second and third subsets -- values which may differ from those given by the

893: models developed in the first experiment (see Table 7).  Although

894: it is not legitimate to interpret the third subset as a test set in the

895: purest sense, its influence on model selection is indirect.

896:

897: From the results shown in Tables 6 and 7, one may plausibly infer

898: that support vector machines can perform very well on the

899: problem of predicting nuclear ground-state spins.  While further

900: experiments are needed to affirm this conclusion, it is already

901: of interest to compare our SVM models with other global models

902: of nuclear spin systematics.  Global nuclear structure calculations

903: within the macroscopic/microscopic approach [26] reproduce

904: the ground-state spins of odd-$A$ nuclei with an accuracy of

905: 60\% (agreement being found in 428 examples out of 713).

906: (In this work, there is no clear distinction between fitting

907: and prediction, or between training, validation, and test

908: sets.) Multilayer feedforward neural networks do somewhat

909: better [25,14].  Averaging over results of three experiments involving

910: nets having a single hidden layer and trained with backpropagation,

911: the performance for odd-$A$ nuclei reaches 62\% on what are

912: effectively validation sets, the training sets being

913: reproduced to an accuracy of 93\%.  In an experiment in which

914: the connection weights of feedforward nets with one hidden layer

915: are determined by a conjugate gradient procedure, performance

916: at the level of 99.5\% on the training set and 73.2\% on

917: a validation set has been achieved for OE nuclei.  The

918: spins of odd-odd nuclei are notoriously difficult to predict.

919: This is reflected in the performance figures of neural-network

920: (perceptron) models on the OO category, which are typically

921: 75\% correct on training-set examples and only 15\% in validation or

922: testing.

923:

924: Placed in the context of earlier work, both statistical

925: and phenomenological, the results in Tables 6--7

926: for the first SVM models of nuclear spin speak for themselves.

927: \vskip 28truept

928:

929: \centerline{\bf 6.  CONCLUDING REMARKS}

930: \vskip 12truept

931:

932: We have made initial studies of the potential of support vector machines (SVM)

933: for providing statistical models of nuclear systematics with demonstrable

934: predictive power.  Using SVM regression and classification procedures,

935: we have created global models of atomic masses, beta-decay halflives,

936: and ground-state spins and parities.  These models exhibit performance

937: in both data-fitting and prediction that is comparable to that of

938: the best global models from nuclear phenomenology and microscopic theory,

939: as well as the best statistical models based on multilayer feedforward

940: neural networks.  Further work to develop the scope, acuity, and reliability

941: of SVM applications to nuclear physics seems to be warranted.  In particular,

942: the full body of data in the AME03 atomic-mass evaluation [18] must be brought to

943: bear in construction of SVM models of mass systematics, and the treatment

944: of the spin problem begun here needs to be completed.  Fruitful applications

945: to nucleon separation energies, $\alpha$-decay halflives,

946: branching ratios of nuclear decay, nuclear deformations, neutron

947: cross sections, and other nuclear properties may also be on the horizon.

948: \vskip 28truept

949:

950: \centerline{\bf ACKNOWLEDGMENTS}

951: \vskip 12 truept

952: This research was supported in part by the U.~S.~National

953: Science Foundation under Grant No.~PHY-0140316.  For the regression

954: problems, we made use of the on-line mySVM software and instruction manual

955: of Stefan R\"uping (Dortmund) [27], and for classification problems

956: we implemented the SVM-multiclass software of Thorsten Joachims

957: (Cornell) [28].

958: \vskip 28truept

959:

960: \centerline{\bf REFERENCES}

961: \vskip 12 truept

962: \item{[1]}

963: D.~E.~Rumelhart, G.~E.~Hinton, and R.~J.~Williams, in {\it

964: Parallel Distributed Processing: Explorations in the Microstructure

965: of Cognition}, Vol.~1, edited by D.~E.~Rumelhart {\it et al.} (MIT Press,

966: Cambridge, MA, 1986).

967: \item{[2]}

968: J.~Hertz, A.~Krogh, and R.~G.~Palmer, {\it Introduction to the Theory

969: of Neural Computation} (Addison-Wesley, Redwood City, CA, 1991).

970: \item{[3]}

971: S.~Haykin, {\it Neural Networks: A Comprehensive Foundation}, Second

972: Edition (McMillan, New York, 1999).

973: \item{[4]}

974: J.~W.~Clark, T.~Lindenau, and M.~L.~Ristig, {\it Scientific Applications

975: of Neural Nets} (Springer-Verlag, Berlin, 1999).

976: \item{[5]}

977: S.~Athanassopoulos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,

978: {\it Nucl.~Phys.~A} {\bf 743}, 222 (2004).

979: \item{[6]}

980: C.~Cortes and V.~Vapnik, {\it Machine Learning} {\bf 20}, 273 (1995).

981: \item{[7]}

982: V.~N.~Vapnik, {\it The Nature of Statistical Learning Theory} (Springer-Verlag,

983: New York, 1995).

984: \item{[8]}

985: V.~N.~Vapnik, {\it Statistical Learning Theory} (Wiley, New York, 1998).

986: \item{[9]}

987: V.~N.~Vapnik, in {\it Advances in Neural Information Processing Systems},

988: Vol.~4 (Morgan Kaufmann, San Mateo, CA, 1992), p.~831.

989: \item{[10]}

990: J.~Mercer, {\it Transactions of the London Philosophical Society (A)} {\bf 209},

991: 415 (1909).

992: \item{[11]}

993: V.~N.~Vapnik and A.~Ya.~Chervonenkis, in {\it Theoretical Probability

994: and Its Applications} {\bf 17}, 264 (1971).

995: \item{[12]}

996: M.~O.~Stitson, A.~Gammerman, V.~Vapnik, V.~Vovk, C.~Watkins, and J.~Weston,

997: in {\it Advances in Kernel Methods -- Support Vector Learning},

998: %#JWC:   Check Sch\"ukopf - may be Sch\"okopf.

999: edited by B. Sch\"ukopf, C.~Burges, and A.~J.~Smola

1000: (MIT Press, Cambridge, MA, 1999), p.~285.

1001: \item{[13]}

1002: P.~M\"oller and J.~R.~Nix, {\it At.~Data~Nucl.~Data Tables} {\bf 26},

1003: 165 (1981).

1004: \item{[14]}

1005: K.~A.~Gernoth, J.~W.~Clark, J.~S.~Prater, and H.~Bohr, {\it Phys.~Lett.} {\bf B300},

1006: 1 (1993).

1007: \item{[15]}

1008: P.~M\"oller and J.~R.~Nix, {\it J.~Phys.~G} {\bf 20}, 1681 (1994).

1009: \item{[16]}

1010: P.~M\"oller, J.~R.~Nix, W.~D.~Myers, and W.~J.~Swiatecki,

1011: {\it At.~Data Nucl.~Data Tables} {\bf 59}, 185 (1995).

1012: \item{[17]}

1013: M.~Samyn, S.~Goriely, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,

1014: {\it Nucl.~Phys.} A {\bf 700}, 142 (2002);

1015: S.~Goriely, M.~Samyn, P.-H.~Heenen, J.~M.~Pearson, and F.~Tondeur,

1016: {\it Phys.~Rev.~C} {\bf 66}, 024326 (2002).

1017: \item{[18]}

1018: A.~H.~Wapstra, G.~Audi, and C.~Thibault, {\it Nucl.~Phys.~A} {\bf 729}, 337

1019: (2003).

1020: \item{[19]}

1021: E.~Mavrommatis, A.~Dakos, K.~A.~Gernoth, and J.~W.~Clark, in {\it Condensed

1022: Matter Theories}, Vol. 13, edited by J.~da Providencia and F.~B.~Malik

1023: (Nova Science Publishers, Commack, NY, 199), p.~423.

1024: \item{[20]}

1025: J. W. Clark, E. Mavrommatis, S. Athanassopoulos, A. Dakos, and

1026: K. A. Gernoth, {\it Fission Dynamics of Atomic Clusters and Nuclei},

1027: edited by D.~M.~Brink, F.~F.~Karpechine, F.~B.~Malik, and J.~da Providencia

1028: (World Scientific, Singapore, 2001), p.~76. [nucl-th/0109081]

1029: \item{[21]}

1030: A.~Staudt, E.~Bender, K.~Muto, and H.~V.~Klapdor,

1031: {\it At.~Data Nucl.~Data Tables} {\bf 44}, 80 (1990).

1032: \item{[22]}

1033: H.~Homma, E.~Bender, M.~Hirsch, K.~Muto, H.~V.~Klapdor-Kleingrothaus,

1034: {\it Phys.~Rev.~C} {\bf 54}, 2972 (1996).

1035: \item{[23]}

1036: P.~M\"oller, J.~R.~Nix, and K.~L.~Kratz,

1037: {\it At.~Data Nucl.~Data Tables} {\bf 66}, 131 (1997).

1038: \item{[24]}

1039: N.~Costiris, A.~Dakos, E.~Mavrommatis, K.~A.~Gernoth, and J.~W.~Clark,

1040: to be published.

1041: \item{[25]}

1042: J.~W.~Clark, S.~Gazula, K.~A.~Gernoth, J.~Hasenbein, J.~S.~Prater,

1043: and H.~Bohr, in {\it Recent Progress in Many-Body Theories},

1044: Vol.~3, edited by T.~L.~Ainsworth, C.~E.~Campbell, B.~E.~Clements,

1045: and E.~Krotscheck (Plenum, New York, 1992), p.~371.

1046: \item{[26]}

1047: P.~M\"oller and J.~R.~Nix, {\it Nucl.~Phys.~A}~{\bf 520}, 369c (1990).

1048: \item{[27]}

1049: S. R\"uping, mySVM,

1050: %${\rm http://www}$-ai.cs.uni-dortmund.de${\rm /SOFTWARE/MYSVM/}$

1051: http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/

1052: (2004).

1053: \item{[28]}

1054: T. Joachims (2004), Multi-Class Support Vector Machine,

1055: %${\rm http://www.cs.cornell.edu/}$ People/tj/svm\_light/svm\_multiclass.html (2004).

1056: http://www.cs.cornell. edu/People/tj/svm\_light/svm\_multiclass.html (2004).

1057: \bye

1058: