0006:physics0006064/ver3.tex

1: \documentclass[12pt]{article}

2: %\documentstyle[osa,osabib,manuscript]{revtex}

3: %\newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}

4: %\newcommand{\manual}{rm}        % Substitute rm (Roman) font.

5: %\newcommand\bs{\char '134 }     % add backslash char to \tt font

6: %

7: \usepackage{times}

8:

9: \textheight 8.5truein \textwidth 6.5truein \hoffset=0truein

10: \voffset=0.truein

11:

12: \begin{document}

13:

14: \def\bA{{\bf A}}

15: \def\bP{{\bf P}}

16: \def\bE{{\bf E}}

17: \def\ba{{\bf a}}

18: \def\be{{\bf e}}

19: \def\bv{{\bf v}}

20: \def\bk{{\bf k}}

21:

22: \bibliographystyle{prsty}

23:

24:

25: \title{Improvements of the Discrete Dipole Approximation method}

26:

27: \author{Piotr J. Flatau \\

28: Scripps Institution of Oceanography, University of California, San

29: Diego, \\La Jolla, California 92093-0221}

30:

31:

32:

33: \maketitle

34:

35: \begin{abstract}

36: We report improvements in complex conjugate gradient algorithms

37: applied to the discrete dipole approximation (DDA). It is shown

38: that computational time is reduced by using  the Bi-CGSTAB version

39: of the CG algorithm, with diagonal left preconditioning.

40:

41: Key words: scattering, non-spherical particles, discrete dipole

42: approximation.

43:

44: \begin{center}

45: Optics Letters 1997, volume 22, number 16, 1205-1207.

46: \newline {\copyright\ Optical Society of America, 1997.}

47: \end{center}

48: \end{abstract}

49:

50:

51:

52:

53: The discrete-dipole approximation (DDA) is a flexible technique

54: for computing scattering and absorption by targets of arbitrary

55: geometry. In \cite{Draine94a} the discrete dipole approximation

56: (DDA) for scattering calculations is reviewed. Rather than

57: ``direct'' methods for solving linear system of equations arising

58: in DDA problem iterative methods for finding the solution have

59: proven effective and efficient.

60:

61: In this paper we perform systematic study of various

62: non-stationary iterative (conjugate gradient) methods in search

63: for the most efficient one. We document implementation of these

64: methods in our public domain code DDSCAT.5a code\cite{Draine94a}

65:

66:

67:

68:

69: Numerical aspects of the discrete dipole approximation continue to

70: be of great interest. Yung \cite{Yung78a} applied a conjugate

71: gradient method to the in DDA approach. Hoekstra

72: \cite{Hoekstra94b} identifies Yung's scheme as the conjugate

73: gradient (CG) algorithm proposed by Hestenes  \cite{Hes52a}.

74: Rahola \cite{Rahola96a} discusses solution of dense systems of

75: linear equations in the discrete-dipole approximation and choice

76: of of the best iterative method in this application. Draine

77: \cite{Draine88a} implemented a conjugate gradient method based on

78: work of Petravic  and Kuo-Petravic. \cite{Petravic79a} This

79: implementation is quite robust and has been  used for many years.

80: \cite{Draine94a} However, Lumme and Rahola \cite{Lumme94a} applied

81: the quasi-minimal residual (QMR) conjugate gradient algorithm to

82: the system of linear equations arising in the DDA applications.

83: They claim that the QMR method is approximately 3 times faster in

84: comparison to the one  employed in the DDSCAT code.

85: \cite{Draine94a} It was this work which prompted us to perform the

86: analysis reported here.

87:

88:

89:

90:

91:

92: PIM\cite{Cunha95a} is a collection of Fortran 77 routines designed

93: to solve systems of linear equations  on parallel and sequential

94: computers using a variety of iterative methods.

95:

96: PIM contains implementations of various methods:

97: conjugate-gradient (CG); Conjugate-Gradients for normal equations

98: with minimization of the residual norm (CGNR); Conjugate-Gradients

99: for normal equations with minimization of the error norm (CGNE);

100: Bi-Conjugate-Gradients (Bi-CG); Conjugate-Gradients squared (CGS);

101: the stabilised version of Bi-Conjugate-Gradients (Bi-CGSTAB); the

102: restarted, stabilised version of Bi-Conjugate-Gradients

103: (RBi-CGSTAB); the restarted, generalized minimal residual

104: (RGMRES); the restarted, generalized conjugate residual (RGCR),

105: the quasi-minimal residual with coupled two-term recurrences

106: (QMR); the transpose-free quasi-minimal residual (TFQMR); and

107: Chebyshev acceleration. The routines allow the use of

108: preconditioners; the user may choose to use left-, right- or

109: symmetric-preconditioning.

110:

111:

112: The convergence rate of iterative methods depends on the

113: coefficient matrix. Hence one may attempt to transform the linear

114: system into one that is equivalent (in the sense that it has the

115: same solution) but is easier to solve. A preconditioner is a

116: matrix $M$ that effects such a transformation. It is possible to

117: introduce left- and right preconditioners.\cite{Barrett94a} The

118: simplest preconditioner consists of just the diagonal of the

119: coefficient matrix. This is known as the (point) Jacobi

120: preconditioner.

121:

122:

123:

124:

125: To compare these different algorithms we have used them to find

126: solutions to the problem of scattering by a homogeneous sphere.

127: The scattering problem is specified by the usual size parameter

128: $x= 2 \pi a / \lambda$, where a is the radius.

129:

130: Tables~\ref{table1} and \ref{table4} presents the number of

131: iterations and CPU time for size parameter $x=0.1$ and $x=1$ and

132: for several values of refractive index. The conjugate gradient

133: methods are defined as above. Label (L) indicates left Jacobi

134: preconditioning. For example CGNE(L) is the conjugate gradient

135: method  for normal equations with minimization of the error norm

136: and left Jacobi preconditioning. Similarly, (R) indicates right

137: Jacobi preconditioning. CPU time (sequential Silicon Graphics

138: workstation) is normalized to the ``best'' method. Star indicates

139: that the method did not converge in the maximum allowed number of

140: iterations or that the method failed to converge. Fractional error

141: $10^{-5}$ was used as the stopping criterion. The DDSCAT.5a

142: code\cite{Draine94a}  with the newly implemented GPFA fast Fourier

143: transform method was used. For Bi-CGSTAB and CGNE we used left and

144: right Neumann polynomial preconditioner truncated after the first

145: term. Thus, Bi-CGSTAB(N)(L)  indicates the stabilised version of

146: Bi-Conjugate-Gradients method with left Neumann polynomial

147: preconditioner.

148:

149: Table~\ref{table1} presents results for size parameter $x=0.1$ and

150: real refractive index ${\rm n}=1.33, 2, 3, 5$ as well as one case

151: with small complex part of refractive index ${\rm n}=(5, 0.0001)$

152: and  size parameter ${\rm x}=0.1$. In Table 1  the  CPU times  are

153: normalized to the CG(L)  method, which was found to be the best

154: method. For example it is 4.0 times faster in comparison with the

155: CGNE for ${\rm n}=(1.33,0)$. For larger values of real refractive

156: index the CGNE is almost an order of magnitude slower in

157: comparison to CG. This is because more iterations are needed for

158: the same convergence and because cost of one CG iteration is less

159: than cost of one CGNE iteration. The QMR algorithm is never

160: competitive and actually fails to converge for large real

161: refractive indices. For small refractive index the Bi-CGSTAB

162: algorithm is comparable to the CG and requires less iterations.

163: However, the cost per iteration is larger in comparison to CG

164: which offsets the advantage of lesser number of iterations. The

165: Petravic  and Kuo-Petravic \cite{Petravic79a} algorithm used by us

166: for many years \cite{Draine88a} is similar to CGNR and CGNE.

167: However, we observed on occasion slightly different convergence

168: rates due to stabilization of Petravic  and Kuo-Petravic algorithm

169: every 10th time step. \cite{Draine88a} This is true for all other

170: cases. The storage requirements of CG,  CGNE, CGNR is $6 \times

171: N$, for BiCG it is $8 \times N$, for CGS, Bi-CGSTAB, TFQMR it is

172: $10 \times N$, QMR requires $11 \times N$. Thus, for pure real

173: refractive index, the  CG is not only the fastest method but also

174: it requires the least amount of temporary storage. It can be seen

175: that left preconditioning by the inverse of diagonal of the DDA

176: matrix \cite{Draine94a}  reduces the number of iterations needed.

177: The added time needed for division by diagonal elements is

178: generally negligible in comparison to the time saved by smaller

179: amount of iterations. It can be seen that for  Bi-CGSTAB,

180: Bi-CGSTAB(L), and Bi-CGSTAB(R) the  left Jacobi preconditioning is

181: the only method converging for larger refractive index. Restarted

182: methods (RBi-CBSTAB and RGCR) appear to be not competitive but

183: further study may be needed (we used the orthogonal base of 10

184: vectors for all restarted methods). The CG method is also

185: competitive in cases with small absorption (see last column of

186: Table~\ref{table1}). We have also calculated (not presented)

187: results for size parameter of $x=0.1$ and increasing complex part

188: of refractive index $n=(1.33,0), (1.33, 0.01), (1.33,0.1),

189: (1.33,1), (1.33, 2), (1.33,3)$. The BiCGSTAB(L), which proved to

190: be   the most robust method. However the CGS(L) is competitive and

191: faster for $n=(1.33, 3)$. Both CGS(L) and BiCGSTAB(L) require the

192: same amount of iteration for convergence and their cost is

193: similar. These methods are between 2.9 and 1.6 times faster in

194: comparison to CGNR --- the method used in DDSCAT code. The QMR and

195: TFQMR which Lumme and Rahola \cite{Lumme94a} claim to be faster in

196: comparison to CGNR and the DDSCAT implementations do not converge

197: on occasion and when they work they are only slightly better in

198: this case. As before, left Jacobi preconditioning is almost always

199: beneficial. The CG(L) algorithm is faster than BiCGSTAB(L) for

200: refractive index $n=(1.33,0), (1.33,0.01), (1.33,0.1)$.

201:

202:

203: Table~\ref{table4} is for size parameter $x=1$. All the results

204: are normalized to Bi-CGSTAB(L). This method is clearly superior to

205: the CGNR method and it  is 2-4.3 faster. It can be seen that CGNR

206: converges slowly, and has not satisfied the stopping criterion in

207: 140 iterations for $n=(3,0.0001)$. For this larger value of size

208: parameter the QMR algorithm doesn't converge well but its smooth

209: version TFQMR does. However, TFQMR is slower in comparison to

210: Bi-CGSTAB(L) and comparable to CGNR. The CG(L) method for

211: refractive index $n=(1.33,0)$ and $n=1.33,0.01$ is faster than the

212: reference scheme Bi-CGSTAB(L). It can be seen that the Neumann

213: polynomial preconditioning Bi-CGSTAB(N)(L) or Bi-CGSTAB(N)(R) does

214: reduce the number of iterations needed for certain cases of

215: refractive index.  However the cost associated with the additional

216: calculations always offsets this improved convergence rate. As

217: before, the  left Jacobi preconditioner is superior to right or

218: no-preconditioner cases. CG(L) works well for small refractive

219: index but is comparable to Bi-CGSTAB(L). The QMR algorithm fails

220: to converge but the transpose-free quasi-minimal residual (TFQMR)

221: algorithm converges well and is comparable to CGNR. The CG method

222: is theoretically valid for Hermitian positive definite matrices.

223: The matrix arising in the DDA is not Hermitian but symmetric.

224: Therefore, strictly speaking, the CG method  is not valid for use

225: in the DDA. The users are advised to test the CG  method when

226: extrapolating results presented here to different size parameters,

227: particle sizes,  and  refractive index values.

228: \begin{table}[ht]

229: \caption{\label{table1}CPU time (normalized)  and number of

230: iterations for x=0.1.}

231: \begin{tabular}{lccccc}

232:  Method  & n=(1.33,0) & (2,0) & (3,0) & (5,0) & (5,0.0001)  \\

233: \hline CGNE &    4.0(9) &    4.9(24) &    8.7(76) &  *(540) &

234: *(540) \\ CGNE(L) &    3.3(7) &    4.0(19) &    7.8(67) &  *(540)

235: &  *(540) \\ CGNE(R) &    4.1(9) &    4.9(24) &    8.8(76) &

236: *(540) &  *(540) \\ CGNE(N)(L) &    3.1(4) &    4.1(13) &

237: 19.3(113) &  *(540) &  *(540) \\ CGNE(N)(R) &    4.4(6) & 7.7(25)

238: &  *(140) &  *(540) &  *(540) \\ CGNR &    4.0(9) & 4.7(23) &

239: 8.0(69) &  *(540) &  *(540) \\ CGNR(L) &    3.3(7) & 4.0(19) &

240: 5.9(50) &    4.6(329) &    3.5(330) \\ CGNR(R) & 4.1(9) &

241: 4.7(23) &    8.0(69) &  *(540) &  *(540) \\ QMR & 3.7(6) &

242: 3.3(11) &    3.3(19) &  *(111) &  *(78) \\ QMR(L) & 2.6(4) &

243: 2.8(9) &    2.7(15) &  *(75) &  *(92) \\ QMR(R) & 3.8(6) &

244: 3.4(11) &    3.4(19) &  *(268) &  *(540) \\ CG & 1.4(6) &

245: 1.2(11) &    1.2(20) &    1.2(163) &    1.1(213) \\ CG(L) &

246: 1.0(4) &    1.0(9) &    1.0(16) &    1.0(138) & 1.0(182) \\ CG(R)

247: &    1.4(6) &    1.2(11) &    1.2(20) & 1.1(157) &    1.2(213) \\

248: BiCG &    2.3(6) &    2.1(11) &  *(140) &  *(540) &  *(540) \\

249: BiCG(L) &    1.6(4) &    1.8(9) &  *(140) & *(540) &  *(540) \\

250: BiCG(R) &    2.4(6) &    2.2(11) &  *(140) & *(540) &  *(540) \\

251: Bi-CGSTAB &    1.8(4) &    1.5(7) &    1.5(13) &  *(540) &  *(540)

252: \\ Bi-CGSTAB(L) &    1.4(3) &    1.3(6) & 1.3(11) &    4.0(281) &

253: 4.2(388) \\ Bi-CGSTAB(R) &    1.8(4) & 1.5(7) &    1.6(13) &

254: *(540) &  *(540) \\ Bi-CGSTAB(N)(L) & 1.9(2) &    6.8(17) &

255: 14.9(65) &  *(540) &  *(540) \\ Bi-CGSTAB(N)(R) &    2.1(2) &

256: 13.1(33) &   27.9(122) &  *(540) & *(540) \\ TFQMR &    3.8(5) &

257: 3.4(9) &    4.3(19) &  *(540) & *(540) \\ TFQMR(L) &    3.1(4) &

258: 3.1(8) &    4.1(18) &  *(540) &  *(540) \\ TFQMR(R) &    3.9(5) &

259: 3.5(9) &    4.4(19) & *(540) &  *(540) \\ CGS &    1.7(4) &

260: 1.5(7) &    1.4(12) & *(540) &  *(540) \\ CGS(L) &    1.4(3) &

261: 1.3(6) &    1.2(10) & *(540) &  *(540) \\ CGS(R) &    1.8(4) &

262: 1.5(7) &    1.4(12) & *(540) &  *(540) \\ RGCR &    4.3(2) &

263: 2.8(2) &    2.5(3) & *(14) &  *(14) \\ RGCR(L) &    2.0(1) &

264: 2.4(2) &    2.1(2) & *(14) &  *(14) \\ RGCR(R) &    4.4(2) &

265: 2.8(2) &    2.7(3) & *(14) &  *(14) \\ RBi-CGSTAB &  *(12) &

266: *(12) &  *(12) &  *(12) & *(12) \\ RBi-CGSTAB(L) &  *(12) &  *(12)

267: &  *(12) &  *(12) & *(12) \\ RBi-CGSTAB(R) &  *(12) &  *(12) &

268: *(12) &  *(12) & *(12) \\ \hline

269: \end{tabular}

270: \end{table}

271: \begin{table}[ht]

272: \caption{\label{table4}CPU time (normalized)  and number of

273: iterations for x=1}

274: \begin{tabular}{lccccc}

275:  Method  & n=(1.33,0) & (1.33,0.01) & (1.33,1) & (2,0) & (3,0.0001)  \\

276: \hline CGNE &    3.2(10) &    3.2(10) &    2.0(16) &    4.5(33) &

277: *(140) \\ CGNE(L) &    2.7(8) &    2.6(8) &    1.7(13) & 3.8(27) &

278: *(140) \\ CGNE(R) &    3.2(10) &    3.2(10) & 2.0(16) &    4.6(33)

279: &  *(140) \\ CGNE(N)(L) &    2.6(5) & 2.6(5) &  *(140) &  *(140) &

280: *(140) \\ CGNE(N)(R) &    3.6(7) & 3.6(7) &  *(140) &  *(140) &

281: *(140) \\ CGNR &    3.5(11) & 3.5(11) &    2.0(16) &    4.3(32) &

282: *(140) \\ CGNR(L) &    2.7(8) &    2.6(8) &    1.7(13) &

283: 3.7(27) &  *(140) \\ CGNR(R) & 3.5(11) &    3.5(11) &    2.0(16) &

284: 4.4(32) &  *(140) \\ QMR & *(47) &  *(58) &  *(25) &  *(76) &

285: *(50) \\ QMR(L) &    5.3(12) & *(59) &  *(21) &  *(71) &  *(39) \\

286: QMR(R) &  *(52) &  *(63) & *(22) &  *(70) &  *(37) \\ CG &

287: 1.3(8) &    1.3(8) &  *(140) & *(140) &  *(140) \\ CG(L) &

288: 0.9(5) &    0.9(5) &  *(140) & *(140) &  *(140) \\ CG(R) &

289: 1.3(8) &    1.3(8) &  *(140) & *(140) &  *(140) \\ BiCG &  *(140)

290: &  *(140) &  *(140) &  *(140) & *(140) \\ BiCG(L) &  *(140) &

291: *(140) &  *(140) &  *(140) & *(140) \\ BiCG(R) &  *(140) &  *(140)

292: &  *(140) &  *(140) & *(140) \\ Bi-CGSTAB &    1.3(4) &    1.3(4)

293: &    1.2(10) & 1.2(9) &    1.1(24) \\ Bi-CGSTAB(L) &    1.0(3) &

294: 1.0(3) & 1.0(8) &    1.0(7) &    1.0(21) \\ Bi-CGSTAB(R) &

295: 1.3(4) & 1.3(4) &    1.2(10) &    1.3(9) &    1.1(24) \\

296: Bi-CGSTAB(N)(L) & 1.4(2) &    1.4(2) &    1.5(6) &    4.6(17) &

297: *(140) \\ Bi-CGSTAB(N)(R) &    1.5(2) &    1.5(2) &    1.6(6) &

298: *(140) & *(140) \\ TFQMR &    3.3(6) &    3.3(6) &    3.3(14) &

299: 3.4(13) &    3.8(42) \\ TFQMR(L) &    2.8(5) &    2.8(5) &

300: 3.0(13) & 3.0(11) &    3.7(40) \\ TFQMR(R) &    3.4(6) &    3.4(6)

301: & 3.3(14) &    3.4(13) &    3.9(42) \\ CGS &    1.3(4) &    1.3(4)

302: & 1.3(11) &    1.4(10) &    1.6(34) \\ CGS(L) &    1.0(3) & 1.0(3)

303: &    1.2(10) &    1.3(9) &    1.1(23) \\ CGS(R) &    1.3(4) &

304: 1.3(4) &    1.3(11) &    1.4(10) &    1.6(33) \\ RGCR & 3.7(2) &

305: 4.5(2) &  *(14) &    3.3(3) &  *(14) \\ RGCR(L) & 3.2(2) &

306: 3.1(2) &    6.5(6) &    2.9(3) &  *(14) \\ RGCR(R) & 3.7(2) &

307: 3.7(2) &    7.4(7) &    3.3(3) &  *(14) \\ \hline

308: \end{tabular}

309: \end{table}

310:

311:

312:

313:

314:

315:

316:

317:

318:

319:

320: We recommend use of the stabilized version of the Bi-conjugate

321: gradient algorithm with left Jacobi preconditioning

322: [Bi-CGSTAB(L)]. This algorithms requires 67\% greater storage than

323: the CGNR algorithm, but is typically 2-3 times faster.

324:

325: The recent version of Discrete Dipole Approximation code DDSCAT5a

326: developed by Draine and Flatau contains improvements documented in

327: this paper. The code is available via anonymous ftp from the

328: \verb|ftp.astro.princeton.edu| site or from the Light Scattering

329: and Radiative Transfer Codes Library --- \verb|SCATTERLIB|

330: (\verb|http://atol.ucsd.edu/~pflatau|).

331:

332:

333:

334:

335:

336:

337:

338:

339: I have been supported in part by the Office of Naval Research

340: Young Investigator Program and in part by DuPont Corporate

341: Educational Assistance. I would like to thank Drs M. J. Wolff and

342: A. E. Ilin who  helped with computer tests. Bruce Draine checked

343: the manuscript. Dr. R. J.  Riegert of Du Pont if acknowledged for

344: his continuing interest in DDSCAT developments.

345:

346:

347:

348:

349:

350: \bibliography{all,cg,fft,local}

351:

352:

353: \end{document}

354: