1: \documentclass[12pt]{article}
2: %\documentstyle[osa,osabib,manuscript]{revtex}
3: %\newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}
4: %\newcommand{\manual}{rm} % Substitute rm (Roman) font.
5: %\newcommand\bs{\char '134 } % add backslash char to \tt font
6: %
7: \usepackage{times}
8:
9: \textheight 8.5truein \textwidth 6.5truein \hoffset=0truein
10: \voffset=0.truein
11:
12: \begin{document}
13:
14: \def\bA{{\bf A}}
15: \def\bP{{\bf P}}
16: \def\bE{{\bf E}}
17: \def\ba{{\bf a}}
18: \def\be{{\bf e}}
19: \def\bv{{\bf v}}
20: \def\bk{{\bf k}}
21:
22: \bibliographystyle{prsty}
23:
24:
25: \title{Improvements of the Discrete Dipole Approximation method}
26:
27: \author{Piotr J. Flatau \\
28: Scripps Institution of Oceanography, University of California, San
29: Diego, \\La Jolla, California 92093-0221}
30:
31:
32:
33: \maketitle
34:
35: \begin{abstract}
36: We report improvements in complex conjugate gradient algorithms
37: applied to the discrete dipole approximation (DDA). It is shown
38: that computational time is reduced by using the Bi-CGSTAB version
39: of the CG algorithm, with diagonal left preconditioning.
40:
41: Key words: scattering, non-spherical particles, discrete dipole
42: approximation.
43:
44: \begin{center}
45: Optics Letters 1997, volume 22, number 16, 1205-1207.
46: \newline {\copyright\ Optical Society of America, 1997.}
47: \end{center}
48: \end{abstract}
49:
50:
51:
52:
53: The discrete-dipole approximation (DDA) is a flexible technique
54: for computing scattering and absorption by targets of arbitrary
55: geometry. In \cite{Draine94a} the discrete dipole approximation
56: (DDA) for scattering calculations is reviewed. Rather than
57: ``direct'' methods for solving linear system of equations arising
58: in DDA problem iterative methods for finding the solution have
59: proven effective and efficient.
60:
61: In this paper we perform systematic study of various
62: non-stationary iterative (conjugate gradient) methods in search
63: for the most efficient one. We document implementation of these
64: methods in our public domain code DDSCAT.5a code\cite{Draine94a}
65:
66:
67:
68:
69: Numerical aspects of the discrete dipole approximation continue to
70: be of great interest. Yung \cite{Yung78a} applied a conjugate
71: gradient method to the in DDA approach. Hoekstra
72: \cite{Hoekstra94b} identifies Yung's scheme as the conjugate
73: gradient (CG) algorithm proposed by Hestenes \cite{Hes52a}.
74: Rahola \cite{Rahola96a} discusses solution of dense systems of
75: linear equations in the discrete-dipole approximation and choice
76: of of the best iterative method in this application. Draine
77: \cite{Draine88a} implemented a conjugate gradient method based on
78: work of Petravic and Kuo-Petravic. \cite{Petravic79a} This
79: implementation is quite robust and has been used for many years.
80: \cite{Draine94a} However, Lumme and Rahola \cite{Lumme94a} applied
81: the quasi-minimal residual (QMR) conjugate gradient algorithm to
82: the system of linear equations arising in the DDA applications.
83: They claim that the QMR method is approximately 3 times faster in
84: comparison to the one employed in the DDSCAT code.
85: \cite{Draine94a} It was this work which prompted us to perform the
86: analysis reported here.
87:
88:
89:
90:
91:
92: PIM\cite{Cunha95a} is a collection of Fortran 77 routines designed
93: to solve systems of linear equations on parallel and sequential
94: computers using a variety of iterative methods.
95:
96: PIM contains implementations of various methods:
97: conjugate-gradient (CG); Conjugate-Gradients for normal equations
98: with minimization of the residual norm (CGNR); Conjugate-Gradients
99: for normal equations with minimization of the error norm (CGNE);
100: Bi-Conjugate-Gradients (Bi-CG); Conjugate-Gradients squared (CGS);
101: the stabilised version of Bi-Conjugate-Gradients (Bi-CGSTAB); the
102: restarted, stabilised version of Bi-Conjugate-Gradients
103: (RBi-CGSTAB); the restarted, generalized minimal residual
104: (RGMRES); the restarted, generalized conjugate residual (RGCR),
105: the quasi-minimal residual with coupled two-term recurrences
106: (QMR); the transpose-free quasi-minimal residual (TFQMR); and
107: Chebyshev acceleration. The routines allow the use of
108: preconditioners; the user may choose to use left-, right- or
109: symmetric-preconditioning.
110:
111:
112: The convergence rate of iterative methods depends on the
113: coefficient matrix. Hence one may attempt to transform the linear
114: system into one that is equivalent (in the sense that it has the
115: same solution) but is easier to solve. A preconditioner is a
116: matrix $M$ that effects such a transformation. It is possible to
117: introduce left- and right preconditioners.\cite{Barrett94a} The
118: simplest preconditioner consists of just the diagonal of the
119: coefficient matrix. This is known as the (point) Jacobi
120: preconditioner.
121:
122:
123:
124:
125: To compare these different algorithms we have used them to find
126: solutions to the problem of scattering by a homogeneous sphere.
127: The scattering problem is specified by the usual size parameter
128: $x= 2 \pi a / \lambda$, where a is the radius.
129:
130: Tables~\ref{table1} and \ref{table4} presents the number of
131: iterations and CPU time for size parameter $x=0.1$ and $x=1$ and
132: for several values of refractive index. The conjugate gradient
133: methods are defined as above. Label (L) indicates left Jacobi
134: preconditioning. For example CGNE(L) is the conjugate gradient
135: method for normal equations with minimization of the error norm
136: and left Jacobi preconditioning. Similarly, (R) indicates right
137: Jacobi preconditioning. CPU time (sequential Silicon Graphics
138: workstation) is normalized to the ``best'' method. Star indicates
139: that the method did not converge in the maximum allowed number of
140: iterations or that the method failed to converge. Fractional error
141: $10^{-5}$ was used as the stopping criterion. The DDSCAT.5a
142: code\cite{Draine94a} with the newly implemented GPFA fast Fourier
143: transform method was used. For Bi-CGSTAB and CGNE we used left and
144: right Neumann polynomial preconditioner truncated after the first
145: term. Thus, Bi-CGSTAB(N)(L) indicates the stabilised version of
146: Bi-Conjugate-Gradients method with left Neumann polynomial
147: preconditioner.
148:
149: Table~\ref{table1} presents results for size parameter $x=0.1$ and
150: real refractive index ${\rm n}=1.33, 2, 3, 5$ as well as one case
151: with small complex part of refractive index ${\rm n}=(5, 0.0001)$
152: and size parameter ${\rm x}=0.1$. In Table 1 the CPU times are
153: normalized to the CG(L) method, which was found to be the best
154: method. For example it is 4.0 times faster in comparison with the
155: CGNE for ${\rm n}=(1.33,0)$. For larger values of real refractive
156: index the CGNE is almost an order of magnitude slower in
157: comparison to CG. This is because more iterations are needed for
158: the same convergence and because cost of one CG iteration is less
159: than cost of one CGNE iteration. The QMR algorithm is never
160: competitive and actually fails to converge for large real
161: refractive indices. For small refractive index the Bi-CGSTAB
162: algorithm is comparable to the CG and requires less iterations.
163: However, the cost per iteration is larger in comparison to CG
164: which offsets the advantage of lesser number of iterations. The
165: Petravic and Kuo-Petravic \cite{Petravic79a} algorithm used by us
166: for many years \cite{Draine88a} is similar to CGNR and CGNE.
167: However, we observed on occasion slightly different convergence
168: rates due to stabilization of Petravic and Kuo-Petravic algorithm
169: every 10th time step. \cite{Draine88a} This is true for all other
170: cases. The storage requirements of CG, CGNE, CGNR is $6 \times
171: N$, for BiCG it is $8 \times N$, for CGS, Bi-CGSTAB, TFQMR it is
172: $10 \times N$, QMR requires $11 \times N$. Thus, for pure real
173: refractive index, the CG is not only the fastest method but also
174: it requires the least amount of temporary storage. It can be seen
175: that left preconditioning by the inverse of diagonal of the DDA
176: matrix \cite{Draine94a} reduces the number of iterations needed.
177: The added time needed for division by diagonal elements is
178: generally negligible in comparison to the time saved by smaller
179: amount of iterations. It can be seen that for Bi-CGSTAB,
180: Bi-CGSTAB(L), and Bi-CGSTAB(R) the left Jacobi preconditioning is
181: the only method converging for larger refractive index. Restarted
182: methods (RBi-CBSTAB and RGCR) appear to be not competitive but
183: further study may be needed (we used the orthogonal base of 10
184: vectors for all restarted methods). The CG method is also
185: competitive in cases with small absorption (see last column of
186: Table~\ref{table1}). We have also calculated (not presented)
187: results for size parameter of $x=0.1$ and increasing complex part
188: of refractive index $n=(1.33,0), (1.33, 0.01), (1.33,0.1),
189: (1.33,1), (1.33, 2), (1.33,3)$. The BiCGSTAB(L), which proved to
190: be the most robust method. However the CGS(L) is competitive and
191: faster for $n=(1.33, 3)$. Both CGS(L) and BiCGSTAB(L) require the
192: same amount of iteration for convergence and their cost is
193: similar. These methods are between 2.9 and 1.6 times faster in
194: comparison to CGNR --- the method used in DDSCAT code. The QMR and
195: TFQMR which Lumme and Rahola \cite{Lumme94a} claim to be faster in
196: comparison to CGNR and the DDSCAT implementations do not converge
197: on occasion and when they work they are only slightly better in
198: this case. As before, left Jacobi preconditioning is almost always
199: beneficial. The CG(L) algorithm is faster than BiCGSTAB(L) for
200: refractive index $n=(1.33,0), (1.33,0.01), (1.33,0.1)$.
201:
202:
203: Table~\ref{table4} is for size parameter $x=1$. All the results
204: are normalized to Bi-CGSTAB(L). This method is clearly superior to
205: the CGNR method and it is 2-4.3 faster. It can be seen that CGNR
206: converges slowly, and has not satisfied the stopping criterion in
207: 140 iterations for $n=(3,0.0001)$. For this larger value of size
208: parameter the QMR algorithm doesn't converge well but its smooth
209: version TFQMR does. However, TFQMR is slower in comparison to
210: Bi-CGSTAB(L) and comparable to CGNR. The CG(L) method for
211: refractive index $n=(1.33,0)$ and $n=1.33,0.01$ is faster than the
212: reference scheme Bi-CGSTAB(L). It can be seen that the Neumann
213: polynomial preconditioning Bi-CGSTAB(N)(L) or Bi-CGSTAB(N)(R) does
214: reduce the number of iterations needed for certain cases of
215: refractive index. However the cost associated with the additional
216: calculations always offsets this improved convergence rate. As
217: before, the left Jacobi preconditioner is superior to right or
218: no-preconditioner cases. CG(L) works well for small refractive
219: index but is comparable to Bi-CGSTAB(L). The QMR algorithm fails
220: to converge but the transpose-free quasi-minimal residual (TFQMR)
221: algorithm converges well and is comparable to CGNR. The CG method
222: is theoretically valid for Hermitian positive definite matrices.
223: The matrix arising in the DDA is not Hermitian but symmetric.
224: Therefore, strictly speaking, the CG method is not valid for use
225: in the DDA. The users are advised to test the CG method when
226: extrapolating results presented here to different size parameters,
227: particle sizes, and refractive index values.
228: \begin{table}[ht]
229: \caption{\label{table1}CPU time (normalized) and number of
230: iterations for x=0.1.}
231: \begin{tabular}{lccccc}
232: Method & n=(1.33,0) & (2,0) & (3,0) & (5,0) & (5,0.0001) \\
233: \hline CGNE & 4.0(9) & 4.9(24) & 8.7(76) & *(540) &
234: *(540) \\ CGNE(L) & 3.3(7) & 4.0(19) & 7.8(67) & *(540)
235: & *(540) \\ CGNE(R) & 4.1(9) & 4.9(24) & 8.8(76) &
236: *(540) & *(540) \\ CGNE(N)(L) & 3.1(4) & 4.1(13) &
237: 19.3(113) & *(540) & *(540) \\ CGNE(N)(R) & 4.4(6) & 7.7(25)
238: & *(140) & *(540) & *(540) \\ CGNR & 4.0(9) & 4.7(23) &
239: 8.0(69) & *(540) & *(540) \\ CGNR(L) & 3.3(7) & 4.0(19) &
240: 5.9(50) & 4.6(329) & 3.5(330) \\ CGNR(R) & 4.1(9) &
241: 4.7(23) & 8.0(69) & *(540) & *(540) \\ QMR & 3.7(6) &
242: 3.3(11) & 3.3(19) & *(111) & *(78) \\ QMR(L) & 2.6(4) &
243: 2.8(9) & 2.7(15) & *(75) & *(92) \\ QMR(R) & 3.8(6) &
244: 3.4(11) & 3.4(19) & *(268) & *(540) \\ CG & 1.4(6) &
245: 1.2(11) & 1.2(20) & 1.2(163) & 1.1(213) \\ CG(L) &
246: 1.0(4) & 1.0(9) & 1.0(16) & 1.0(138) & 1.0(182) \\ CG(R)
247: & 1.4(6) & 1.2(11) & 1.2(20) & 1.1(157) & 1.2(213) \\
248: BiCG & 2.3(6) & 2.1(11) & *(140) & *(540) & *(540) \\
249: BiCG(L) & 1.6(4) & 1.8(9) & *(140) & *(540) & *(540) \\
250: BiCG(R) & 2.4(6) & 2.2(11) & *(140) & *(540) & *(540) \\
251: Bi-CGSTAB & 1.8(4) & 1.5(7) & 1.5(13) & *(540) & *(540)
252: \\ Bi-CGSTAB(L) & 1.4(3) & 1.3(6) & 1.3(11) & 4.0(281) &
253: 4.2(388) \\ Bi-CGSTAB(R) & 1.8(4) & 1.5(7) & 1.6(13) &
254: *(540) & *(540) \\ Bi-CGSTAB(N)(L) & 1.9(2) & 6.8(17) &
255: 14.9(65) & *(540) & *(540) \\ Bi-CGSTAB(N)(R) & 2.1(2) &
256: 13.1(33) & 27.9(122) & *(540) & *(540) \\ TFQMR & 3.8(5) &
257: 3.4(9) & 4.3(19) & *(540) & *(540) \\ TFQMR(L) & 3.1(4) &
258: 3.1(8) & 4.1(18) & *(540) & *(540) \\ TFQMR(R) & 3.9(5) &
259: 3.5(9) & 4.4(19) & *(540) & *(540) \\ CGS & 1.7(4) &
260: 1.5(7) & 1.4(12) & *(540) & *(540) \\ CGS(L) & 1.4(3) &
261: 1.3(6) & 1.2(10) & *(540) & *(540) \\ CGS(R) & 1.8(4) &
262: 1.5(7) & 1.4(12) & *(540) & *(540) \\ RGCR & 4.3(2) &
263: 2.8(2) & 2.5(3) & *(14) & *(14) \\ RGCR(L) & 2.0(1) &
264: 2.4(2) & 2.1(2) & *(14) & *(14) \\ RGCR(R) & 4.4(2) &
265: 2.8(2) & 2.7(3) & *(14) & *(14) \\ RBi-CGSTAB & *(12) &
266: *(12) & *(12) & *(12) & *(12) \\ RBi-CGSTAB(L) & *(12) & *(12)
267: & *(12) & *(12) & *(12) \\ RBi-CGSTAB(R) & *(12) & *(12) &
268: *(12) & *(12) & *(12) \\ \hline
269: \end{tabular}
270: \end{table}
271: \begin{table}[ht]
272: \caption{\label{table4}CPU time (normalized) and number of
273: iterations for x=1}
274: \begin{tabular}{lccccc}
275: Method & n=(1.33,0) & (1.33,0.01) & (1.33,1) & (2,0) & (3,0.0001) \\
276: \hline CGNE & 3.2(10) & 3.2(10) & 2.0(16) & 4.5(33) &
277: *(140) \\ CGNE(L) & 2.7(8) & 2.6(8) & 1.7(13) & 3.8(27) &
278: *(140) \\ CGNE(R) & 3.2(10) & 3.2(10) & 2.0(16) & 4.6(33)
279: & *(140) \\ CGNE(N)(L) & 2.6(5) & 2.6(5) & *(140) & *(140) &
280: *(140) \\ CGNE(N)(R) & 3.6(7) & 3.6(7) & *(140) & *(140) &
281: *(140) \\ CGNR & 3.5(11) & 3.5(11) & 2.0(16) & 4.3(32) &
282: *(140) \\ CGNR(L) & 2.7(8) & 2.6(8) & 1.7(13) &
283: 3.7(27) & *(140) \\ CGNR(R) & 3.5(11) & 3.5(11) & 2.0(16) &
284: 4.4(32) & *(140) \\ QMR & *(47) & *(58) & *(25) & *(76) &
285: *(50) \\ QMR(L) & 5.3(12) & *(59) & *(21) & *(71) & *(39) \\
286: QMR(R) & *(52) & *(63) & *(22) & *(70) & *(37) \\ CG &
287: 1.3(8) & 1.3(8) & *(140) & *(140) & *(140) \\ CG(L) &
288: 0.9(5) & 0.9(5) & *(140) & *(140) & *(140) \\ CG(R) &
289: 1.3(8) & 1.3(8) & *(140) & *(140) & *(140) \\ BiCG & *(140)
290: & *(140) & *(140) & *(140) & *(140) \\ BiCG(L) & *(140) &
291: *(140) & *(140) & *(140) & *(140) \\ BiCG(R) & *(140) & *(140)
292: & *(140) & *(140) & *(140) \\ Bi-CGSTAB & 1.3(4) & 1.3(4)
293: & 1.2(10) & 1.2(9) & 1.1(24) \\ Bi-CGSTAB(L) & 1.0(3) &
294: 1.0(3) & 1.0(8) & 1.0(7) & 1.0(21) \\ Bi-CGSTAB(R) &
295: 1.3(4) & 1.3(4) & 1.2(10) & 1.3(9) & 1.1(24) \\
296: Bi-CGSTAB(N)(L) & 1.4(2) & 1.4(2) & 1.5(6) & 4.6(17) &
297: *(140) \\ Bi-CGSTAB(N)(R) & 1.5(2) & 1.5(2) & 1.6(6) &
298: *(140) & *(140) \\ TFQMR & 3.3(6) & 3.3(6) & 3.3(14) &
299: 3.4(13) & 3.8(42) \\ TFQMR(L) & 2.8(5) & 2.8(5) &
300: 3.0(13) & 3.0(11) & 3.7(40) \\ TFQMR(R) & 3.4(6) & 3.4(6)
301: & 3.3(14) & 3.4(13) & 3.9(42) \\ CGS & 1.3(4) & 1.3(4)
302: & 1.3(11) & 1.4(10) & 1.6(34) \\ CGS(L) & 1.0(3) & 1.0(3)
303: & 1.2(10) & 1.3(9) & 1.1(23) \\ CGS(R) & 1.3(4) &
304: 1.3(4) & 1.3(11) & 1.4(10) & 1.6(33) \\ RGCR & 3.7(2) &
305: 4.5(2) & *(14) & 3.3(3) & *(14) \\ RGCR(L) & 3.2(2) &
306: 3.1(2) & 6.5(6) & 2.9(3) & *(14) \\ RGCR(R) & 3.7(2) &
307: 3.7(2) & 7.4(7) & 3.3(3) & *(14) \\ \hline
308: \end{tabular}
309: \end{table}
310:
311:
312:
313:
314:
315:
316:
317:
318:
319:
320: We recommend use of the stabilized version of the Bi-conjugate
321: gradient algorithm with left Jacobi preconditioning
322: [Bi-CGSTAB(L)]. This algorithms requires 67\% greater storage than
323: the CGNR algorithm, but is typically 2-3 times faster.
324:
325: The recent version of Discrete Dipole Approximation code DDSCAT5a
326: developed by Draine and Flatau contains improvements documented in
327: this paper. The code is available via anonymous ftp from the
328: \verb|ftp.astro.princeton.edu| site or from the Light Scattering
329: and Radiative Transfer Codes Library --- \verb|SCATTERLIB|
330: (\verb|http://atol.ucsd.edu/~pflatau|).
331:
332:
333:
334:
335:
336:
337:
338:
339: I have been supported in part by the Office of Naval Research
340: Young Investigator Program and in part by DuPont Corporate
341: Educational Assistance. I would like to thank Drs M. J. Wolff and
342: A. E. Ilin who helped with computer tests. Bruce Draine checked
343: the manuscript. Dr. R. J. Riegert of Du Pont if acknowledged for
344: his continuing interest in DDSCAT developments.
345:
346:
347:
348:
349:
350: \bibliography{all,cg,fft,local}
351:
352:
353: \end{document}
354: