physics0006064/ver3.tex
1: \documentclass[12pt]{article}
2: %\documentstyle[osa,osabib,manuscript]{revtex}
3: %\newcommand{\MF}{{\large{\manual META}\-{\manual FONT}}}
4: %\newcommand{\manual}{rm}        % Substitute rm (Roman) font.
5: %\newcommand\bs{\char '134 }     % add backslash char to \tt font
6: %
7: \usepackage{times}
8: 
9: \textheight 8.5truein \textwidth 6.5truein \hoffset=0truein
10: \voffset=0.truein
11: 
12: \begin{document}
13: 
14: \def\bA{{\bf A}}
15: \def\bP{{\bf P}}
16: \def\bE{{\bf E}}
17: \def\ba{{\bf a}}
18: \def\be{{\bf e}}
19: \def\bv{{\bf v}}
20: \def\bk{{\bf k}}
21: 
22: \bibliographystyle{prsty}
23: 
24: 
25: \title{Improvements of the Discrete Dipole Approximation method}
26: 
27: \author{Piotr J. Flatau \\
28: Scripps Institution of Oceanography, University of California, San
29: Diego, \\La Jolla, California 92093-0221}
30: 
31: 
32: 
33: \maketitle
34: 
35: \begin{abstract}
36: We report improvements in complex conjugate gradient algorithms
37: applied to the discrete dipole approximation (DDA). It is shown
38: that computational time is reduced by using  the Bi-CGSTAB version
39: of the CG algorithm, with diagonal left preconditioning.
40: 
41: Key words: scattering, non-spherical particles, discrete dipole
42: approximation.
43: 
44: \begin{center}
45: Optics Letters 1997, volume 22, number 16, 1205-1207.
46: \newline {\copyright\ Optical Society of America, 1997.}
47: \end{center}
48: \end{abstract}
49: 
50: 
51: 
52: 
53: The discrete-dipole approximation (DDA) is a flexible technique
54: for computing scattering and absorption by targets of arbitrary
55: geometry. In \cite{Draine94a} the discrete dipole approximation
56: (DDA) for scattering calculations is reviewed. Rather than
57: ``direct'' methods for solving linear system of equations arising
58: in DDA problem iterative methods for finding the solution have
59: proven effective and efficient.
60: 
61: In this paper we perform systematic study of various
62: non-stationary iterative (conjugate gradient) methods in search
63: for the most efficient one. We document implementation of these
64: methods in our public domain code DDSCAT.5a code\cite{Draine94a}
65: 
66: 
67: 
68: 
69: Numerical aspects of the discrete dipole approximation continue to
70: be of great interest. Yung \cite{Yung78a} applied a conjugate
71: gradient method to the in DDA approach. Hoekstra
72: \cite{Hoekstra94b} identifies Yung's scheme as the conjugate
73: gradient (CG) algorithm proposed by Hestenes  \cite{Hes52a}.
74: Rahola \cite{Rahola96a} discusses solution of dense systems of
75: linear equations in the discrete-dipole approximation and choice
76: of of the best iterative method in this application. Draine
77: \cite{Draine88a} implemented a conjugate gradient method based on
78: work of Petravic  and Kuo-Petravic. \cite{Petravic79a} This
79: implementation is quite robust and has been  used for many years.
80: \cite{Draine94a} However, Lumme and Rahola \cite{Lumme94a} applied
81: the quasi-minimal residual (QMR) conjugate gradient algorithm to
82: the system of linear equations arising in the DDA applications.
83: They claim that the QMR method is approximately 3 times faster in
84: comparison to the one  employed in the DDSCAT code.
85: \cite{Draine94a} It was this work which prompted us to perform the
86: analysis reported here.
87: 
88: 
89: 
90: 
91: 
92: PIM\cite{Cunha95a} is a collection of Fortran 77 routines designed
93: to solve systems of linear equations  on parallel and sequential
94: computers using a variety of iterative methods.
95: 
96: PIM contains implementations of various methods:
97: conjugate-gradient (CG); Conjugate-Gradients for normal equations
98: with minimization of the residual norm (CGNR); Conjugate-Gradients
99: for normal equations with minimization of the error norm (CGNE);
100: Bi-Conjugate-Gradients (Bi-CG); Conjugate-Gradients squared (CGS);
101: the stabilised version of Bi-Conjugate-Gradients (Bi-CGSTAB); the
102: restarted, stabilised version of Bi-Conjugate-Gradients
103: (RBi-CGSTAB); the restarted, generalized minimal residual
104: (RGMRES); the restarted, generalized conjugate residual (RGCR),
105: the quasi-minimal residual with coupled two-term recurrences
106: (QMR); the transpose-free quasi-minimal residual (TFQMR); and
107: Chebyshev acceleration. The routines allow the use of
108: preconditioners; the user may choose to use left-, right- or
109: symmetric-preconditioning.
110: 
111: 
112: The convergence rate of iterative methods depends on the
113: coefficient matrix. Hence one may attempt to transform the linear
114: system into one that is equivalent (in the sense that it has the
115: same solution) but is easier to solve. A preconditioner is a
116: matrix $M$ that effects such a transformation. It is possible to
117: introduce left- and right preconditioners.\cite{Barrett94a} The
118: simplest preconditioner consists of just the diagonal of the
119: coefficient matrix. This is known as the (point) Jacobi
120: preconditioner.
121: 
122: 
123: 
124: 
125: To compare these different algorithms we have used them to find
126: solutions to the problem of scattering by a homogeneous sphere.
127: The scattering problem is specified by the usual size parameter
128: $x= 2 \pi a / \lambda$, where a is the radius.
129: 
130: Tables~\ref{table1} and \ref{table4} presents the number of
131: iterations and CPU time for size parameter $x=0.1$ and $x=1$ and
132: for several values of refractive index. The conjugate gradient
133: methods are defined as above. Label (L) indicates left Jacobi
134: preconditioning. For example CGNE(L) is the conjugate gradient
135: method  for normal equations with minimization of the error norm
136: and left Jacobi preconditioning. Similarly, (R) indicates right
137: Jacobi preconditioning. CPU time (sequential Silicon Graphics
138: workstation) is normalized to the ``best'' method. Star indicates
139: that the method did not converge in the maximum allowed number of
140: iterations or that the method failed to converge. Fractional error
141: $10^{-5}$ was used as the stopping criterion. The DDSCAT.5a
142: code\cite{Draine94a}  with the newly implemented GPFA fast Fourier
143: transform method was used. For Bi-CGSTAB and CGNE we used left and
144: right Neumann polynomial preconditioner truncated after the first
145: term. Thus, Bi-CGSTAB(N)(L)  indicates the stabilised version of
146: Bi-Conjugate-Gradients method with left Neumann polynomial
147: preconditioner.
148: 
149: Table~\ref{table1} presents results for size parameter $x=0.1$ and
150: real refractive index ${\rm n}=1.33, 2, 3, 5$ as well as one case
151: with small complex part of refractive index ${\rm n}=(5, 0.0001)$
152: and  size parameter ${\rm x}=0.1$. In Table 1  the  CPU times  are
153: normalized to the CG(L)  method, which was found to be the best
154: method. For example it is 4.0 times faster in comparison with the
155: CGNE for ${\rm n}=(1.33,0)$. For larger values of real refractive
156: index the CGNE is almost an order of magnitude slower in
157: comparison to CG. This is because more iterations are needed for
158: the same convergence and because cost of one CG iteration is less
159: than cost of one CGNE iteration. The QMR algorithm is never
160: competitive and actually fails to converge for large real
161: refractive indices. For small refractive index the Bi-CGSTAB
162: algorithm is comparable to the CG and requires less iterations.
163: However, the cost per iteration is larger in comparison to CG
164: which offsets the advantage of lesser number of iterations. The
165: Petravic  and Kuo-Petravic \cite{Petravic79a} algorithm used by us
166: for many years \cite{Draine88a} is similar to CGNR and CGNE.
167: However, we observed on occasion slightly different convergence
168: rates due to stabilization of Petravic  and Kuo-Petravic algorithm
169: every 10th time step. \cite{Draine88a} This is true for all other
170: cases. The storage requirements of CG,  CGNE, CGNR is $6 \times
171: N$, for BiCG it is $8 \times N$, for CGS, Bi-CGSTAB, TFQMR it is
172: $10 \times N$, QMR requires $11 \times N$. Thus, for pure real
173: refractive index, the  CG is not only the fastest method but also
174: it requires the least amount of temporary storage. It can be seen
175: that left preconditioning by the inverse of diagonal of the DDA
176: matrix \cite{Draine94a}  reduces the number of iterations needed.
177: The added time needed for division by diagonal elements is
178: generally negligible in comparison to the time saved by smaller
179: amount of iterations. It can be seen that for  Bi-CGSTAB,
180: Bi-CGSTAB(L), and Bi-CGSTAB(R) the  left Jacobi preconditioning is
181: the only method converging for larger refractive index. Restarted
182: methods (RBi-CBSTAB and RGCR) appear to be not competitive but
183: further study may be needed (we used the orthogonal base of 10
184: vectors for all restarted methods). The CG method is also
185: competitive in cases with small absorption (see last column of
186: Table~\ref{table1}). We have also calculated (not presented)
187: results for size parameter of $x=0.1$ and increasing complex part
188: of refractive index $n=(1.33,0), (1.33, 0.01), (1.33,0.1),
189: (1.33,1), (1.33, 2), (1.33,3)$. The BiCGSTAB(L), which proved to
190: be   the most robust method. However the CGS(L) is competitive and
191: faster for $n=(1.33, 3)$. Both CGS(L) and BiCGSTAB(L) require the
192: same amount of iteration for convergence and their cost is
193: similar. These methods are between 2.9 and 1.6 times faster in
194: comparison to CGNR --- the method used in DDSCAT code. The QMR and
195: TFQMR which Lumme and Rahola \cite{Lumme94a} claim to be faster in
196: comparison to CGNR and the DDSCAT implementations do not converge
197: on occasion and when they work they are only slightly better in
198: this case. As before, left Jacobi preconditioning is almost always
199: beneficial. The CG(L) algorithm is faster than BiCGSTAB(L) for
200: refractive index $n=(1.33,0), (1.33,0.01), (1.33,0.1)$.
201: 
202: 
203: Table~\ref{table4} is for size parameter $x=1$. All the results
204: are normalized to Bi-CGSTAB(L). This method is clearly superior to
205: the CGNR method and it  is 2-4.3 faster. It can be seen that CGNR
206: converges slowly, and has not satisfied the stopping criterion in
207: 140 iterations for $n=(3,0.0001)$. For this larger value of size
208: parameter the QMR algorithm doesn't converge well but its smooth
209: version TFQMR does. However, TFQMR is slower in comparison to
210: Bi-CGSTAB(L) and comparable to CGNR. The CG(L) method for
211: refractive index $n=(1.33,0)$ and $n=1.33,0.01$ is faster than the
212: reference scheme Bi-CGSTAB(L). It can be seen that the Neumann
213: polynomial preconditioning Bi-CGSTAB(N)(L) or Bi-CGSTAB(N)(R) does
214: reduce the number of iterations needed for certain cases of
215: refractive index.  However the cost associated with the additional
216: calculations always offsets this improved convergence rate. As
217: before, the  left Jacobi preconditioner is superior to right or
218: no-preconditioner cases. CG(L) works well for small refractive
219: index but is comparable to Bi-CGSTAB(L). The QMR algorithm fails
220: to converge but the transpose-free quasi-minimal residual (TFQMR)
221: algorithm converges well and is comparable to CGNR. The CG method
222: is theoretically valid for Hermitian positive definite matrices.
223: The matrix arising in the DDA is not Hermitian but symmetric.
224: Therefore, strictly speaking, the CG method  is not valid for use
225: in the DDA. The users are advised to test the CG  method when
226: extrapolating results presented here to different size parameters,
227: particle sizes,  and  refractive index values.
228: \begin{table}[ht]
229: \caption{\label{table1}CPU time (normalized)  and number of
230: iterations for x=0.1.}
231: \begin{tabular}{lccccc}
232:  Method  & n=(1.33,0) & (2,0) & (3,0) & (5,0) & (5,0.0001)  \\
233: \hline CGNE &    4.0(9) &    4.9(24) &    8.7(76) &  *(540) &
234: *(540) \\ CGNE(L) &    3.3(7) &    4.0(19) &    7.8(67) &  *(540)
235: &  *(540) \\ CGNE(R) &    4.1(9) &    4.9(24) &    8.8(76) &
236: *(540) &  *(540) \\ CGNE(N)(L) &    3.1(4) &    4.1(13) &
237: 19.3(113) &  *(540) &  *(540) \\ CGNE(N)(R) &    4.4(6) & 7.7(25)
238: &  *(140) &  *(540) &  *(540) \\ CGNR &    4.0(9) & 4.7(23) &
239: 8.0(69) &  *(540) &  *(540) \\ CGNR(L) &    3.3(7) & 4.0(19) &
240: 5.9(50) &    4.6(329) &    3.5(330) \\ CGNR(R) & 4.1(9) &
241: 4.7(23) &    8.0(69) &  *(540) &  *(540) \\ QMR & 3.7(6) &
242: 3.3(11) &    3.3(19) &  *(111) &  *(78) \\ QMR(L) & 2.6(4) &
243: 2.8(9) &    2.7(15) &  *(75) &  *(92) \\ QMR(R) & 3.8(6) &
244: 3.4(11) &    3.4(19) &  *(268) &  *(540) \\ CG & 1.4(6) &
245: 1.2(11) &    1.2(20) &    1.2(163) &    1.1(213) \\ CG(L) &
246: 1.0(4) &    1.0(9) &    1.0(16) &    1.0(138) & 1.0(182) \\ CG(R)
247: &    1.4(6) &    1.2(11) &    1.2(20) & 1.1(157) &    1.2(213) \\
248: BiCG &    2.3(6) &    2.1(11) &  *(140) &  *(540) &  *(540) \\
249: BiCG(L) &    1.6(4) &    1.8(9) &  *(140) & *(540) &  *(540) \\
250: BiCG(R) &    2.4(6) &    2.2(11) &  *(140) & *(540) &  *(540) \\
251: Bi-CGSTAB &    1.8(4) &    1.5(7) &    1.5(13) &  *(540) &  *(540)
252: \\ Bi-CGSTAB(L) &    1.4(3) &    1.3(6) & 1.3(11) &    4.0(281) &
253: 4.2(388) \\ Bi-CGSTAB(R) &    1.8(4) & 1.5(7) &    1.6(13) &
254: *(540) &  *(540) \\ Bi-CGSTAB(N)(L) & 1.9(2) &    6.8(17) &
255: 14.9(65) &  *(540) &  *(540) \\ Bi-CGSTAB(N)(R) &    2.1(2) &
256: 13.1(33) &   27.9(122) &  *(540) & *(540) \\ TFQMR &    3.8(5) &
257: 3.4(9) &    4.3(19) &  *(540) & *(540) \\ TFQMR(L) &    3.1(4) &
258: 3.1(8) &    4.1(18) &  *(540) &  *(540) \\ TFQMR(R) &    3.9(5) &
259: 3.5(9) &    4.4(19) & *(540) &  *(540) \\ CGS &    1.7(4) &
260: 1.5(7) &    1.4(12) & *(540) &  *(540) \\ CGS(L) &    1.4(3) &
261: 1.3(6) &    1.2(10) & *(540) &  *(540) \\ CGS(R) &    1.8(4) &
262: 1.5(7) &    1.4(12) & *(540) &  *(540) \\ RGCR &    4.3(2) &
263: 2.8(2) &    2.5(3) & *(14) &  *(14) \\ RGCR(L) &    2.0(1) &
264: 2.4(2) &    2.1(2) & *(14) &  *(14) \\ RGCR(R) &    4.4(2) &
265: 2.8(2) &    2.7(3) & *(14) &  *(14) \\ RBi-CGSTAB &  *(12) &
266: *(12) &  *(12) &  *(12) & *(12) \\ RBi-CGSTAB(L) &  *(12) &  *(12)
267: &  *(12) &  *(12) & *(12) \\ RBi-CGSTAB(R) &  *(12) &  *(12) &
268: *(12) &  *(12) & *(12) \\ \hline
269: \end{tabular}
270: \end{table}
271: \begin{table}[ht]
272: \caption{\label{table4}CPU time (normalized)  and number of
273: iterations for x=1}
274: \begin{tabular}{lccccc}
275:  Method  & n=(1.33,0) & (1.33,0.01) & (1.33,1) & (2,0) & (3,0.0001)  \\
276: \hline CGNE &    3.2(10) &    3.2(10) &    2.0(16) &    4.5(33) &
277: *(140) \\ CGNE(L) &    2.7(8) &    2.6(8) &    1.7(13) & 3.8(27) &
278: *(140) \\ CGNE(R) &    3.2(10) &    3.2(10) & 2.0(16) &    4.6(33)
279: &  *(140) \\ CGNE(N)(L) &    2.6(5) & 2.6(5) &  *(140) &  *(140) &
280: *(140) \\ CGNE(N)(R) &    3.6(7) & 3.6(7) &  *(140) &  *(140) &
281: *(140) \\ CGNR &    3.5(11) & 3.5(11) &    2.0(16) &    4.3(32) &
282: *(140) \\ CGNR(L) &    2.7(8) &    2.6(8) &    1.7(13) &
283: 3.7(27) &  *(140) \\ CGNR(R) & 3.5(11) &    3.5(11) &    2.0(16) &
284: 4.4(32) &  *(140) \\ QMR & *(47) &  *(58) &  *(25) &  *(76) &
285: *(50) \\ QMR(L) &    5.3(12) & *(59) &  *(21) &  *(71) &  *(39) \\
286: QMR(R) &  *(52) &  *(63) & *(22) &  *(70) &  *(37) \\ CG &
287: 1.3(8) &    1.3(8) &  *(140) & *(140) &  *(140) \\ CG(L) &
288: 0.9(5) &    0.9(5) &  *(140) & *(140) &  *(140) \\ CG(R) &
289: 1.3(8) &    1.3(8) &  *(140) & *(140) &  *(140) \\ BiCG &  *(140)
290: &  *(140) &  *(140) &  *(140) & *(140) \\ BiCG(L) &  *(140) &
291: *(140) &  *(140) &  *(140) & *(140) \\ BiCG(R) &  *(140) &  *(140)
292: &  *(140) &  *(140) & *(140) \\ Bi-CGSTAB &    1.3(4) &    1.3(4)
293: &    1.2(10) & 1.2(9) &    1.1(24) \\ Bi-CGSTAB(L) &    1.0(3) &
294: 1.0(3) & 1.0(8) &    1.0(7) &    1.0(21) \\ Bi-CGSTAB(R) &
295: 1.3(4) & 1.3(4) &    1.2(10) &    1.3(9) &    1.1(24) \\
296: Bi-CGSTAB(N)(L) & 1.4(2) &    1.4(2) &    1.5(6) &    4.6(17) &
297: *(140) \\ Bi-CGSTAB(N)(R) &    1.5(2) &    1.5(2) &    1.6(6) &
298: *(140) & *(140) \\ TFQMR &    3.3(6) &    3.3(6) &    3.3(14) &
299: 3.4(13) &    3.8(42) \\ TFQMR(L) &    2.8(5) &    2.8(5) &
300: 3.0(13) & 3.0(11) &    3.7(40) \\ TFQMR(R) &    3.4(6) &    3.4(6)
301: & 3.3(14) &    3.4(13) &    3.9(42) \\ CGS &    1.3(4) &    1.3(4)
302: & 1.3(11) &    1.4(10) &    1.6(34) \\ CGS(L) &    1.0(3) & 1.0(3)
303: &    1.2(10) &    1.3(9) &    1.1(23) \\ CGS(R) &    1.3(4) &
304: 1.3(4) &    1.3(11) &    1.4(10) &    1.6(33) \\ RGCR & 3.7(2) &
305: 4.5(2) &  *(14) &    3.3(3) &  *(14) \\ RGCR(L) & 3.2(2) &
306: 3.1(2) &    6.5(6) &    2.9(3) &  *(14) \\ RGCR(R) & 3.7(2) &
307: 3.7(2) &    7.4(7) &    3.3(3) &  *(14) \\ \hline
308: \end{tabular}
309: \end{table}
310: 
311: 
312: 
313: 
314: 
315: 
316: 
317: 
318: 
319: 
320: We recommend use of the stabilized version of the Bi-conjugate
321: gradient algorithm with left Jacobi preconditioning
322: [Bi-CGSTAB(L)]. This algorithms requires 67\% greater storage than
323: the CGNR algorithm, but is typically 2-3 times faster.
324: 
325: The recent version of Discrete Dipole Approximation code DDSCAT5a
326: developed by Draine and Flatau contains improvements documented in
327: this paper. The code is available via anonymous ftp from the
328: \verb|ftp.astro.princeton.edu| site or from the Light Scattering
329: and Radiative Transfer Codes Library --- \verb|SCATTERLIB|
330: (\verb|http://atol.ucsd.edu/~pflatau|).
331: 
332: 
333: 
334: 
335: 
336: 
337: 
338: 
339: I have been supported in part by the Office of Naval Research
340: Young Investigator Program and in part by DuPont Corporate
341: Educational Assistance. I would like to thank Drs M. J. Wolff and
342: A. E. Ilin who  helped with computer tests. Bruce Draine checked
343: the manuscript. Dr. R. J.  Riegert of Du Pont if acknowledged for
344: his continuing interest in DDSCAT developments.
345: 
346: 
347: 
348: 
349: 
350: \bibliography{all,cg,fft,local}
351: 
352: 
353: \end{document}
354: