1: \documentclass[letter,useAMS,usenatbib]{mn2e}
2:
3: \usepackage[english]{babel} \usepackage{subfigure}
4: \usepackage{graphicx}
5:
6: \usepackage[fleqn]{amsmath}
7: \usepackage{color}
8:
9: \usepackage[varg]{txfonts}
10:
11: \citestyle{aa}
12:
13: \bibliographystyle{mn2e}
14:
15: \topmargin -1.3cm
16:
17: %%%%%%%%%%%%%%% Author definitions %%%%%%%%%%%%%%%%%%%%%%
18: %%%%% 1. Journals
19:
20: \newcommand{\aj}{AJ} % Astronomical Journal
21: \newcommand{\aap}{A\&A} % Astronomy and Astrophysics
22: \newcommand{\aaps}{A\&AS} % Astronomy and Astrophysics Supplement Series
23: \newcommand{\apj}{ApJ} % Astrophysical Journal
24: \newcommand{\apjs}{ApJS} % Astrophysical Journal Supplement Series
25: \newcommand{\apjl}{ApJL} % Astrophysical Journal Letters
26: \newcommand{\araa}{ARAA} % Annual Reviews in Astronomy and Astrophysics
27: \newcommand{\mnras}{MNRAS} % Monthly Notices of the Royal Astronomical Society
28:
29: \newcommand{\noi}{\noindent}
30:
31: %%%%%%%%%%%%%%% Title %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
32:
33: \title[Strong lensing on adaptive grids]{Bayesian strong gravitational-lens modelling on adaptive
34: grids:\\ objective detection of mass substructure in galaxies}
35:
36: \author[S. Vegetti \& L. V. E. Koopmans.]{ Simona Vegetti\thanks{E-mail:
37: vegetti@astro.rug.nl} \& L. V. E. Koopmans\\ Kapteyn
38: Astronomical Institute, University of Groningen, PO Box 800,
39: 9700\,AV Groningen, the Netherlands}
40:
41: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
42:
43: \begin{document}
44:
45: \date{Accepted for publication on MNRAS}
46:
47: \pagerange{\pageref{firstpage}--\pageref{lastpage}} \pubyear{2008}
48:
49: \maketitle
50:
51: \label{firstpage}
52:
53: \begin{abstract}
54:
55: We introduce a new adaptive and fully Bayesian grid-based method
56: to model strong gravitational lenses with extended images. The
57: primary goal of this method is to quantify the level of luminous
58: and dark-mass substructure in massive galaxies, through their
59: effect on highly-magnified arcs and Einstein rings. The method is
60: adaptive on the source plane, where a Delaunay tessellation is
61: defined according to the lens mapping of a regular grid onto the
62: source plane. The Bayesian penalty function allows us to recover
63: the best non-linear potential-model parameters and/or a grid-based
64: potential correction and to objectively quantify the level of
65: regularization for both the source and the potential. In addition,
66: we implement a Nested-Sampling technique to quantify the
67: errors on all non-linear mass model parameters -- marginalized
68: over all source and regularization parameters -- and allow an
69: objective ranking of different potential models in terms of the
70: marginalized evidence. In particular, we are interested in
71: comparing very smooth lens mass models with ones that
72: contain mass-substructures. The algorithm has been tested on a range
73: of simulated data sets, created from a model of a realistic
74: lens system. One of the lens systems is characterized by a smooth
75: potential with a power-law density profile, twelve
76: include a Navarro, Frenk and White (NFW) dark-matter substructure of different masses and at
77: different positions and one contains two NFW dark substructures with the
78: same mass but with different positions.
79: Reconstruction of the source and of the lens
80: potential for all of these systems shows the method is able, in a
81: realistic scenario, to identify perturbations with masses $\ga 10^7\rm
82: M_\odot$ when located \emph{on} the Einstein ring. For
83: positions both inside and outside of the ring, masses of at least
84: $10^9\rm M_\odot$ are required (i.e. roughly the Einstein ring of
85: the perturber needs to overlap with that of the main lens). Our
86: method provides a fully novel and objective test of mass
87: substructure in massive galaxies.
88:
89: \end{abstract}
90:
91: \begin{keywords}
92: gravitational lensing --- dark matter --- galaxies: structure ---
93: galaxies: haloes
94: \end{keywords}
95:
96:
97: \section{Introduction}
98:
99: At the present time, the most popular cosmological model for
100: structure formation is the $\Lambda \text{CDM}$ paradigm. While this
101: model has been very successful in describing the Universe on large
102: scales and in reproducing numerous observational results
103: \citep[e.g.,][]{Reiss98, Efstathiou02, Burles01,
104: Philips01, Jaffe01, Percival01, deBernardis02, Hamilton02, Croft02,
105: Tonry03, Spergel03, Komatsu08}, important discrepancies still
106: persist on small scales. In particular, some of these involve the
107: dark matter distribution within galactic haloes
108: \citep[e.g.,][]{Moore94, Burkert95, McGaugh98,
109: Binney01, Blok01, deBlok02, McGaugh03, Simon03,Rhee04,Kuzio06}
110: and the number of galaxy satellites, i.e the
111: \emph{Missing Satellite Problem}.
112:
113: \noi According to the standard scenario, structures form in a
114: hierarchical fashion via merging and accretion of smaller objects
115: \citep{Toomre77, Frenk88, White91, Barnes92, Cole00}. As shown by
116: the latest numerical simulations, in which high mass and force
117: resolution is achieved, the progenitor population is only weakly
118: affected by virialization processes and a large number of sub-haloes
119: is able to survive after merging. The number of substructures
120: within the Local Group, however, is predicted to be 1-2 orders of
121: magnitude higher than what is effectively observed
122: \citep[e.g.,][]{Kauffmann93, Moore99, Klypin99,
123: Moore01,Diemand07b,Diemand07a}.
124:
125: \noi Two different classes of solutions have been suggested to
126: alleviate this problem, cosmological and astrophysical. Cosmological
127: solutions address the basis of the $\Lambda \text{CDM}$ paradigm
128: itself and mostly concentrate on the properties of the dark matter,
129: allowing for example, for a warm \citep{Colin00}, decaying
130: \citep{Cen01}, self-interacting \citep{Spergel00}, repulsive
131: \citep{Goodman00}, or annihilating nature
132: \citep{Riotto00}. Alternatively the $\Lambda \text{CDM}$ picture can
133: be modified by the introduction of a break of the power-spectrum at
134: the small scales \citep[e.g.,][]{Kamionkowski00, Zentner03}.
135:
136: \noi From an astrophysical point of view, the number of visible
137: satellites can be reduced by suppressing the gas collapse/cooling
138: \citep[e.g.,][]{Bullock00, Kravtsov04, Moore06} via supernova
139: feedback, photoionization or reionization. This would result in a
140: high mass-to-light ratio ($M/L$) in the substructures. If these
141: high-$M/L$ substructures indeed exist, different methods
142: for indirect detection are possible. The dark substructure may be
143: detectable for example through its effects on stellar streams
144: \citep[e.g.,][]{Ibata02, Mayer02}, via $\gamma$-rays from dark
145: matter annihilation \citep{Bergstrom99, Calcaneo00, Stoehr03,
146: Colafrancesco06} or through gravitational lensing \citep[e.g.,][]{Dalal02,
147: Koopmans05}.
148:
149: \noi While the first two approaches are limited to the local
150: Universe, gravitational lensing allows one to explore the mass
151: distribution of galaxies outside the Local Group and at a relatively
152: high redshift. Moreover, gravitational lensing is independent of the
153: baryonic content, of the dynamical state of the system and of the
154: nature of dark matter. For example, when in a lens system a point source is close to the caustic fold or cusp, the sum of the image fluxes should add to zero if the sign of the image parities
155: is taken into account \citep{Blandford86,Zakharov95}. This relation is, however, violated by
156: many observed lensed quasars with cusp and
157: fold images.
158: As first suggested by \citet{Mao98}, these flux ratio anomalies
159: can be related to the presence of (dark matter) substructure around the
160: lensing galaxy on scales smaller than the image
161: separation \citep{Bradac02, Chiba02, Dalal02,
162: Metcalf02, Keeton03, Kochanek04, Bradac04, Keeton05}.
163: Nevertheless subsequent studies of similar
164: gravitationally lensed systems have shown that
165: the required mass fraction in substructure is higher than what is
166: obtained in numerical simulations \citep{Mao04, Maccio06,Diemand07b}. In
167: addition, for a significant number of cases the observed flux ratio
168: anomalies can be explained by taking into account the luminous dwarf
169: satellite population \citep{Trotter00, Ros00,
170: Koopmans02, Kochanek04, Chen07, McKean07, More08}. Whether the mass fraction
171: of CDM substructures is quantifiable via flux ratio anomalies is
172: therefore a question still open for debate. Alternatively,
173: \citet{Koopmans05} showed that dark matter substructure in lensing
174: galaxies can be detected by modelling of multiple images or Einstein
175: rings from extended sources. \\
176:
177: \noi In this paper, we developed an adaptive grid-based modelling
178: code for extended lensed sources and grid-based potentials, to fully
179: quantify this procedure. The method presented here is a significant
180: improvement of the techniques introduced by \citet{Warren03},
181: \citet{Dye05}, \citet{Koopmans05}, \citet{Suyu106},
182: \citet{suyu206} and \citet{Brewer06}. In order to detect mass substructure in lens
183: galaxies one needs to solve simultaneously for the source surface
184: brightness distribution and the lens potential. A semilinear
185: technique for the reconstruction of grid-based sources, given a
186: parametric lens potential, was first introduced by
187: \citet{Warren03}. The method was subsequently extended by
188: \citet{Koopmans05} and \citet{Suyu106} in order to include a
189: grid-based potential for the lens and by \citet{Barnabe07} to
190: include galaxy dynamics. \citet{Dye05} introduced an
191: adaptive gridding on the source plane; this would minimize the
192: covariance between pixels and decrease the computational
193: effort. However the method is still lacking an objective procedure
194: to quantify the level of regularization. \citet{suyu206} and \citet{Brewer06} encoded the
195: semi-linear method within the framework of Bayesian statistics
196: \citep{MacKay92, MacKay03}. Although a vast improvement, the fixed
197: grids do not allow to take into account the correct number of
198: degrees of freedom and proper evidence comparison is difficult.
199: In the implementation here described, these issues have
200: been solved:
201:
202: \smallskip
203:
204: \noi {\bf (i)} the procedure is fully Bayesian; this allows us to
205: determine the best set of non-linear parameters for a given
206: potential and the linear parameters of the source, to objectively
207: set the level of regularization and to compare/rank different
208: potential families;
209:
210: \smallskip
211:
212: \noi {\bf (ii)} using a Delaunay tessellation, the source grid
213: automatically adaptives in such a way that the computational effort
214: is mostly concentrated in high magnification regions;
215:
216:
217: \smallskip
218:
219: \noi {\bf (iii)} the source-grid triangles are re-computed at every
220: step of the modelling so that the source and the image plane always
221: perfectly map onto each other and the number of degrees of freedom
222: remains constant during Bayesian evidence maximisation.
223:
224: \smallskip
225:
226: \noi For the first time in the framework of grid-based lensing
227: modelling, we use the Nested-Sampling technique by
228: \citet{Skilling04} to compute the full marginalized Bayesian
229: evidence of the data \citep{MacKay92, MacKay03}. This approach not
230: only provides statistical errors on the lens parameters, but also
231: consistently quantifies the relative evidence of a smooth potential
232: against one containing substructures. As such, our method
233: provides a fully objective way to rank these two hypotheses given
234: the data, which is the goal set out in this paper.
235:
236: \noi The paper is organized as follow. In Section 2 we give a
237: general overview on the data model. In Section 3 we present in
238: detail how the data model can be inverted and the source and lens
239: potential reconstructed. In Section 4 we review the basics of
240: Bayesian statistics and of the Nested-Sampling technique for
241: evidence computation. In Section 5 we describe how the method has
242: been tested and how its ability in detecting substructures,
243: depending on the perturbation mass and position, has been
244: studied. Finally in Section 6 conclusions are drawn and future
245: applications are discussed.
246:
247: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
248:
249: \section{Construction of the lensing operators}
250:
251: In this section, we describe the data model which relates the
252: unknown source brightness distribution and lens potential to the
253: known data of the lensed images. The aim is to put this procedure in
254: a fully self-consistent mathematical framework, excluding as much as
255: possible any subjective intervention into the modelling. The core
256: of the method presented here is based on a Occam's razor argument.
257: From a Bayesian evidence point of view, correlated features in the
258: lensed images are most likely due to structure in the source, rather
259: than being the result of small-scale perturbations of the lens
260: potential in front of all the lensed images. On the other hand,
261: uncorrelated structure in the lensed images is most likely due to
262: small-scale perturbations of the lens potential.
263:
264: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
265:
266: \subsection{The data, source and potential grids}\label{sec:grids}
267: The main idea of grid-based lensing techniques is to use a
268: grid-based reconstruction of the source and of the lens potential.
269: Here we introduce the general geometry of the problem, explicitly
270: shown in Fig. \ref{fig:grid}. Consider a lensed image $\bmath d$
271: of an unknown extended source $\bmath s$. Both $\bmath d$ and
272: $\bmath s$ are vectors that describe the surface brightness
273: distributions on a set of spatial points $\bmath x_i^d$ and $\bmath
274: y_j^s$ in the lens and source plane, respectively
275: \citep[e.g.,][]{Warren03,Koopmans05, suyu206}. In general, these are
276: related through the lens equation ${\bmath y_i^d} = {\bmath x_i^d} -
277: {\bmath \nabla} \psi({\bmath x_i^d})$, where ${\bmath x}_i^d$
278: corresponds to the spatial position of the surface brightness in the
279: $ith$ element of the vector $\bmath d$, i.e. $d_i$ and $\psi({\bmath x_i^d})$
280: is the lensing potential, which is described in more detail in a moment.
281: We note that ${\bmath y}_i^d$ does not necessarily directly correspond to the
282: elements $\bmath y_j^s$, $jth$ brightness value
283: of the vector $\bmath s$. In our implementation, the grid on the
284: source plane is fully adaptive and is directly constructed from a
285: subset of the $N_d$ pixels in the image plane, with spatial
286: boundaries of the image grid included. In particular, as shown
287: schematically in Fig. \ref{fig:grid}, $N_s$ pixels, located each
288: at a position $\bmath x_i^s$ on the image grid, are cast back to the
289: source plane giving the positions $\bmath y_j^s$.
290: The set of positions $\{ \bmath y_i^s \}$ constitute
291: the vertices of a Delaunay triangulation. In this way, we define an
292: irregular adaptive grid, where vertex positions in the source plane
293: are related to positions on the image plane via the lens equation
294: and every vertex value represents an unknown source surface
295: brightness level.
296:
297: \noi We assume the lens potential to be the
298: superposition of a parametric smooth component with linear local
299: perturbations related to the presence of e.g. CDM substructures or
300: dwarf galaxies:
301: %
302: \begin{equation}
303: \psi(\bmath x,\bmath \eta)=\psi_s(\bmath x,\bmath
304: \eta)+\delta\psi(\bmath x).
305: \end{equation}
306: %
307: While $\psi_s(\bmath x,\bmath \eta)$ assumes a parametric form,
308: with parameters $\bmath\eta$, $\delta \psi(\bmath x)$ is a function
309: that is pixelized on a regular Cartesian grid of points $\bmath
310: x_k^{\delta\psi}$ with values
311: $\delta \psi_k$. The set $\{\delta \psi_k\}$ is written as a vector
312: $\delta\bmath{\psi}$. Given the observational set of data $\bmath d$,
313: we now wish to recover the source distribution $\bmath s$ and the
314: lens potential $\psi({\bmath x}, \bmath\eta)$ simultaneously. To do
315: this we need to mathematically relate the brightness values $\bmath
316: d$ to the unknown brightness values $\bmath s$. As described in the
317: next subsection, this can be done through a linear operation on
318: $\bmath s$ and $\delta \bmath{\psi}$, where the operator itself is a
319: function of an initial guess of the lens potential.
320:
321:
322:
323: \begin{figure}
324: \begin{center}
325: \includegraphics[width=\hsize]{fig1}
326: \caption {A schematic overview of the non-linear source and
327: potential reconstruction method, as implemented in this
328: paper. On the left hand-side, on the image plane, two grids
329: are defined: one for the potential corrections and one for the
330: lensed image. A subset of $N_s$ of the $N_d$ image pixels
331: located at the positions $\bmath x^s_i$ on the image plane
332: (filled circles) is cast back to the source plane (on the
333: right) on $\bmath{y}^s_i$ through the lens equation. These
334: form the vertices of an adaptive grid on the source plane. The
335: remaining image pixels (open circles) are also cast to the
336: source plane to the positions $\bmath{y}_i^d$ (we note that
337: this set of points includes $\bmath{y}^s_i$). Because the
338: source brightness distribution is conserved, i.e $S(\bmath
339: x^d_i)=S(\bmath y^d_i)$, the surface brightness at the empty
340: circles is represented by a linear superposition of the
341: surface brightness at the three triangle vertices that enclose
342: it. Similarly the potential correction at a point
343: $\bmath{x}_i^{\delta\psi}$ is given by linear interpolation of
344: the potential corrections at the surrounding pixels (large
345: rectangular pixels on the image plane). }
346: \label{fig:grid}
347: \end{center}
348: \end{figure}
349:
350:
351: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
352:
353: \subsection{The source and potential operator}
354:
355: We now derive the explicit relation between the unknown source
356: distribution $\bmath s$, the potential correction $\delta
357: \bmath{\psi}$, the smooth potential $\psi_s(\bmath x,\bmath\eta)$
358: and the image brightness $\bmath d$.
359:
360: \noi Consider a generic triangle $\widehat{\rm{ABC}}$ on the source
361: plane (Fig. \ref{fig:single}), then the source surface brightness
362: ${s_{\rm P}}$ on a point P, located inside the triangle at the
363: position ${\bmath y}_{\rm P}^d$, can be related to the surface brightness on
364: the vertices A, B and C through a simple linear relation
365: %
366: \begin{equation}
367: {s_{\rm P}}=w_{\rm A}{s_{\rm A}}+w_{\rm B}{s_{\rm B}}+w_{\rm
368: C}{s_{\rm C}}\,.
369: \end{equation}
370: %
371: \noi An explicit expression for the bilinear interpolation weights
372: $w_{\rm{A}}$, $w_{\rm B}$ and $w_{\rm C}$ can be obtained by
373: considering the point $\rm P_1 $, at the intersection of the line
374: $\overline{\rm {AP}}$ with the line $\overline{\rm{CB}}$. The source
375: intensities at P and $\rm P_1$ are also related to each other
376: through a linear interpolation. On the other hand, the surface
377: brightness in $\rm P_1$ is directly related to the values on the
378: triangle vertices $\rm B$ and $\rm C$
379: %
380: \begin{equation}
381: \left\{
382: \begin{array}{l}
383: s_{\rm P} = \frac{d_{\rm {PA} }
384: }{d_{\rm{P_1A}}}(s_{\rm{P_1}}-s_{\rm A})+s_{\rm A}\\ s_{\rm
385: {P_1}}= \frac{ d_{ \rm {P_1B} } }{ d_{\rm{CB}} }(s_{\rm
386: C}-s_{\rm B})+s_{\rm B}
387: \end{array}
388: \right.\,
389: \label{equ:arr}
390: \end{equation}
391: %
392: \noi where $d_{\rm {PA}}$ and $d_{\rm {P_1A}}$ are the absolute
393: distances between the points P and A and the points $\rm P_1$ and A
394: respectively; $d_{ \rm{P_1B}}$ and $d_{\rm {CB}}$ are the distances
395: between the points $\rm {P_1}$ and B and the points C and B
396: respectively. Solving (\ref{equ:arr}), we obtain the weights
397: %
398: \begin{equation}
399: \left\{
400: \begin{array}{l}
401: w_{\rm A}= 1-\frac{d_{\rm {PA}}}{d_{\rm{P_1A}}}\\ w_{\rm B}=
402: \frac{d_{\rm{PA}}}{d_{\rm{P_1A}}}
403: \left(1-\frac{d_{\rm{P_1B}}}{d_{\rm{CB}}}\right)\\
404: w_{\rm C}=
405: \frac{d_{\rm{PA}}d_{\rm{P_1B}}}{d_{\rm{P_1A}}d_{\rm{CB}}}
406: \end{array}
407: \right.\,
408: \end{equation}
409: %
410: \noi with $\sum_{i=\rm A,\rm B,\rm C }{w_i}=1$. Because
411: gravitational lensing conserves the surface brightness, i.e.\ $S(\bmath
412: x_i^d) = S(\bmath y_i^d)$, the mapping between the two planes (when
413: $\delta\bmath\psi=0$) can be expressed as a system of $N_s$ coupled
414: linear equations
415: %
416: \begin{equation}
417: \mathbf{B\,L}(\bmath \eta)\bmath s =\bmath d + \bmath n\,,
418: \label{equ: src_linear_blurred}
419: \end{equation}
420: %
421: where $\mathbf L(\bmath \eta)$ and $\mathbf B$ are the lensing and
422: the blurring operators respectively \citep[see e.g.][]{Warren03,
423: Treu04, Koopmans05, Suyu106}. The blurring operator is a square
424: sparse matrix which accounts for the effects of the PSF. Each row of
425: the lensing operator (a sparse matrix) contains at most the three
426: bilinear interpolation weights, $w_{\rm A ,B, C}$, placed at the columns that
427: correspond to the three source vertices that enclose the associated
428: source position. For a vertex point, there is only one weight equal
429: to unity. In case $N_s = N_d$ (i.e.\ all image positions are used to
430: create the source grid), all weights are equal to unity. In this
431: case, the systems of equations is under-constrained and strong
432: regularization is required.
433:
434: \noi By pixelating $\delta \psi(\bmath x)$ on a regular Cartesian
435: grid, a similar argument as for the source can be applied to the
436: potential correction; all potential values, $\{\delta \psi_k\}$, and
437: their derivatives on the image plane can be related to this limited
438: set of points through bilinear interpolation
439: \citep[see][]{Koopmans05, Suyu08}. It is then possible to derive from
440: equation~(\ref{equ: src_linear_blurred}) a new set of linear
441: equations,
442: %
443: \begin{equation}
444: \mathbf{M_c}\left(\bmath{\eta},\bmath\psi\right)\,\bmath r = \bmath d +
445: \bmath n,
446: \label{equ: src_pot_linear_blurred}
447: \end{equation}
448: %
449: where
450: %
451: \begin{equation}
452: \bmath r\equiv\left(
453: \begin{array}{c}
454: \bmath s\\ \delta\bmath \psi
455: \end{array}
456: \right)\,.
457: \end{equation}
458: %
459: \noi More specifically, $\bmath\psi$ is the sum of all the previous
460: corrections $\delta\bmath\psi$ and the operator $\mathbf{M_c}$ is a
461: block matrix reading
462: \begin{equation}
463: \mathbf{M_c}\equiv \mathbf B \left [\mathbf L(\bmath \eta, \bmath
464: \psi)\, | -\mathbf{D_s}(\bmath s_{\rm MP})\mathbf {D_{\psi}}\\
465: \right]\,.
466: \label{equ:block_matrix}
467: \end{equation}
468: %
469: \noi ${\mathbf L}({\bmath \eta}, {\bmath \psi})$ is the
470: lensing operator introduced above, $\mathbf{D_s}(\bmath s_{\rm MP})$
471: is a sparse matrix whose entries depend on the surface brightness
472: gradient of the previously-best source model at $\bmath{y}^d_i$ and
473: $\mathbf{D_\psi}$ is a matrix that determines the gradient of
474: $\delta\bmath\psi$ at all corresponding points $\bmath{x}^d_i$
475: \citep[see] [for details]{Koopmans05}. The generic structure of
476: these matrices is given by
477: %
478: \begin{equation}
479: \mathbf{D_{s}}= \left(
480: \begin{array}{ccccc}
481: ...&&& \\ \\ &\frac{\partial S({\bmath y}^d_i)}{\partial y_1}&
482: \frac{\partial S({\bmath y}^d_i)}{\partial y_2} & \\ \\ &
483: &\frac{\partial S({\bmath y}^d_{i+1})}{\partial y_1}&
484: \frac{\partial S({\bmath y}^d_{i+1})}{\partial y_2} \\ \\ & & & & ...\\
485: \end{array}
486: \right)
487: \end{equation}
488: %
489: and
490: %
491: \begin{equation}
492: \mathbf{D_{\delta\psi}}= \left(
493: \begin{array}{ccccc}
494: ...&\\ \\ &\frac{\partial \delta\psi (\bmath{x}^d_i)}{\partial
495: x_1}&\\ &\frac{\partial \delta\psi (\bmath{x}^d_i)}{\partial
496: x_2} & \\ \\ &&\frac{\partial \delta\psi (\bmath{x}^d_{i+1})}{\partial
497: x_1} \\ &&\frac{\partial \delta\psi (\bmath{x}^d_{i+1})}{\partial
498: x_2} \\ & & & &...\\
499: \end{array}
500: \right)
501: \end{equation}
502: %
503: where the index $i$ runs along all the $\bmath{x}_i^d$ and $\bmath{y}_i^d$,
504: i.e. triangle vertices included. The ``functions'' $S$ and $\delta
505: \psi$ and their derivative can be derived through bilinear
506: interpolation and finite differencing from $\bmath s$ and $\delta
507: \bmath \psi$, respectively.
508:
509: \noi It is clear from the structure of these matrices that the
510: first-order correction to the model, as a result of $\delta \psi$,
511: is equal to $\delta d_i= -\bmath {\nabla} S(\bmath{y}^d_i) \cdot
512: \bmath{\nabla} \delta \psi(\bmath{x}^d_i)$ at every point
513: $\bmath{x}^d_i$ \cite[see e.g.][for a derivation]{Koopmans05}.
514:
515: \noi As for the surface brightness itself, also the first derivatives for
516: a generic point P on the source plane can be expressed as functions
517: of the relative values on the triangle vertices A, B, C, yielding
518: %
519: \begin{eqnarray}
520: \frac{\partial {s_{\rm P}}}{\partial y_{1}} & = &w_{\rm
521: A}\frac{\partial {s_{\rm A}}}{\partial y_{1}}+w_{\rm
522: B}\frac{\partial {s_{\rm B}}}{\partial y_{1}}+w_{\rm
523: C}\frac{\partial {s_{\rm C}}}{\partial y_{1}}\nonumber\\
524: \frac{\partial {s_{\rm P}}}{\partial y_{2}} & = &w_{\rm
525: A}\frac{\partial {s_{\rm A}}}{\partial y_{2}}+w_{\rm
526: B}\frac{\partial {s_{\rm B}}}{\partial y_{2}}+w_{\rm
527: C}\frac{\partial {s_{\rm C}}}{\partial y_{2}}
528: \end{eqnarray}
529: %
530: For the generic vertex $j= \rm{A, B,C}$ these are given by
531: $\frac{\partial \bmath{s_j}}{\partial y_{1}}=-\frac{n_0}{n_2}$
532: and $\frac{\partial \bmath{s_j}}{\partial
533: y_{2}}=-\frac{n_1}{n_2}$, where $\bmath{N}\equiv(n_0,n_1,n_2)$ is the
534: unit-length surface normal vector at the vertex $j$ and is defined
535: as the average of the adjacent per-face normal vectors. For
536: $\delta\bmath\psi$ and its gradients, on a rectangular grid with
537: rectangular pixels, we follow \cite{Koopmans05}.
538:
539: \begin{figure}
540: \begin{center}
541: \subfigure[]{\centering \includegraphics[width=4.5cm]{fig2a}
542: \label{fig:single}
543: }
544: \hspace{.5in}
545:
546:
547: \subfigure[]{ \includegraphics[width=3cm]{fig2b}
548: \label{fig:double_x}
549: }
550: \hspace{.25in} \subfigure[]{
551: \includegraphics[width=3cm]{fig2c}
552: \label{fig:double_y}
553: }
554:
555: \caption{Generic triangles from the
556: source grid. Both the source surface brightness and its
557: derivatives at the points P, $\rm P_1$ and $\rm P_2$ are given
558: by linear superposition of the values at the edges of the
559: surrounding triangles.}
560: \label{fig:triangles}
561:
562: \end{center}
563: \end{figure}
564:
565:
566: \section{Inverting the data model}\label{sec:inverting}
567:
568: \noi As shown above, in both cases of solving for the source alone,
569: or solving for the source plus a potential correction, a {\sl linear
570: data model} can be constructed. In this section, we give a
571: general overview of how this set of linear equations can be
572: (iteratively) solved. A more thorough Bayesian description and
573: motivation can be found in Section~4.
574:
575: \subsection{The penalty function}
576: Before we go into the details of the method, we first restate that
577: for a given lens potential $\psi(\bmath x, {\bmath \eta})$ and
578: potential correction $\bmath \psi_n = \sum^n_{i=1}
579: \delta {\bmath \psi_i}$, on a grid, the source surface brightness vector
580: $\bmath s$ and the data vector $\bmath d$ can be related through a
581: linear (matrix) operator
582: %
583: \begin{equation}
584: \mathbf {M_c}({\bmath \eta}, {\bmath \psi}_{n-1}, \bmath
585: s_{n-1})\bmath r_n={\bmath d} + {\bmath n},
586: \label{equ: src_linear}
587: \end{equation}
588: now explicitly written with their dependencies on the source and
589: potential and with
590: \begin{equation}
591: \bmath r_n= \left(\begin{array}{c}\bmath s_{n} \\
592: \delta\bmath\psi_n \\
593: \end{array}
594: \right).
595: \end{equation}
596: %
597: In this equation $\bmath s_n$ is a model of the source
598: brightness distribution at a given iteration $n$ (we describe the
599: iterative scheme momentarily). We assume the noise $\bmath n$ to be
600: Gaussian which is a good approximation for the HST images the
601: method will be applied to. Even in case of deviations from Gaussianity,
602: the central limit theorem, for many data points, ensures that the probability density
603: distribution is often well approximated by a Normal distribution. \\
604: \noi Because of the ill-posed nature of this relation,
605: equation (\ref{equ: src_linear}) cannot simply be inverted. Instead a
606: penalty function which expresses the mismatch between the data and
607: the model has to be defined by
608: \begin{equation}\label{eqn:penalty}
609: P(\bmath s,\delta \bmath \psi \,|\, {\bmath \eta}, {\bmath \lambda},
610: {\bmath s}_{n-1}, {\bmath
611: \psi}_{n-1})=\chi^2+\lambda_s^2\|\mathbf{H_s} \bmath s\|^2_2
612: +\lambda_{\delta\psi}^2 \|\mathbf{H_{\delta\psi}} \delta\bmath
613: \psi\|^2_2\,,
614: \end{equation}
615: with
616: \begin{equation}\label{eqn:chi2}
617: \chi^2 = [\mathbf {M_c}({\bmath \eta}, \bmath \psi_{n-1}, \bmath
618: s_{n-1})\, \bmath r - {\bmath d}]^{\rm T} \, {\mathbf {C_d^{-1}}} \,
619: [\mathbf {M_c}({\bmath \eta}, \bmath \psi_{n-1}, \bmath
620: s_{n-1})\,\bmath r - {\bmath d}].
621: \end{equation}
622:
623: \noi The second and third term in the penalty function contain prior
624: information, or beliefs about the smoothness of the source and of
625: the potential respectively and $\mathbf{C_d}$ is the diagonal
626: covariance matrix of the data. The level of regularization is set by
627: the regularization parameters $\bmath \lambda$, one for the source and one
628: for the potential \citep[see][for a more general
629: discussion]{Koopmans05, suyu206}. In a Bayesian framework, this
630: penalty function is related to the posterior probability of the
631: model given the data (see Section 4). In the following two sections
632: we describe how to solve for the linear and non-linear parameters of
633: the penalty function (except for $\bmath \lambda$, which is described
634: in Section 4).
635:
636: \subsubsection{Solving for the linear parameters}
637: \label{sec:solvelinear}
638: The most probable solution, $\bmath{r_{\rm MP}}$, minimizing the
639: penalty function is obtained by solving the set of linear equations
640: \begin{equation}
641: (\mathbf{M_c^T C_d^{-1}M_c+R^T R})\,\bmath
642: r=\mathbf{M_c^TC_d^{-1}}\bmath d.
643: \label{equ: src_pot_penalty}
644: \end{equation}
645: The regularization matrix is given by
646: \begin{equation}
647: {\mathbf R^{\rm T}} {\mathbf R} = \left(
648: \begin{array}{cc}
649: \lambda_s^2\mathbf{H_s^{\rm T}} \mathbf{H_s} & \\ &
650: \lambda^2_{\delta\psi}\mathbf{H_{\delta\psi}^{\rm T}}
651: \mathbf{H_{\delta\psi}}
652: \end{array} \right).
653: \end{equation}
654:
655: \noi The solution of this symmetric positive definite set of
656: equations can be found using e.g.\ a Cholesky decomposition
657: technique. By solving equation (\ref{equ: src_pot_penalty}), adding
658: the correction $\delta \bmath \psi_n$ to the previously-best
659: potential $\bmath \psi_{n-1}$ and iterating this procedure, both the
660: source and the potential should converge to the minimum of the
661: penalty function $P(\bmath s_n,\delta \bmath \psi_{n} \,|\, {\bmath
662: \eta}, {\bmath \lambda}, {\bmath s}_{n-1}, {\bmath \psi}_{n-1})$. At
663: every step of this iterative procedure the matrices $\mathbf {M_c}$
664: and $\mathbf R$ have to be recalculated for the new updated
665: potential $\bmath \psi_n$ and source $\bmath s_n$. While the
666: potential grid points are kept spatially fixed in the image plane,
667: the Delaunay tessellation grid of the source is re-built at every
668: iteration to ensure that the number of degrees of freedom is kept
669: constant during the entire optimization process.
670:
671: \noi Note that because the source and the potential corrections are
672: independent, they require their own form ($\mathbf H$) and level
673: ($\lambda$) of regularization. The most common forms of
674: regularization are the zeroth-order, the gradient and the
675: curvature. As shown by \citet{suyu206} the best form depends on the
676: nature of the source distribution and can be assessed via Bayesian
677: evidence maximisation. For the source, we chose the curvature
678: regularization defined for the Delaunay tessellation of the source
679: plane.
680:
681: \noi Specifically one can combine the gradient and curvature
682: matrices in the $x$ and $y$ directions: $\mathbf{H_{s}^{\rm
683: T}}\mathbf{H_{s}}=\mathbf{H_{s,y_1}^{\rm
684: T}}\mathbf{H_{s,y_1}}+\mathbf{H_{s,y_2}^{\rm T}}\mathbf{H_{s,y_2}}$.
685: Both $\mathbf{H_{s,y_1}}$ and $\mathbf{H_{s,y_2}}$ can be obtained
686: by analogy by considering the pair of triangles in
687: Fig.~\ref{fig:double_x} and Fig.~\ref{fig:double_y}
688: respectively.
689:
690: \noi For every generic point C on the source plane we consider the
691: pair of triangles $\widehat{\rm{ABC}}$ and $\widehat{\rm{DCE}}$ and
692: define the curvature in C in the $y_1$ direction as:
693: %
694: \begin{equation}
695: {s''_{C,y_1}}
696: \equiv \frac{1}{d_{CP}}({s_P}-{s_C}) -\frac{1}{d_{CQ}}({s_C}-{s_Q})\,.
697: \label{equ:curvature}
698: \end{equation}
699: This is not the second derivative, but we find that this alternative
700: curvature definition gives much better results than using the second
701: derivative directly. The reason is that it gives equal weight to all
702: triangles, independently of their relative sizes (for identical
703: rectangular pixels this problem does not arise since the above
704: definition is equal to the second derivative up to a proportionality
705: constant). A much smoother solution in that case is obtained.
706:
707: \noi P and Q
708: are given by intersecting the line
709: $\overline{\rm{CP_1}}$ with the line $\overline{\rm{ED}}$ and the
710: line $\overline{\rm{CP_2}}$ with the line $\overline{\rm{AB}}$
711: respectively. Specifically, $\rm{P_1}$ and $\rm{P_2}$ are defined as
712: very small displacements from the point C in the $y_1$ direction %
713: \begin{eqnarray}
714: y_{2}^{\rm{P_1}} & = & y_{2}^{\rm{P_2}} = y_{2}^{\rm C}\nonumber\\
715: y_{1}^{\rm{P_{1,2}}} & = & y_{1}^{\rm C} \pm \delta y_1.
716: \end{eqnarray}
717: %
718: The source surface brightness in P and Q can be obtained by
719: linear interpolation between the source values in D with the value
720: in E and the value in A with the value in B respectively
721: %
722: \begin{eqnarray}
723: s_{\rm P}&=&\frac{d_{\rm{PD}}}{d_{\rm{ED}}}(s_{\rm E}-s_{\rm
724: D})+s_{\rm D}\label{equ:s_p} \nonumber \\ s_{\rm
725: Q}&=&\frac{d_{\rm{QA}}}{d_{\rm{AB}}}({s_{\rm B}}-s_{\rm
726: A})+s_{\rm A}\label{equ:s_q}\,,
727: \end{eqnarray}
728: %
729: \noi Substituting (\ref{equ:s_p}) in
730: (\ref{equ:curvature}) gives
731: %
732: \begin{multline}
733: {s''_{C,y_1}}=-\left(\frac{1}{d_{\rm
734: {CP}}}+\frac{1}{d_{\rm {CQ}}}\right){s_{\rm C}}+\frac{d_{\rm
735: PD}}{d_{\rm CP}d_{\rm DE}}s_{\rm E}+\\ \frac{d_{\rm
736: {QA}}}{d_{\rm{CQ}}d_{\rm{AB}}}s_{\rm B}+\frac{d_{\rm{PE}}}
737: {d_{\rm{CP}}{d_{\rm{DE}} }}s_{\rm D}+\frac{d_{\rm
738: {QB}}}{d_{\rm{CQ}}d_{\rm{AB}}}s_{\rm A}\,.
739: \end{multline}
740: %
741: \noi Each row of the regularization matrix $\mathbf{H_{s,y_1}}$, corresponding to every
742: point C, contains the five interpolation weights, placed at the
743: columns that correspond to the five vertices A, B, C, D and
744: E. The curvature in the $y_2$ direction is derived in an analogous
745: way using the pair of triangles in Fig. \ref{fig:double_y}. We
746: refer again to \citet{Koopmans05} for details on the
747: potential regularization matrix $\mathbf{ H_{\delta \psi}}$
748:
749: \subsubsection{Solving for the non-linear parameters}
750: \label{sec:solvenonlinear}
751: In order to recover the non-linear parameters $\bmath \eta$, we need
752: to minimize the penalty function $P(\bmath s, {\bmath \eta}\,|\,
753: {\bmath \lambda}, {\bmath \psi})$. We allow for a correction,
754: $\bmath \psi$, to the parametric potential $\psi(\bmath \eta,\bmath
755: x)$ (not necessarily zero), but do not allow it to be changed while
756: optimising for $\bmath s$ and ${\bmath \eta}$. In all cases, we keep
757: $\bmath \lambda$ fixed during the optimization. Given an
758: initial guess for the non-linear parameters $\bmath \eta_0$, we then
759: minimize the penalty function defined in Section
760: \ref{sec:solvelinear}, under the conditions outlined above
761: ($\bmath\psi$ is constant and $\delta\bmath\psi \equiv \bmath 0$).
762: We use a non-linear optimizer \citep[in our case Downhill-Simplex
763: with Simulated Annealing;][]{Press92}, to change $\bmath \eta$ at
764: every step and to minimize the joint penalty function $P(\bmath s,
765: {\bmath \eta}\,|\, {\bmath \lambda}, {\bmath \psi})$. The
766: optimization of $\bmath s$ is implicitly embedded in the
767: optimization of $\bmath \eta$ by solving equation (\ref{equ:
768: src_pot_penalty}) only for $\bmath s$, every time $\bmath \eta$ is
769: modified.
770:
771: \subsection{The optimization strategy}\label{sec:strategy}
772:
773: We have implemented a multi-fold optimization scheme for solving the
774: linear equation (\ref{equ: src_linear}). This scheme is not unique,
775: but stabilises the numerical optimization of this rather complex set
776: of equations. Solving all parameters simultaneously would be
777: computationally prohibitive and usually shows poor convergence
778: properties.
779:
780: \subsubsection{Optimization steps}
781:
782: Our optimization scheme is similar to a {\sl line-search}
783: optimization, where consecutively different sets of unknown
784: parameters are being kept fixed, while the others are optimized
785: for. The sets $\{\delta \bmath \psi, \bmath s\}$, $\{\bmath \eta,
786: \bmath s \}$ and $\{\bmath \lambda, \bmath s \}$ define the three
787: different groups of parameters, of which only one is solved for at
788: once. The individual steps, in no particular order, are then:
789:
790: \noi {\bf (i)} {We assume $\bmath \eta$
791: and $\bmath \lambda$ to be constant vectors and iteratively solve
792: for $\delta\bmath\psi$ and the source $\bmath s$. In this case, at
793: every iteration we solve for $\bmath r$ and adjust $\bmath \psi$,
794: using the linear correction to the potential $\delta \bmath
795: \psi$. This was described in Section \ref{sec:solvelinear}.}
796:
797: \noi {\bf (ii)} {We assume $\bmath\psi$ and
798: $\bmath \lambda$ to be constant vectors and
799: $\delta\bmath\psi_i=\bmath 0$ at every iteration and only solve
800: for the non-linear potential parameters $\bmath \eta$ and the
801: source $\bmath s$. This was described in Section
802: \ref{sec:solvenonlinear}. We note that part of step (i) is also
803: implicitly carried out in step (ii) (i.e.\ solving for $\bmath s$).}
804:
805: \noi {\bf (iii)} {We assume both (i) and (ii), above, and solve for
806: the regularization parameters $\lambda_s$ of the source and the source
807: itself $\bmath s$. This requires a Bayesian approach and will be
808: described in more detail in Section~4. We have not attempted to
809: optimize for $\lambda_{\delta \psi}$, but will study this
810: in future publications.}
811:
812: \noi The overall goal, however, remains to solve for the \emph{full}
813: set of unknown parameters $\{ {\bmath \eta}, {\bmath \psi}_n, \bmath
814: s_n \}$ for $n\rightarrow \infty$ (or some large number). In
815: particular if an overall smooth (on scales of the image separations)
816: potential model $\psi(\bmath \eta)$ does not allow a proper
817: reconstruction of the lens system, we add an additional and more
818: flexible potential correction $\delta{\bmath \psi}$,
819: which can describe a more complex mass structure.
820:
821: \subsubsection{Line-search optimization scheme}
822:
823: In practice, we find that the optimal strategy to minimize the
824: penalty function is the following, in order:
825:
826: \noi {\bf (1)} {We set $\lambda_{\rm s}$ to a large constant value
827: such that the source model remains relatively smooth throughout
828: the optimization (i.e.\ the peak brightness of the model is a
829: factor of a few below that of the data) and keep
830: $\bmath\psi_n=\bmath 0$ \citep[see also][]{suyu206, Suyu08}. We then
831: solve for $\bmath \eta$ and $\bmath s$ that minimize the penalty
832: function}.
833:
834: \noi {\bf (2)} {Once the best $\bmath \eta$ and $\bmath s$ are
835: found, a Bayesian approach is used to find the best value of
836: $\lambda_{\rm s}$ for the source only. At this point
837: $\bmath\psi$ is still kept equal to zero.}
838:
839: \noi {\bf (3)} {Given the new value of $\lambda_{\rm s}$, step (1) is repeated
840: to find improved values of $\bmath \eta$ and $\bmath s$. Since the
841: sensitivity of $\lambda_{\rm s}$ to changes in $\bmath \eta$ is
842: rather weak, at this point the best values of $\bmath \eta$,
843: $\bmath s$ and $\bmath \lambda$ have been found.}
844:
845: \noi {\bf (4)} {Next, all the above parameters are kept fixed and we
846: solve for $\bmath r$, this time assuming a very large value for
847: $\lambda_{\delta \psi}$ to keep the potential correction (and
848: convergence) smooth. We adjust $\bmath \psi$ at every iteration
849: until convergence is reached
850: \cite[e.g.][]{Suyu08}. At this point we stop the optimization
851: procedure.}
852:
853: \noi {\bf (5)} {The smooth model with $\bmath \psi = \bmath 0$ and
854: the same model with $\bmath \psi \neq \bmath 0$ are then compared
855: through their Bayesian evidence values and errors on the
856: parameters are estimated through the Nested Sampling of
857: \citet{Skilling04}(Section 4).}
858:
859: \noi Fig. \ref{fig:flow} shows a complete flow diagram of our
860: optimization scheme. In the next section we place
861: equation (\ref{eqn:penalty}) and model ranking on a formal Bayesian
862: footing. Those readers mostly interested in the application and
863: tests of the method could continue reading in Section~5.
864:
865: \begin{figure*}
866: \begin{center}
867: \includegraphics[width=\hsize,clip=]{fig3}
868: \caption {A schematic overview of the non-linear source and
869: potential reconstruction method.}
870: \label{fig:flow}
871: \end{center}
872: \end{figure*}
873:
874: \section{A Bayesian approach to data fitting and model selection}
875: \label{sec:bayes}
876:
877: When trying to constrain the physical properties of the lens galaxy,
878: within the grid-based approach, three different problems are
879: faced. Given the linear relation in equation (\ref{equ:
880: src_pot_linear_blurred}) we need to determine the linear parameters
881: $\bmath r$ for a certain set of data $\bmath d$ and a form for the
882: smooth potential $\psi_{s}(\bmath x,\bmath \eta)$. We then aim to
883: find the best values for the parameters $\bmath \eta$ and $\bmath
884: \lambda$ and finally, on a more general level, we wish to infer the
885: best model for the overall potential and quantitatively rank
886: different potential families. In particular, we want to compare smooth models with models
887: that also include a potential grid for substructure (with more free
888: parameters). These issues can all be quantitatively and objectively
889: addressed within the framework of Bayesian statistics. In the
890: context of data modelling three levels of inference can be
891: distinguished \citep{MacKay92, suyu206}.
892:
893: \medskip
894:
895: \noi {\bf (1)} First level of inference: linear optimization. We
896: assume the model $\mathbf{M_c}$, which depends on a given potential
897: and source model, to be true and for a fixed form $\mathbf R$ and
898: level ($\bmath\lambda$) of regularization, we derive from Bayes'
899: theorem the following expression:
900: \begin{equation}
901: P\left(\bmath r\,|\,\bmath d,\bmath\lambda,\bmath \eta,\mathbf
902: {M_c},\mathbf R\right)=\frac{P(\bmath d \,|\,\bmath r,\bmath \eta,
903: \mathbf{M_c})\, P(\bmath r\,|\,\bmath\lambda,\mathbf R)}{P(\bmath
904: d \,|\,\bmath\lambda,\bmath \eta,\mathbf{M_c},\mathbf R)}\,.
905: \end{equation}
906: The likelihood term, in case of Gaussian noise, for a covariance
907: matrix $\mathbf{C_d}$, is given by
908: \begin{equation}
909: P(\bmath d \,|\,\bmath r, \bmath\eta,\mathbf{M_c})=
910: \frac{1}{Z_d}\exp{[-E_d(\bmath d \,|\,\bmath
911: r,\bmath\eta,\mathbf{M_c})]}\,
912: \end{equation}
913: where
914: \begin{equation}
915: Z_d=(2\pi)^{N_d/2}(\det \ \mathbf{C_d})^{1/2}
916: \end{equation}
917: and (see equation \ref{eqn:chi2})
918: \begin{equation}
919: E_d(\bmath d \,|\,\bmath r,\bmath\eta,\mathbf{M_c}]=
920: \frac{1}{2}\,\chi^2=\frac{1}{2}\left(\mathbf{M_c} \bmath
921: r-\bmath d\right)^{\rm T}\mathbf{C}_D^{-1}\left(\mathbf{M_c}
922: \bmath r-\bmath d\right)\,.
923: \end{equation}
924: Because of the presence of noise and often the singularity of
925: $\det\,(\mathbf{M_c^{\rm T}} \mathbf{M_c})$, it is not possible to
926: simply invert the linear relation in equation (\ref{equ:
927: src_pot_linear_blurred}) but an additional penalty function must be
928: defined through the introduction of a prior probability $P(\bmath r
929: \,|\,\bmath\lambda,\mathbf R)$ on $\bmath s$ and on $\delta\bmath
930: \psi$. In our implementation of the method, the prior assumes a
931: quadratic form, with minimum in $\bmath r=\bmath 0$ and sets the
932: level of smoothness (specified in $\mathbf H$ and $\bmath\lambda$)
933: for the solution
934: \begin{equation}
935: P(\bmath r\,|\,\bmath\lambda,\mathbf R)=
936: \frac{1}{Z_r}\exp{\left[-\bmath\lambda E_r(\bmath r\,|\,\mathbf
937: R)\right]}\,,
938: \end{equation}
939: with
940: \begin{equation}
941: Z_r(\bmath\lambda)=\int {d\bmath r e^{-\bmath\lambda E_r}}=
942: e^{-\bmath\lambda
943: E_s(0)}\left(\frac{2\pi}{\bmath\lambda}\right)^{N_r/2}(\det\mathbf
944: C)^{-1/2}\,,
945: \end{equation}
946: \begin{equation}
947: E_r=\frac{1}{2}\|\mathbf R\bmath r\|^2_2
948: \end{equation}
949: and
950: \begin{equation}
951: \mathbf C=\nabla \nabla E_r=\mathbf R\,\mathbf {R}^{\rm T}\,.
952: \end{equation}
953: The normalization constant $P(\bmath d\,|\,\bmath\lambda,\bmath
954: \eta,\mathbf{M_c},\mathbf R)$ is called the evidence and plays an
955: important role at higher levels of inference. In this specific case
956: it reads
957: \begin{equation}
958: P(\bmath d\,|\,\bmath\lambda,\bmath \eta,\mathbf{M_c},\mathbf R)
959: =\frac{\int{d\bmath r\exp{(-M(\bmath r))}}}{Z_d Z_r}\,,
960: \end{equation}
961: \noi where
962: \begin{equation}
963: M(\bmath r)=E_d+ E_r\,.
964: \end{equation}
965: The most probable solution for the linear parameters, is found by
966: maximizing the posterior probability
967: \begin{equation}
968: P(\bmath r\,|\,\bmath d,\bmath\lambda,\bmath
969: \eta,\mathbf{M_c},\mathbf R)=\frac{\exp(-M(\bmath
970: r))}{\int{d\bmath r\,\exp(-M(\bmath r))}}\,.
971: \label{equ:posterior}
972: \end{equation}
973: The condition $\partial (E_d+ E_r)/\partial \bmath r=0$ now yields the
974: set of linear equations already introduced in Section
975: \ref{sec:solvelinear}:
976: \begin{equation}
977: \left(\mathbf{M_c^{\rm T}} \mathbf{C_d}^{-1} \mathbf{M_c}+\mathbf
978: R^{\rm T} \mathbf R\right)\bmath r = \mathbf{M_c^{\rm T}}
979: \mathbf{C_d}^{-1}\bmath d\,.
980: \label{equ:src_pot_penalty_bayes}
981: \end{equation}
982: Equation (\ref{equ:src_pot_penalty_bayes}) is solved iteratively
983: using a Cholesky decomposition technique.
984:
985: \noi {\bf (2)} Second level of inference: non-linear optimization.
986: At this level we want to infer the non-linear parameters $\bmath
987: \eta$ and the hyper-parameter $\lambda_{\rm s}$ for the
988: source. Since at this point we are interested only in the smooth
989: component of the lens potential, we set $\delta\bmath \psi=0$ and
990: for a fixed family $\psi_s(\bmath \eta)$, form of the regularization
991: $\mathbf R$ and model $\mathbf{M_c}$, we maximize the posterior
992: probability
993:
994: \begin{equation}\label{equ:posterior_2}
995: P(\bmath\lambda,\bmath \eta\,|\,\bmath d,\mathbf{M_c},\mathbf
996: R)=\frac{P(\bmath d\,|\,\bmath \lambda,\bmath \eta,\mathbf{M_c},\mathbf
997: R)P(\bmath \lambda,\bmath \eta)}{P(\bmath d\,|\,\mathbf{M_c},\mathbf
998: R)}\,.
999: \end{equation}
1000:
1001: \noi Assuming a prior $P(\bmath \lambda,\bmath \eta)$, which is flat in
1002: $\log(\lambda_s)$ and $\bmath\eta$, reduces to maximizing the
1003: evidence $P(\bmath d\,|\,\bmath\lambda,\bmath
1004: \eta,\mathbf{M_c},\mathbf R)$ (which here plays the role of the
1005: likelihood) for $\bmath \eta$ and $\bmath\lambda$. The evidence can
1006: be computed by integrating over the posterior (\ref{equ:posterior_2})
1007: %
1008: \begin{equation}
1009: P(\bmath d\,|\,\bmath\lambda,\bmath \eta,\mathbf{M_c},\mathbf R)=\int{d\bmath
1010: r\, P(\bmath d\,|\,\bmath r,\bmath
1011: \eta,\mathbf{M_c})P(\bmath r\,|\,\bmath\lambda,\mathbf
1012: R)}\,.
1013: \label{equ:evidence}
1014: \end{equation}
1015: %
1016: Because of the assumptions we made (Gaussian noise and quadratic
1017: form of regularization), this integral can be solved analytically
1018: and yields
1019: %
1020: \begin{equation}
1021: P(\bmath d\,|\,\bmath\lambda,\bmath \eta,\mathbf{M_c},\mathbf R)=
1022: \frac{Z_M(\bmath\lambda, \bmath \eta)}{Z_d Z_r(\bmath\lambda)}\,,
1023: \end{equation}
1024: %
1025: where
1026: %
1027: \begin{equation}
1028: Z_M(\bmath\lambda, \bmath \eta)=\exp{(-M(\bmath
1029: r_{\rm MP}))}\left(2\pi\right)^{N_r/2}(\det \ \mathbf A)^{-1/2}\,,
1030: \end{equation}
1031: %
1032:
1033: \noi with $\mathbf A=\nabla\nabla M(\bmath r).$ Again we proceed in an
1034: iterative fashion: using a simulated annealing technique we maximize
1035: the evidence (\ref{equ:evidence}) for the parameters $\bmath
1036: \eta$. Every step of the maximisation generates a new model
1037: $\mathbf{M_c}(\psi(\bmath \eta_i))$, for which the most probable
1038: source $\bmath s_{\rm{MP}}$ is reconstructed as described in Section
1039: \ref{sec:inverting}. At this starting step the level of the source
1040: regularization is set to a relatively large initial value
1041: $\lambda_{s,0}$; in this way we ensure the solution to be smooth (at
1042: least at this first level) and the exploration of the $\bmath \eta$
1043: space to be faster. Subsequently we fix the best model
1044: $\mathbf{M_c}(\bmath \eta_0)$ found at the previous iteration and,
1045: using the same technique, we maximize the evidence for the source
1046: regularization level $\lambda_s$. The procedure is repeated until
1047: the total evidence has reached its maximum. In principle we should
1048: have built a nested loop for $\lambda_s$ at every step of the
1049: $\bmath \eta$ exploration, but in practice the regularization
1050: constant only changes slightly with $\bmath \eta$ and the alternate
1051: loop described above gives a faster way to reach the maximum
1052: (line-search method).
1053:
1054: \noi {\bf (3)} At the third level of inference Bayesian statistics
1055: provides an objective and quantitative procedure for model
1056: comparison and ranking on the basis of the evidence,
1057: \begin{equation}
1058: P(\mathbf{M_c},\mathbf R\,|\,\bmath d) \propto P(\bmath
1059: d\,|\,\mathbf{M_c},\mathbf R)P(\mathbf{M_c},\mathbf R)\,.
1060: \end{equation}
1061: For a flat prior $P(\mathbf{M_c},\mathbf R)$ (at this level of
1062: inference we can make little to no assumptions) different models can
1063: be compared according to their value of $P(\bmath
1064: d\,|\,\mathbf{M_c},\mathbf R)$, which is related to the evidence of
1065: the previous level by the following relation
1066: \begin{equation}
1067: P(\bmath d\,|\,\mathbf{M_c},\mathbf R)=\int{d\bmath\lambda\, d\bmath
1068: \eta \,P(\bmath d\,|\,\bmath \lambda,\bmath \eta,\mathbf{M_c},\mathbf
1069: R) P(\bmath\lambda,\bmath\eta)}\,.
1070: \label{equ:evidence_integral}
1071: \end{equation}
1072: Being multidimensional and highly non-linear, the integral
1073: (\ref{equ:evidence_integral}) is carried out numerically through a
1074: Nested-Sampling technique \citep{Skilling04}, which is described in
1075: more detail in the next section. A by-product of this method is an
1076: exploration of the posterior probability (\ref{equ:posterior_2}),
1077: allowing for error analysis of the non-linear parameters and of the
1078: evidence itself.
1079:
1080: \subsection{Model selection: smooth versus clumpy models}\label{sec:nested sampling}
1081:
1082: In the previous section we introduced the main structure of the
1083: Bayesian inference for model fitting and model selection. While
1084: parameter fitting simply determines how well a model matches the
1085: data and can be easily attained with the relatively simple analytic
1086: integrations of the first and second level of inference, model
1087: selection itself requires the highly non-linear and multidimensional
1088: integral (\ref{equ:evidence_integral}) to be solved. This
1089: marginalized evidence can be used to assign probabilities to models
1090: and to reasonably establish whether the data require or allows
1091: additional parameters or not. Given two competing models $\rm M_0$
1092: and $\rm M_1$ with relative marginalized evidence ${\cal{E}}_0$ and
1093: ${\cal{E}}_1$, the Bayes factor, $\Delta {\cal{E}} \equiv
1094: \log{\cal{E}}_0 - \log{\cal{E}}_1$, quantifies how well $\rm M_0$ is
1095: supported by the data when compared with $\rm M_1$ and it
1096: automatically includes the Occam's razor. Typically the literature
1097: suggests to weigh the Bayes factor using Jeffreys' scale
1098: \citep{Jeffreys61}, which however provides only a qualitative
1099: indication: $\Delta {\cal{E}} < 1$ is not significant, $1 < \Delta
1100: {\cal{E}}< 2.5$ is significant, $2.5 < \Delta {\cal{E}}< 5$ is
1101: strong and $\Delta {\cal{E}} > 5$ is decisive.
1102:
1103:
1104: \noi In order to evaluate this marginalized evidence with a high
1105: enough accuracy we implemented the new evidence algorithm known as
1106: Nested Sampling, proposed by \citet{Skilling04}. Specifically, we
1107: would like to compare two different models: one in which the lens
1108: potential is smooth and one in which substructures are present, with
1109: e.g. a NFW profile. While the first is defined by the non-linear
1110: parameters of the lens potential and of the source regularization
1111: only, the second also allows for three extra parameters: the mass of
1112: the substructure and its position on the lens plane (see
1113: Section \ref{sec:test})
1114:
1115: \subsection{Model ranking: nested sampling}
1116:
1117: Here, we provide a short description of how the Nested Sampling can
1118: be used to compute the marginalized evidence and errors on the model
1119: parameters; a more detailed one can be found in
1120: \citet{Skilling04}. The Nested-Sampling algorithm integrates the
1121: likelihood over the prior volume by moving through thin nested
1122: likelihood surfaces. Introducing the fraction of total prior
1123: mass $X$, within which the likelihood exceeds ${\cal L^*}$, hence
1124: %
1125: \begin{equation}
1126: X=\int_{{\cal{L}}>{\cal{L^*}}}{dX}\,,
1127: \end{equation}
1128: %
1129: with
1130: %
1131: \begin{equation}
1132: dX=P\left(\bmath\lambda,\bmath\eta\right)d\bmath\lambda\,d\bmath\eta\,,
1133: \end{equation}
1134: %
1135: the multi-dimensional integral (\ref{equ:evidence_integral})
1136: relating the likelihood $\cal{L}$ and the marginalized evidence
1137: $\cal{E}$ can be reduced to a one-dimensional integral with positive
1138: and decreasing integrand
1139: %
1140: \begin{equation}
1141: {\cal{E}}=\int_0^1{dX\,{\cal{L}}(X)}\,.
1142: \end{equation}
1143:
1144: \noi Where ${\cal L}(X)$ is the likelihood of the (possibly disjoint)
1145: iso-likelihood surface in parameter space which encloses a total prior
1146: mass of $X$. If the likelihood ${\cal{L}}_j={\cal{L}}(X_j)$ can be
1147: evaluated for each of a given set of decreasing points, $0 < X_j <
1148: X_{j-1} <....< 1$, then the total evidence ${\cal{E}}$ can be
1149: obtained, for example, with the trapezoid rule,
1150: ${\cal{E}}=\sum_{j=1}^m{\cal{E}}_j=\sum_{j=1}^m{\frac{{\cal{L}}_j}{2}}\left(X_{j-1}-X_{j+1}\right)$.
1151:
1152: \noi The power of the method is that the values of $X_j$ do not
1153: have to be explicitly calculated, but can be statistically
1154: estimated. Specifically, the marginalized evidence is obtained
1155: through the following iterative scheme:
1156:
1157: \noi {\bf (1)} the likelihood ${\cal{L}}$ is computed for N
1158: different points, called active points, which are randomly drawn
1159: from the prior volume.
1160:
1161:
1162: \noi {\bf (2)} the point $X_j$ with the lowest likelihood is found
1163: and the corresponding prior volume is estimated statistically: after
1164: $j$ iterations the average volume decreases as $ X_j/X_{j-1}=t $,
1165: where t is the expectation value of the largest of N numbers
1166: uniformly distributed between $\left(0,1\right)$.
1167:
1168: \noi {\bf (3)} the term
1169: ${\cal{E}}_j=\frac{{\cal{L}}_j}{2}\left(X_{j-1}-X_{j+1}\right)$ is
1170: added to the current value of the total evidence;
1171:
1172: \noi {\bf (4)} $X_j$ is replaced by a new point randomly
1173: distributed within the remaining prior volume and satisfying the
1174: condition ${\cal{L}} > {\cal{L}}^* \equiv {\cal{L}}_j$;
1175:
1176: \noi {\bf (5)} the above steps are repeated until a stopping
1177: criterion is satisfied.
1178:
1179: \noi By climbing up the iso-likelihood surfaces, the method, in
1180: general, find and quantifies the small region in which the bulk
1181: of the evidence is located.
1182:
1183: \noi Different stopping criteria can be chosen. Following
1184: \citet{Skilling04}, we stop the iteration when $j \gg \rm{N}H$,
1185: where H is minus the logarithm of that fraction of prior mass which
1186: contains the bulk of the posterior mass. In practical terms this
1187: means that the procedure should be stopped only when most of the
1188: evidence has been included. Given the areas ${\cal{E}}_j$, in fact,
1189: the likelihood initially increases faster than the widths decrease,
1190: until its maximum is reached; across this maximum, located in the
1191: region ${\cal{E}}\thickapprox e^{-H}$, the likelihood flatten off
1192: and the decreasing widths dominate the increasing
1193: ${\cal{L}}_j$. Since ${\cal{E}}_j\thickapprox e^{-j/\rm{N}}$, it
1194: takes $\rm{N}H$ iterations to reach the dominating areas. These
1195: $\rm{N}H$ iterations are random and are subjected to a standard
1196: deviation uncertainty $\sqrt{\rm{N}H}$, corresponding to a
1197: deviation standard on the logarithmic evidence of $\sqrt{\rm{N}
1198: H}/ \rm{N}$
1199:
1200: \begin{equation}
1201: {\log \cal{E}}=
1202: \log\left(\sum_j{{\cal{E}}_j}\right)\mathrm{~~~with~~~}
1203: \sigma_{\log{\cal E}}=\sqrt{\frac{H}{\rm{N}}}\,.
1204: \end{equation}
1205:
1206: \subsubsection{Posterior probability distributions}
1207:
1208:
1209: \noi For the lens parameters, the substructure position and the
1210: logarithm of the source regularization, priors are chosen to be
1211: uniform on a symmetric interval around the best values which we have
1212: determined at the second level of the Bayesian inference. The size
1213: of the interval being at least one order of magnitude larger than
1214: the errors on the parameters. In practice, we first carry out a fast
1215: run of the Nested Sampling with few active points $\rm{N}$, this gives us
1216: an estimate for the non-linear parameter errors. Using the product
1217: $2\times N_{\rm dim}\times \sigma_\eta$, where $N_{\rm dim}$ is the
1218: total number of parameters and $\sigma_\eta$ the corresponding
1219: standard deviation, we can then roughly enclose the bulk of the
1220: likelihood (note that this can be double-checked and corrected in
1221: hindsight, if the posterior probability functions are truncated at
1222: the prior boundaries). Priors on the parameters are taken in such a
1223: way that this maximum is fully included in the total integral of the
1224: marginalized evidence. For the main lens parameters and for the
1225: regularization constant the same priors are used for model with and
1226: without substructure. For the substructure mass a flat prior between
1227: $M_{\rm min}=4.0\times 10^6M_\odot$ and $M_{\rm
1228: max}=4.0\times 10^9M_\odot$ is adopted, with the two limits given by N-body
1229: simulations \citep[e.g.][]{Diemand07b, Diemand07a}. In reality,
1230: the method does not require the parameters to be well known a
1231: priori, but limiting the exploration to the best fit region
1232: sensibly reduces the computational effort without significantly
1233: altering the evidence estimation. From Bayes theorem we have that
1234: the posterior probability density $p_j$ is given by
1235: %
1236: \begin{equation}
1237: p_j(t)=
1238: \frac{{\cal{L}}_j}{2}\left(X_{j-1}-X_{j+1}\right)/{\cal{E}}(t)=w_j/{\cal{E}}(t)\,.
1239: \end{equation}
1240: %
1241: The existing set of points $\left(\bmath\eta, \bmath\lambda
1242: \right)_1$,..., $\left(\bmath\eta, \bmath\lambda \right)_{\rm N}$
1243: then gives us a set of posterior values that can be then used to
1244: obtain mean values and standard deviations on the non-linear
1245: parameters
1246: %
1247: \begin{equation}
1248: \langle\bmath\eta\rangle=\sum_j{w_j\bmath\eta_j}/\sum_j{w_j}\,,
1249: \end{equation}
1250: %
1251: and similarly for $\bmath\lambda$. These samples also provide a
1252: sampling of the full joint probability density
1253: function. Marginalising over this function, the full marginalized
1254: probability density distribution of each parameters can be determined
1255: (see Section 5.5).
1256:
1257: \section{Testing and calibrating the method}\label{sec:test}
1258:
1259: In this section we describe the procedure to test the method
1260: introduced above and to assess its ability to detect dark matter
1261: substructures in realistic data sets (e.g. from HST). A set of mock
1262: data, mimicking a typical Einstein ring, is created. We generate
1263: fourteen different lens models, of which $\rm L_0$ is purely
1264: smooth, $\rm L_{1 \le i < 13}$ are given by the superposition
1265: of the same smooth potential with a single NFW dark matter substructure of
1266: varying mass and position and $\rm L_{13}$
1267: contains two NFW dark matter substructures with
1268: the same mass but with different positions (See Table \ref{tab:lenses}).
1269: A first approximate reconstruction of the source and of the lens potential
1270: is performed by recovering the best non-linear lens parameters
1271: $\bmath\eta$ and the level of source regularization
1272: $\lambda_s$. These values are then used for the linear grid-based
1273: optimization, which provides initial values of the substructure
1274: position and mass. Three extra runs of the non-linear optimization are then
1275: performed to recover the best set
1276: $\left(\bmath\eta_b,\lambda_{s,b}\right)$ for the main lens and the
1277: best mass and position of the substructure (solely modelled with a
1278: NFW density profile). Finally by means of the Nested-Sampling
1279: technique described in Section \ref{sec:nested sampling} we
1280: compute the marginalized evidence, equation (\ref{equ:evidence_integral}), for
1281: every model twice, once under the hypothesis of a smooth lens and
1282: once allowing for the presence of one or two extra mass
1283: substructures. Comparison between these two models allows us to
1284: assess whether the presence of substructure in the model improves
1285: the evidence despite the larger number of free parameters.
1286:
1287: \subsection{Mock data realisations}
1288:
1289: A set of simulated data with realistic noise is generated from a
1290: model based on the real lens SLACS J1627$-$0055
1291: \citep{Koopmans06,Bolton06,Treu06}. We assume the lens to be well
1292: described by a power-law (PL) profile \citep{Barkana98}. Using the
1293: optimization technique described in Section (\ref{sec:bayes}) we find
1294: the best set of non-linear parameters
1295: $\left(\bmath\eta_b,\lambda_{s,b}\right)$. In particular
1296: $\bmath\eta$ contains the lens strength $b$, and some of the
1297: lens-geometry parameters: the position angle $\theta$, the
1298: axis ratio $f$, the centre coordinates $\bmath x_0$ and the density
1299: profile slope $q$, $\left(\rho \propto r^{-(2q+1)}\right)$. If
1300: necessary, information about external shear can be included. The
1301: best parameters are used to create fourteen different lenses and
1302: their corresponding lensed images. One of the systems is given by a
1303: smooth PL model while twelve include a dark matter
1304: substructure with virial mass $\rm M_{vir}=10^7 \rm M_\odot, 10^8
1305: \rm M_\odot,10^9 \rm M_\odot$ located either on the lowest surface
1306: brightness point of the ring $P_0$, on a high surface brightness
1307: point of the ring $P_1$, inside the ring $P_2$ and outside the ring
1308: $P_3$ (see Table \ref{tab:lenses}). The fourteenth lens
1309: contains two substructures each with a mass of $\rm M_{vir}=10^8 M_\odot$,
1310: located respectively in $P_0$ and $P_1$. The substructures are assumed
1311: to have a NFW profile
1312: %
1313: \begin{equation}
1314: \rho\left(r\right)={\rho_s}{\left(r_s/r\right)\left[1+\left(r/r_s\right)\right]^{-2}}\,,
1315: \end{equation}
1316: %
1317: where the concentration $c=r_{\mathrm {vir}}/r_s$ and the scaling radius $r_s$
1318: are obtained from the virial mass using the empirical scaling laws
1319: provided by \citet{Diemand07b, Diemand07a}. The source has an
1320: elliptical Gaussian surface brightness profile centred in zero
1321: %
1322: \begin{equation}
1323: s\left(\bmath y\right) = s_0 \exp\left[ - (y_1/\delta y_1)^2 - (y_2/\delta y_2)^2 \right]\,.
1324: \end{equation}
1325: %
1326: We assume $s_0=0.25$, $\delta y_1=0.01$ and $\delta y_2=0.04$.
1327:
1328: \begin{table}
1329: \begin{center}
1330: \caption {Non-smooth (PL+NFW) lens models. At each of the $P_i$
1331: positions a NFW perturbation of virial mass $m_{sub}$ is superimposed
1332: on a smooth PL mass model distribution.}
1333: \begin{tabular}{cccc}
1334: \hline Lens&$\bmath x_{sub}$ $\left( \mathrm{arcsec}
1335: \right)$&$m_{sub}$ $\left( M_\odot \right)$\\ \hline $\rm
1336: L_1$&$P_0= (+0.90 ; +1.19)$&$10^7$\\ $\rm L_2$&&$10^8$\\ $\rm
1337: L_3$&&$10^9$\\ \\ $\rm L_4$&$P_1= (-0.50 ; -1.00)$&$10^7$\\ $\rm
1338: L_5$&&$10^8$\\ $\rm L_6$&&$10^9$\\ \\ $\rm L_7$&$P_2 = (-0.10 ;
1339: -0.60)$&$10^7$\\ $\rm L_8$&&$10^8$\\ $\rm L_9$&&$10^9$\\ \\
1340: $\rm L_{10}$&$P_3 = (-0.90 ; -1.40)$&$10^7$\\ $\rm
1341: L_{11}$&&$10^8$\\ $\rm L_{12}$&&$10^9$\\ \\
1342: $\rm L_{13}$&$P_0$ and $P_1 $&$10^8$\\\hline
1343: \end{tabular}
1344: \label{tab:lenses}
1345: \end{center}
1346: \end{table}
1347:
1348: \subsection{Non-linear reconstruction of the main lens}
1349:
1350: We start by choosing an initial parameter set $\bmath\eta_{0}$ for
1351: the main lens, which is offset from $\bmath\eta_{\rm true}$ that we
1352: used to create the simulated data. Assuming the lens does not
1353: contain any substructure we run the non-linear procedure described
1354: in Section (\ref{sec:bayes}) and optimize $\{\bmath\eta,\lambda_{s}\}$
1355: for each of the considered systems. At every step of the
1356: optimization a new set $\{\bmath\eta_i,\lambda_{s,i}\}$ is obtained
1357: and the corresponding lensing operator $\mathbf{M_c}(\bmath
1358: \eta_{i},\lambda_{s,i})$ has to be re-computed. The images are
1359: defined on a 81 by 81 pixels $\left(N_d= 6561\right)$ regular
1360: Cartesian grid while the sources are reconstructed on a Delaunay
1361: tessellation grid of $N_s= 441$ vertices. The number of image
1362: points, used for the source grid construction, is effectively a form
1363: of a prior and the marginalized evidence (equation \ref{equ:evidence_integral}) can be used to
1364: test this choice. To check whether the number of image pixels used
1365: can affect the result of our modelling, we consider the smooth lens
1366: $\rm L_0$ and perform the non-linear reconstruction using one pixel every sixteen, nine, four and
1367: one. In each of the considered cases we find that the lens parameters are within the relative errors (see Table ~3).
1368: This suggests that, for this particular case, the choice of number of pixels is not influencing the quality of the reconstruction.
1369: In real systems, the dynamic range of the lensed images could be much
1370: higher and a case by case choice based on the marginalized evidence has to be considered.
1371: In Fig. \ref{fig:best1_upr}, the residuals relative to the system $\rm L_1$ are shown; the noise
1372: level is in general reached and only small residuals are observed at
1373: the position of the substructure.
1374: Whether the level of such residuals and therefore the relative detection
1375: of the substructure are significant is an issue we will address later on in
1376: terms of the total marginalized evidence.
1377:
1378: \subsection{Linear reconstruction: substructure detection}\label{sec:linear rec}
1379:
1380: The non-linear optimization provides us with a first good
1381: approximate solution for the source and for the smooth component of
1382: the lens potential. While this is a good description for the smooth
1383: model $\rm L_0$ (see Fig. \ref{fig:best_smooth}), the residuals
1384: (e.g. Fig. \ref{fig:best1_outside_01}) for
1385: the perturbed model $\rm L_{i\ge1}$ indicate that the
1386: \emph{no-substructure} hypothesis is improbable and that
1387: perturbations to the main potential have to be considered. If the
1388: perturbation is small, this can be done by temporarily assuming that
1389: $\bmath{\eta}_{i=1}$ reflects the true mass model distribution for the
1390: main lens and reconstruct the source and the potential correction by
1391: means of equation (\ref{equ:src_pot_penalty_bayes}). In order to
1392: keep the potential corrections in the linear regime, where the
1393: approximation (\ref{equ:src_pot_penalty_bayes}) is valid, both the
1394: source and the potential need to be initially over-regularised:
1395: $\lambda_s=10\,\lambda_{s,1}$ and
1396: $\lambda_{\delta\psi}=3.0\times10^5$ \citep{Koopmans05,
1397: suyu206}. For each of the possible substructure positions we
1398: identify the lowest-mass-substructure we are able to recover. In the
1399: two most favourable cases, $\rm L_1$ and $\rm L_4$, in which the
1400: substructure sits on the Einstein ring a perturbation of $10^7 \rm
1401: M_\odot$ is readily reconstructed. For these two positions higher
1402: mass models, with the exception of $\rm L_2$, will not be further analysed. The systems $\rm
1403: L_{7,8,9}$ and $\rm L_{10,11,12}$, in which the substructure is
1404: located, respectively, inside and outside the ring, represent more
1405: difficult scenarios. In the first case all perturbations below $10^9
1406: \rm M_\odot$ can be mimicked by an increase of the mass of the main
1407: lens within the ring, while in the second case these cannot be
1408: easily distinguished from an external shear effect. For the models
1409: $\rm L_{1,2,4,9,12}$ convergence is reached after 150 iterations and
1410: the perturbations are recovered near their known position (Figs. 8 and 9). The grid
1411: based potential reconstruction indeed leads to a good first
1412: estimation for the substructure position.
1413:
1414:
1415:
1416: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1417:
1418: \subsection{Non-linear reconstruction: main lens and substructure}\label{sec:non-linear rec}
1419:
1420: In order to compare with numerical simulations, the mass of the
1421: substructure is required. Performing this evaluation with a grid
1422: based reconstruction is more complicated and requires some
1423: assumptions (e.g.\ aperture). To alleviate this problem we assume a
1424: parametric model, in which the substructures are described by a NFW
1425: density profile, and we recover the corresponding non-linear
1426: parameters, mass and position, using the non-linear Bayesian
1427: optimization previously described.
1428:
1429: \noi To quantify the mass and position of the substructure and to
1430: update the non-linear parameters when a substructure is added, we
1431: adopt a multi-step non-linear procedure that relatively fast
1432: converges to a best PL+NFW mass model. At this level, we neglect the
1433: smooth lens $\rm L_0$, for which a satisfactory model already has
1434: been obtained after the first non-linear optimization, and the
1435: perturbed models $ \rm L_{7,8,10,11}$ for which the substructure
1436: could not be recovered. We proceed as follows:
1437:
1438: \medskip
1439:
1440: \noi {\bf (i)} we fix the main lens parameters to the best values
1441: found in Section (\ref{sec:linear rec}),
1442: $\{\bmath\eta_1,\lambda_{\rm s,1}\}$. We set the substructure
1443: mass to some guess value. We optimize for the substructure position
1444: $\bmath x_{\rm sub,1}$.
1445:
1446: \noi {\bf (ii)} we fix $\{\bmath\eta_1,\lambda_{s,1}\}$ and
1447: the source position $\bmath x_{\rm sub,1}$. We optimize for the
1448: substructure mass $m_{\rm sub,1}$.
1449:
1450: \noi {\bf (iii)} we run the non-linear procedure described in
1451: Section (\ref{sec:bayes}) by alternatively optimising for the main
1452: lens, source, and substructure parameters and for the level of source
1453: regularization.
1454:
1455: \medskip
1456:
1457: \noi This leads to a new set of parameters, $\{\bmath\eta_{\rm b},
1458: \lambda_{\rm s,b}, m_{\rm sub,b}, \bmath x_{\rm sub,b}\}$. Final
1459: results for the considered models are listed in
1460: Table 3 and the
1461: relative residuals are shown in the Figs. \ref{fig:best1_upr}-\ref{fig:best1_outside_01}, respectively. For all the considered lenses the final
1462: reconstruction converges to the noise level.
1463:
1464: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1465:
1466: \subsection{Multiple substructures}
1467: The lens system $\rm L_{13}$ represents a more complex case in which two substructures
1468: are included. In particular we are interested in testing
1469: whether both substructures are detectable and whether their effect may be hidden by the
1470: presence of external shear. As for the previously considered cases, we first perform a non-linear
1471: reconstruction of the main lens parameters assuming a single PL mass model.
1472: For this particular system we also include the strength $\rm \Gamma_{sh}$ and the position
1473: angle $\rm \theta_{sh}$ of the external shear as free parameters. Results for this first step of the reconstruction
1474: are shown in Fig. \ref{fig:linear_a}. We then run the linear potential
1475: reconstruction. One of the two substructures is detected although a significant
1476: level of image residuals is left (Fig. \ref{fig:best1_sub_double}).
1477: The combined effect of external shears ($\rm \Gamma_{sh}=-0.031$) and the substructure in $P_1$
1478: is not sufficient to explain the perturbation generated by the second substructure at the lowest surface
1479: brightness point of the Einstein ring. We therefore include a NFW substructure in
1480: the recovered position and run a non-linear reconstruction for the new PL+NFW model,
1481: Fig. \ref{fig:linear_b}. We are then able to detect also the second substructure, Fig. \ref{fig:best2_sub_double}.
1482: Finally we run a global non-linear reconstruction for the
1483: PL+2NFW model (Fig. \ref{fig:linear_c}), the noise level is reached and the strength of the external shear is consistent with zero ($\rm \Gamma_{sh}=0.0001$).
1484:
1485: \subsection{Nested sampling: the evidence for substructure}
1486:
1487: When modelling systems as $\rm L_{0}$ or $\rm L_{i\ge1}$ we assume
1488: that the best recovered values, under the hypothesis of a single
1489: power-law, provide a good description of the true mass distribution
1490: and that any eventually observed residual could be an indication for
1491: the presence of mass substructure. Model comparison within the
1492: framework of Bayesian statistics gives us the possibility to test
1493: this assumption.
1494:
1495: \subsubsection{Marginalized Bayesian evidence}
1496:
1497: In order to statistically compare two models the
1498: marginalized evidence (\ref{equ:evidence_integral}) has to be
1499: computed. As described in Section (\ref{sec:nested sampling}) this
1500: multi-dimensional and non-linear integral can be evaluated using the
1501: Nested-Sampling technique by
1502: \citet{Skilling04}. Specifically the two mass models we wish to
1503: compare are a single PL, M$_0$, versus a PL+NWF
1504: substructure, M$_1$. The first one is completely defined by the
1505: non-linear parameters $\left(\bmath \eta, \lambda_s\right)$, while the
1506: second needs three extra parameters, namely the substructure mass
1507: and position. For all these parameters prior probabilities have to
1508: be defined:
1509: %
1510: \begin{equation}
1511: P\left(\eta_i\right)= \left\{
1512: \begin{array}{ll}
1513: \text{constant} & {\rm ~~~for~~~} |\eta_{\rm {b},i}-\eta_i|\leq\delta\eta_i \\ &\\
1514: 0 & {\rm ~~~for~~~} |\eta_{\rm {b},i}-\eta_i| > \delta\eta_i
1515: \end{array}
1516: \right.
1517: \end{equation}
1518: and
1519: \begin{equation}
1520: P\left( x_{\rm {sub},i}\right)= \left\{
1521: \begin{array}{ll}
1522: \text{constant} & {\rm ~~~for~~~} | x_{\rm {sub,b},i}- x_{\rm {sub},i}|\leq\delta
1523: x_{\rm {sub},i}\\ &\\ 0 & {\rm ~~~for~~~} | x_{\rm {sub,b},i}- x_{\rm {sub},i}| > \delta
1524: x_{\rm {sub},i}
1525: \end{array}
1526: \right.
1527: \end{equation}
1528:
1529: \noi where the elements of $\delta\eta_i$ and $\delta x_{\rm sub,i}$
1530: are empirically assessed such that the bulk of the evidence
1531: likelihood is included \citep[see][]{Skilling04}. The prior on the
1532: substructure mass is flat between the lower and upper mass limits
1533: given by numerical simulations \citep[e.g.][]{Diemand07b,
1534: Diemand07a}. Given the lenses $\rm L_{0,1,2,4,9,12,13}$ we run the
1535: Nested Sampling twice, once for the single PL model and
1536: once for the PL+NFW (+NFW) one. The two marginalized evidences with
1537: corresponding numerical errors can be compared from Table ~2. Despite a certain number of authors suggest
1538: the use of Jeffreys' scale \citep{Jeffreys61} for model comparison, we adopt here a
1539: more conservative criterion. In particular, we note that the
1540: perturbed model M$_1$ for the lens system $\rm L_0$ is basically
1541: consistent with a single smooth PL model M$_0$, with $\Delta{\cal
1542: {E}}\sim 7.85$. The Bayesian factor for the system $\rm L_4$ is of
1543: the order of $\Delta{\cal {E}} \sim 21.5$ in favour of the smooth
1544: model M$_0$, indicating that the detection of such a low-mass
1545: substructure can formally not be claimed at a significant level. The
1546: reason why we think this substructure is clearly visible in the
1547: grid-based results, is that this particular solution is the
1548: maximum-posterior (MP) solution, whereas the evidence gives the
1549: integral over the entire parameter space. This implies that there
1550: must be many solutions near the MP solution that do not show the
1551: substructure. This indicates that our approach of quantifying the
1552: evidence for substructure is very conservative. On the other hand
1553: the Bayes factor for the lens $\rm L_1$, $\Delta{\cal {E}} = -17.1
1554: $, clearly shows that the detection of a $10^7 M_\odot$ substructure
1555: can be significant when the latter is located at a different
1556: position on the ring. Finally all higher mass perturbations are
1557: easily detectable independently of their position relative to the
1558: image ring; Bayes factor for $\rm L_2$, $\rm L_9$, $\rm L_{12}$ and $\rm L_{13}$
1559: are, in fact, respectively $\Delta{\cal {E}} = -213.0 $,
1560: $\Delta{\cal {E}} = -2609.7$, $\Delta{\cal {E}} = -4603.4$ and $\Delta{\cal {E}} = -1835.7$.
1561: Substructure properties for these systems are also confidently
1562: recovered.
1563: The main difference between Jeffreys' scale and our criterion for
1564: quantifying the significance level of the substructure detection is observed
1565: for the system $\rm L_1$. If we had to adopt Jeffreys' scale in fact, such detection
1566: would have to be claimed decisive while we think it is only significant.
1567:
1568: \begin{figure}
1569: \begin{center}
1570: \includegraphics[width=8cm]{fig4}
1571: \caption{Results of the non-linear optimization for the smooth
1572: lens $\rm {L_0}$. The top-right panel shows the original mock
1573: data, while the top-left one shows the final
1574: reconstruction. On the second row the source reconstruction
1575: (left) and the image residuals (right) are shown.}
1576: \label{fig:best_smooth}
1577: \end{center}
1578: \end{figure}
1579:
1580:
1581: \subsection{Posterior probabilities}
1582:
1583: As discussed in Section (\ref{sec:nested sampling}) an interesting
1584: by-product of the Nested-Sampling procedure is an exploration of the
1585: posterior probability (\ref{equ:posterior_2}) which provides us with
1586: statistical errors on the model parameters, see Tables 3 and 4. The
1587: relative posterior probabilities for $\rm L_0$, $\rm L_1$ and $\rm
1588: L_2$ are plotted in Fig.~\ref{fig:smooth_weights},
1589: Fig.~\ref{fig:pert0001_weights} and
1590: Fig.~\ref{fig:pert001_weights} respectively. Lets start by
1591: considering the lens system $\rm L_0$ and the relative probability
1592: distribution for the substructure mass. Although the model M$_1$, in
1593: terms of marginalized evidence, is consistent with the single smooth
1594: PL model M$_0$, there is a small probability for the presence of a
1595: substructure with mass up to few $10^8 M_\odot$ located as far as
1596: possible from the ring. The effect of such objects on the lensed
1597: image would be very small and could be easily hidden by introducing
1598: artificial features in the source structure, as suggested by the
1599: posterior distributions for the source regularization constant.
1600: This means, that from the image point of view, a smooth single PL
1601: model and a perturbed PL+NWF with a substructure of $10^8 M_\odot$,
1602: located far from ring, are not distinguishable from each other as
1603: long as the effect of the perburber can be hidden in the structure
1604: of the source. From a probabilistic point of view, however, the second
1605: scenario is more unlikely to happen. A similar argument can be
1606: applied to the lens $\rm L_1$ for which a strong degeneracy between
1607: the mass and the position of the substructure is observed. We
1608: conclude therefore that, although this substructure can be detected
1609: at a statistically significant level, its mass and position cannot
1610: be confidently assessed yet. In contrast, for systems such as $\rm
1611: L_{2,9,12}$, the effect of the substructure is so strong that it can
1612: not be mimicked by the source structure or by a different
1613: combination of the substructure parameters. For these cases not only
1614: the detection is highly significant, but the properties of the
1615: perturber can be confidently constrained with minimal biases.
1616:
1617: \begin{table}
1618: \begin{center}
1619: \caption{marginalized evidence and corresponding standard
1620: deviation as obtained via the Nested-Sampling
1621: integration. Results are shown for the hypothesis of a smooth
1622: lens (PL) and the hypothesis of a clumpy lens potential
1623: (PL+NFW).}
1624: \begin{tabular}{cccc}
1625: \hline Lens&Model& $\log {\cal E} \,$&$\sigma_{{\log {\cal E}
1626: }}\,$\\ \hline $\rm L_0$ & PL & 26332.70&0.33\\ &
1627: PL+NFW &26324.85&0.30\\ \\ $\rm L_1$ & PL
1628: &20366.86&0.34\\ &PL+NFW&20383.95&0.30\\ \\ $\rm L_4$
1629: & PL &20292.40&0.33\\ & PL+NFW &20270.87& 0.29\\ \\
1630: $\rm L_9$ & PL &17669.41&0.45\\ & PL+NFW
1631: &20279.13&0.36\\ \\ $\rm L_{12}$ & PL
1632: &15786.91&0.33\\ & PL+NFW
1633: &20390.35&0.37\\ \\ $\rm L_{13}$ & PL
1634: &18509.76&0.24\\ & PL+2 NFW
1635: &20346.48&0.49\\ \hline
1636: \end{tabular}
1637: \label{tab:evidence}
1638: \end{center}
1639: \end{table}
1640:
1641: % \begin{table*}
1642: % \vbox to220mm{\vfil Landscape table to go here.
1643: % \caption{} \vfil}
1644: % \label{tab:results}
1645: % \end{table*}
1646:
1647: %\begin{table*}
1648: %\vbox to220mm{\vfil Landscape table to go here.
1649: %\caption{} \vfil}
1650: %\label{tab:results}
1651: % \end{table*}
1652:
1653: \begin{figure*}
1654: \begin{center}
1655: \includegraphics[width=0.45\hsize]{fig5a}
1656: \hfill
1657: \includegraphics[width=0.45\hsize]{fig5b}
1658: \caption{{\bf Left panel:} Results of the first non-linear
1659: reconstruction for the smooth component of the perturbed lens
1660: L$_1$. The top-right panel shows the original mock
1661: data, while the top-left one shows the final
1662: reconstruction. On the second row the source reconstruction
1663: (left) and the image residuals (right) are shown. {\bf Right
1664: panel:} Final results of the non-linear reconstruction for the
1665: perturbed lens L$_1$. The top-right panel shows the
1666: original mock data, while the top-left one shows the final
1667: model reconstruction obtained after a non-linear optimization
1668: involving the lens parameters and the substructure position
1669: and mass. The recovered source is plotted in the low-left
1670: panel. Image
1671: residuals (right) are shown.}
1672: \label{fig:best1_upr}
1673: \end{center}
1674: \end{figure*}
1675:
1676: \begin{figure*}
1677: \begin{center}
1678: \includegraphics[width=0.45\hsize]{fig6a}
1679: \hfill
1680: \includegraphics[width=0.45\hsize]{fig6b}
1681: \caption{Similar as Figure~\ref{fig:best1_upr} for L$_2$.}
1682: \label{fig:best1_upr_001}
1683: \end{center}
1684: \end{figure*}
1685:
1686:
1687:
1688: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1689:
1690: \section{Conclusions and Future work }
1691:
1692: We have introduced a fully Bayesian adaptive method for objectively
1693: detecting mass substructure in gravitational lens galaxies. The
1694: implemented method has the following specific features:
1695:
1696: \begin{itemize}
1697:
1698: \item Arbitrary imaging data-set defined on a regular grid can be
1699: modelled, as long as only lensed structure is included. The code
1700: is specifically tailored to high-resolution HST data-sets with a
1701: compact PSF that can be sampled by a small number of pixels.
1702:
1703: \item Different parametric two-dimensional mass-models can be used,
1704: with a set of free parameter $\bmath \eta$. Currently, we have
1705: implemented the elliptical power-law density models from
1706: \citet{Barkana98}, but other models can easily be included.
1707: Multiple parametric mass models can be simultaneously optimized.
1708:
1709: \item A grid-based correction to the parametric potential can
1710: iteratively be determined for any perturbation that can not easily
1711: be modelled within the chosen family of potential models (e.g.\
1712: warps, twists, mass-substructures, etc.).
1713:
1714: \item The source surface-brightness structure is determined on a
1715: fully adaptive Delaunay tessellation grid, which is updated with
1716: every change of the lens potential.
1717:
1718: \item Both model-parameter optimization and model ranking are fully
1719: embedded in a Bayesian framework. The method takes special care not
1720: to change the number of degrees of freedom during the
1721: optimization, which is ensured by the adaptive source grid. Methods
1722: with a fixed source surface-brightness grid can not do this.
1723:
1724: \item Both source and potential solutions are regularised, based on
1725: a smoothness criterion. The choice of regularization can be
1726: modified and the level of regularization is set by Bayesian
1727: optimization of the evidence. The data itself determine what
1728: level of regularization is needed. Hence overly smooth or overly
1729: irregular structure is automatically penalised.
1730:
1731: \item The maximum-posterior and the full marginalized probability
1732: distribution function of {\sl all} linear and non-linear
1733: parameters can be determined, marginalized over all other
1734: parameters (including regularization). Hence a full exploration
1735: of {\sl all} uncertainties of the model is undertaken.
1736:
1737: \item The full marginalized evidence (i.e.\ the probability of the
1738: model given the data) is calculated, which can be used to rank
1739: {\sl any} set of model assumptions (e.g. pixel size, PSF) or model
1740: families. In our case, we intend to compare smooth models with
1741: models that include mass substructure. The marginalized evidence
1742: implicitly includes Occam's razor and can be used to assess whether
1743: substructure or any other assumption is justified, compared to a
1744: null-hypothesis.
1745:
1746: \end{itemize}
1747:
1748: \noi The method has been tested and calibrated on a set of
1749: artificial but realistic lens systems, based on the
1750: lens system SLACS J1627$-$0055.
1751:
1752: \noi The ensemble of mock data consists of a smooth PL lens and
1753: thirteen clumpy models including one or two NFW substructures. Different values
1754: for the mass and the substructure position have been considered.
1755: Using the Bayesian optimization strategy developed in this paper we are
1756: able to recover the smooth PL system and all perturbed models with a
1757: substructure mass $ \ga 10^7 M_\odot$ when located at the lowest
1758: surface brightness point on the Einstein ring and with a mass $\geq
1759: 10^9 M_\odot$ when located just inside or outside the ring (i.e.\
1760: their Einstein rings need to overlap roughly). For all these models
1761: we have convincingly recovered the best set of non-linear parameters
1762: describing the lens potential and objectively set the level of
1763: regularization.
1764:
1765: \noi Furthermore, our implementation of the Nested-Sampling
1766: technique provides statistical errors for {\sl all} model parameters
1767: and allows us to objectively rank and compare different potential
1768: models in terms of Bayesian evidence, removing as much as possible
1769: any subjective choices. Any choice can quantitatively be
1770: ranked. For each of the lens systems we compare a complete smooth PL
1771: mass model with a perturbed PL+NFW (+NFW) one. The method here developed
1772: allows us to solve simultaneously for the lens potential and the
1773: lensed source. The latter, in particular, is reconstructed on an
1774: adaptive grid which is re-computed at every step of the
1775: optimization, allowing to take into account the correct number
1776: of degrees of freedom.
1777:
1778:
1779: \begin{figure*}
1780: \begin{center}
1781: \includegraphics[width=0.45\hsize]{fig7a}
1782: \hfill
1783: \includegraphics[width=0.45\hsize]{fig7b}
1784: \caption{ Similar as Figure~\ref{fig:best1_upr} for L$_{12}$.}
1785: \label{fig:best1_outside_01}
1786: \end{center}
1787: \end{figure*}
1788:
1789: \begin{figure*}
1790: \begin{center}
1791: \includegraphics[width=\hsize]{fig8}
1792: \caption{Results of the linear source and potential
1793: reconstruction for the lens L$_1$. The first row shows
1794: the original model (left), the reconstructed model (middle)
1795: and the current-best source, as well as the corresponding adaptive grid.
1796: On the second row the image
1797: residuals (left), the total potential convergence (middle) and
1798: the substructure convergence (right) are shown. Note
1799: that the substructure, although weak, is reconstructed at
1800: the correct position.}
1801: \label{fig:best1_sub_upr}
1802: \end{center}
1803: \end{figure*}
1804:
1805: \begin{figure*}
1806: \begin{center}
1807: \includegraphics[width=\hsize]{fig9}
1808: \caption{Similar as Figure~\ref{fig:best1_sub_upr} for L$_2$. We note
1809: that the substructure is extremely
1810: well reconstructed, both at the correct position and in mass.}
1811: \label{fig:best1_sub_upr_001}
1812: \end{center}
1813: \end{figure*}
1814:
1815:
1816: \begin{figure*}
1817: \begin{center}
1818: \subfigure[]{ \includegraphics[width=0.45\hsize]{fig10a}
1819: \label{fig:linear_a}
1820: }
1821: \hfill
1822: \subfigure[]{ \includegraphics[width=0.45\hsize]{fig10b}
1823: \label{fig:linear_b}
1824: }
1825:
1826: \subfigure[]{\centering \includegraphics[width=0.45\hsize]{fig10c}
1827: \label{fig:linear_c}
1828: }
1829:
1830: \caption{Non linear reconstruction of the lens $\rm L_{13}$ for a single PL model, a PL+NFW and
1831: a PL+2NFW one.}
1832: \label{fig:best_double}
1833: \end{center}
1834: \end{figure*}
1835:
1836: \begin{figure*}
1837: \begin{center}
1838: \includegraphics[width=\hsize]{fig11}
1839: \caption{Results of the first linear source and potential
1840: reconstruction for the lens L$_{13}$. The first row shows
1841: the original model (left), the reconstructed model (middle)
1842: and the image residuals. On the second row the current-best source (left), the total potential convergence (middle) and
1843: the substructure convergence (right) are shown. Note
1844: that the substructure, although weak, is reconstructed at
1845: the correct position.}
1846: \label{fig:best1_sub_double}
1847: \end{center}
1848: \end{figure*}
1849:
1850: \begin{figure*}
1851: \begin{center}
1852: \includegraphics[width=\hsize]{fig12}
1853: \caption{ Results of the second linear source and potential
1854: reconstruction for the lens L$_{13}$. }
1855: \label{fig:best2_sub_double}
1856: \end{center}
1857: \end{figure*}
1858:
1859: \noi In this paper we have considered systems which contains at most two CDM substructures. Although it may appear as a very small
1860: number when compared with predictions from N-body simulations within the virial radius, this represents a realistic scenario.
1861: As we have shown, our method, with current HST data, is mostly sensitive
1862: to perturbations with mass $\ga 10^7\rm M_\odot$ and located on the Einstein ring ($\Delta\theta\sim\mu\theta_{\rm ER}$).
1863: The projected volume that we are able to probe is therefore small compared to the projected volume within the virial radius.
1864: The probability that more than two substructures have this right combination of mass and position is relatively low and we expect most of the
1865: real systems to be dominated by one or at most two perturbers.
1866: \noi Despite these new results, further improvements can still be
1867: made. We think, for example, that an adaptive source grid based on surface
1868: brightness, rather than magnification, or a combination, could be
1869: more suitable for the scientific problem considered here.
1870:
1871: \noi The method will soon be applied to real systems, as for example
1872: from the \emph{Sloan Lens ACS Survey} sample of massive early-type galaxies
1873: \citep{Koopmans06,Bolton06,Treu06}. This will lead to powerful new
1874: constraints or limits on the fraction and mass distribution of
1875: substructure. Results will be compared with CDM simulations.
1876:
1877: \section*{Acknowledgements} The authors would like to thank Matteo
1878: Barnab\`e, Oliver Czoske, Antonaldo Diaferio, Phil Marshall, Sherry Suyu and the anonymous referee for useful
1879: discussions. They also thank the Kavli Institute for Theoretical Physics
1880: for hosting the gravitational lensing workshop in fall 2006, during which
1881: important parts of this work were made. SV and LVEK are supported (in part) through an
1882: NWO-VIDI program subsidy (project number 639.042.505).
1883:
1884:
1885: \bibliography{ms}
1886:
1887: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1888:
1889: \begin{figure*}
1890: \begin{center}
1891: \includegraphics[width=\hsize]{fig13}
1892: \caption{Posterior probability distributions for the non linear
1893: parameters of the smooth lens model $\rm L_0$ as obtained from
1894: the Nested-Sampling evidence exploration. In particular
1895: results for two different models are shown, a smooth PL
1896: potential (blue histograms) and a perturbed PL+NFW lens
1897: (black histograms). From up left, the lens strength, the
1898: position angle, the axis ratio, the slope, the logarithm of
1899: the source regularization constant, the substructure mass and
1900: position are plotted.}
1901: \label{fig:smooth_weights}
1902: \end{center}
1903: \end{figure*}
1904:
1905: \begin{figure*}
1906: \begin{center}
1907: \includegraphics[width=\hsize]{fig14}
1908: \caption{Similar as Figure~\ref{fig:smooth_weights} for L$_1$.}
1909: \label{fig:pert0001_weights} \end{center}
1910: \end{figure*}
1911:
1912: \begin{figure*}
1913: \begin{center}
1914: \includegraphics[width=\hsize]{fig15}
1915: \caption {Similar as Figure~\ref{fig:smooth_weights} for L$_2$.}
1916: \label{fig:pert001_weights}
1917: \end{center}
1918: \end{figure*}
1919:
1920:
1921: \begin{figure*}
1922: \begin{center}
1923: \includegraphics[width=\hsize]{table_3.ps}
1924: \end{center}
1925: \end{figure*}
1926:
1927: \begin{figure*}
1928: \begin{center}
1929: \includegraphics[width=\hsize]{table_4.ps}
1930: \end{center}
1931: \end{figure*}
1932:
1933:
1934: \clearpage
1935:
1936: \newpage \label{lastpage}
1937:
1938:
1939:
1940: \end{document}
1941: