cs0405101/TN1.tex
1: \documentclass{tlp}
2: \usepackage{color}
3: \usepackage{latexsym}
4: 
5: \title[Worst-Case Groundness Analysis using $\Def$]
6:       {Worst-Case Groundness Analysis using \\
7:        Definite Boolean Functions}
8: \author[Samir Genaim, Jacob M. Howe and Michael Codish]
9:        {SAMIR GENAIM\\
10:          Department of Computer Science\\
11:          Ben-Gurion University
12:        \and JACOB M. HOWE\\
13:          Computing Laboratory\\
14:          University of Kent
15:        \and  MICHAEL CODISH\\
16:          Department of Computer Science\\
17:          Ben-Gurion University}
18: \newcommand{\Pos}{\textsf{Pos}}
19: \newcommand{\Def}{\textsf{Def}}
20: \newtheorem{proposition}{Proposition}[section]
21: \newtheorem{theorem}{Theorem}[section]
22: 
23: \def \lquote    {[\![}
24: \def \rquote    {]\!]}
25: \def\qquote#1{\lquote #1 \rquote}
26: 
27: \begin{document}
28: 
29: \maketitle
30: \thispagestyle{empty}
31: \begin{abstract}
32:   This note illustrates theoretical worst-case scenarios for
33:   groundness analyses obtained through abstract interpretation over
34:   the abstract domains of \emph{definite} ($\Def$) and \emph{positive}
35:   ($\Pos$) Boolean functions.
36: %
37:   For $\Def$, an example is given for which any $\Def$-based abstract
38:   interpretation for groundness analysis follows a chain which is
39:   exponential in the number of argument positions as well as in the
40:   number of clauses but sub-exponential in the size of the program.
41: %
42:   For $\Pos$, we strengthen a previous result by illustrating an
43:   example for which any $\Pos$-based abstract interpretation for
44:   groundness analysis follows a chain which is exponential in the size
45:   of the program.
46: %
47:  It remains an open problem to determine if the worst case for $\Def$
48:  is really as bad as that for $\Pos$.
49: \end{abstract}
50: 
51: 
52: \section{Introduction}
53: 
54: Boolean functions play an important role in various formal methods for
55: specification, verification and analysis of software systems.  In
56: program analysis, Boolean functions are often used to approximate
57: properties of the set of states encountered at a given program point.
58: For example, a conjunction $x\wedge y$ could specify that variables
59: $x$ and $y$ satisfy some property whenever control reaches a given
60: program point. A Boolean function $\varphi_1 \rightarrow \varphi_2$
61: could specify that if $\varphi_1$ is satisfied at a program point
62: (perhaps depending on the unknown inputs to the program) then also
63: $\varphi_2$ is satisfied. A disjunction $\varphi_1 \vee\varphi_2$
64: could arise as a consequence of a branch in the control where
65: $\varphi_1$ and $\varphi_2$ approximate properties of the
66: \texttt{then} and \texttt{else} branches respectively.
67: 
68: For program analysis using Boolean functions, we often consider the
69: \emph{positive} Boolean functions, $\Pos$. Namely, those for which
70: $f(1,\ldots,1) = 1$ (denoting $false$ and $true$ by $0$ and $1$
71: respectively).  This restriction is natural as, due to the element of
72: approximation, the result of an analysis is not a ``\emph{yes/no}''
73: answer, but rather a ``\emph{yes/maybe not}'' answer. In this case
74: there is no ``negative'' information. Sophisticated $\Pos$-based
75: analyzers implemented using binary decision diagrams
76: \cite{Bryant:CS92} have been shown \cite{CLCVH95} to give good
77: experimental results with regards to precision as well as the
78: efficiency of the analyzers. However, scalability is a problem and
79: inputs (programs) for which the analysis requires an exponential
80: number of iterations or exponentially large data structures are
81: encountered \cite{codish-worstcase}.
82: 
83: The domain, $\Def$, of definite Boolean functions is a subdomain of
84: $\Pos$. These are the positive functions whose sets of models are
85: closed under intersection. The domain $\Def$ is less expressive than
86: $\Pos$. For example, the formula $x\lor y$ is not in $\Def$. However,
87: $\Def$-based analyzers can be implemented using less complex data
88: structures and can be faster than $\Pos$-based analyzers.
89: %
90: For goal dependent groundness analyses (where a description of the
91: inputs to the program being analyzed is given) $\Def$ has been shown
92: to provide a reasonable tradeoff between efficiency and precision
93: \cite{King:esop99, King:esop00}.
94: 
95: The work described in \cite{codish-worstcase} illustrates a series of
96: pathological inputs for $\Pos$-based groundness analysis. That paper
97: defines a predicate $chain(x_1,\ldots,x_n)$ using $n$ clauses and
98: illustrates that its $\Pos$-based groundness analysis requires $2^{n}$
99: iterations. However, given that the size of the program (the total
100: number of arguments), is quadratic in $n$ ($m=n^2+n$), the
101: number of iterations is sub-exponential in the size of the input
102: ($2^{n}$ or $2^{O(\sqrt{m})}$).
103: %
104: It has been suggested that $\Def$ analyses might provide better
105: scalability properties than $\Pos$ due to the restriction to functions
106: whose models are closed under intersection.
107: %
108: This note makes two contributions:
109: \begin{enumerate}
110: \item It demonstrates that the worst-case behavior of a $\Def$-based
111:   analysis is (at least) as bad as that described in
112:   \cite{codish-worstcase} for $\Pos$-based analyses; and
113: \item It demonstrates that the worst-case behavior of a $\Pos$-based
114:   analysis is exponential in the size of the input.
115: \end{enumerate}
116: % 
117: We have not succeeded to demonstrate a worst-case analysis for $\Def$
118: for which the number of iterations is exponential in the size of the
119: input, nor to prove that $\Def$-based groundness analysis has
120: sub-exponential worst-case behaviour. This remains an open problem.
121: 
122: \section{A potential worst-case for $\Def$}\label{s2}
123: 
124: Consider an $n$-ary Boolean function $f$. A model $M$ of $f$ can be
125: viewed as a sequence $(b_1,\ldots,b_n)$ of zero's and one's such that
126: $f(b_1,\ldots,b_n)=1$. 
127: %
128: For the sake of our construction, we order $n$-ary models according to
129: their value as $n$-digit binary numbers. So a model $M_1$ is smaller
130: or equal to a model $M_2$ if and only if the binary number
131: corresponding to $M_1$ is less or equal to the binary number
132: corresponding to $M_2$.
133: %
134: The intersection of models is defined as usual so that
135: $(a_1,\ldots,a_n) \cap (b_1,\ldots,b_n) = (c_1,\ldots,c_n)$ where
136: $c_i=1$ if and only if $a_i=b_i=1$.
137: 
138: Let us first comment on the series of programs which demonstrates the
139: potential worst-case behavior of a $\Pos$-based groundness analysis
140: \cite{codish-worstcase}. The analysis of the predicate $chain/n$
141: enumerates the models of the (constant) $n$-ary Boolean function $1$
142: ($true$) in reverse order. Starting from the initial approximation
143: (which has no models), each consecutive approximation is a function
144: which has one new model that was not in the previous iteration. For
145: example, when $n=3$, the models accumulate in the following order:
146: $(1,1,1),(1,1,0), (1,0,1), (1,0,0),\ldots,(0,0,0)$ and the
147: $\Pos$-based analysis totals 8 iterations.
148: %
149: In contrast the corresponding $\Def$-based analysis totals 4
150: iterations because at each iteration the current set of models is
151: closed under intersection. So for example, in the third iteration, the
152: set $\{(1,1,1),(1,1,0), (1,0,1)\}$ is closed to give
153: $\{(1,1,1),(1,1,0), (1,0,1),(1,0,0)\}$.
154: 
155: We now construct a series of programs which demonstrates the potential
156: worst-case behavior of a $\Def$-based groundness analysis. This
157: construction is based on the following observation:
158: 
159: \begin{proposition}\label{prop1}
160: Let $M$ be an $n$-ary model. Then the set of $n$-ary models
161: smaller or equal to $M$ is closed under intersection.
162: \end{proposition}
163: 
164: \begin{proof}
165:   The result follows from the following observation: If $M_1$ and
166:   $M_2$ are $n$-ary models, then $M_1\cap M_2$ is no larger than $M_1$
167:   (and no larger than $M_2$). This is because $M_1\cap M_2$ is
168:   obtained from $M_1$ (or from $M_2$) by changing some one's to
169:   zero's. 
170: \end{proof}
171: 
172: A consequence of Proposition \ref{prop1} is that the domain of
173: definite Boolean functions over $n$ variables contains a chain of
174: length $2^n$. To demonstrate such a chain consider an enumeration
175: $M_0,\ldots,M_{2^n-1}$ of the $n$-ary models according to their binary
176: ordering (so $M_0=(0,\ldots,0)$ and $M_{2^n-1}=(1,\ldots,1)$). Observe
177: that $M_i$ is the $n$-ary binary representation of $i$.  Define a
178: sequence $F=(f_0,\ldots,f_{2^n-1})$ as follows: let $f_0$ be the
179: Boolean function with the empty set of models and for $0<i<2^n-1$
180: define $f_i$ to be the Boolean function whose models are
181: $\{M_0,\ldots,M_{i-1}\} \cup \{M_{2^n-1}\}$.
182: %
183: >From the construction it is clear that $F$ forms a chain. Moreover,
184: the elements of $F$ are in $\Def$: They are positive because they have
185: $M_{2^n-1}$ as a model; and from Proposition \ref{prop1}, it follows
186: that they are closed under intersection. The chain $F$ is of length
187: $2^n-1$ because, for $1<i<2^n-1$ $f_i$ has exactly one model more than
188: $f_{i-1}$.
189: %
190: This is the setting for our construction.
191: 
192: The ($\Def$-based) groundness analysis of the following predicate
193: ``$p/n$'' iterates through the chain $F$. The arguments typeset in
194: boldface highlight the case for $n=4$.  The program size is quadratic
195: in $n$ and consists of a single predicate of arity $n$ with $n+1$
196: binary clauses. The analysis of the program can be viewed as counting
197: from zero to $2^n-2$ in its arguments.
198: 
199: {\small
200: \definecolor{light}{gray}{0.25}
201: %\def\m#1{ \mbox{\small $#1$}}
202: \def\m#1{ \mbox{\textcolor{light}{$#1$}}}
203: \[\begin{array}{lcrclcl}
204: p(\m{X_n},    &\m{\ldots,}& \bf X_4, X_3, X_2, c )&\leftarrow& 
205:               p(\m{X_n,}   &\m{\ldots,}& \bf X_4, X_3, X_2, X_1).\\
206: p(\m{X_{n-1},}&\m{\ldots,}& \bf X_3, X_2, c , X_1)&\leftarrow& 
207:               p(\m{X_{n-1},} &\m{\ldots,}& \bf X_3, X_2, X_1, c ).\\
208: p(\m{X_{n-2},}&\m{\ldots,}& \bf X_2, c , X_1, X_1)&\leftarrow& 
209:               p(\m{X_{n-2},} &\m{\ldots,}& \bf X_2, X_1, c , c ).\\
210: &\vdots\\
211: p(\bf c  , X_1,&\m{\ldots,}& \bf X_1, X_1)&\leftarrow& 
212:               p(\bf X_1, c , &\m{\ldots,}& \bf c , c ).\\
213: p(\bf X_1, X_1,&\m{\ldots,}& \bf X_1, X_1)&\hspace{-8mm}.&
214: \end{array}\]
215: }
216: 
217: % \section{Discussion}
218: 
219: % There is a subtle difference between the series of example programs
220: % given in \cite{codish-worstcase} and the one demonstrated in this
221: % note. Although conceptually, we switch the role of ``true'' and
222: % ``false''.
223: 
224: 
225: \section{A Challenge}
226: 
227: The $\Def$- and $\Pos$-based groundness analyses of the predicate
228: $p/n$ program in the series given in this note involve an exponential
229: number of iterations and compute an $n$-ary Boolean function. The same
230: is true for the $\Pos$-based analysis of the series given in
231: \cite{codish-worstcase}.
232: %
233: However, it is important to note that complexity is typically
234: expressed in terms of the size of the input to a problem and that the
235: size of the program defining $p/n$ in both series is quadratic in $n$
236: ($m=n^2+n$). Hence, formally speaking, we have shown that both $\Def$
237: and $\Pos$-based groundness analyses may potentially involve a number
238: of iterations which is $2^{O(\sqrt{m})}$.  This is bad enough, but
239: sub-exponential.
240: 
241: For $\Pos$, we can strengthen the result. The following program is of
242: size linear in $n$ ($m=11\cdot n$) and its $\Pos$-based groundness
243: analysis requires $2^n-2$ iterations.
244: 
245: {\small
246: \[\begin{array}{l}
247: \begin{array}{l}
248: p(X_1, \ldots, X_1). \\
249: p(A_1, \ldots, A_n) ~\leftarrow~
250:          p(B_1, \ldots, B_n), ~s(A_1, \ldots, A_n,~B_1, \ldots, B_n).\\
251: \end{array} \\[2ex]
252: %
253: \begin{array}{llcll}
254: s(c~, X_1, \ldots, X_1, & X_1, c~, \ldots,  c~ ).\\
255: s(W, A_1, \ldots,A_{n-1}, & W, B_1, \ldots,B_{n-1}) & \leftarrow &
256:          s(A_1,\ldots,A_n, & B_1,\ldots,B_n).\\
257: \end{array}
258: \end{array}
259: \]
260: }
261: 
262: Intuitively, the $2n$ arguments of the predicate $s/2n$ represent two
263: $n$-digit binary numbers (the first is the successor of the second) so
264: that the $n$ recursive clauses from the program in Section \ref{s2}
265: can be simulated by two clauses for $s/2n$. The base case of $s/2n$
266: corresponds to the last recursive clause. However, the analysis of
267: $s/2n$ does not follow an exponential chain so we still need the
268: predicate $p/n$ to get the worst-case behaviour.
269: %
270: This approach does not work for $\Def$ because the result in $\Pos$
271: for $s/2n$ is not closed under intersection. 
272: 
273: 
274: 
275: \section{Conclusion}
276: 
277: We have demonstrated a $2^{O(m)}$ worst case complexity for $\Pos$ and
278: at least $2^{O(\sqrt{m})}$ for $\Def$ (where $m$ is the size of the
279: program). It remains to be determined if the worst case for $\Def$ is
280: really as bad as that for $\Pos$ or perhaps $\Def$ has better
281: worst-case behaviour.
282: 
283: \begin{theorem}
284: Groundness analysis using \Def\ has a potentional
285: worst-case behaviour involving $2^{O(\sqrt{m})}$ iterations, where $m$
286: is the size of the program.    
287: \end{theorem}
288: 
289: \begin{theorem}
290: Groundness analysis using \Pos\ has a
291:   worst-case behaviour involving $2^{O(m)}$ iterations, where $m$ is
292:   the size of the program.  
293: \end{theorem}
294: 
295: \bibliographystyle{tlp}
296: 
297: %\bibliography{worst}
298: 
299: \begin{thebibliography}{}
300: 
301: \bibitem[\protect\citename{Bryant, }1992]{Bryant:CS92}
302: Bryant, R. (1992).
303: \newblock Symbolic {Boolean} manipulation with ordered binary-decision
304:   diagrams.
305: \newblock {\em Acm computing surveys}, {\bf 24}(3), 293--318.
306: 
307: \bibitem[\protect\citename{Codish, }1999]{codish-worstcase}
308: Codish, Michael. (1999).
309: \newblock Worst-case groundness analysis using positive {Boolean} functions.
310: \newblock {\em The journal of logic programming}, {\bf 41}(1), 125--128.
311: 
312: \bibitem[\protect\citename{Howe \& King, }2000]{King:esop00}
313: Howe, J.~M., \& King, A. (2000).
314: \newblock Implementing groundness analysis with definite {B}oolean functions.
315: \newblock {\em Pages  200--214 of:} Smolka, G. (ed), {\em European symposium on
316:   programming}.
317: \newblock Lecture Notes in Computer Science, vol. 1782.
318: \newblock Springer-Verlag.
319: 
320: \bibitem[\protect\citename{King {\em et~al.}\relax, }1999]{King:esop99}
321: King, Andy, Smaus, Jan-Georg, \& Hill, Pat. (1999).
322: \newblock Quotienting share for dependency analysis.
323: \newblock {\em Pages  59--73 of:} Swierstra, Doaitse (ed), {\em European
324:   symposium on programming}.
325: \newblock Lecture Notes in Computer Science, vol. 1576.
326: \newblock Springer-Verlag.
327: 
328: \bibitem[\protect\citename{{Van Hentenryck} {\em et~al.}\relax, }1995]{CLCVH95}
329: {Van Hentenryck}, P., Cortesi, A., \& {Le Charlier}, B. (1995).
330: \newblock Evaluation of the domain {$Prop$}.
331: \newblock {\em Journal of logic programming}, {\bf 23}(3), 237--278.
332: 
333: \end{thebibliography}
334: 
335: 
336: 
337: \end{document}
338: 
339: 
340: 
341: 
342: 
343: 
344: 
345: