0405:cs0405101/TN1.tex

1: \documentclass{tlp}

2: \usepackage{color}

3: \usepackage{latexsym}

4:

5: \title[Worst-Case Groundness Analysis using $\Def$]

6:       {Worst-Case Groundness Analysis using \\

7:        Definite Boolean Functions}

8: \author[Samir Genaim, Jacob M. Howe and Michael Codish]

9:        {SAMIR GENAIM\\

10:          Department of Computer Science\\

11:          Ben-Gurion University

12:        \and JACOB M. HOWE\\

13:          Computing Laboratory\\

14:          University of Kent

15:        \and  MICHAEL CODISH\\

16:          Department of Computer Science\\

17:          Ben-Gurion University}

18: \newcommand{\Pos}{\textsf{Pos}}

19: \newcommand{\Def}{\textsf{Def}}

20: \newtheorem{proposition}{Proposition}[section]

21: \newtheorem{theorem}{Theorem}[section]

22:

23: \def \lquote    {[\![}

24: \def \rquote    {]\!]}

25: \def\qquote#1{\lquote #1 \rquote}

26:

27: \begin{document}

28:

29: \maketitle

30: \thispagestyle{empty}

31: \begin{abstract}

32:   This note illustrates theoretical worst-case scenarios for

33:   groundness analyses obtained through abstract interpretation over

34:   the abstract domains of \emph{definite} ($\Def$) and \emph{positive}

35:   ($\Pos$) Boolean functions.

36: %

37:   For $\Def$, an example is given for which any $\Def$-based abstract

38:   interpretation for groundness analysis follows a chain which is

39:   exponential in the number of argument positions as well as in the

40:   number of clauses but sub-exponential in the size of the program.

41: %

42:   For $\Pos$, we strengthen a previous result by illustrating an

43:   example for which any $\Pos$-based abstract interpretation for

44:   groundness analysis follows a chain which is exponential in the size

45:   of the program.

46: %

47:  It remains an open problem to determine if the worst case for $\Def$

48:  is really as bad as that for $\Pos$.

49: \end{abstract}

50:

51:

52: \section{Introduction}

53:

54: Boolean functions play an important role in various formal methods for

55: specification, verification and analysis of software systems.  In

56: program analysis, Boolean functions are often used to approximate

57: properties of the set of states encountered at a given program point.

58: For example, a conjunction $x\wedge y$ could specify that variables

59: $x$ and $y$ satisfy some property whenever control reaches a given

60: program point. A Boolean function $\varphi_1 \rightarrow \varphi_2$

61: could specify that if $\varphi_1$ is satisfied at a program point

62: (perhaps depending on the unknown inputs to the program) then also

63: $\varphi_2$ is satisfied. A disjunction $\varphi_1 \vee\varphi_2$

64: could arise as a consequence of a branch in the control where

65: $\varphi_1$ and $\varphi_2$ approximate properties of the

66: \texttt{then} and \texttt{else} branches respectively.

67:

68: For program analysis using Boolean functions, we often consider the

69: \emph{positive} Boolean functions, $\Pos$. Namely, those for which

70: $f(1,\ldots,1) = 1$ (denoting $false$ and $true$ by $0$ and $1$

71: respectively).  This restriction is natural as, due to the element of

72: approximation, the result of an analysis is not a ``\emph{yes/no}''

73: answer, but rather a ``\emph{yes/maybe not}'' answer. In this case

74: there is no ``negative'' information. Sophisticated $\Pos$-based

75: analyzers implemented using binary decision diagrams

76: \cite{Bryant:CS92} have been shown \cite{CLCVH95} to give good

77: experimental results with regards to precision as well as the

78: efficiency of the analyzers. However, scalability is a problem and

79: inputs (programs) for which the analysis requires an exponential

80: number of iterations or exponentially large data structures are

81: encountered \cite{codish-worstcase}.

82:

83: The domain, $\Def$, of definite Boolean functions is a subdomain of

84: $\Pos$. These are the positive functions whose sets of models are

85: closed under intersection. The domain $\Def$ is less expressive than

86: $\Pos$. For example, the formula $x\lor y$ is not in $\Def$. However,

87: $\Def$-based analyzers can be implemented using less complex data

88: structures and can be faster than $\Pos$-based analyzers.

89: %

90: For goal dependent groundness analyses (where a description of the

91: inputs to the program being analyzed is given) $\Def$ has been shown

92: to provide a reasonable tradeoff between efficiency and precision

93: \cite{King:esop99, King:esop00}.

94:

95: The work described in \cite{codish-worstcase} illustrates a series of

96: pathological inputs for $\Pos$-based groundness analysis. That paper

97: defines a predicate $chain(x_1,\ldots,x_n)$ using $n$ clauses and

98: illustrates that its $\Pos$-based groundness analysis requires $2^{n}$

99: iterations. However, given that the size of the program (the total

100: number of arguments), is quadratic in $n$ ($m=n^2+n$), the

101: number of iterations is sub-exponential in the size of the input

102: ($2^{n}$ or $2^{O(\sqrt{m})}$).

103: %

104: It has been suggested that $\Def$ analyses might provide better

105: scalability properties than $\Pos$ due to the restriction to functions

106: whose models are closed under intersection.

107: %

108: This note makes two contributions:

109: \begin{enumerate}

110: \item It demonstrates that the worst-case behavior of a $\Def$-based

111:   analysis is (at least) as bad as that described in

112:   \cite{codish-worstcase} for $\Pos$-based analyses; and

113: \item It demonstrates that the worst-case behavior of a $\Pos$-based

114:   analysis is exponential in the size of the input.

115: \end{enumerate}

116: %

117: We have not succeeded to demonstrate a worst-case analysis for $\Def$

118: for which the number of iterations is exponential in the size of the

119: input, nor to prove that $\Def$-based groundness analysis has

120: sub-exponential worst-case behaviour. This remains an open problem.

121:

122: \section{A potential worst-case for $\Def$}\label{s2}

123:

124: Consider an $n$-ary Boolean function $f$. A model $M$ of $f$ can be

125: viewed as a sequence $(b_1,\ldots,b_n)$ of zero's and one's such that

126: $f(b_1,\ldots,b_n)=1$.

127: %

128: For the sake of our construction, we order $n$-ary models according to

129: their value as $n$-digit binary numbers. So a model $M_1$ is smaller

130: or equal to a model $M_2$ if and only if the binary number

131: corresponding to $M_1$ is less or equal to the binary number

132: corresponding to $M_2$.

133: %

134: The intersection of models is defined as usual so that

135: $(a_1,\ldots,a_n) \cap (b_1,\ldots,b_n) = (c_1,\ldots,c_n)$ where

136: $c_i=1$ if and only if $a_i=b_i=1$.

137:

138: Let us first comment on the series of programs which demonstrates the

139: potential worst-case behavior of a $\Pos$-based groundness analysis

140: \cite{codish-worstcase}. The analysis of the predicate $chain/n$

141: enumerates the models of the (constant) $n$-ary Boolean function $1$

142: ($true$) in reverse order. Starting from the initial approximation

143: (which has no models), each consecutive approximation is a function

144: which has one new model that was not in the previous iteration. For

145: example, when $n=3$, the models accumulate in the following order:

146: $(1,1,1),(1,1,0), (1,0,1), (1,0,0),\ldots,(0,0,0)$ and the

147: $\Pos$-based analysis totals 8 iterations.

148: %

149: In contrast the corresponding $\Def$-based analysis totals 4

150: iterations because at each iteration the current set of models is

151: closed under intersection. So for example, in the third iteration, the

152: set $\{(1,1,1),(1,1,0), (1,0,1)\}$ is closed to give

153: $\{(1,1,1),(1,1,0), (1,0,1),(1,0,0)\}$.

154:

155: We now construct a series of programs which demonstrates the potential

156: worst-case behavior of a $\Def$-based groundness analysis. This

157: construction is based on the following observation:

158:

159: \begin{proposition}\label{prop1}

160: Let $M$ be an $n$-ary model. Then the set of $n$-ary models

161: smaller or equal to $M$ is closed under intersection.

162: \end{proposition}

163:

164: \begin{proof}

165:   The result follows from the following observation: If $M_1$ and

166:   $M_2$ are $n$-ary models, then $M_1\cap M_2$ is no larger than $M_1$

167:   (and no larger than $M_2$). This is because $M_1\cap M_2$ is

168:   obtained from $M_1$ (or from $M_2$) by changing some one's to

169:   zero's.

170: \end{proof}

171:

172: A consequence of Proposition \ref{prop1} is that the domain of

173: definite Boolean functions over $n$ variables contains a chain of

174: length $2^n$. To demonstrate such a chain consider an enumeration

175: $M_0,\ldots,M_{2^n-1}$ of the $n$-ary models according to their binary

176: ordering (so $M_0=(0,\ldots,0)$ and $M_{2^n-1}=(1,\ldots,1)$). Observe

177: that $M_i$ is the $n$-ary binary representation of $i$.  Define a

178: sequence $F=(f_0,\ldots,f_{2^n-1})$ as follows: let $f_0$ be the

179: Boolean function with the empty set of models and for $0<i<2^n-1$

180: define $f_i$ to be the Boolean function whose models are

181: $\{M_0,\ldots,M_{i-1}\} \cup \{M_{2^n-1}\}$.

182: %

183: >From the construction it is clear that $F$ forms a chain. Moreover,

184: the elements of $F$ are in $\Def$: They are positive because they have

185: $M_{2^n-1}$ as a model; and from Proposition \ref{prop1}, it follows

186: that they are closed under intersection. The chain $F$ is of length

187: $2^n-1$ because, for $1<i<2^n-1$ $f_i$ has exactly one model more than

188: $f_{i-1}$.

189: %

190: This is the setting for our construction.

191:

192: The ($\Def$-based) groundness analysis of the following predicate

193: ``$p/n$'' iterates through the chain $F$. The arguments typeset in

194: boldface highlight the case for $n=4$.  The program size is quadratic

195: in $n$ and consists of a single predicate of arity $n$ with $n+1$

196: binary clauses. The analysis of the program can be viewed as counting

197: from zero to $2^n-2$ in its arguments.

198:

199: {\small

200: \definecolor{light}{gray}{0.25}

201: %\def\m#1{ \mbox{\small $#1$}}

202: \def\m#1{ \mbox{\textcolor{light}{$#1$}}}

203: \[\begin{array}{lcrclcl}

204: p(\m{X_n},    &\m{\ldots,}& \bf X_4, X_3, X_2, c )&\leftarrow&

205:               p(\m{X_n,}   &\m{\ldots,}& \bf X_4, X_3, X_2, X_1).\\

206: p(\m{X_{n-1},}&\m{\ldots,}& \bf X_3, X_2, c , X_1)&\leftarrow&

207:               p(\m{X_{n-1},} &\m{\ldots,}& \bf X_3, X_2, X_1, c ).\\

208: p(\m{X_{n-2},}&\m{\ldots,}& \bf X_2, c , X_1, X_1)&\leftarrow&

209:               p(\m{X_{n-2},} &\m{\ldots,}& \bf X_2, X_1, c , c ).\\

210: &\vdots\\

211: p(\bf c  , X_1,&\m{\ldots,}& \bf X_1, X_1)&\leftarrow&

212:               p(\bf X_1, c , &\m{\ldots,}& \bf c , c ).\\

213: p(\bf X_1, X_1,&\m{\ldots,}& \bf X_1, X_1)&\hspace{-8mm}.&

214: \end{array}\]

215: }

216:

217: % \section{Discussion}

218:

219: % There is a subtle difference between the series of example programs

220: % given in \cite{codish-worstcase} and the one demonstrated in this

221: % note. Although conceptually, we switch the role of ``true'' and

222: % ``false''.

223:

224:

225: \section{A Challenge}

226:

227: The $\Def$- and $\Pos$-based groundness analyses of the predicate

228: $p/n$ program in the series given in this note involve an exponential

229: number of iterations and compute an $n$-ary Boolean function. The same

230: is true for the $\Pos$-based analysis of the series given in

231: \cite{codish-worstcase}.

232: %

233: However, it is important to note that complexity is typically

234: expressed in terms of the size of the input to a problem and that the

235: size of the program defining $p/n$ in both series is quadratic in $n$

236: ($m=n^2+n$). Hence, formally speaking, we have shown that both $\Def$

237: and $\Pos$-based groundness analyses may potentially involve a number

238: of iterations which is $2^{O(\sqrt{m})}$.  This is bad enough, but

239: sub-exponential.

240:

241: For $\Pos$, we can strengthen the result. The following program is of

242: size linear in $n$ ($m=11\cdot n$) and its $\Pos$-based groundness

243: analysis requires $2^n-2$ iterations.

244:

245: {\small

246: \[\begin{array}{l}

247: \begin{array}{l}

248: p(X_1, \ldots, X_1). \\

249: p(A_1, \ldots, A_n) ~\leftarrow~

250:          p(B_1, \ldots, B_n), ~s(A_1, \ldots, A_n,~B_1, \ldots, B_n).\\

251: \end{array} \\[2ex]

252: %

253: \begin{array}{llcll}

254: s(c~, X_1, \ldots, X_1, & X_1, c~, \ldots,  c~ ).\\

255: s(W, A_1, \ldots,A_{n-1}, & W, B_1, \ldots,B_{n-1}) & \leftarrow &

256:          s(A_1,\ldots,A_n, & B_1,\ldots,B_n).\\

257: \end{array}

258: \end{array}

259: \]

260: }

261:

262: Intuitively, the $2n$ arguments of the predicate $s/2n$ represent two

263: $n$-digit binary numbers (the first is the successor of the second) so

264: that the $n$ recursive clauses from the program in Section \ref{s2}

265: can be simulated by two clauses for $s/2n$. The base case of $s/2n$

266: corresponds to the last recursive clause. However, the analysis of

267: $s/2n$ does not follow an exponential chain so we still need the

268: predicate $p/n$ to get the worst-case behaviour.

269: %

270: This approach does not work for $\Def$ because the result in $\Pos$

271: for $s/2n$ is not closed under intersection.

272:

273:

274:

275: \section{Conclusion}

276:

277: We have demonstrated a $2^{O(m)}$ worst case complexity for $\Pos$ and

278: at least $2^{O(\sqrt{m})}$ for $\Def$ (where $m$ is the size of the

279: program). It remains to be determined if the worst case for $\Def$ is

280: really as bad as that for $\Pos$ or perhaps $\Def$ has better

281: worst-case behaviour.

282:

283: \begin{theorem}

284: Groundness analysis using \Def\ has a potentional

285: worst-case behaviour involving $2^{O(\sqrt{m})}$ iterations, where $m$

286: is the size of the program.

287: \end{theorem}

288:

289: \begin{theorem}

290: Groundness analysis using \Pos\ has a

291:   worst-case behaviour involving $2^{O(m)}$ iterations, where $m$ is

292:   the size of the program.

293: \end{theorem}

294:

295: \bibliographystyle{tlp}

296:

297: %\bibliography{worst}

298:

299: \begin{thebibliography}{}

300:

301: \bibitem[\protect\citename{Bryant, }1992]{Bryant:CS92}

302: Bryant, R. (1992).

303: \newblock Symbolic {Boolean} manipulation with ordered binary-decision

304:   diagrams.

305: \newblock {\em Acm computing surveys}, {\bf 24}(3), 293--318.

306:

307: \bibitem[\protect\citename{Codish, }1999]{codish-worstcase}

308: Codish, Michael. (1999).

309: \newblock Worst-case groundness analysis using positive {Boolean} functions.

310: \newblock {\em The journal of logic programming}, {\bf 41}(1), 125--128.

311:

312: \bibitem[\protect\citename{Howe \& King, }2000]{King:esop00}

313: Howe, J.~M., \& King, A. (2000).

314: \newblock Implementing groundness analysis with definite {B}oolean functions.

315: \newblock {\em Pages  200--214 of:} Smolka, G. (ed), {\em European symposium on

316:   programming}.

317: \newblock Lecture Notes in Computer Science, vol. 1782.

318: \newblock Springer-Verlag.

319:

320: \bibitem[\protect\citename{King {\em et~al.}\relax, }1999]{King:esop99}

321: King, Andy, Smaus, Jan-Georg, \& Hill, Pat. (1999).

322: \newblock Quotienting share for dependency analysis.

323: \newblock {\em Pages  59--73 of:} Swierstra, Doaitse (ed), {\em European

324:   symposium on programming}.

325: \newblock Lecture Notes in Computer Science, vol. 1576.

326: \newblock Springer-Verlag.

327:

328: \bibitem[\protect\citename{{Van Hentenryck} {\em et~al.}\relax, }1995]{CLCVH95}

329: {Van Hentenryck}, P., Cortesi, A., \& {Le Charlier}, B. (1995).

330: \newblock Evaluation of the domain {$Prop$}.

331: \newblock {\em Journal of logic programming}, {\bf 23}(3), 237--278.

332:

333: \end{thebibliography}

334:

335:

336:

337: \end{document}

338:

339:

340:

341:

342:

343:

344:

345: