0207:cs0207049/p08.tex

1: \NeedsTeXFormat{LaTeX2e}

2: \documentclass[runningheads]{llncs}

3:

4: \usepackage{latexsym}

5: \usepackage{pstricks,pst-node,pst-text,pst-3d}

6: \renewcommand{\floatpagefraction}{0.90}

7: %\newcounter{08example}

8: \newenvironment{08example}{\begin{example}}{\end{example}}

9: \newcommand{\reach}{\stackrel{\tiny reach}{\longrightarrow}}

10: \newcommand{\pow}[1]{2^{#1}}

11: \newcommand{\ssubseteq}{\widehat{\sqsubseteq}}

12: \newcommand{\scup}{\widehat{\sqcup}}

13: \newcommand{\scap}{\widehat{\sqcap}}

14: \newcommand{\lcons}{\mbox{\tt '.'}}

15: \newcommand{\abigcup}[1]{\stackrel{#1}{\bigcup}}

16:

17: \begin{document}

18: \setcounter{page}{63}

19: \title{More Precise Yet Efficient Type Inference for Logic Programs}

20: \titlerunning{More Precise Yet Efficient Type Inference for Logic Programs}

21: \author{Claudio Vaucheret \and Francisco Bueno}

22: \authorrunning{C. Vaucheret, F. Bueno}

23: \institute{Technical University of Madrid (UPM), Spain\\

24:            \email{\{claudio,bueno\}@fi.upm.es}}

25:

26: \maketitle

27:

28: \addtocounter{footnote}{1}

29: \footnotetext{In Alexandre Tessier (Ed), proceedings of the 12th International Workshop on Logic Programming Environments (WLPE 2002), July 2002, Copenhagen, Denmark.\\Proceedings of WLPE 2002: \texttt{http://xxx.lanl.gov/html/cs/0207052} (CoRR)}

30:

31: \begin{abstract}

32: Type analyses of logic programs which aim at inferring the types of

33: the program being analyzed are presented in a unified abstract

34: interpretation-based framework. This covers most classical abstract

35: interpretation-based type analyzers for logic programs, built on

36: either top-down or bottom-up

37: interpretation of the program. In this setting, we discuss the

38: widening operator, arguably a crucial one. We present a new widening

39: which is more precise than those previously proposed. Practical

40: results with our analysis domain are also presented, showing that it

41: also allows for efficient analysis.

42: \end{abstract}

43:

44: \section{Introduction}

45:

46: In type analyses, the widening operation has much influence in the

47: results. If the widening is too aggressive in making approximations

48: then the analysis results may be too imprecise. On the other hand,

49: if it is not sufficiently aggressive then the analysis may become

50: too inefficient.

51:

52: Widening operators are aimed at identifying the recursive structure

53: of the types being inferred. All widenings already proposed

54: in the literature are based on locating type nodes with the same

55: functors, which are possible sources of recursion. However, they

56: disregard whether such nodes come in fact from a recursive structure

57: in the program or not. This may originate an unnecessary loss of

58: precision, since the widening result may then impose a recursive

59: structure on the resulting type in argument positions where the

60: concrete program is in fact not recursive. We propose a widening

61: operator to try to remedy this problem.

62:

63: We present our widening operator for regular type inference in an

64: analysis framework based on abstract interpretation of the program.

65: In order for the paper to be self contained, we first revisit

66: regular types (Section~\ref{sec:regtypes}) and, in particular,

67: deterministic ones. We focus on deterministic types for ease of

68: presentation; however, there is nothing in our widening which prevents

69: it to be applicable also to non-deterministic types. The abstract

70: interpretation framework is set up in

71: Section~\ref{sec:domain}. Section~\ref{sec:widenings}

72: reviews previous widenings in the literature, and

73: Section~\ref{sec:grammarwithnames} presents ours. In

74: Section~\ref{sec:results} experimental results are presented, and

75: Section~\ref{sec:conclusions} concludes and discusses future work.

76:

77: \section{Regular Types}

78: \label{sec:regtypes}

79:

80: A {\em regular type}~\cite{Dart-Zobel} is a type representing a class

81: of terms that can be described by a regular term grammar.

82: A \emph{regular term grammar}, or grammar for short, describes a set of

83: finite terms constructed from a finite alphabet ${\cal F}$ of

84: \emph{ranked function symbols} or \emph{functors}.

85: %

86: A grammar $G = (S,{\cal T},{\cal F},{\cal R})$ consists of a

87: set of non-terminal symbols ${\cal T}$, one distinguished

88: symbol $S\in{\cal T}$, and a finite set ${\cal R}$ of

89: productions $T\longrightarrow rhs$, where $T \in {\cal T}$ is a

90: non-terminal and the right hand side $rhs$ is either a non-terminal or

91: a term $f(T_1,\ldots,T_n)$ constructed from an $n$-ary function symbol

92: $f \in {\cal F}$ and $n$ non-terminals.

93:

94: The non-terminals ${\cal T}$ are {\em types} describing (ground) terms

95: built from the functors in ${\cal F}$. The concretization $\gamma(T)$

96: of a non-terminal $T$ is the set of terms derivable from its productions,

97: that is,

98: \begin{eqnarray*}

99: \gamma(T) & = & \bigcup_{(T \longrightarrow rhs) \in {\cal R}}

100:       \gamma(rhs) \\

101: \gamma(f(T_1,\ldots,T_n)) & = & \{ f(t_1,\ldots,t_n) ~|~

102:                                      t_i \in \gamma(T_i) \}

103: \end{eqnarray*}

104:

105: The types of interest are each defined by one grammar: each $T_i$ is

106: defined by a grammar $(T_i,{\cal T}_i,{\cal F},{\cal R}_i)$, so that for

107: any two types of interest $T_1$ and $T_2$ on ${\cal F}$,

108: ${\cal T}_1\cap{\cal T}_2=\emptyset$. Sometimes, we will be interested

109: in types defined by non-terminals of a grammar

110: $(T,{\cal T},{\cal F},{\cal R})$ other than the distinguished

111: non-terminal $T$. This is formalized by defining a type $T_i\in{\cal T}$

112: as the grammar

113: %

114: \begin{equation}\label{eq:restrict}

115: (T_i,\{ T\in {\cal T} ~|~ T_i \reach^*_{\cal R} T \},

116:                {\cal F},\{ (T \longrightarrow rhs)\in {\cal R}

117:                         ~|~ T_i \reach^*_{\cal R} T \})

118: \end{equation}

119: where all the non-terminals are renamed apart, $\reach^*_{\cal R}$ is

120: the reflexive and transitive closure of $\reach_{\cal R}$ and

121:  $T_i \reach_{\cal R} T_j

122:    \mbox{ iff } T_i \longrightarrow_{\cal R} T_j

123:    \mbox{ or }  T_i \longrightarrow_{\cal R} f(\ldots,T_j,\ldots)$.

124:

125:

126: A grammar is in \emph{normal form} if none of the right hand sides are

127: non-terminals.  A particular class of grammars are deterministic ones.

128: A grammar is \emph{deterministic} if it is in normal form

129: and for each non-terminal $T$ the function symbols are all distinct in

130: the right hand sides of the productions for $T$.

131:

132: Deterministic grammars are less expressive than non-deterministic ones.

133: Deterministic grammars can only express sets of terms which are

134: \emph{tuple-distributive}; informally speaking, which are ``closed

135: under exchange of arguments''. I.e., if the set contains two terms of

136: the same functor, then it also contains terms with the same principal

137: functor obtained by exchanging subterms of the previous two terms in

138: the same argument positions. Basically, no dependencies between

139: arguments of a term can be expressed with deterministic grammars.

140:

141: \begin{08example}

142: Consider the type $T$ denoting the set $\{f(a,b),f(c,d)\}$, which is

143: non-deterministic,

144: $$\begin{array}{lllllllllll}

145:   T & \longrightarrow & f(A,B) && A & \longrightarrow & a && C & \longrightarrow & c \\

146:   T & \longrightarrow & f(C,D) && B & \longrightarrow & b && D & \longrightarrow & d

147:   \end{array}$$

148: A deterministic type $T'$ with a concretization which included

149: $\gamma(T)$ would also have to include $\{f(c,b),f(a,d)\}$, that is,

150: $$\begin{array}{lllllllllll}

151:   T' & \longrightarrow & f(AC,BD) && AC & \longrightarrow & a && BD & \longrightarrow & b \\

152:      &                 &          && AC & \longrightarrow & c && BD & \longrightarrow & d

153:   \end{array}$$

154: To facilitate the presentation non-terminals with a single production

155: will often be ``inlined'' and multiple right hand sides combined so

156: that $T$ above will be written

157: $T \longrightarrow f(a,b) \ | \ f(c,d)$ and $T'$ as

158: $$\begin{array}{lllllllllll}

159:   T' & \longrightarrow & f(AC,BD) && AC & \longrightarrow & a \  | \ c &&

160:   BD & \longrightarrow & b \  | \ d

161:   \end{array}$$

162: \vspace{-2\baselineskip}

163: \end{08example}

164:

165: To be able to describe terms containing numbers and variables we

166: introduce two distinguished symbols \textbf{num} and \textbf{any},

167: plus an additional $\bot$. The concretization of \textbf{num} is the

168: set of all numbers, the concretization of \textbf{any} is the set

169: of all terms (including variables), and the concretization of $\bot$ is

170: the empty set of terms. These symbols are non-terminals but they are

171: considered terminals to the effect of regarding a grammar as

172: deterministic.

173:

174: Let $\mathcal{G}$ be the set of all grammars, if $T_1$, $T_2$ belong to

175: $\mathcal{G}$, the relation $T_1 \equiv T_2 \Leftrightarrow \gamma(T_1) =

176: \gamma(T_2)$ is an equivalence relation. The quotient set

177: $\mathcal{G}/\equiv$ is a complete lattice with top element \textbf{any}

178: and bottom element $\bot$ based on the relation of {\em containment}, or

179: type {\em inclusion}: for every

180: $\overline{T_1},\overline{T_2} \in \mathcal{G}/\equiv$,

181: $\overline{T_1} \sqsubseteq \overline{T_2}

182: \Leftrightarrow \gamma(T_1)\subseteq\gamma(T_2)$.

183: % where

184: % and $T_i$ is the representative of the equivalence class $\overline{T_i}$

185: %

186: % Non-terminals, and therefore,

187: % grammars are ordered by the relation of {\em containment}, or type {\em

188: %   inclusion}, $(T_1 \sqsubseteq T_2) \Leftrightarrow

189: % \gamma(T_1)\subseteq\gamma(T_2)$.  Based on this, the set of all grammars

190: % $\mathcal{G}$ is a complete lattice with top element \textbf{any} and

191: % bottom element $\bot$.

192: %

193: We will denote $\overline{T_i}$ simply by $T_i$.

194:

195: The least upper bound is given by type {\em union},

196: $({T_1} \sqcup {T_2})$, and the greatest lower bound by

197: type {\em intersection},

198: $({T_1} \sqcap {T_2})$~\cite{Dart-Zobel}. It can be shown that

199: intersection describes term unification:

200: \[ t_1^*\subseteq\gamma(T_1)\wedge t_2^*\subseteq\gamma(T_2)\wedge

201:    t_1\theta=t_2\theta \Rightarrow

202:    (t_1\theta)^*\subseteq\gamma(T_1 \sqcap T_2)

203: \]

204: where $t^*$ denotes the set of ground terms which are instances of the

205: term $t$.

206:

207: \section{Abstract Domain for Type Inference}

208: \label{sec:domain}

209:

210: In an abstract interpretation-based type analysis, a type is

211: used as an abstract description of a set of terms. Given variables of

212: interest $\{x_1,\ldots,x_n\}$, any substitution

213: $\theta = \{x_1 \leftarrow t_1,\ldots,x_n \leftarrow t_n\}$

214: can be approximated by an {\em abstract substitution}

215: $\{x_1 \leftarrow T_{x_1},\ldots,x_n \leftarrow T_{x_n}\}$

216: where $t_i \in \gamma(T_{x_i})$ and each type $T_{x_i} \in \mathcal{G/\equiv}$.

217: %

218: We will write abstract substitutions as tuples

219: $\langle T_1,\ldots,T_n \rangle$, and sometimes also abbreviate a

220: tuple simply as $T^n$.

221:

222: Concretization is lifted up to abstract substitutions

223: straightforwardly,

224: $$\begin{array}{rcl}

225: \gamma(\langle T_1,\ldots,T_n \rangle) & = &

226: \{~\{x_1 \leftarrow t_1,\ldots,x_n \leftarrow t_n\}~|~t_i \in \gamma(T_i)~\}

227: \end{array}$$

228: as well as the equivalence relation $\equiv$. Additionally,

229:  we consider a distinguished abstract

230: substitution {\boldmath$\bot$} as a representative of any

231: $\langle T_1,\ldots,T_n\rangle$ such that there is $T_i=\bot$.

232: Of course, $\gamma(${\boldmath$\bot$}$)=\emptyset$.

233:

234: An ordering on the domain is obtained as the natural element-wise

235: extension of the ordering on types:

236: $$\begin{array}{rcl}

237:  \begin{array}[b]{rcl}

238:     \mbox{\boldmath $\bot$} & \sqsubseteq & T^n \\

239:     \langle T_1,\ldots,T_n \rangle & \not \sqsubseteq & \mbox{\boldmath $\bot$} \\

240:     \langle T_1,\ldots,T_n \rangle & \sqsubseteq & \langle T'_1,\ldots,T'_n \rangle \\

241:  \end{array} & \Longleftrightarrow & \forall_{1 \leq i \leq n}T_i

242:     \sqsubseteq T'_i

243: %%  \\

244: %% T^n  \equiv  T'^n & \Longleftrightarrow & T^n \sqsubseteq T'^n \wedge

245: %% T'^n \sqsubseteq T^n

246: \end{array}$$

247: The domain is a lattice with bottom element {\boldmath $\bot$} and top element

248: $\langle T_1,\ldots,T_n \rangle$ such that $T_1=\ldots=T_n=\textbf{any}$.

249: %

250: The greatest lower bound and least upper bound domain operations are

251: lifted also element-wise, as follows,

252: $$\begin{array}{rcl}

253: \mbox{\boldmath $\bot$} \sqcup T^n  =  T^n \sqcup \mbox{\boldmath $\bot$} & = & T^n \\

254: \langle T_1,\ldots,T_n \rangle \sqcup \langle T'_1,\ldots,T'_n \rangle

255: & = & \langle T_1 \sqcup T'_1 ,\ldots,T_n \sqcup T'_n \rangle \\

256: \mbox{\boldmath $\bot$} ~ \sqcap ~ T^n  =  T^n ~ \sqcap ~ \mbox{\boldmath $\bot$}

257: & = & \mbox{\boldmath $\bot$} \\

258: \langle T_1,\ldots,T_n \rangle ~ \sqcap ~ \langle T'_1,\ldots,T'_n \rangle

259: & = & \langle T_1 \sqcap T'_1 ,\ldots,T_n \sqcap T'_n \rangle

260: %% \bot \bigtriangledown  \langle T'_1,\ldots,T'_n \rangle

261: %% & = & \langle \bot \bigtriangledown T'_1,\ldots,\bot \bigtriangledown T'_n \rangle

262: %% \\

263: %% \langle T_1,\ldots,T_n \rangle \bigtriangledown \langle T'_1,\ldots,T'_n \rangle

264: %% & = & \langle T_1 \bigtriangledown T'_1 ,\ldots,T_n \bigtriangledown T'_n \rangle

265: \end{array}$$

266:

267: Using the adjoint $\alpha$ of $\gamma$ as abstraction function, it can

268: be shown that $(\pow{\Theta},\alpha,\Omega,\gamma)$ is a Galois

269: insertion, where $\Theta$ is the domain of concrete % substitutions

270: and $\Omega$ that of abstract substitutions.

271: %

272: The following abstract unification operator can be shown to approximate the

273: concrete one. Let $x=t$ be a concrete unification equation, with $x$ a

274: variable,$t$ any term, and $T^n$ the current abstract substitution, and

275: let $y_j$, $j=1,\ldots,m$ be the variables of $t$,

276: the new abstract substitution is:

277: \begin{equation}\label{eq:amgu}

278: amgu(T^n,x=t)=

279: T^n[T_x/T'_x,T_{y_1}/T'_{y_1},\ldots,T_{y_m}/T'_{y_m}]

280: \end{equation}

281: with each $T$ replaced by $T'$ in the

282: tuple, $T'_x = T_x\sqcap t\mu$,

283: $\mu=\{y_1\leftarrow T_{y_1},\ldots,y_m\leftarrow T_{y_m}\}$, and

284: $solve(t,T'_x) = \{ y_1=T'_{y_1},\ldots,y_m=T'_{y_m} \}$, a set of

285: equations that define the

286: types of the variables of a term $t\in\gamma(T'_x)$, obtained as:

287: %\marginpar{revise solve}

288: $$solve(t,T)=\left\{

289: %\mbox{\begin{eqnarray*}

290: \begin{array}{lll}

291: \{t=T\}             & \mbox{if} & t \mbox{ is a variable} \\

292: {\displaystyle \bigcup_{T \longrightarrow f(T_1,\ldots,T_n)}\ \bigcup_{i=1,\ldots,n}}

293: solve(t_i,T_i)      & \mbox{if} & t \mbox{ is } f(t_1,\ldots,t_n)

294: \end{array}

295: %\end{eqnarray*}}

296: \right.$$

297:

298: In this abstract interpretation-based setting, analysis with a

299: monotonic semantic function can be easily

300: shown correct. However, it is not guaranteed to terminate, since

301: $\Omega$ has infinite ascending chains. To guarantee termination, a

302: widening operator is required.

303:

304: \begin{08example}

305: \label{ex:listoflists}

306: %% Consider the following program which defines the regular type lists of lists of

307: %% numbers:

308: The following program defines the type lists of lists of numbers:

309: \begin{verbatim}

310: list_of_lists([]).                  num_list([]).

311: list_of_lists([L|Ls]):-             num_list([N|Xs]):-

312:         num_list(L),                        number(N),

313:         list_of_lists(Ls).                  num_list(Xs).

314: \end{verbatim}

315: For the argument of \texttt{num\_list}, without a widening operator,

316: an analysis would obtain the following first three approximations:

317: $$\begin{array}{rclccrclccrclccrcl}

318: T_0 & \longrightarrow & [] & & &

319: T_1 & \longrightarrow & [] \ | \ .(\mathbf{num},T_0) & & &

320: T_2 & \longrightarrow & [] \ | \ .(\mathbf{num},T_1)

321: \end{array}$$

322: where each $T_i$ represents a list of $i$ numbers. Analysis will

323: never terminate, since it would keep on obtaining a new type

324: representing a list with one more number. A widening operator would be

325: required that over-approximates some type $T_l$ to something like

326: $T_l \longrightarrow [] \ | \ .(\mathbf{num},T_l)$,

327: %\centerline{$T_l \longrightarrow [] \ | \ .(\mathbf{num},T_l)$}

328: %\noindent

329: which is the expected type, and allows termination of the analysis.

330: \end{08example}

331:

332:

333: \section{Widenings}

334: \label{sec:widenings}

335:

336: %% A widening is required to guarantee that an analysis

337: %% terminates when the abstract domain has infinite ascending chains as

338: %% is the case of regular types.

339: %% We now revisit widening operators for types proposed in the literature.

340:

341: %% \begin{definition}[widening operator] \label{def:widening}

342: %% An operator $\bigtriangledown$ is a widening iff it is:

343: %%   \begin{description}

344: %%   \item[correct] $a_1 \sqsubseteq a_1 \bigtriangledown a_1 \wedge

345: %%     a_2 \sqsubseteq a_1 \bigtriangledown a_2$

346: %%   \item[stationary] for every increasing chain $a_0 \sqsubset a_1

347: %%     \sqsubset a_2 \sqsubset \ldots$, the chain $b_0 = a_0, b_1 = b_0

348: %%     \bigtriangledown a_1, \ldots ,b_{i+1} = b_i \bigtriangledown

349: %%     a_{i+1}, \ldots$ is not strictly increasing for $\sqsubseteq$,

350: %%     i.e., there is an $n$ such that $b_n=b_{n+1}$.

351: %%   \end{description}

352: %% \end{definition}

353: %%

354: %% For the purposes of termination of the analysis it is enough to

355: %% require that the widening is stationary for the ascending chains that

356: %% may occur during analysis of a program.

357:

358: \paragraph{Functor Widening} \label{sec:widen-de-funct}

359: This is probably the simplest widening operator which still

360: keeps information from the recursive structure of the program that

361: ``produces'' the corresponding terms. The idea behind it is to create

362: a type and a production for each functor symbol in the original type.

363: All arguments of the function symbols are replaced with the new types~\cite{mildnerthesis}.

364:

365: \begin{08example} \label{ex:widen-de-funct}

366: Consider predicate \texttt{list\_of\_lists} of

367: Example~\ref{sec:domain}.\ref{ex:listoflists}, its argument should

368: ideally have the following type:

369: %% \begin{eqnarray*}

370: %%     T_{ll} & \longrightarrow & [] \  | \ .(T_l,T_{ll})  \\

371: %%     T_l & \longrightarrow & [] \  | \ .(\mathbf{num},T_l)

372: %% \end{eqnarray*}

373: $\begin{array}{lclcclcl}

374:     T_{ll} & \longrightarrow & [] \  | \ .(T_l,T_{ll}) & & &

375:     T_l & \longrightarrow & [] \  | \ .(\mathbf{num},T_l)

376: \end{array}$

377: but the functor widening will yield:

378: $T \longrightarrow [] \ | \ \mathbf{num} \ | \ .(T,T) $.

379: %% \centerline{$T \longrightarrow [] \ | \ \mathbf{num} \ | \ .(T,T) $}

380: %\vspace{-2.5\baselineskip}

381: \end{08example}

382:

383: \paragraph{Type Jungle Widening} \label{sec:widening-jungla-de}

384: A type jungle is a grammar where each functor always has the same

385: arguments. It was originally proposed as a finite type

386: domain~\cite{lindgrenmildner} ,

387: since in a domain where all grammars are of the type jungle class

388: all ascending chains are finite. However, it can be used as a

389: subdomain to provide a widening %operator.

390:

391: \begin{08example}

392: Applying this widening to the previous type $T_{ll}$, the following

393: will be obtained:

394: %% \begin{eqnarray*}

395: %%   T & \longrightarrow & [] \ | \ .(T_1,T)  \\

396: %%   T_1 & \longrightarrow & [] \ | \ \mathbf{num} \ | \ .(T_1,T)

397: %% \end{eqnarray*}

398: \vspace{-\baselineskip}

399: $$\begin{array}{lclcclcl}

400:   T & \longrightarrow & [] \ | \ .(T_1,T) & & &

401:   T_1 & \longrightarrow & [] \ | \ \mathbf{num} \ | \ .(T_1,T)

402: \end{array}$$

403: %\vspace{-2.5\baselineskip}

404: \end{08example}

405:

406: Note that this widening is strictly more precise than the functor

407: widening. In the example, the new type captures the upper level of

408: lists, but it loses precision when describing the type of the list

409: elements. This is due to the restriction of forcing functors to always

410: have the same arguments.

411:

412: \paragraph{Shortening} \label{sec:shortening}

413: A grammar can be seen as a graph where the nodes correspond to the

414: non-terminals (or-nodes) and to the right hand sides of productions

415: (and-nodes), and the edges correspond to the production relation or

416: the relation between a functor and its arguments in a right hand side

417: of a production.

418: %

419: Given an or-node, its {\em principal functors} are the functors

420: appearing in its children nodes.

421:

422: \begin{08example}

423: The type $T_{ll}$ of the previous examples can be seen as the graph:

424:

425: \begin{small}

426: %\vspace{-0.5\baselineskip}

427: \begin{center}

428: $

429: \psmatrix[colsep=9ex,rowsep=1ex]

430: &                & \mbox{ [ ] }       &             & \mbox{ [ ] } \\

431: & [mnode=circle] T_{ll} &             & [mnode=circle] T_l & & \mathbf{num} \\

432: &                & \psframebox{ ./2 } &             & \psframebox{ ./2 } \\

433: \endpsmatrix

434: \psset{nodesep=1pt,arrows=->}

435: \ncline{2,2}{1,3}

436: \ncline{2,2}{3,3}

437: \ncline{3,3}{2,4}

438: \ncline{2,4}{1,5}

439: \ncline{2,4}{3,5}

440: \ncline{3,5}{2,6}

441: \ncarc[arcangle=110]{3,3}{2,2}

442: \ncarc[arcangle=110]{3,5}{2,4}

443: $

444: \end{center}

445: \end{small}

446: \end{08example}

447:

448: Gallagher and de Waal~\cite{gallagher-types-iclp94} defined a widening

449: which avoids having two or-nodes, which have the same principal functors,

450: connected by a path. If two such nodes exist, they are replaced

451: by their least upper bound.

452:

453: \begin{08example}

454: In the above example graph, nodes $T_{ll}$ and $T_l$ have the same

455: principal functors ([] and .) so that they are replaced, yielding:

456: %% \begin{eqnarray*}

457: %%   T & \longrightarrow & [] \ | \ .(T_1,T)  \\

458: %%   T_1 & \longrightarrow & [] \ | \ \mathbf{num} \ | \ .(\mathbf{num},T)

459: %% \end{eqnarray*}

460: $$\begin{array}{lclcclcl}

461:   T & \longrightarrow & [] \ | \ .(T_1,T) & & &

462:   T_1 & \longrightarrow & [] \ | \ \mathbf{num} \ | \ .(\mathbf{num},T)

463: \end{array}$$

464: %\vspace{-2.5\baselineskip}

465: \end{08example}

466:

467: Note the precision improvement with respect to the result in the

468: previous example. Note also that still the result is imprecise.

469:

470: \paragraph{Restricted Shortening} \label{sec:short-restr}

471: Saglam and Gallagher~\cite{Saglam-Gallagher-94} propose a more precise

472: variant of the previous widening. Shortening is restricted so that two

473: or-nodes $T$ and $T'$ which are connected by a path from $T$ to $T'$

474: and have the same principal functors are replaced only if

475: $T'\sqsubseteq T$. If this is the case, only $T'$ needs be replaced,

476: since the least upper bound is $T$.

477:

478: \begin{08example} \label{ex:short-restr}

479: Continuing previous examples, since nodes $T_{ll}$ and $T_l$ have

480: the same principal functors but $T_l \not \sqsubseteq T_{ll}$,

481: the widening operation will make no change. In this case, the most

482: precise type is achieved.

483: \end{08example}

484:

485: Note, however, that restricted shortening does not guarantee

486: termination in general (and thus, it is not, strictly speaking, a

487: widening). There are cases in which analysis may not

488: terminate using only this widening operator~\cite{mildnerthesis}.

489:

490: \paragraph{Depth Widening}

491: Janssens and Bruynooghe~\cite{janss92} proposed a type analysis in

492: which the widening effect is achieved by a ``pruning'' of the type

493: depth up to a certain bound. A parameter k establishes the maximum

494: number of occurrences of a functor in-depth in a type. The idea is

495: similar to the well-known depth-k abstraction for term structure

496: analysis. The resulting type analysis uses normal restricted type

497: graphs, which are basically deterministic types satisfying the depth

498: limit. Obviously, the precision % of this analysis

499: depends on the value of the parameter k.

500:

501: \begin{08example}

502: The widening of our previous type $T_{ll}$ with k=1 will yield the

503: same result than the functor widening

504: (Example~\ref{sec:widenings}.\ref{ex:widen-de-funct}), whereas with

505: k=2 will yield the same result as restricted shortening

506: (Example~\ref{sec:widenings}.\ref{ex:short-restr}).

507: \vspace{-\baselineskip}

508: \end{08example}

509:

510:

511: \paragraph{Topological Clash Widening}

512: Van Hentenryck et al.~\cite{VanHentenryckCortesiLeCharlier95} proposed

513: the first widening operator that takes  into account two consecutive

514: approximations to the type being inferred. After merging the two

515: ---i.e., calculating their least upper bound, the result is compared

516: with the previous approximation to try to ``guess'' where the type is

517: growing. This is done by locating {\em topological clashes}: functors

518: that differ or appear at different depth in each type graph.

519: %\marginpar{revise!}

520: The clashes are resolved by replacing them with the recently

521: calculated least upper bound.

522:

523: \begin{08example}

524: \label{ex:sorted}

525: Consider the program:

526: \begin{verbatim}

527: sorted([]).

528: sorted([_X]).

529: sorted([X,Y|L]):- X =< Y, sorted([Y|L]).

530: \end{verbatim}

531: and the moment during analysis when the final widening is performed.

532: The resulting type for the argument of \texttt{sorted/1}

533: is the one on the left below for the first two clauses, and the one on

534: the right for the last one:

535: $$\begin{array}{lclcclcl}

536:     T_0 & \longrightarrow & [] \  | \ .(\mathbf{any},[]) & & &

537:     T_1 & \longrightarrow & .(\mathbf{num},.(\mathbf{num},T_l))  \\

538:         &                 &                               & & &

539:     T_l & \longrightarrow & [] \  | \ .(\mathbf{num},T_l)

540: \end{array}$$

541: Their least upper bound is $T_u$ on the left below, which

542: exhibits a clash with $T_0$ in the second argument of functor $./2$.

543: Thus, the result of widening is $T_s$: % on the right:

544: $$\begin{array}{lclcclcl}

545:   T_u & \longrightarrow & [] \ | \ .(\mathbf{any},T_l) & & &

546:   T_s & \longrightarrow & [] \ | \ .(\mathbf{any},T_s)

547: \end{array}$$

548: %% \begin{eqnarray*}

549: %%   T_s & \longrightarrow & [] \ | \ .(\mathbf{any},T_s)

550: %% \end{eqnarray*}

551: %\vspace{-1.5\baselineskip}

552: \end{08example}

553:

554: All widening operators are based on locating recursive

555: structures in the type definitions where there are nodes with the same

556: functors. This may originate an unnecessary loss of precision, since

557: the widening may impose a recursive structure on the

558: resulting type in argument positions where the concrete program is in

559: fact not recursive. In the following section we present a new widening

560: operator that tries to remedy this problem.

561:

562: \section{Structural Type Widening}

563: \label{sec:grammarwithnames}

564:

565: In this section we define an extended domain for type analysis which

566: incorporates a widening operator aimed at improving the precision of

567: the analysis. The domain is defined so as to keep track of information

568: on the program structure, so that recursion on the types produced by

569: the analysis is imposed by the widening operator only in the cases

570: where it corresponds to a recursive structure in the program being

571: analyzed. To this end, type names will be used.

572:

573: A \emph{type name} is roughly a (distinguished) non-terminal that

574: represents a

575: type produced during the analysis. Type names are created for each

576: variable in each argument of each variant of each program atom for

577: each predicate (note how this is different from, for example,

578: set-based analyses~\cite{set-based-popl}, where variants are not taken into

579: account).

580:

581: Type names provide information on how types are being formed from

582: other types during analysis. This makes it possible to precisely

583: identify places where to impose recursion on the types: in a

584: subterm of the type which happens to refer to the name of that

585: type. To this end, type names contain

586: references to the position of its constituent types. To determine

587: positions, selectors are used, as defined below.

588:

589: \begin{definition}[selector]

590:     Define $t/s$, the subterm of a concrete term $t$ referenced by a

591:     \emph{selector} $s$, inductively as follows. The empty selector

592:     $\epsilon$ refers to the term $t$, that is, $t/\epsilon = t$.

593:     If $t/s = t'$, $t'$ is a compound term

594:     $f(t_1,\ldots,t_i,\ldots,t_n)$ (where $f$ is an $n$-ary function

595:     symbol) then $t/s\cdot(f.i) = t_i$, $1 \leq i \leq n$.

596: \end{definition}

597:

598: For every two selectors $s$, $p$, if $t/s=t'$ and if $t'/p$ exists

599: then $t/s\cdot p=t'/p$. The initial $\epsilon$ of a non-empty selector

600: will often be omitted, so $\epsilon\cdot p$ will be written simply as

601: $p$.

602:

603: We define a set of type names $\mathcal{N}$ such that

604: $\mathcal{N}\cap\mathcal{G}=\emptyset$ and a set

605: $\pow{\mathcal{N}\times\mathcal{G}}$ of relations

606: $\mathcal{X}\in\pow{\mathcal{N}\times\mathcal{G}}$

607: between type names and types, of the form

608: $\mathcal{X}\subseteq\mathcal{N}\times\mathcal{G}$.

609:

610: \begin{definition}[label]

611:   Let $\mathcal{X}$ a relation between type names and types.

612:   Given a type name $N$, a \emph{label} of $N$ is a tuple

613:   $\langle s,N' \rangle$, where $s$ is a selector and $N'$ is a

614:   type name, iff $(N,T)\in\mathcal{X}$, $(N',T')\in\mathcal{X}$, and

615:   $T' \sqsubseteq T/s$.

616: %\marginpar{$T/s \sqsubseteq T'$ ??}

617: \end{definition}

618:

619: Labels of a type name $N$ indicate subterms of the type $T$ defining

620: $N$ where other type names occur.

621:

622: \begin{08example}

623: Let a relation $\mathcal{X}$ such that $\{ (A,T_1), (B,T_2) \}

624: \subseteq \mathcal{X}$, and let grammars

625: $(T_1,{\cal T}_1,{\cal F},{\cal R}_1)$ and

626: $(T_2,{\cal T}_2,{\cal F},{\cal R}_2)$, such that

627: the only rule for $T_1$ is

628: $(T_1 \longrightarrow f(b))\in {\cal R}_1$ and

629: $(T_2 \longrightarrow g(c,T_3))\in {\cal R}_2$,

630: $(T_3 \longrightarrow b ~|~ f(b))\in {\cal R}_2$.

631: Consider a label $\langle (g.2),A \rangle$ of $B$.

632: We have that $T_1 \sqsubseteq T_2/(g.2) = T_3$.

633: \end{08example}

634: %\vspace{-0.5\baselineskip}

635:

636: \begin{definition}[type descriptor]

637:   Let $\mathcal{G}$ a set of types (regular term grammars),

638:   $\mathcal{N}$ a set of type names, and

639:   $\mathcal{X}\subseteq\mathcal{N}\times\mathcal{G}$. A

640:   \emph{type descriptor} is a tuple $(N,E,T)$ where $N\in\mathcal{N}$,

641:   $T\in\mathcal{G}$, $(N,T)\in\mathcal{X}$, and $E$ is a set of labels

642:   of $N$.

643: \end{definition}

644:

645: In the new domain, type descriptors will be used instead of types. Let

646: $\mathcal{D}$ be the set of all type descriptors from given sets of

647: types $\mathcal{G}$ and of type names $\mathcal{N}$. Concretization is

648: defined as $\gamma((N,E,T))=\gamma(T)$. The domain ordering and

649: operations on $\mathcal{D}$ are the same as on $\mathcal{G}$ except

650: for type names. In this case, they have to take into account the

651: possible labels of the type name.

652:

653: %\vspace*{\baselineskip}

654: %\begin{nondefinition}[Inclusion]

655: \paragraph{Inclusion}

656: %\noindent{\bf Inclusion}

657: $(N_1,E_1,T_1) \sqsubseteq (N_2,E_2,T_2) \Leftrightarrow

658: T_1 \sqsubseteq T_2 \wedge E_1 \subseteq E_2$.

659:

660: \paragraph{Union}

661: %\noindent{\bf Union}

662: $(N,E,T) = (N_1,E_1,T_1) \sqcup (N_2,E_2,T_2) \Leftrightarrow

663: T = T_1 \sqcup T_2 \wedge E = E_1 \cup E_2$.

664:

665: \paragraph{Intersection}

666: %\noindent{\bf Intersection}

667: $(N,E,T) = (N_1,E_1,T_1) \sqcap (N_2,E_2,T_2) \Leftrightarrow

668: T = T_1 \sqcap T_2 \wedge E = E_1 \cup E_2$.

669: %\end{nondefinition}

670: %\vspace*{\baselineskip}

671:

672: \paragraph{}

673: Again, we may be interested in types defined by non-terminals other

674: than the distinguished non-terminal $T$ of a grammar

675: $(T,{\cal T},{\cal F},{\cal R})$. A type descriptor $(N_i,E_i,T_i)$,

676: where $T_i\in{\cal T}$, is formally defined from $(N,E,T)$ as

677: follows: $T_i$ is the grammar of Equation~\ref{eq:restrict}, $N_i$ is

678: a new type name, and

679: $$E_i = \{ \langle p,N' \rangle ~|~ \langle s\cdot p,N' \rangle \in E

680:                 \wedge T/s = T_i \}.$$

681:

682: Abstract substitutions for variables of interest $\{x_1,\ldots,x_n\}$

683: are now defined as tuples of the form

684: $\langle (N_1,E_1,T_{x_1}),\ldots,(N_n,E_n,T_{x_n})\rangle$.

685: Concretization and the domain ordering and operations are lifted to

686: abstract substitutions element-wise, in the same way as in

687: Section~\ref{sec:domain}, including the widening operator defined

688: below.

689: %

690: If now $\Omega$ is the domain of type descriptors, it can be shown

691: that $(\pow{\Theta},\alpha,\Omega,\gamma)$ is a Galois insertion,

692: where $\alpha$ is the adjoin of $\gamma$. Abstract unification is

693: defined as in Equation~\ref{eq:amgu}, but using type descriptors

694: instead of types (and preserving all type names in the ``input''

695: abstract substitution $T^n$ to $amgu$).

696:

697: \begin{definition}[structural widening]

698: The widening between an approximation $T_2$ to type name $N$ and a

699: previous approximation $T_1$ to $N$ is

700: $(N,E_1,T_1) \bigtriangledown (N,E_2,T_2) = (N,E_1 \cup E_2,T)$, such

701: that $T$ is defined by $(T,{\cal T},{\cal F},{\cal R})$ where

702: ${\cal T}=\{ T_i ~|~ T \longrightarrow^*_{\cal R} T_i \}$,

703: and ${\cal R}$ is obtained by the following algorithm:

704: \end{definition}

705:

706: \begin{tabbing}

707: $T'$ := $T_1 \sqcup T_2$

708:         defined by $(T',{\cal T}',{\cal F},{\cal R}')$ \\

709: $\mathcal{S}$ := $\{ s ~|~ (s,N) \in E_1\cup E_2 \}$ \\

710: $Seen := \emptyset$ \\

711: \texttt{for }\=\texttt{each}

712:         $(T' \longrightarrow f(A_1,\ldots,A_n))\in {\cal R}'$

713:         \texttt{add to ${\cal R}$ production} \\

714:  \> $T \longrightarrow

715:        f(\texttt{widen}(A_1,{\cal R}',(f.1)),\ldots,

716:          \texttt{widen}(A_n,{\cal R}',(f.n)))$ \\

717: \\

718: \texttt{widen}(\=$N,{\cal R}',Sel) :$  \\

719: \> \texttt{if} $N$ = \textbf{any} \texttt{return} \textbf{any} \\

720: \> \texttt{if} $\exists M \langle N,M \rangle \in Seen$

721:   \texttt{return} $M$ \\

722: \> \texttt{let} $M$ \texttt{a new non-terminal} \\

723: \> $Seen$ \texttt{:=} $Seen \cup \{ \langle N,M \rangle \}$ \\

724: \> \texttt{for }\=\texttt{each}

725:       $(N \longrightarrow f(A_1,\ldots,A_n))\in {\cal R}'$

726:       \texttt{add to ${\cal R}$ production} \\

727: \> \> $M \longrightarrow

728:        f(\texttt{widen}(A_1,{\cal R}',Sel\cdot(f.1)),\ldots,

729:          \texttt{widen}(A_n,{\cal R}',Sel\cdot(f.n)))$ \\

730: \> \texttt{if} \= Sel $\in \mathcal{S}$ \texttt{then} \\

731: \> \> \texttt{add to ${\cal R}$ production} $M \longrightarrow T$ \\

732: \> \texttt{return} $M$

733: \end{tabbing}

734:

735: Structural widening basically identifies subterms of the new type

736: $T_1 \sqcup T_2$ where a reference to the type $N$ being widened

737: appear, and makes this ``self-reference'' explicit in the definition

738: of the new type.

739: %

740: % Note that the result of the widening, $T$ is a correct over-aproximation of

741: % $T_2$ since each subterm where a reference to the type $N$ appear is

742: % equivalent to a previous approximation, which is in turn included in $T_1

743: % \sqcup T_2$, therefore $T_2 \sqsubseteq (T_1 \sqcup T_2) \sqsubseteq T$ .

744: %

745: Note that the widening operation starts with the least upper bound and,

746: basically, adds new grammar rules to that type. Therefore, the result

747: is always a correct approximation of such an upper bound. This

748: justifies its correctness.

749: %

750: Moreover, this approach based on type names is potentially more precise

751: than any of the  previous widening operators discussed, as the

752: following examples show:

753:

754: \begin{08example}

755: Consider program \texttt{sorted} in Example~\ref{sec:widenings}.\ref{ex:sorted}.

756: A top-down analysis with topological clash was roughly described there.

757: Let us now look at analysis using restricted shortening. The resulting

758: type happens to be the same one.

759:

760: Analysis of program atom \texttt{sorted([Y|L])} approximates variable

761: \texttt{Y} always as $\mathbf{num}$, both in the calls and in the

762: successes. The first two success approximations for variable

763: \texttt{L} are $[]$ and $.(\mathbf{num},[])$. Their lub (and

764: widening) is:

765: \begin{eqnarray*}

766:   T_1 & \longrightarrow & [] \ | \ .(\mathbf{num},[])

767: \end{eqnarray*}

768: The next approximation to the type of \texttt{L} is

769: $.(\mathbf{num},T_1)$. Its lub with $T_1$ is

770: $T_2 \longrightarrow [] ~|~ .(\mathbf{num},T_1)$, and since $T_2$ and

771: $T_1$ have the same functors, and $T_1$ is included in $T_2$, the

772: widening of $T_2$ is:

773: \begin{eqnarray*}

774: T_3 \longrightarrow [] \ | \ .(\mathbf{num},T_3)

775: \end{eqnarray*}

776: i.e., list of numbers. The next approximation to the type of \texttt{L}

777: is $.(\mathbf{num},T_3)$ (i.e., a list with at least one number). It

778: is included in $T_3$, so fixpoint is reached.

779:

780: The success of principal goal \texttt{sorted(X)} is approximated

781: after analyzing the two non-recursive clauses by

782: $T_4 \longrightarrow [] \ | \ .(\mathbf{any},[])$. Analysis of the

783: third clause yields $.(\mathbf{num},.(\mathbf{num},T_3))$. Its lub

784: with $T_4$ is $T_5 \longrightarrow [] \ | \ .(\mathbf{any},T_3)$. The

785: widening of $T_5$ finds that $T_5$ and $T_3$ have the same functors and

786: $T_3 \sqsubseteq T_5$, since \textbf{num} $\sqsubseteq$ \textbf{any}.

787: Thus, the result of widening is:

788: \begin{eqnarray*}

789: T_6 \longrightarrow [] \ | \ .(\mathbf{any},T_6)

790: \end{eqnarray*}

791: i.e., list of terms. This is the final result after one more

792: iteration. Note that the information about successes where the tail of

793: lists of length greater than one is a list of numbers is lost.

794:

795: Let us now consider structural widening. Analysis of atom

796: \texttt{sorted([Y|L])} always approximates the type of \texttt{Y} by

797: $(N_{13},\emptyset,\mathbf{num})$. For variable \texttt{L} the two first

798: approximations are $(N_{14},\emptyset,[])$ and

799: $(N_{14},E_{14},.(\mathbf{num},[]))$, where the set of labels is

800: $E_{14} = \{\ (\lcons.1,N_{13}),\ (\lcons.2,N_{14})\ \}$.

801: The result of widening is $(N_{14},E_{14},T_1)$ where $T_1$ is defined

802: as:

803: \begin{eqnarray*}

804:   T_1 & \longrightarrow & [] \ | \ .(\mathbf{num},T_1)

805: \end{eqnarray*}

806: i.e., list of numbers. This is the final result after one more

807: iteration.

808:

809: The success of principal goal \texttt{sorted(X)} is approximated

810: after analyzing the two non-recursive clauses by $(N_3,\emptyset,T_2)$

811: where $T_2 \longrightarrow [] \ | \ .(\mathbf{any},[])$. Analysis of

812: the third clause yields

813: $(N_{3},E_3,.(\mathbf{num},.(\mathbf{num},T_1)))$, where

814: \begin{eqnarray*}

815:   E_3 & = & \{\ (\lcons.2\cdot \lcons.1,N_{13}),\

816:                 (\lcons.2\cdot \lcons.2,N_{14})\ \}

817: \end{eqnarray*}

818: %\marginpar{$(\lcons.1,N_6)$, $(\lcons.2,N_2)$ ??}

819: Its widening with the previous approximation $T_2$ is $(N_3,E_3,T_3)$,

820: where

821: \begin{eqnarray*}

822:   T_3    & \longrightarrow & [] \ | \ .(\mathbf{any},T_1)

823: %%   E(N_3) & = & \{ (\lcons.1,N_6) , (\lcons.2,N_2) ,

824: %%                   (\lcons.2\cdot \lcons.1,N_{13}) ,

825: %%                   (\lcons.2\cdot \lcons.2,N_{14}) \}

826: \end{eqnarray*}

827: which amounts to their lub, since the widening operator does not

828: produce any change, because $N_3$ is not among its own labels.

829: Therefore, the final result, after one more iteration, is $T_3$,

830: where indeed lists of length greater than one have a tail which is

831: a list of numbers.

832: \end{08example}

833:

834: However, structural widening does not guarantee termination. It is

835: effective as long as the new approximation is built from the previous

836: approximation of the type being inferred. This case is identified, in

837: essence, by locating a reference to the type name of the previous

838: approximation within the definition of the new one. However, there are

839: contrived

840: cases in which a type is constructed during analysis which loses the

841: reference to the previous approximation. In these cases, a more

842: restrictive widening has to be applied to guarantee termination.

843:

844: \begin{08example}

845: \label{ex:nontermination}

846: Consider the program:

847: \begin{verbatim}

848: main:- p(a).     p(a).                    q(a,f(a)).

849:                  p(X):- q(X,Y), p(Y).     q(f(Z),f(L)):- q(Z,L).

850: \end{verbatim}

851: The calling substitution for atom \texttt{p(Y)} is the sequence

852: $$\begin{array}{lclcclclcclclccl}

853:  T_1 & \longrightarrow & f(a) & & &

854:  T_2 & \longrightarrow & f(f(a)) & & &

855:  T_3 & \longrightarrow & f(f(f(a))) & & &

856:  \ldots

857: \end{array}$$

858: whereas the type $T \longrightarrow f(a) \ | \ f(T)$ correctly

859: describes such calls. However, the analysis is not able to infer such

860: a type.

861: \end{08example}

862:

863: The problem in the above example is that none of the approximations

864: $T_i$ contains a reference to the previous approximation. This is

865: originated in the program fact for predicate \texttt{q/2} which

866: causes the loss of the reference to the previous approximation because

867: of the double occurrence of constant \texttt{a}.

868:

869: In our analysis, termination is guaranteed by a bound on the number of

870: times the widening operation can be applied to a type name. A counter

871: is associated to each type name, so that when the bound is reached

872: a more restrictive widening that guarantees termination is applied.

873:

874: \section{Analysis Results}

875: \label{sec:results}

876:

877: We have implemented analyses based on most of the widenings discussed

878: in this paper, including structural widening. The implementation is in

879: Prolog and has been incorporated to the CiaoPP

880: system~\cite{ciaopp-iclp99-tut},

881: which uses the top-down analysis algorithm of PLAI. The analysis

882: of~\cite{gallagher-types-iclp94}, based on regular approximations,

883: which uses a bottom-up algorithm, is also incorporated into the

884: system. This analysis uses shortening. We want to compare the top-down

885: and bottom-up approaches with the same widening and similar

886: implementation technology,\footnote{Similar in the programming

887:   technique. Of course, the regular approximation method is rather

888:   different from the method of program interpretation on an abstract

889:   domain: Evaluating this difference is part of the aim of the

890:   comparison.}

891: as well as the precision and efficiency,

892: within the same analysis framework, of the widening operators

893: previously discussed.

894:

895: We have used two sets of benchmark programs: the one used in the PLAI

896: framework and that used in the GAIA~\cite{LeCharlier94:toplas} framework.

897: A summary of the benchmarking follows. The analysis times in

898: miliseconds are shown in Table~\ref{table:bothfixp} (left). The first column

899: (\texttt{rul}) is for the regular approximation analysis and the other

900: three for the PLAI-based analyses: column \texttt{short} for

901: shortening, column \texttt{clash} for topological clash, and

902: column \texttt{struct} for structural widening.

903:

904: \newcommand{\mejor}{$^*$}

905:

906: \begin{table}[htbp]

907:   \begin{center}

908:     \begin{tabular}{||l||r|r|r|r||}

909: \hline \hline

910: Program & \multicolumn{1}{|c|}{\texttt{rul}} &

911:           \multicolumn{1}{|c|}{\texttt{short}} &

912:           \multicolumn{1}{|c|}{\texttt{clash}} &

913:           \multicolumn{1}{|c||}{\texttt{struct}} \\

914: \hline \hline

915: aiakl       &  568&        469&        529&      900 \\

916: bid         & 1480&       2209&       2529&     4730 \\

917: boyer       & 3450&       3890&       4989&     9629 \\

918: browse      &  758&        380&        389&      539 \\

919: cs\_o       & 3840&       1889&       2689&     2580 \\

920: cs\_r       &18549&      10720&      24479&    19560 \\

921: disj\_r     & 4468&       1819&       6399&     2440 \\

922: gabriel     & 1549&       1430&       1870&     1760 \\

923: grammar     &  330&        160&        160&      190 \\

924: hanoiapp    &  620&        719&       1889&     1150 \\

925: kalah\_r    & 1520&         79&         79&       89 \\

926: mmatrix     &  310&        190&        209&      119 \\

927: occur       &  380&        219&        330&      289 \\

928: palin       &  590&        840&        980&      850 \\

929: pg          &  839&       2020&       2980&     3990 \\

930: plan        & 1138&        819&        960&     1009 \\

931: progeom     &  979&       1840&       2530&     3640 \\

932: qsort       &  310&        590&        659&      680 \\

933: qsortapp    &  369&       1000&       2898&     1210 \\

934: queens      &  329&        179&        190&      180 \\

935: query       &  720&        360&        370&      410 \\

936: serialize   &  478&        810&        969&      899 \\

937: witt        & 2929&       4890&       1399&     1169 \\

938: zebra       &  560&       3490&      14958&    12830 \\

939: \hline \hline

940: \multicolumn{5}{c}{(excluding simplification times)}

941:     \end{tabular}\hfill

942:     \begin{tabular}{||l||r|r|r|r||}

943: \hline \hline

944: Program & \multicolumn{1}{|c|}{\texttt{rul}} &

945:           \multicolumn{1}{|c|}{\texttt{short}} &

946:           \multicolumn{1}{|c|}{\texttt{clash}} &

947:           \multicolumn{1}{|c||}{\texttt{struct}} \\

948: \hline \hline

949: aiakl       &        697&       3009&       3738&     1409 \\

950: bid         &       2899&      31278&      35949&    15259 \\

951: boyer       &      19620&     201169&     206917&    92117 \\

952: browse      &        987&       2848&       2987&     1698 \\

953: cs\_o       &      11958&      17389&      32959&     4878 \\

954: cs\_r       &      50760&     303430&     238788&    30169 \\

955: disj\_r     &       6508&      18598&      26077&     6408 \\

956: gabriel     &       2098&      13388&      22379&     5208 \\

957: grammar     &        759&       3169&       3169&     1279 \\

958: hanoiapp    &        840&       3988&      13738&     3378 \\

959: kalah\_r    &       2069&       1187&       1188&      888 \\

960: mmatrix     &        757&       1769&       2078&      488 \\

961: occur       &        530&       1647&       2628&      767 \\

962: palin       &        997&       8520&      11878&     2180 \\

963: pg          &       1349&      15380&      22870&     7370 \\

964: plan        &       1587&       6167&       6559&     2288 \\

965: progeom     &       1358&      12800&      17598&     6679 \\

966: qsort       &        520&       3439&       4168&     1409 \\

967: qsortapp    &        569&       7789&       9669&     2900 \\

968: queens      &        457&       1128&       1138&      429 \\

969: query       &       1627&      22458&      22788&    11818 \\

970: serialize   &        937&       8429&      11957&     2217 \\

971: witt        &       3438&     188419&      42699&    25709 \\

972: zebra       &        717&      55100&     189949&    44540 \\

973: \hline \hline

974: \multicolumn{5}{c}{(including simplification times)}

975:     \end{tabular}

976:     \caption{Timing results}

977:     \label{table:bothfixp}

978:     \label{table:bothsimp}

979:   \end{center}

980: \vspace{-2.5\baselineskip}

981: \end{table}

982:

983: Table~\ref{table:precision} shows results in terms of precision.

984: The precision of \texttt{struct} is never improved by any of the

985: others. The improved precision of \texttt{struct} has been measured

986: as follows. The left subcolumns under \texttt{rul}, \texttt{short}, and

987: \texttt{clash} show the number of types with a more precise definition

988: inferred by \texttt{struct}. The right subcolumns show the number of types

989: where the previous ones appear (and are thus, also, more precise).

990: The former are types directly inferred from program predicates; the

991: latter are types which are defined from the former, due to the data

992: flow in the program.

993:

994: \begin{table}[htbp]

995:   \begin{center}

996:     \begin{tabular}{||l||r|r||r|r||r|r||r||}

997: \hline \hline

998: Program & \multicolumn{2}{|c||}{\texttt{rul}} &

999:           \multicolumn{2}{|c||}{\texttt{short}} &

1000:           \multicolumn{2}{|c||}{\texttt{clash}} \\

1001: \hline \hline

1002: aiakl          & 1 & 1 & 1 & 1  &   &    \\

1003: bid            & 9 & 12& 9 & 12 &   &    \\

1004: cs\_o          & 4 & 18& 4 & 18 & 2 & 9  \\

1005: cs\_r          & 4 & 28& 4 & 28 & 2 & 19 \\

1006: disj\_r        & 6 & 13& 6 & 13 &   &    \\

1007: mmatrix        & 2 & 2 & 2 & 2  &   &    \\

1008: occur          & 1 & 1 & 1 & 1  &   &    \\

1009: palin          & 2 & 4 & 2 & 4  &   &    \\

1010: pg             & 1 & 1 & 1 & 1  &   &    \\

1011: qsort          & 1 & 1 & 1 & 1  &   &    \\

1012: serialize      & 2 & 4 & 2 & 4  &   &    \\

1013: zebra          & 3 & 3 & 3 & 3  & 1 & 1  \\

1014: \hline \hline

1015:     \end{tabular}

1016:     \caption{Precision results}

1017:     \label{table:precision}

1018:   \end{center}

1019: \vspace{-2\baselineskip}

1020: \end{table}

1021:

1022: %% Also, mention the times the bound on the number of widenings is

1023: %% reached...

1024:

1025: The following conclusions can be drawn from the tables. First, the

1026: regular approximation approach seems to behave better in terms of

1027: efficiency than the program interpretation approach, at least for the

1028: bigger programs. This conclusion, however, has to be taken with some

1029: care, since the current implementation of \texttt{rul} performs some

1030: caching of the type grammars that the PLAI-based analysis does

1031: not. % The effect of

1032: This should be subject of a more thorough

1033: evaluation, which is out of the scope of this paper. The fact that it

1034: improves in bigger programs seems to suggest that the effect of this

1035: caching is most surely not negligible.

1036:

1037: Regarding the analyses based on program interpretation, it can be

1038: concluded that the better the precision the worse the efficiency:

1039: \texttt{short} takes less than \texttt{clash}, and this one takes less

1040: than \texttt{struct}; this one is more precise than \texttt{clash},

1041: which is more precise than \texttt{short}. This conclusion seems

1042: evident at first sight, but it is not: in analysis, an improvement in

1043: precision can very well trigger an improvement in efficiency. This can

1044: also be seen in the tables in some cases, the most significant

1045: probably being \texttt{zebra}. % in table~\ref{table:ciaoppfixp}.

1046: Overall, one can arguably conclude that the efficiency loss found is

1047: not a high price in exchange for the gain in precision.

1048: %\marginpar{discuss results!}

1049:

1050: We have also carried out another test.

1051: For practical purposes, the CiaoPP system includes a back-end to the

1052: analysis that simplifies the types inferred, in the sense that

1053: equivalent types are identified, so that they are then reduced to a

1054: single type. This facilitates the interpretation of the output.

1055: %

1056: It is the case that the structural widening includes certain amount of

1057: type simplification, so that the analysis creates less different types

1058: which are in fact equivalent. For this reason, we have included the

1059: same tests as above, but adding now the times taken in the back-end

1060: simplification phase.

1061:

1062: The times including the simplification are shown in

1063: table~\ref{table:bothsimp}. The columns

1064: read as before. It can be seen that in this case structural widening

1065: outperforms all of the other analyses, except, in some cases,

1066: \texttt{rul}.

1067: %

1068: It also can be observed that \texttt{rul} behaves usually better than

1069: \texttt{short} also when simplification is included. This seems to

1070: suggest that incorporating our widening into the regular approximation

1071: approach would probably give the best results in practice.\footnote{This,

1072:   however, may not be trivial. It is subject for future work.}

1073:

1074:

1075: \section{Conclusions}

1076: \label{sec:conclusions}

1077:

1078: We have presented a new widening operator on regular types within

1079: an abstract interpretation-based characterization of type inference.

1080: The idea behind it is similar to set-based

1081: analyses~\cite{FruewirthShapiroVardiYardeni91,set-based-popl} in that

1082: we assign and fix type names, but it is applied here with more generality.

1083: It can be seen as a generalization of the idea of ``guessing'' the

1084: growth of the types during analysis which is

1085: behind~\cite{VanHentenryckCortesiLeCharlier95}. Instead of guessing,

1086: our technique determines exactly where the type is growing.

1087: %

1088: The resulting widening operator has been presented on deterministic

1089: regular types. However, its extension to non-deterministic regular

1090: types should be straightforward.

1091:

1092: Our operator is more precise than previous approaches,

1093: but it is still efficient. This has been shown with (preliminary)

1094: practical results. However, it does not guarantee termination. We are

1095: currently working on the non-termination problem. A moded type domain

1096: will help in this. The idea is to enhance abstract unification so that

1097: it is able to identify the ``transference'' of type names from the

1098: input to the output types, so that the names are not dropped. This

1099: will remedy the problem of Example~\ref{ex:nontermination} and,

1100: hopefully, allow us to prove termination of analyses with the proposed

1101: widening operator.

1102:

1103: Finally, this work has revealed two issues that may be worth

1104: investigating for practical purposes: the impact on the

1105: efficiency of analysis of the different implementation techniques

1106: for different analysis methods,

1107: %(e.g., regular approximations versus program interpretation),

1108: on one hand, and of the simplification of

1109: types, on the other hand.

1110:

1111: \paragraph{\bf Acknowledgements}

1112: %

1113: We would like to thank John Gallagher for very useful discussions, and

1114: Pedro L\'{o}pez for his help with the implementation and the

1115: availability of his library on type manipulation.

1116: This work has been partially supported by Spanish MCYT project EDIPIA

1117: TIC99-1151.

1118:

1119: \begin{thebibliography}{10}

1120:

1121: \bibitem{set-based-popl}

1122: W.~Charatonik, A.~Podelski, and J.-M. Talbot.

1123: \newblock {P}aths vs. {T}ress in {S}et-based {P}rogram {A}nalysis.

1124: %\newblock In {\em Principles of Programming Languages}, pages 330--338. {ACM}

1125: \newblock In {\em POPL00}, pages 330--338. {ACM}

1126:   {P}ress, January 2000.

1127:

1128: \bibitem{LeCharlier94:toplas}

1129: B.~Le Charlier and P.~{Van Hentenryck}.

1130: \newblock {E}xperimental {E}valuation of a {G}eneric {A}bstract

1131:   {I}nterpretation {A}lgorithm for {P}rolog.

1132: \newblock {\em ACM Transactions on Programming Languages and Systems},

1133:   16(1):35--101, 1994.

1134:

1135: \bibitem{Dart-Zobel}

1136: P.W. Dart and J.~Zobel.

1137: \newblock {A} {R}egular {T}ype {L}anguage for {L}ogic {P}rograms.

1138: \newblock In F.~Pfenning, editor, {\em {T}ypes in {L}ogic {P}rogramming}, pages

1139:   157--187. MIT Press, 1992.

1140:

1141: \bibitem{FruewirthShapiroVardiYardeni91}

1142: T.~Fr{\"u}wirth, E.~Shapiro, M.Y. Vardi, and E.~Yardeni.

1143: \newblock Logic programs as types for logic programs.

1144: \newblock In {\em Proc. LICS'91}, pages 300--309, 1991.

1145:

1146: \bibitem{gallagher-types-iclp94}

1147: J.P. Gallagher and D.A. de~Waal.

1148: \newblock Fast and precise regular approximations of logic programs.

1149: \newblock In Pascal {Van~Hentenryck}, editor, {\em Proc.~of the 11th

1150:   International Conference on Logic Programming}, pages 599--613. MIT Press,

1151:   1994.

1152:

1153: \bibitem{ciaopp-iclp99-tut}

1154: M.~Hermenegildo, F.~Bueno, G.~Puebla, and P.~L\'{o}pez-Garc\'{\i}a.

1155: \newblock {P}rogram {A}nalysis, {De}bugging and {O}ptimization {U}sing the

1156:   {C}iao {S}ystem {P}reprocessor.

1157: %\newblock In {\em 1999 International Conference on Logic Programming}, pages

1158: \newblock In {\em ICLP99}, pages

1159:   52--66, Cambridge, MA, November 1999. MIT Press.

1160:

1161: \bibitem{janss92}

1162: G.~Janssens and M.~Bruynooghe.

1163: \newblock Deriving {D}escriptions of {P}ossible {V}alues of {P}rogram

1164:   {V}ariables by means of {A}bstract {I}nterpretation.

1165: \newblock {\em Journal of Logic Programming}, 13(2 and 3):205--258, July 1992.

1166:

1167: \bibitem{lindgrenmildner}

1168: T.~Lindgren and P.~Mildner.

1169: \newblock The impact of structure analysis on prolog compilation.

1170: \newblock Tech.\ Rep.\ 140, Computing Science Dept., Uppsala

1171:   University, 1997.

1172:

1173: \bibitem{mildnerthesis}

1174: P.~Mildner.

1175: \newblock {\em Type Domains for Abstract Interpretation: A Critical Study}.

1176: \newblock PhD thesis, Computing Science Department - Uppsala University, 1999.

1177:

1178: \bibitem{Saglam-Gallagher-94}

1179: H.~Saglam and J.~Gallagher.

1180: \newblock Approximating logic programs using types and regular descriptions.

1181: \newblock Tech.\ Rep.\ CSTR-94-19, Dept.\ of C.\ Science,

1182:   U.\ of Bristol, 1994.

1183:

1184: \bibitem{VanHentenryckCortesiLeCharlier95}

1185: P.~Van~Hentenryck, A.~Cortesi, and B.~Le~Charlier.

1186: \newblock Type analysis of prolog using type graphs.

1187: \newblock {\em Journal of Logic Programming}, 22(3):179--209, 1995.

1188:

1189: \end{thebibliography}

1190:

1191: \end{document}

1192: