0409:cs0409009/main.tex

1: \documentclass{llncs}

2: \bibliographystyle{alpha}

3:

4: \usepackage{cs_techrpt_cover}

5: \usepackage{graphicx}

6: \usepackage{amssymb}

7: \usepackage{eucal}

8: \usepackage[pdfpagemode=None]{hyperref}

9:

10: \newcounter{verbline}

11: \makeatletter

12: \def\verblineinput#1{

13: \begingroup

14: \setcounter{verbline}{0} \fontsize{10pt}{11pt}

15: \def\par{\leavevmode\null\@@par}

16: \obeylines \frenchspacing\@vobeyspaces \@makeother\$\@makeother\&\@makeother\#

17: \@makeother\^\@makeother\_\@makeother\~ \@makeother\{\@makeother\}\@makeother\\

18: \@makeother\%\tt\@noligs\input{#1}\endgroup} \makeatother

19:

20: % Seiten in Hoehe vergroessern

21: \addtolength{\topmargin}{-1.5cm}

22: \addtolength{\textheight}{3cm}

23:

24: % Seiten in Breite vergroessern

25: \addtolength{\oddsidemargin}{-2.2cm}

26: \addtolength{\evensidemargin}{-2.2cm}

27: \addtolength{\textwidth}{3.7cm}

28:

29:

30: % Zus"atzliche Definitionen.

31: \def\R{\mathop{\rm I\kern-1.5pt R}}

32: \def\N{\mathop{\rm I\kern-1.5pt N}}

33: \def\NP{\mathop{\cal NP}}

34: \def\bbbkl{{[\![}}

35: \def\bbbkr{{]\!]}}

36: \def\cc{{\cal U}}

37: \def\crel{{\cal R}}

38: \def\ct{{\cal T}}

39: \def\cx{{\cal X}}

40:

41: \pagestyle{empty}

42: \setcounter{page}{0}

43:

44: \title{CrocoPat 2.1 Introduction and Reference Manual\thanks{%

45:   This research was supported in part by

46:   the NSF grants CCR-0234690 and ITR-0326577,

47:   and the DFG grant BE~1761/3-1.

48:        }}

49:

50: \institute{

51:    Electronics Research Laboratory \\

52:    Department of Electrical Engineering and Computer Sciences \\

53:    College of Engineering \\

54:    University of California at Berkeley \\

55:    Berkeley, CA 94720-1770, U.S.A. \\

56:    \email{beyer@eecs.berkeley.edu}\\[5mm]

57: \and

58:    Software Systems Engineering Research Group\\

59:    Department of Computer Science\\

60:    Brandenburg University of Technology at Cottbus\\

61:    PO Box 10 13 44, 03013 Cottbus, Germany\\

62:    \email{an@informatik.tu-cottbus.de}

63: }

64:

65: %\reportmonth{July}

66: %\reportyear{2004}

67: \date{July 2004}

68: \reportnumber{04-1338}

69: \useuglyformat

70:

71: \begin{document}

72:

73: \author{Dirk Beyer \and Andreas Noack}

74: \makecover

75:

76: % Just to get an additional blank page.

77: \phantom{x}

78:

79: \author{Dirk Beyer\inst{1} \and Andreas Noack\inst{2}}

80: \maketitle

81:

82: % Seitennummern unten mittig.

83: \pagestyle{plain}

84:

85: \begin{abstract}

86: CrocoPat is an efficient, powerful and easy-to-use tool for manipulating

87: relations of arbitrary arity, including directed graphs.  This manual provides

88: an introduction to and a reference for CrocoPat and its programming

89: language~RML. It includes several application examples, in particular from the

90: analysis of structural models of software systems.

91: \end{abstract}

92:

93: \sloppy

94:

95:

96:

97:

98: %RML

99: %operator complexity

100:

101:

102: \section{Introduction} \label{s:intro}

103:

104: CrocoPat is a tool for manipulating relations, including directed

105: graphs (binary relations).  CrocoPat is powerful, because it

106: manipulates relations of arbitrary arity; it is efficient in terms

107: of time and memory, because it uses the data structure binary

108: decision diagram (BDD,~\cite{Bryant:1986,Bryant:1992}) for the

109: internal representation of relations; it is fairly easy to use,

110: because its language is simple and based on the well-known

111: predicate calculus; and it is easy to integrate with other tools,

112: because it uses the simple and popular Rigi Standard Format (RSF)

113: as input and output format for relations. CrocoPat is free

114: software (released under LGPL) and can be obtained from

115: \href{http://www.software-systemtechnik.de/CrocoPat}{\tt

116: http://www.software-systemtechnik.de/CrocoPat}.

117:

118: {\bf Overview.} CrocoPat is a command line tool which interprets programs

119: written in the Relation Manipulation Language (RML).  Its inputs are an RML

120: program and relations in the Rigi Standard Format (RSF), and its outputs are

121: relations in RSF and other text produced by the RML program.  The programming

122: language RML, and the input and output of relations from and to RSF files are

123: introduced with the help of many examples in Section~\ref{s:lanintro}.

124: Section~\ref{s:advanced} describes advanced programming techniques, in

125: particular for improving the performance of RML programs and for circumventing

126: limitations of RML. The manual concludes with references of CrocoPat's command

127: line options in Section~\ref{s:pat}, of RSF in Section~\ref{s:rsf}, and of RML

128: in Section~\ref{s:lan}.  The RML reference includes a concise informal

129: description of the semantics, and a formal description of the syntax and the

130: core semantics.

131:

132: {\bf Applications.} CrocoPat was originally developed for analyzing graph

133: models of software systems, and in particular for finding patterns in such

134: graphs~\cite{BeyerEtAl:2003}. Existing tools were not appropriate for this

135: task, because they were limited to binary relations (e.g.

136: Grok~\cite{Holt:1998}, RPA \cite{FeijsEtAl:1998}, and

137: RelView~\cite{BerghammerEtAl:2002}), or consumed too much time or memory (e.g.

138: relational database management systems and Prolog interpreters). Applications

139: of graph pattern detection include

140: \begin{itemize}

141: \item the detection of implementation

142: patterns~\cite{RichWills:1990,HarandiNing:1990,Hartman:1991,Quilici:1994},

143: object-oriented design patterns (Section~\ref{s:pattern},

144: \cite{MendelzonSametinger:1995,KraemerPrechelt:1996,AntoniolEtAl:1998,KellerEtAl:1999,NiereEtAl:2002}),

145: and architectural styles~\cite{Holt:1996},

146: \item the detection of potential design problems (Section~\ref{s:pattern},

147: \cite{MendelzonSametinger:1995,SefikaEtAl:1996a,FeijsEtAl:1998,KazmanBurth:1998,Ciupke:1999,FahmyHolt:2000}),

148: \item the inductive inference of design

149: patterns~\cite{ShullEtAl:1996,TonellaAntoniol:1999},

150: \item the identification of code clones~\cite{Krinke:2001},

151: \item the extraction of scenarios from models of source

152: code~\cite{WuEtAl:2002}, and

153: \item the detection of design problems in databases~\cite{Blaha:2004}.

154: \end{itemize}

155:

156: The computation of transitive closures of graphs -- another particular strength

157: of CrocoPat -- is not only needed for the detection of some of the above

158: patterns, but has also been applied for dead code detection and change impact

159: analysis~\cite{ChenEtAl:1998,FeijsEtAl:1998}.  Computing and analyzing the

160: difference between two graphs supports checking the conformance of the as-built

161: architecture to the as-designed architecture

162: \cite{SefikaEtAl:1996a,FeijsEtAl:1998,MensWuyts:1999,FahmyHolt:2000,MurphyEtAl:2001},

163: and studying the evolution of software systems between different versions.

164: Calculators for relations have also been used to compute views of systems on

165: different levels of abstraction by lifting and lowering

166: relations~\cite{FeijsEtAl:1998,FahmyHolt:2000}, and to calculate software

167: metrics (Section~\ref{s:numerical},

168: \cite{MendelzonSametinger:1995,KullbachWinter:1999}).

169:

170: Although we are most familiar with potential applications in the analysis of

171: software designs, we are confident that CrocoPat can be beneficial in many

172: other areas. For example, calculators for relations have been used for program

173: analyses like points-to analysis~\cite{BerndlEtAl:2003}, and for the

174: implementation of graph algorithms~(Section~\ref{s:control},

175: \cite{BehnkeEtAl:1997}).

176:

177:

178: \section{RML Tutorial} \label{s:lanintro}

179:

180: This section introduces RML, the programming language of CrocoPat, on examples.

181: The core of RML are relational expressions based on first-order predicate

182: calculus, a well-known, reasonably simple, precise and powerful language.

183: Relational expressions are explained in Subsection~\ref{s:re}, and additional

184: examples involving relations of arity greater than two are given in

185: Subsection~\ref{s:pattern}.  Besides relational expressions, the language

186: includes control structures, described in Subsection~\ref{s:control}, and

187: numerical expressions, described in Subsection~\ref{s:numerical}.  The input

188: and output of relations is described in Subsection~\ref{s:output}.  A more

189: concise and a more formal specification of the language can be found in

190: Section~\ref{s:lan}.

191:

192: Although the main purpose of this section is the introduction of the language,

193: some of the application examples may be of interest by themselves. In

194: Subsection~\ref{s:control}, simple graph algorithms are implemented, and in the

195: Subsections~\ref{s:pattern} and~\ref{s:numerical} the design of object-oriented

196: software systems is analyzed.

197:

198:

199: \subsection{Relational Expressions} \label{s:re}

200:

201: This subsection introduces relational expressions using relationships between

202: people as example. Remember that $n$-ary relations are sets of ordered

203: $n$-tuples.  In this subsection, we will only consider the cases $n = 1$ (unary

204: relations) and $n = 2$ (binary relations, directed graphs).  CrocoPat

205: manipulates tuples of strings, thus unary relations in CrocoPat are sets of

206: strings, and binary relations in CrocoPat are sets of ordered pairs of strings.

207:

208: {\bf Adding Elements.} The statement

209: \begin{verbatim}

210: Male("John");

211: \end{verbatim}

212: expresses that {\tt John} is male. (In some languages, e.g. the logic

213: programming language Prolog~\cite{ClocksinMellish:2003}, such statements are

214: called facts.) It adds the string {\tt John} to the unary relation {\tt Male}.

215: Because each relation variable initially contains the empty relation, {\tt

216: John} is so far the only element of the set {\tt Male}. An explicit declaration

217: of variables is not necessary. However, variables should be defined (i.e.,

218: assigned a value) before they are first used, otherwise CrocoPat prints a

219: warning.

220: \begin{verbatim}

221: Male("Joe");

222: \end{verbatim}

223: adds the string {\tt Joe} to the set {\tt Male}, such that it now has two

224: elements. Similarly, we can initialize the variable {\tt Female}:

225: \begin{verbatim}

226: Female("Alice");

227: Female("Jane");

228: Female("Mary");

229: \end{verbatim}

230:

231: To express that the {\tt John} and {\tt Mary} are the

232: parents of {\tt Alice} and {\tt Joe}, and {\tt Joe} is the father of {\tt

233: Jane}, we create a binary relation variable {\tt ParentOf} which contains the

234: five parent-child pairs:

235: \begin{verbatim}

236: ParentOf("John", "Alice");

237: ParentOf("John", "Joe");

238: ParentOf("Mary", "Alice");

239: ParentOf("Mary", "Joe");

240: ParentOf("Joe", "Jane");

241: \end{verbatim}

242:

243: {\bf Assignments.} The following statement uses an {\em attribute}~{\tt x} to

244: assign the set of {\tt Joe}'s parents to the set {\tt JoesParent}:

245: \begin{verbatim}

246: JoesParent(x) := ParentOf(x, "Joe");

247: \end{verbatim}

248: Now {\tt JoesParent} contains the two elements {\tt John} and {\tt Mary}. As

249: another example, the following assignment says that {\tt x} is a child of {\tt

250: y} if and only if {\tt y} is a parent of {\tt x}:

251: \begin{verbatim}

252: ChildOf(x,y) := ParentOf(y,x);

253: \end{verbatim}

254: {\tt John} is the father of a person if and only if he is the parent of this

255: person.  The same is true for {\tt Joe}:

256: \begin{verbatim}

257: FatherOf("John", x) := ParentOf("John", x);

258: FatherOf("Joe", x) := ParentOf("Joe", x);

259: \end{verbatim}

260: Because the scope of each attribute is limited to one statement, the attribute

261: in the first statement and the attribute in the second statement are different,

262: despite of their equal name~{\tt x}.

263:

264: {\bf Basic Relational Operators.} The relation {\tt FatherOf} can be described

265: more concisely: {\tt x} is father of~{\tt y} if and only if {\tt x} is a parent

266: of {\tt y} and {\tt x} is male:

267: \begin{verbatim}

268: FatherOf(x,y) := ParentOf(x,y) & Male(x);

269: \end{verbatim}

270: Of course, we can define a similar relation for female parents:

271: \begin{verbatim}

272: MotherOf(x,y) := ParentOf(x,y) & Female(x);

273: \end{verbatim}

274:

275: Besides the operator {\em and} ({\tt \&}), another important operator is {\em

276: or} ({\tt |}). For example, we can define the {\tt ParentOf} relation in terms

277: of the relations {\tt MotherOf} and {\tt FatherOf}: {\tt x}~is a parent of~{\tt

278: y} if and only if {\tt x}~is the mother or the father of~{\tt y}:

279: \begin{verbatim}

280: ParentOf(x,y) := MotherOf(x,y) | FatherOf(x,y);

281: \end{verbatim}

282:

283: {\bf Quantification.} Parents are people who are a parent of another person.

284: More precisely, {\tt x} is a parent if and only if there {\em exists} ({\tt

285: EX}) a {\tt y} such that {\tt x} is a parent of {\tt y}.

286: \begin{verbatim}

287: Parent(x) := EX(y, ParentOf(x,y));

288: \end{verbatim}

289: Now the set {\tt Parent} consists of {\tt John}, {\tt Mary}, and {\tt Joe}.

290: There is also an abbreviated notation for existential quantification which is

291: similar to anonymous variables in Prolog and functional programming languages:

292: \begin{verbatim}

293: Parent(x) := ParentOf(x,_);

294: \end{verbatim}

295: With the operator {\em not} ({\tt !}), we can compute who has no children:

296: \begin{verbatim}

297: Childless(x) := !EX(y, ParentOf(x,y));

298: \end{verbatim}

299: Equivalently, {\tt x} childless if {\em for all} ({\tt FA}) {\tt y} holds that

300: {\tt x} is not a parent of {\tt y}:

301: \begin{verbatim}

302: Childless(x) := FA(y, !ParentOf(x,y));

303: \end{verbatim}

304: In both cases, the set {\tt Childless} contains {\tt Alice} and {\tt Jane}.

305:

306: {\bf Transitive Closure.} To compute the grandparents of a person we have to

307: determine the parents of his or her parents:

308: \begin{verbatim}

309: GrandparentOf(x,z) := EX(y, ParentOf(x,y) & ParentOf(y,z));

310: \end{verbatim}

311: Now {\tt GrandparentOf} contains the two pairs ({\tt John}, {\tt Jane}) and

312: ({\tt Mary}, {\tt Jane}).  To find out all ancestors of a person, i.e. parents,

313: parents of parents, parents of parents of parents, etc., we have to apply the

314: above operation (which is also called composition) repeatedly until the fixed

315: point is reached, and unite the results.  The transitive closure operator {\tt

316: TC} does exactly this:

317: \begin{verbatim}

318: AncestorOf(x,z) := TC(ParentOf(x,z));

319: \end{verbatim}

320: The resulting relation {\tt AncestorOf} contains any pair from {\tt ParentOf}

321: and {\tt GrandparentOf}. (It also contains grand-grandparents etc., but there

322: are none in this example.) The transitive closure operator {\tt TC} can only be

323: applied to binary relations.

324:

325: {\bf Predefined Relations, the Universe.} The relations {\tt FALSE} and {\tt

326: TRUE} are predefined.  {\tt FALSE} is the empty relation, and {\tt TRUE} is the

327: full relation. More precisely, there is one predefined relation {\tt FALSE} and

328: one predefined relation {\tt TRUE} for every arity.  In particular, there is

329: also a 0-ary relation {\tt FALSE()}, which is the empty set, and a 0-ary

330: relation {\tt TRUE()}, which contains only~$()$ (the tuple of length~$0$).

331: Intuitively, these 0-ary relations can be used like Boolean literals.  By the

332: way, the statement

333: \begin{verbatim}

334: Male("John");

335: \end{verbatim}

336: is an abbreviation of the assignment

337: \begin{verbatim}

338: Male("John") := TRUE();

339: \end{verbatim}

340:

341: The result of {\tt TRUE(x)} is the so-called {\em universe}.  The universe

342: contains all string literals that appear in the input RSF stream (if there is

343: one, see Subsection~\ref{s:output}) and on the left hand side of assignments in

344: the present RML program.  For example, the string literals used on the left

345: hand side of the assignments in the examples of this subsection are {\tt

346: Alice}, {\tt Jane}, {\tt Joe}, {\tt John}, and {\tt Mary}, so the set {\tt

347: TRUE(x)} contains these five elements. See Subsection~\ref{s:universe} for more

348: information on the universe.

349:

350: The binary relations {\tt =}, {\tt !=}, {\tt <}, {\tt <=}, {\tt >}, and {\tt

351: >=} for the lexicographical order of the strings in the universe are also predefined.

352: For example, siblings are two {\em different} people who have a common parent:

353: \begin{verbatim}

354: SiblingOf(x,y) := EX(z, ParentOf(z,x) & ParentOf(z,y)) & !=(x,y);

355: \end{verbatim}

356: The infix notation is also available for binary relations, so the expression

357: {\tt !=(x,y)} can also be written as~{\tt x!=y}.  Note that the predefined

358: relations, like any other relation, are restricted to the universe. Thus the

359: expression {\tt "A"="A"} yields {\tt FALSE()} if (and only if) the string {\tt

360: A} is not in the universe.

361:

362: Further relational expressions are provided to match POSIX extended regular

363: expressions~\cite[Section~9.4]{IEEE:2001}.  These relational expressions start

364: with the character {\tt @}, followed by the string for the regular expression.

365: For example,

366: \begin{verbatim}

367: StartsWithJ(x) := @"^J"(x);

368: \end{verbatim}

369: assigns to the set {\tt StartsWithJ} the set of all strings in the universe

370: that start with the letter {\tt J}, namely {\tt Jane}, {\tt Joe}, and {\tt

371: John}. A short overview of the syntax of regular expressions is given in

372: Subsection~\ref{s:syntax}.

373:

374: {\bf Boolean Operators.} Two relations can be compared with the operators {\tt

375: =}, {\tt !=}, {\tt <}, {\tt <=}, {\tt >}, or {\tt >=}.  Because such

376: comparisons evaluate to either {\tt TRUE()} or {\tt FALSE()}, they are called

377: Boolean expressions. For example,

378: \begin{verbatim}

379: GrandparentOf(x,y) < AncestorOf(x,y)

380: \end{verbatim}

381: yields {\tt TRUE()}, because {\tt GrandparentOf} is a proper subset of {\tt

382: AncestorOf}.  However,

383: \begin{verbatim}

384: GrandparentOf(x,y) = AncestorOf(x,y)

385: \end{verbatim}

386: yields {\tt FALSE()}, because the two relations are not equal.

387:

388: The six comparison operators should not be confused with the six predefined

389: relations for the lexicographical order. The operators take two relations as

390: parameters, while the predefined relations take strings or attributes as

391: parameters.

392:

393:

394:

395: \subsection{Input and Output of Relations} \label{s:output}

396:

397: {\bf File Format RSF.} CrocoPat reads and writes relations in Rigi Standard

398: Format (RSF,~\cite[Section~4.7.1]{Wong:1998}). Files in RSF are human-readable,

399: can be loaded into and saved from many reverse engineering tools, and are

400: easily processed by scripts in common scripting languages.

401:

402: In an RSF file, a tuple of an $n$-ary relation is represented as a line of the

403: form

404: \begin{verbatim}

405: RelationName element1 element2 ... elementn

406: \end{verbatim}

407: The elements may be enclosed by double quotes. Because white space serves as

408: delimiter of the elements, elements that contain white space {\em must} be

409: enclosed by double quotes. A relation is represented as a sequence of such

410: lines.  The order of the lines is arbitrary.  An RSF file may contain several

411: relations.

412:

413: As an example, the relation {\tt ParentOf} from the previous subsection can be

414: represented in RSF format as follows:

415: \begin{verbatim}

416: ParentOf John Alice

417: ParentOf John Joe

418: ParentOf Mary Alice

419: ParentOf Mary Joe

420: ParentOf Joe  Jane

421: \end{verbatim}

422:

423: {\bf Input.} RML has no input statements.  When CrocoPat is started, it first

424: reads input relations in RSF from the standard input before it parses and

425: executes the RML program.  RSF reading can be skipped with the {\tt -e} command

426: line option. If the input relations are available as files, they can be feeded

427: into CrocoPat's standard input using the shell operator {\tt <}, as the

428: following examples shows for the file {\tt ParentOf.rsf}:

429: \begin{verbatim}

430: crocopat Prog.rml < ParentOf.rsf

431: \end{verbatim}

432: The end of the input data is recognized either from the end of file character

433: or from a line that starts with the dot ({\tt .}) character.  The latter is

434: sometimes useful if RSF input is feeded interactively.

435:

436: If the above RSF data is used as input, then at the start of the program the

437: binary relation variable {\tt ParentOf} contains the five pairs, and the

438: universe contains the five string literals {\tt Alice}, {\tt Jane}, {\tt Joe},

439: {\tt John}, and {\tt Mary} (and additionally all string literals that appear on

440: the left hand side of assignments in the program.)

441:

442: {\bf Output.} The {\tt PRINT} statement outputs relations in RSF format to the

443: standard output. For example, running the program

444: \begin{verbatim}

445: ParentOf("Joe",x) := FALSE(x);

446: ParentOf(x,"Joe") := FALSE(x);

447: PRINT ParentOf(x,y);

448: \end{verbatim}

449: with the above input data prints to the standard output

450: \begin{verbatim}

451: John Alice

452: Mary Alice

453: \end{verbatim} The statement

454: \begin{verbatim}

455: PRINT ["ParentOf"] ParentOf(x,y);

456: \end{verbatim}

457: writes the string {\tt ParentOf} before each tuple, and thus outputs

458: \begin{verbatim}

459: ParentOf John Alice

460: ParentOf Mary Alice

461: \end{verbatim}

462: The output can also be appended to a file {\tt ParentOf2.rsf} (which is created

463: if it does not exist) with

464: \begin{verbatim}

465: PRINT ["ParentOf"] ParentOf(x,y) TO "ParentOf2.rsf";

466: \end{verbatim}

467: or to stderr with

468: \begin{verbatim}

469: PRINT ["ParentOf"] ParentOf(x,y) TO STDERR;

470: \end{verbatim}

471: \pagebreak

472:

473: {\bf Command Line Arguments.} It is sometimes convenient to specify the names

474: of output files at the command line and not in the RML program.  If there is

475: only one output file, the standard output can be simply redirected to a file

476: using the shell operator {\tt >}:

477: \begin{verbatim}

478: crocopat Prog.rml < ParentOf.rsf > ParentOf2.rsf

479: \end{verbatim}

480:

481: An alternative solution (which also works with more than one file) is to pass

482: command line arguments to the program.  Command line arguments can be accessed

483: in RML as {\tt \$1}, {\tt \$2}, etc.  For example, when the program

484: \begin{verbatim}

485: ChildOf(x,y) := ParentOf(y,x);

486: PRINT ["Child"] ChildOf(x,$1) TO $1 + ".rsf";

487: PRINT ["Child"] ChildOf(x,$2) TO $2 + ".rsf";

488: \end{verbatim}

489: is executed with

490: \begin{verbatim}

491: crocopat IO.rml Joe Mary < ParentOf.rsf

492: \end{verbatim}

493: then the first {\tt PRINT} statement writes to the file {\tt Joe.rsf}, and

494: the second {\tt PRINT} statement writes to {\tt Mary.rsf}.

495:

496: Command line arguments are not restricted to specifying file names, but can be

497: used like string literals.  However, in contrast to string literals, command

498: line arguments are never added to the universe, and thus cannot be used on the

499: left hand side of relational assignments.

500:

501:

502: \subsection{Control Structures} \label{s:control}

503:

504: This subsection introduces the control structures of RML, using algorithms for

505: computing the transitive closure of a binary relation~{\tt R} as examples.

506:

507: {\bfseries {\ttfamily WHILE} Statement.} As a first algorithm, the relation

508: {\tt R} is composed with itself until the fixed point is reached.

509: %Intro_TC-Exp.rml

510: \begin{verbatim}

511: Result(x,y) := R(x,y);

512: PrevResult(x,y) := FALSE(x,y);

513: WHILE (PrevResult(x,y) != Result(x,y)) {

514:     PrevResult(x,y) := Result(x,y);

515:     Result(x,z) := Result(x,z) | EX(y, Result(x,y) & Result(y,z));

516: }

517: \end{verbatim}

518: The program illustrates the use of the {\tt WHILE} loop, which has the usual

519: meaning: The body of the loop is executed repeatedly as long as the condition

520: after {\tt WHILE} evaluates to~{\tt TRUE()}.

521:

522: {\bfseries {\ttfamily FOR} Statement.} The second program computes the

523: transitive closure of the relation~{\tt R} using the Warshall algorithm.  This

524: algorithm successively adds arcs. In the first iteration, an arc $(u,v)$ is

525: added if the input graph contains the arcs $(u,$~node$_0)$ and $($node$_0, v)$.

526: In the second iteration, an arc $(u, v)$ is added if the graph that results

527: from the first iteration contains the arcs $(u,$~node$_1)$ and $($node$_1, v)$.

528: And so on, for all nodes of the graph (in arbitrary order.)

529: %Intro_TC-Warshall.rml

530: \begin{verbatim}

531: Result(x,y) := R(x,y);

532: Node(x) := Result(x,_) & Result(_,x);

533: FOR node IN Node(x) {

534:   Result(x,y) := Result(x,y) | (Result(x,node) & Result(node,y));

535: }

536: \end{verbatim}

537: The program illustrates the use of the {\tt FOR} loop.  The relation after {\tt

538: IN} must be a unary relation.  The iterator after {\tt FOR} is a string

539: variable and takes as values the elements of the unary relation in

540: lexicographical order. Thus, the number of iterations equals the number of

541: elements of the unary relation.

542:

543: For the implementation of the transitive closure operator of RML, we

544: experimented with several algorithms.  An interesting observation in these

545: experiments was that the empirical complexity of some algorithms for practical

546: graphs deviated strongly from their theoretical worst case complexity, thus

547: some algorithms with a relatively bad worst-case complexity were very

548: competitive in practice.  In our experiments, the first of the above algorithms

549: was very fast, thus we made it available as operator {\tt TCFAST}.  The

550: implementation of the {\tt TC} operator of RML is a variant of the Warshall

551: algorithm.  It is somewhat slower than {\tt TCFAST} (typically about 20 percent

552: in our experiments), but often needs much less memory because it uses no

553: ternary relations.

554:

555: {\bfseries {\ttfamily IF} Statement.} The following example program determines

556: if the input graph {\tt R} is acyclic, by checking if its transitive closure

557: contains loops (i.e. arcs from a node to itself):

558: %Intro_Cyclic.rml

559: \begin{verbatim}

560: SelfArcs(x,y) := TC(R(x,y)) & (x = y);

561: IF (SelfArcs(_,_)) {

562:     PRINT "R is not acyclic", ENDL;

563: } ELSE {

564:     PRINT "R is acyclic", ENDL;

565: }

566: \end{verbatim}

567:

568:

569:

570: \subsection{Relations of Higher Arity}\label{s:pattern}

571:

572: In this subsection, relations of arity greater than two are used for finding

573: potential design patterns and design problems in structural models of

574: object-oriented programs.  The examples are taken from~\cite{BeyerEtAl:2003}.

575:

576: The models of object-oriented programs contain the call, containment, and

577: inheritance relationships between classes. Here containment means that a class

578: has an attribute whose type is another class. The direction of inheritance

579: relationships is from the subclass to the superclass. As an example, the source

580: code

581: \begin{verbatim}

582: class ContainedClass {}

583: class SuperClass {}

584: class SubClass extends SuperClass {

585:     ContainedClass c;

586: }

587: \end{verbatim}

588: corresponds to the following RSF file:

589: \begin{verbatim}

590: Inherit   SubClass  SuperClass

591: Contain   SubClass  ContainedClass

592: \end{verbatim}

593:

594: \begin{figure}[ht]

595:    \centering \includegraphics{compositemg}

596:    {\caption{Composite design pattern

597:    \label{fig:composite}}}

598: \end{figure}

599:

600: {\bf Composite Design Pattern.} Figure~\ref{fig:composite} shows the class

601: diagram of the Composite design pattern~\cite{GammaEtAl:1995}. To identify

602: possible instances of this pattern, we compute all triples of a Component

603: class, a Composite class, and a Leaf class, such that (1) Composite and Leaf

604: are subclasses of Component, (2) Composite contains an instance of Component,

605: and (3) Leaf does not contain an instance of Component. The translation of

606: these conditions to an RML statement is straightforward:

607: \begin{verbatim}

608: CompPat(component, composite, leaf) :=   Inherit(composite, component)

609:                                        & Contain(composite, component)

610:                                        & Inherit(leaf, component)

611:                                        & !Contain(leaf, component);

612: \end{verbatim}

613:

614: {\bf Degenerate Inheritance.} When a class {\tt C} inherits from another class

615: {\tt A} directly and indirectly via a class {\tt B}, the direct inheritance is

616: probably redundant or even misleading.  The following statement detects such

617: patterns:

618: \begin{verbatim}

619: DegInh(a,b,c) :=   Inherit(c,b)

620:                  & Inherit(c,a)

621:                  & TC(Inherit(b,a));

622: \end{verbatim}

623:

624: {\bf Cycles.} To understand an undocumented class, one has to understand all

625: classes it uses. If one of the (directly or indirectly) used classes is the

626: class itself, understanding this class is difficult.  All classes that

627: participate in cycles can be found using the transitive closure operator, as

628: shown in Subsection~\ref{s:control}.  However, in many large software systems

629: hundreds of classes participate in cycles, and it is tedious for a human

630: analyst to find the actual cycles in the list of these classes. In our

631: experience, it is often more useful to detect cycles in the order of ascending

632: length.  As a part of such a program, the following statements detects all

633: cycles of length~$3$.

634: \begin{verbatim}

635: Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);

636: Cycle3(x,y,z) := Use(x,y) & Use(y,z) & Use(z,x);

637: Cycle3(x,y,z) := Cycle3(x,y,z) & (x <= y) & (x <= z);

638: \end{verbatim}

639: To see the purpose of the third statement, consider three classes {\tt A}, {\tt

640: B}, and {\tt C} that form a cycle.  After the second statement, the relation

641: variable {\tt Cycle3} contains three representatives of this cycle: $(${\tt

642: A},~{\tt B},~{\tt C}$)$, $(${\tt B},~{\tt C},~{\tt A}$)$ and $(${\tt C},~{\tt

643: A},~{\tt B}$)$. The third statement removes two of these representatives from

644: {\tt Cycle3}, and keeps only the tuple with the lexicographically smallest

645: class at the first position, namely $(${\tt A},~{\tt B},~{\tt C}$)$.

646:

647:

648:

649: \subsection{Numerical Expressions} \label{s:numerical}

650:

651: In this subsection, a software metric is calculated as example for the use of

652: numerical expressions in RML programs.  Therefore, we extend the structural

653: model of object-oriented programs introduced in the previous subsection with a

654: binary relation {\tt PackageOf}.  This relation assigns to each package the

655: classes that it contains. (Packages are high-level entities in object-oriented

656: software systems that can be considered as sets of classes.)

657:

658: Robert Martin's metric for the instability of a package is defined as $ce /

659: (ca+ce)$, where $ca$ is the number of classes outside the package that use

660: classes inside the package, and $ce$ is the number of classes inside the

661: package that use classes outside the package~\cite{Martin:1997}.

662: \begin{verbatim}

663: Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);

664: Package(x) := PackageOf(x,_);

665: FOR p IN Package(x) {

666:     CaClass(x) := !PackageOf(p,x) & EX(y, Use(x,y) & PackageOf(p,y));

667:     ca := #(CaClass(x));

668:     CeClass(x) := PackageOf(p,x) & EX(y, Use(x,y) & !PackageOf(p,y));

669:     ce := #(CeClass(x));

670:     IF (ca + ce > 0) {

671:         PRINT p, " ", ce / (ca+ce), ENDL;

672:     }

673: }

674: \end{verbatim}

675:

676:

677:

678: \section{Advanced Programming Techniques} \label{s:advanced}

679:

680: This section describes advanced programming techniques, in particular for

681: improving efficiency and circumventing language limitations.  The first

682: subsection explains how to control the memory usage of CrocoPat.  The second

683: and third subsection describe how relational expressions are evaluated in

684: CrocoPat, and how to assess and improve the efficiency of their evaluation. The

685: fourth subsection explains why the universe is immutable during the execution

686: of an RML program and how to work around this limitation.

687:

688: \subsection{Controlling the Memory Usage} \label{ss:memory}

689:

690: CrocoPat represents relations using the data structure binary decision diagram

691: (BDD,~\cite{Bryant:1986}). When CrocoPat is started, it reserves a fixed amount

692: of memory for BDDs, which is not changed during the execution of the RML

693: program. If the available memory is insufficient, CrocoPat exits with the error

694: message

695: \begin{verbatim}

696: Error: BDD package out of memory.

697: \end{verbatim}

698:

699: The BDD memory can be controlled with the command line option~{\tt -m},

700: followed by an integer number giving the amount of memory in MByte.  The

701: default value is~50.  The actual amount of memory reserved for BDDs is not

702: infinitely variable, so the specified value is only a rough upper bound of the

703: amount of memory used.

704:

705: It can also be beneficial to reserve less memory, because the time used for

706: allocating memory increases with the amount of memory.  When the manipulated

707: relations are small or the algorithms are computationally inexpensive, memory

708: allocation can dominate the overall runtime.

709:

710: \subsection{Speeding up the Evaluation of Relational Expressions}

711:

712: This subsection explains how CrocoPat evaluates relational expressions.  Based

713: on this information, hints for performance improvement are given. Understanding

714: the subsection requires basic knowledge about BDDs and the impacts of the

715: variable order on the size of BDDs. An introduction to BDDs is beyond the scope

716: of this manual, we refer the reader to~\cite{Bryant:1992}.

717:

718: The attributes in an RML program are called {\em user attributes} in the

719: following.  For example, the expression {\tt R(x,y)} contains the user

720: attributes {\tt x} and~{\tt y}.  For the internal representation of relations,

721: CrocoPat uses a sequence of {\em internal attributes}, which are distinct from

722: the user attributes.  We call these internal attributes {\tt i1}, {\tt i2},

723: {\tt i3}, ... For example, the binary relation {\tt R} is internally

724: represented as a set of assignments to the internal attributes {\tt i1}

725: and~{\tt i2}.

726:

727: When the expression {\tt R(x,y)} is evaluated, the internal attributes {\tt i1}

728: and {\tt i2} are renamed to the user attributes {\tt x} and {\tt y}. Therefore

729: all BDD nodes of the representation of {\tt R} have to be traversed. Thus, the

730: time for evaluating the expression {\tt R(x,y)} is at least linear in the

731: number of BDD nodes of~{\tt R}'s representation.

732:

733: The order of the internal attributes in the BDD is always {\tt i1}, {\tt i2},

734: ... The order of the user attributes in the BDD may be different in the

735: evaluation of each statement, because the scope of user attributes is restricted

736: to one statement. The order of the user attributes in the BDD in the evaluation

737: of a statement is the order in which CrocoPat encounters the user attributes in

738: the execution of the statement.  In the example statement

739: \begin{verbatim}

740: R(x,z) := EX(y, R(x,y) & R(y,z));

741: \end{verbatim}

742: CrocoPat evaluates first {\tt R(x,y)}, then {\tt R(y,z)}, then the conjunction,

743: then the existential quantification, and finally the assignment.  Therefore,

744: the order of the user attributes in the BDD is {\tt x}, {\tt y}, {\tt z}.

745:

746: {\bf Avoid renaming large relations.} The time for the evaluation of the

747: expression {\tt R(x,y)} is at least linear in the number of BDD nodes in the

748: representation of~{\tt R}, because all BDD nodes have to be renamed from

749: internal attributes to user attributes. Usually this effort for renaming does

750: not dominate the overall runtime, but in the following we give an example where

751: it does. \pagebreak

752:

753: Let {\tt R(x,y)} be a directed graph with $n$~nodes. Let the BDD representation

754: of {\tt R} have $\Theta(n^2)$ BDD nodes (which is the worst case). The

755: assignment

756: \begin{verbatim}

757: Outneighbor(y) := EX(x, R(x,y) & x="node1");

758: \end{verbatim}

759: assigns the outneighbors of the graph node {\tt node1} to the set {\tt

760: Outneighbor}.  The evaluation of {\tt R(x,y)} costs $\Theta(n^2)$ time in this

761: example, because of the renaming of all nodes.  The ``real computation'',

762: namely the conjunction and the existential quantification, can be done in

763: $O(\log n)$ time.  So the renaming dominates the overall time.

764:

765: The equivalent statement

766: \begin{verbatim}

767: Outneighbor(y) := R("node1",y);

768: \end{verbatim}

769: is executed in only $O(n)$ time, because the set {\tt R("node1",y)} has $O(n)$

770: elements, and its BDD representation has $O(n)$ nodes.

771:

772: {\bf Avoid swapping attributes.} Renaming the nodes of a BDD costs at least

773: linear time, but can be much more expensive when attributes have to be swapped.

774: In the statement

775: \begin{verbatim}

776: S(x,y,z) := R(y,z) & R(x,y);

777: \end{verbatim}

778: the BDD attribute order on the right hand side of the assignment is {\tt y},

779: {\tt z}, {\tt x}, while the BDD attribute order on the left hand side is {\tt

780: x}, {\tt y}, {\tt z}. Because the two orders are different, attributes have to

781: be swapped to execute the assignment. This can be easily avoided by using the

782: equivalent statement

783: \begin{verbatim}

784: S(x,y,z) := R(x,y) & R(y,z);

785: \end{verbatim}

786:

787: Of course, swapping attributes can not always be avoided.  However, developers

788: of RML programs should know that swapping attributes can be expensive, and

789: should minimize it when performance is critical.

790:

791: {\bf Ensure good attribute orders.} A detailed discussion of BDD attribute

792: orders is beyond the scope of this manual (see

793: e.g.~\cite[Section~1.3]{Bryant:1992} for details), but the basic rule is that

794: related attributes should be grouped together.  In the two assignment

795: statements

796: \begin{verbatim}

797: S1(v,w,x,y) := R(v,w) & R(x,y);

798: S2(v,x,w,y) := R(v,w) & R(x,y);

799: \end{verbatim}

800: the attributes {\tt v} and {\tt w} are related, and the attributes {\tt x} and

801: {\tt y} are related, while {\tt v} and {\tt w} are unrelated to {\tt x}

802: and~{\tt y}.  In {\tt S1}, related attributes are grouped together, but not

803: in~{\tt S2}.  For many relations~{\tt R}, the BDD representation of~{\tt S1}

804: will be drastically smaller than the BDD representation of~{\tt S2}.

805:

806: {\bf Profile.} Information about the number of BDD nodes and the BDD attribute

807: order of an expression can be printed with {\tt PRINT RELINFO}. For example,

808: \begin{verbatim}

809: PRINT RELINFO(R(y,z) & R(x,y));

810: \end{verbatim}

811: may output

812: \begin{verbatim}

813: Number of tuples in the relation: 461705

814: Number of values (universe): 6218

815: Number of BDD nodes: 246986

816: Percentage of free nodes in BDD package: 1614430 / 1966102 = 82 %

817: Attribute order: y z x

818: \end{verbatim}

819: The first line gives the cardinality of the relation, the second line the

820: cardinality of the universe, the third line the size of the BDD that represents

821: the result of the expression, and the fifth line the attribute order in this

822: BDD.

823:

824: \subsection{Estimating the Evaluation Time of Relational Expressions} \label{ss:estimation}

825:

826: Knowledge of the computational complexity of RML's operators is useful to

827: optimize the performance of RML programs. This subsection gives theoretical

828: complexity results, but also discusses the limits of their practical

829: application.

830:

831: Table~\ref{t:complexity} shows the asymptotic worst case time complexity for

832: the evaluation of RML's relational operators.  The times do not include the

833: renaming of internal attributes discussed in the previous subsection, and

834: the evaluation of subexpressions.  It is assumed that the caches of the BDD

835: package are sufficiently large. This assumption is closely approximated in

836: practice when the manipulated BDDs only occupy a small fraction of the

837: available nodes in the BDD package. Otherwise, performance may be improved by

838: increasing the BDD memory (see Subsection~\ref{ss:memory}).

839:

840: When the operands of an expression are relations, the computation time is given

841: as function of the sizes of their BDD representation. (The only exception are

842: the transitive closure operators, where a function of the size of the universe

843: gives a more useful bound.) This raises the problem of how to estimate these

844: BDD sizes. Many practical relations have regularities that enable an (often

845: dramatically) compressed BDD representation, but the analytical derivation of

846: the typical compression rate for relations from a particular application domain

847: is generally difficult.  Our advice is to choose some representative examples

848: and measure the BDD sizes with the {\tt PRINT RELINFO} statement.

849:

850: It is important to note that Table~\ref{t:complexity} gives {\em worst-case}

851: computation times.  In many cases, the typical practical performance is much

852: better than the worst case.  For example, the relational comparison operators

853: ({\tt <=}, {\tt <}, {\tt >=}, {\tt >}) and the binary logic operators ({\tt

854: \&}, {\tt |}, {\tt ->}, {\tt <->}) are very common in RML programs. Their

855: worst-case complexity is the product of the sizes of their operand BDDs, which

856: is alarmingly high. However, in practice the performance is often much closer

857: to the sum of the operand BDD sizes. Similarly, the quantification operators

858: are often efficient despite their prohibitive worst case runtime (which is

859: difficult to derive because quantification is implemented as a series of

860: several bit-level operations).

861:

862: Another practically important example for the gap between average-case and

863: worst-case runtime are the transitive closure operators.  The worst case

864: complexity of their BDD-based implementation is the same as for implementations

865: with conventional data structures. However, the BDD-based implementations are

866: much more efficient for many practical graphs~\cite{BeyerEtAl:2003}.  Even in

867: the comparison of different BDD-based implementations, a better worst-case

868: complexity does not imply a better performance in practice.  We conclude from

869: our experience that  knowledge of the theoretical complexity complements but

870: cannot replace experimentation in the development of highly optimized RML

871: programs.

872:

873: \begin{table}

874: \vspace{-1ex}\centering

875:

876: \normalsize

877: \begin{tabular}{l@{~~~~}l}

878: {\tt ! re}

879:     & $O(${\em bddsize}$(${\tt re}$))$ \\

880: {\tt re1 \& re2}, {\tt ~re1 | re2}

881:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\

882: {\tt re1 -> re2}, {\tt ~re1 <-> re2}

883:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\

884: {\tt EX(x, re)}, {\tt ~FA(x, re)}

885:     & $\ge O(${\em bddsize}$(${\tt re}$)^2)$ \\

886: {\tt TC(re)} & $O(n^3)$ \\

887: {\tt TCFAST(re)} & $O(n^3 \log n)$ \\

888: {\tt FALSE(x1, x2, ...)} & $O(1)$ \\

889: {\tt TRUE(x1, x2, ...)} & $O(1)$ \\

890: {\tt @s(x)} & $O(n \log n)$ \\

891: {\tt \symbol{126}(x1, x2)}

892:     & $O(n \log n)$ ({\tt \symbol{126}} can be

893:       {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >})\\

894: {\tt \symbol{126}(ne1, ne2)}

895:     & $O(1)$ ({\tt \symbol{126}} can be

896:       {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >})\\

897: {\tt re1 = re2}, {\tt ~re1 != re2} & $O(1)$ \\

898: {\tt re1 < re2}, {\tt ~re1 <= re2}

899:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\

900: {\tt re1 > re2}, {\tt ~re1 >= re2}

901:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\[2ex]

902: \end{tabular}

903:

904: \caption{Worst case time complexity of the evaluation of relational

905: expressions. {\tt re}, {\tt re1}, {\tt re2} are relational expressions, {\tt

906: x}, {\tt x1}, {\tt x2} are attributes, and {\tt ne1}, {\tt ne2} are numerical

907: expressions. {\em bddsize}$(${\tt re}$)$ is the number of BDD nodes of the

908: result of the expression {\tt re}, and $n$~is the cardinality of the universe.}

909: \label{t:complexity}

910: \end{table}

911:

912:

913:

914:

915: \subsection{Extending the Universe} \label{s:universe}

916:

917: The set of all strings that may be tuple elements of relations in an RML

918: program is called the {\em universe}.  The universe contains all tuple elements

919: of the input relations (from the input RSF data), and all string literals that

920: appear on the left hand side of assignments in the RML program.  The universe

921: is immutable in the sense that it can be determined before the interpretation

922: of the RML program starts, and is not changed during the interpretation of the

923: RML program.

924:

925: Sometimes the immutability of the universe is inconvenient for the developer of

926: RML programs. Consider, for example, a program that takes as input the nodes

927: and arcs of a graph, and computes the binary relation {\tt OutneighborCnt}

928: which contains for each node the number of outneighbors:

929: \begin{verbatim}

930: FOR n IN Node(x) {

931:     OutneighborCnt(n, #(Arc(n,x))) := TRUE();

932: }

933: \end{verbatim}

934: This is not a syntactically correct RML program, because {\tt \#(Arc(n,x))} is

935: not a string, but a number. However, RML has an operator {\tt STRING} that

936: converts a number into a string.  But

937: \begin{verbatim}

938: OutneighborCnt(n, STRING( #(Arc(n,x)) )) := TRUE();

939: \end{verbatim}

940: is still not syntactically correct, because such a conversion is not allowed at

941: the left hand side of assignment statements. The reason is that the string that

942: results from such a conversion is generally not known before the execution of

943: the RML program, can therefore not be added to the universe before the

944: execution, and is thus not allowed as tuple element of a relation.

945:

946: The immutability of the universe during the execution of an RML program is

947: necessary because constant relations like {\tt TRUE(x)} (the universe) and {\tt

948: =(x,y)} (string equality for all strings in the universe) are only defined for

949: a given universe.  Also, the complement of a relation depends on the universe:

950: The complement of a set contains all strings of the universe that are not in

951: the given set, and thus clearly changes when the universe changes.

952:

953: However, there is a way to work around this limitation: Writing an RSF file,

954: and restarting CrocoPat with this RSF file as input, which adds all tuple

955: elements in the RSF file to the universe.  For example, the above incorrect

956: program can be replaced by the following correct program:

957: \begin{verbatim}

958: FOR n IN Node(x) {

959:     PRINT "OutneighborCnt ", n, " ", #(Arc(n,x)) TO "OutneighborCnt.rsf";

960: }

961: \end{verbatim}

962: When CrocoPat is restarted with the resulting RSF file {\tt OutneighborCnt.rsf}

963: as input, the binary relation {\tt OutneighborCnt} is available for further

964: processing.

965:

966:

967: \section{CrocoPat Reference} \label{s:pat}

968:

969: CrocoPat is executed with

970: \begin{verbatim}

971: crocopat [OPTION]... FILE [ARGUMENT]...

972: \end{verbatim}

973: It first reads relations in RSF (see Section~\ref{s:rsf}) from stdin (unless

974: the option {\tt -e} is given) and then executes the RML program {\tt FILE} (see

975: Section~\ref{s:lan}).  The {\tt ARGUMENT}s are passed

976: to the RML program.  The {\tt OPTION}s are \\[1ex]

977: \begin{tabular}{l@{\hspace{3mm}}l}

978: {\tt -e}        &  Do not read RSF data from stdin. \\

979: {\tt -m NUMBER} &  Approximate memory for BDD package in MB. The default is~50. See Subsection~\ref{ss:memory}.\\

980: {\tt -q}        &  Suppress warnings. \\

981: {\tt -h}        &  Display help message and exit. \\

982: {\tt -v}        &  Print version information and exit. \\

983: \end{tabular}\\[1ex]

984: \pagebreak

985:

986: The output of the RML program can be written to files, stdout, or stderr, as

987: specified in the RML program.  Error messages and warnings of CrocoPat are always written

988: to stderr.

989:

990: The exit status of CrocoPat is~1 if it terminates abnormally and~0 otherwise.

991: CrocoPat always outputs an error message to stderr before it terminates with exit

992: status~1.

993:

994:

995: \section{RSF Reference} \label{s:rsf}

996:

997: Rigi Standard Format (RSF) is CrocoPat's input and output format for relations.

998: It is an extension of the format for binary relations defined

999: in~\cite[Section~4.7.1]{Wong:1998}.  For examples of its use see

1000: Subsection~\ref{s:output}.

1001:

1002: An RSF stream is a sequence of lines.  The order of the lines is arbitrary. The

1003: repeated occurrence of a line is permissable and has the same meaning as a

1004: single occurrence. The end of an RSF stream is indicated by the end of the file

1005: or by a line that starts with a dot ({\tt .}). Lines starting with a sharp

1006: ({\tt \#}) are comment lines.

1007:

1008: All lines that do not start with a dot or a sharp specify a tuple in a

1009: relation. They consist of the name of the relation followed by a sequence of

1010: (an arbitrary number of) tuple elements, separated by at least one whitespace

1011: character (i.e., space or horizontal tab).

1012:

1013: Relation names must be RML identifiers (see Subsection~\ref{ss:lexical}). Tuple

1014: elements are sequences of arbitrary characters except line breaks and

1015: whitespace characters.  A tuple element may be optionally enclosed by double

1016: quotes ({\tt "}), in which case it may also contain whitespace characters.

1017: Tuple elements that are enclosed by double quotes in the RSF input of an RML

1018: program are also enclosed by double quotes in its output.

1019:

1020:

1021:

1022: \section{RML Reference} \label{s:lan}

1023:

1024: Relation Manipulation Language (RML) is CrocoPat's programming language for

1025: manipulating relations. This section defines the lexical structure, the syntax,

1026: and the semantics of RML. Nonterminals are printed in italics and terminals in

1027: typewriter.

1028:

1029: \subsection{Lexical Structure} \label{ss:lexical}

1030:

1031: Identifiers are sequences of Latin letters ({\tt a-zA-z}), digits ({\tt 0-9})

1032: and underscores ({\tt \_}), the first of which must be a letter or underscore.

1033: RML has four types of identifiers: attributes ({\em attribute}), relational

1034: variables ({\em rel\_var}), string variables ({\em str\_var}), and numerical

1035: variables ({\em num\_var}).  Every identifier of an RML program belongs to

1036: exactly one of these types.  The type is determined at the first occurrence of

1037: the identifier in the input RSF file (only possible for relational variables)

1038: or in the RML program. Explicit declaration of identifiers is not necessary.

1039:

1040: The following strings are reserved as keywords and therefore cannot be

1041: used as identifiers:

1042: \begin{verbatim}

1043: AVG DIV ELSE ENDL EX EXEC EXIT FA FOR IF IN MAX MIN MOD NUMBER PRINT RELINFO

1044: STDERR STRING SUM TC TCFAST TO WHILE

1045: \end{verbatim}

1046:

1047: RML has two types of literals: string literals ({\em str\_literal}) and

1048: numerical literals ({\em num\_literal}).  String literals are delimited by

1049: double quotes ({\tt "}) and can contain arbitrary characters except double

1050: quotes.  A numerical literal consists of an integer part, a fractional part

1051: indicated by a decimal point ({\tt .}), and an exponent indicated by the letter

1052: {\tt e} or {\tt E} followed by an optionally signed integer.  All three parts

1053: are optional, but at least one digit in the integer part or the fractional part

1054: is required. Examples of numerical literals are {\tt 1}, {\tt .2}, {\tt 3.},

1055: {\tt 4.5}, and {\tt 6e-7}.

1056:

1057: There are two kinds of comments: Text starting with {\tt /*} and ending with

1058: {\tt */}, and text from {\tt //} to the end of the line. \pagebreak

1059:

1060: \subsection{Syntax and Informal Semantics}\label{s:syntax}

1061:

1062: \footnotetext[1]{{\em stmt} ... denotes a sequence of one or more {\em stmt}s.}

1063: \footnotetext[2]{Context conditions are marked with~$\to$.}

1064: \footnotetext[3]{{\tt \symbol{126}} can be

1065:        {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >}.}

1066:

1067: \begin{tabbing}

1068: \hspace*{52mm}\=\hspace{5mm}\= \kill

1069: \noindent{\bfseries \em program} ::=

1070:     \>\> {\bf RML program.}\\

1071: {\em stmt} ...\footnotemark[1]

1072:     \>\> Executes the {\em stmt}s in the given order.\\[1.5ex]

1073:

1074: \noindent{\bfseries \em stmt} ::=

1075:     \>\> {\bf Statement.}\\

1076: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt )} {\tt := }{\em rel\_expr}{\tt ;}

1077:     \>\> Assigns the result of {\em rel\_expr} to {\em rel\_var}.\\

1078:     \>$\to$\footnotemark[2]\> The {\em term}s must be {\em attribute}s or {\em str\_literal}s. \\

1079:     \>$\to$\> The set of {\em attribute}s among the {\em term}s on the left hand side \\

1080:     \>\> must equal the set of free attributes in {\em rel\_expr}. \\

1081: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt );}

1082:     \>\> Shortcut for {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt )} {\tt := TRUE(}{\em term}{\tt ,}...{\tt )}. \\

1083: {\em str\_var}{\tt ~:= }{\em str\_expr}{\tt ;}

1084:     \>\> Assigns the result of {\em str\_expr} to {\em str\_var}.\\

1085: {\em num\_var}{\tt ~:= }{\em num\_expr}{\tt ;}

1086:     \>\> Assigns the result of {\em num\_expr} to {\em num\_var}.\\

1087: {\tt IF}~{\em rel\_expr}~{\tt \{}{\em stmt} ...{\tt\}}~{\tt ELSE}~{\tt \{}{\em stmt} ...{\tt\}}

1088:     \>\> Executes the {\em stmt}s before {\tt ELSE} if the result of {\em rel\_expr} is {\tt TRUE()}, \\

1089: {\tt IF}~{\em rel\_expr}~{\tt \{}{\em stmt} ...{\tt\}}

1090:     \>\> and the {\em stmt}s after {\tt ELSE} (if present) otherwise.\\

1091:     \>$\to$\> {\em rel\_expr} must not have free attributes.\\

1092: {\tt WHILE }{\em rel\_expr}{\tt~\{}{\em stmt} ...{\tt\}}

1093:     \>\> Exec. the {\em stmt}s repeatedly as long as {\em rel\_expr} evaluates to {\tt TRUE()}. \\

1094:     \>$\to$\> {\em rel\_expr} must not have free attributes.\\

1095: {\tt FOR~}{\em str\_var}{\tt~IN~}{\em rel\_expr}{\tt~\{}{\em stmt} ...{\tt\}}

1096:     \>\> Executes the {\em stmt}s once for each element in the result of {\em rel\_expr}. \\

1097:     \>$\to$\> {\em rel\_expr} must have exactly one free attribute.\\

1098: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt ;}

1099:     \>\> Writes the results of the {\em print\_expr}s to stdout. \\

1100: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt~TO STDERR;}

1101:     \>\> Writes the results of the {\em print\_expr}s to stderr. \\

1102: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt~TO }{\em str\_expr}{\tt ;}

1103:     \>\> Appends the results of the {\em print\_expr}s to the specified file.\\

1104: {\tt EXEC }{\em str\_expr}{\tt ;}

1105:     \>\> Executes the shell command given by {\em str\_expr}. \\

1106:     \>\> The exit status is available as numerical constant {\tt exitStatus}.\\

1107: {\tt EXIT }{\em num\_expr}{\tt ;}

1108:     \>\> Exits CrocoPat with the given exit status. \\

1109: {\tt \{ }{\em stmt} ...{\tt~\}}

1110:     \>\> Executes the {\em stmt}s in the given order. \\[1.5ex]

1111:

1112:

1113: \noindent{\bfseries \em rel\_expr} ::=

1114:     \>\> {\bf Relational Expression. The result is a relation.} \\*

1115: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt)}

1116:     \>\> Atomic relational expression. \\*

1117: {\em term~~rel\_var~~term}

1118:     \>\> Same as {\em rel\_var}{\tt (}{\em term}{\tt ,} {\em term}{\tt)}. \\*

1119: {\tt ! }{\em rel\_expr}

1120:     \>\> Negation (not). \\*

1121: {\em rel\_expr}{\tt~\& }{\em rel\_expr}

1122:     \>\> Conjunction (and). \\*

1123: {\em rel\_expr}{\tt~| }{\em rel\_expr}

1124:     \>\> Disjunction (or). \\*

1125: {\em rel\_expr}{\tt~-> }{\em rel\_expr}

1126:     \>\> Implication (if). {\tt r1 -> r2 }is equivalent to{\tt~!(r1) | (r2)}.\\*

1127: {\em rel\_expr}{\tt~<-> }{\em rel\_expr}

1128:     \>\> Equivalence (if and only if). \\*

1129:     \>\> {\tt r1 <-> r2 }is equivalent to{\tt~(r1 -> r2) \& (r2 -> r1)}.\\*

1130: {\tt EX(}{\em attribute}{\tt ,}...{\tt , }{\em rel\_expr}{\tt )}

1131:     \>\> Existential quantification of the {\em attribute}s. \\*

1132: {\tt FA(}{\em attribute}{\tt ,}...{\tt , }{\em rel\_expr}{\tt )}

1133:     \>\> Universal quantification of the {\em attribute}s. \\*

1134: {\tt TC(}{\em rel\_expr}{\tt )}

1135:     \>\> Transitive closure. \\*

1136:     \>$\to$\> {\em rel\_expr} must have exactly two free attributes. \\*

1137: {\tt TCFAST(}{\em rel\_expr}{\tt )}

1138:     \>\> Same as {\tt TC}, but with an alternative algorithm (see Section~\ref{s:control}). \\*

1139: {\tt FALSE(}{\em term}{\tt ,}...{\tt )}

1140:     \>\> Empty relation. \\*

1141: {\tt TRUE(}{\em term}{\tt ,}...{\tt )}

1142:     \>\> Relation containing all tuples of strings in the universe. \\*

1143: {\tt @}{\em str\_expr}{\tt (}{\em term}{\tt )}

1144:     \>\> Strings in the universe that match the regular expression {\em str\_expr}. \\*

1145: {\tt \symbol{126}(}{\em term}{\tt , }{\em term}{\tt)}

1146:     \>\> Lexicographical order of all strings in the universe.\footnotemark[3] \\*

1147: {\em term}{\tt~\symbol{126} }{\em term}

1148:     \>\> Same as {\tt \symbol{126}(}{\em term}{\tt , }{\em term}{\tt)}.\\*

1149: {\tt \symbol{126}(}{\em num\_expr}{\tt , }{\em num\_expr}{\tt)}

1150:     \>\> Numerical comparison. The result is either {\tt TRUE()} or {\tt FALSE()}.\footnotemark[3] \\*

1151: {\em num\_expr}{\tt~\symbol{126} }{\em num\_expr}

1152:     \>\> Same as {\tt \symbol{126}(}{\em num\_expr}{\tt , }{\em num\_expr}{\tt)}.\\*

1153: {\em rel\_expr}{\tt~\symbol{126} }{\em rel\_expr}

1154:     \>\> Relational comparison. The result is either {\tt TRUE()} or {\tt FALSE()}.\footnotemark[3] \\*

1155: {\tt (}{\em rel\_expr}{\tt )}

1156: \end{tabbing}

1157:

1158: \begin{tabbing}

1159: \hspace*{45mm}\=\hspace{5mm}\= \kill

1160: \noindent{\bfseries \em term} ::=

1161:     \>\> {\bf Term.} \\

1162: {\em attribute}

1163:     \>\> Attribute. \\

1164: {\tt \_}

1165:     \>\> Anonymous attribute. E.g. {\tt R(\_)} is equivalent to {\tt EX(x, R(x))}.\\

1166: {\em str\_expr}

1167:     \>\> String expression. \\[2ex]

1168:

1169:

1170: \noindent{\bfseries \em str\_expr} ::=

1171:     \>\> {\bf String Expression. The result is a string.} \\

1172: {\em str\_literal}

1173:     \>\> String literal.\\

1174: {\em str\_var}

1175:     \>\> String variable.\\

1176: {\tt STRING(}{\em num\_expr}{\tt )}

1177:     \>\> Converts the result of {\em num\_expr} into a string. \\

1178: {\tt \$ }{\em num\_expr}

1179:     \>\> Command line argument. The first argument has the number~$1$. \\

1180:     \>\> The constant {\tt argCount} contains the number of arguments. \\

1181: {\em str\_expr}{\tt~+ }{\em str\_expr}

1182:     \>\> Concatenation. \\

1183: {\tt (}{\em str\_expr}{\tt )}\\[2ex]

1184:

1185:

1186: \noindent{\bfseries \em num\_expr} ::=

1187:     \>\> {\bf Numerical Expression. The result is a floating point number.} \\

1188: {\em num\_literal}

1189:     \>\> Numerical literal.\\

1190: {\em num\_var}

1191:     \>\> Numerical variable.\\

1192: {\tt NUMBER(}{\em str\_expr}{\tt )}

1193:     \>\> Converts the result of {\em str\_expr} into a number. Yields~$0.0$ if \\

1194:     \>\> the result of {\em str\_expr} is not the string representation of a number.\\

1195: {\tt \#(}{\em rel\_expr}{\tt )}

1196:     \>\> Cardinality (number of elements) of the result of {\em rel\_expr}. \\

1197: {\tt MIN(}{\em rel\_expr}{\tt )}, {\tt MAX(}{\em rel\_expr}{\tt )}, {\tt SUM(}{\em rel\_expr}{\tt )}, {\tt AVG(}{\em rel\_expr}{\tt )} \\

1198:     \>\> Minimum, maximum, sum, and arithmetic mean of {\tt NUMBER(s)} \\

1199:     \>\> over all strings {\tt s} in the result of {\em rel\_expr}. \\

1200:     \>$\to$\> {\em rel\_expr} must have one free attribute, its result must be non-empty. \\

1201: {\em num\_expr}{\tt~+ }{\em num\_expr}

1202:     \>\> Addition. \\

1203: {\em num\_expr}{\tt~- }{\em num\_expr}

1204:     \>\> Subtraction. \\

1205: {\em num\_expr}{\tt~* }{\em num\_expr}

1206:     \>\> Multiplication. \\

1207: {\em num\_expr}{\tt~/ }{\em num\_expr}

1208:     \>\> Real division. \\

1209: {\em num\_expr}{\tt~DIV }{\em num\_expr}

1210:     \>\> Integer division (truncating). \\

1211: {\em num\_expr}{\tt~MOD }{\em num\_expr}

1212:     \>\> Modulo. \\

1213: {\em num\_expr}{\tt~\symbol{94} }{\em num\_expr}

1214:     \>\> The first {\em num\_expr} raised to the power given by the second {\em num\_expr}. \\

1215: {\tt argCount}

1216:     \>\> Number of command line arguments. \\

1217: {\tt exitStatus}

1218:     \>\> Exit status of the last executed {\tt EXEC} statement. \\

1219: {\tt (}{\em num\_expr}{\tt )}\\[2ex]

1220:

1221:

1222: \noindent{\bfseries \em print\_expr} ::=

1223:     \>\> {\bf Print Expression.} \\

1224: {\em rel\_expr}

1225:     \>\> Prints the tuples in the result of {\em rel\_expr}, one tuple per line.\\

1226: {\tt [}{\em str\_expr}{\tt ] }{\em rel\_expr}

1227:     \>\> Prints the result of {\em str\_expr} before each tuple of {\em rel\_expr}. \\

1228: {\em str\_expr}

1229:     \>\> Prints the result of {\em str\_expr}.\\

1230: {\em num\_expr}

1231:     \>\> Prints the result of {\em num\_expr}.\\

1232: {\tt ENDL}

1233:     \>\> Prints a line break. \\

1234: {\tt RELINFO(}{\em rel\_expr}{\tt )}

1235:     \>\> Prints information about the BDD representation of {\em rel\_expr}.

1236: \end{tabbing}

1237:

1238:

1239: \noindent{\bf Levels of Precedence (from Low to High)}\\

1240: {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >} \\

1241: {\tt ->}, {\tt <->}\\

1242: {\tt |}\\

1243: {\tt \&}\\

1244: {\tt !}\\

1245: {\tt +}, {\tt -} (binary) \\

1246: {\tt *}, {\tt /}, {\tt DIV}, {\tt MOD} \\

1247: {\tt \symbol{94}}\\

1248: {\tt -} (unary)\\

1249: {\tt \$}

1250:

1251:

1252: \subsubsection{Free Attributes in Relational Expressions.}

1253: Several context conditions of RML refer to the free attributes in relational

1254: expressions.  The number of free attributes in a relational expression equals

1255: the arity of the resulting relation.  Informally, the set of free attributes of

1256: an expression is the set of its contained attributes that are not in the scope

1257: of a quantifier (i.e., {\tt EX} or {\tt FA}).  Exceptions are the numerical and

1258: relational comparison, which have Boolean results and therefore no free

1259: attributes. Formally, the function~$f$ that assigns to each relational

1260: expression the set of its free attributes is inductively defined as

1261: follows:\\[1ex]

1262: \begin{tabular}{l@{\hspace{2mm}}l@{\hspace{2mm}}l}

1263: $f($~{\em rel\_var}{\tt (}{\em term}$_1${\tt ,} {\em term}$_2${\tt ,}...{\tt)}$~)$

1264:     & $:=$ & $\{$ {\em term}$_i~|~${\em term}$_i$ is an attribute $\}$ \\

1265: $f($~{\tt !}{\em rel\_expr}~$)$

1266:     & $:=$ & $f($~{\em rel\_expr}~$)$\\

1267: $f($~{\em rel\_expr}$_1${\tt~\& }{\em rel\_expr}$_2$~$)$

1268:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\

1269: $f($~{\em rel\_expr}$_1${\tt~| }{\em rel\_expr}$_2$~$)$

1270:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\

1271: $f($~{\em rel\_expr}$_1${\tt~-> }{\em rel\_expr}$_2$~$)$

1272:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\

1273: $f($~{\em rel\_expr}$_1${\tt~<-> }{\em rel\_expr}$_2$~$)$

1274:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\

1275: $f($~{\tt EX(}{\em attribute}{\tt , }{\em rel\_expr}{\tt )}~$)$

1276:     & $:=$ & $f($ {\em rel\_expr} $) ~\setminus~ \{$ {\em attribute} $\}$\\

1277: $f($~{\tt FA(}{\em attribute}{\tt , }{\em rel\_expr}{\tt )}~$)$

1278:     & $:=$ & $f($ {\em rel\_expr} $) ~\setminus~ \{$ {\em attribute} $\}$\\

1279: $f($~{\tt TC(}{\em rel\_expr}{\tt )}~$)$

1280:     & $:=$ & $f($~{\em rel\_expr}~$)$\\

1281: $f($~{\tt TCFAST(}{\em rel\_expr}{\tt )}~$)$

1282:     & $:=$ & $f($~{\em rel\_expr}~$)$\\

1283: $f($~{\tt (}{\em num\_expr$_1$}{\tt~\symbol{126} }{\em num\_expr$_2$}{\tt )}~$)$

1284:     & $:=$ & $\emptyset$ \\

1285: $f($~{\tt (}{\em rel\_expr$_1$}{\tt~\symbol{126} }{\em rel\_expr$_2$}{\tt )}~$)$

1286:     & $:=$ & $\emptyset$ \\

1287: $f($~{\tt (}{\em rel\_expr}{\tt )} $)$

1288:     & $:=$ & $f($~{\em rel\_expr}~$)$

1289: \end{tabular} \\[1ex]

1290: As before, {\tt\symbol{126}} can be {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt

1291: >=}, or {\tt >}. The relational constants {\tt\symbol{126}}, {\tt FALSE},

1292: {\tt TRUE}, and {\tt @}{\em str\_expr} are equivalent to {\em rel\_var} with

1293: respect to the definition of free attributes.

1294:

1295:

1296: \subsubsection{Regular Expressions.} In the relational expression

1297: {\tt @}{\em str\_expr}{\tt (}{\em term}{\tt )}, the result of {\em str\_expr}

1298: can be any POSIX extended regular expression~\cite[Section~9.4]{IEEE:2001}.  A

1299: full description is beyond the scope, we only give a short overview.

1300:

1301: Most characters in a regular expression only match themselves.  The following

1302: special characters match themselves only when they are preceded by a

1303: backslash~({\tt \symbol{92}}), and otherwise have special meanings:\\[1ex]

1304: \begin{tabular}{ll}

1305: {\tt .}  & Matches any single character.\\

1306: $\mathtt{[~]}$ & Matches any single character contained within the brackets.\\

1307: $\mathtt{[}${\tt \symbol{94}}~$\mathtt{]}$ & Matches any single character not contained within the

1308: brackets.\\

1309: {\tt \symbol{94}} & Matches the start of the string.\\

1310: {\tt \$} & Matches the end of the string.\\

1311: {\tt \{$x$,$y$\}}~~ & Matches the last character (or regular expression enclosed by

1312: parentheses) \\

1313:                     & at least $x$ and at most $y$ times.\\

1314: {\tt +}  & Matches the last character (or regular expression enclosed by

1315: parentheses) one or more times.\\

1316: {\tt *}  & Matches the last character (or regular expression enclosed by

1317: parentheses) zero or more times.\\

1318: {\tt ?}  & Matches the last character (or regular expression enclosed by

1319: parentheses) zero or one times.\\

1320: {\tt |}  & Matches either the expression before or the expression after the

1321: operator.

1322: \end{tabular}\\[1ex]

1323: Regular expressions can be grouped by enclosing them with

1324: parentheses~({\tt(}...{\tt)}).

1325:

1326:

1327:

1328:

1329: \subsection{Formal Semantics}\label{s:semantics}

1330:

1331: This subsection formally defines the semantics of the part of RML that deals

1332: with relations, namely of relational expressions and the relational assignment

1333: statement.

1334:

1335: The {\em universe} $\cc$ is the finite set of all string literals that appear

1336: in the input RSF file, or on the left side of a relational assignment. The

1337: finite set of {\em attributes} of the RML program is denoted by $\cx$ ($\cc

1338: \cap \cx = \emptyset$). An {\em attribute assignment} is a total function $v:

1339: \cx \cup \cc \to \cc$ which maps each attribute to its value and (for

1340: notational convenience) each string literal to itself.  The set of all

1341: attribute assignments is denoted by $\mathit{Val(\cx)}$.

1342:

1343: The finite set of {\em relation variables} in the RML program is denoted by

1344: $\crel$.  A {\em relation assignment} is a total function $s: \crel \to

1345: \bigcup_{n \in \mathbb{N}} 2^{\prod_{k=1}^n \cc}$, which maps each relation

1346: variable to a relation of arbitrary arity. The set of all relation assignments

1347: is denoted by~$Rel(\crel)$.

1348:

1349: The semantics of relational expressions and statements are given

1350: by the following interpretation functions:

1351: \[

1352: \begin{array}{lcll}

1353: \bbbkl.\bbbkr_e &:& \mbox{\em rel\_expr}s \to

1354:                   ( Rel(\crel) \to 2^{\mathit{Val}(\cx)} ) \\[1ex]

1355: \bbbkl.\bbbkr_s &:& \mbox{\em stmt}s \to

1356:                   ( Rel(\crel) \to Rel(\crel) ) \\

1357: \end{array}

1358: \]

1359: So we define the semantics of an expression as the set of attribute assignments

1360: that satisfy the expression, and the semantics of a statement as a

1361: transformation of the relation assignment. The interpretation functions are

1362: defined inductively in Figure~\ref{fig:sem}.

1363:

1364: \begin{figure*}%[htb]

1365: \[

1366: \hspace{-5mm}  \begin{array}{rcll}

1367:     \bbbkl \mbox{{\em rel\_var}{\tt (}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&

1368:       \big\{ v \in Val(\cx) ~\big|~

1369:         \big( v(term_1), \dots, v(term_n) \big) \in s(\mbox{\em rel\_var})

1370:       \big\}\\[1ex]

1371:

1372:     \bbbkl \mbox{{\tt !}~{\em rel\_expr}} \bbbkr_e (s) &=&

1373:       \mathit{Val}(\cx) \setminus \bbbkl \mbox{{\em rel\_expr}} \bbbkr_e (s) \\[1ex]

1374:

1375:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt \&}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&

1376:       \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \cap

1377:       \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\[1ex]

1378:

1379:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt |}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&

1380:       \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \cup

1381:       \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\[1ex]

1382:

1383:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt ->}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&

1384:       \bbbkl \mbox{{\tt !}~{\tt (}{\em rel\_expr$_1$}{\tt )}~{\tt |\,(}{\em rel\_expr$_2$}{\tt )}} \bbbkr_e (s) \\[1ex]

1385:

1386:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt <->}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&

1387:       \bbbkl \mbox{{\tt (}{\em rel\_expr$_1$}~{\tt ->}~{\em rel\_expr$_2$}{\tt )}~{\tt \&\,}{\tt (}{\em rel\_expr$_2$}~{\tt ->}~{\em rel\_expr$_1$}{\tt )}} \bbbkr_e (s) \\[1ex]

1388:

1389:     \bbbkl \mbox{{\tt EX(}{\em attribute}{\tt ,}~{\em rel\_expr}{\tt )}} \bbbkr_e (s) &=&

1390:       \big\{ v \in Val(\cx) ~\big|~ \exists v' \in \bbbkl \mbox{{\em rel\_expr}} \bbbkr_e(s)

1391:                                    ~\forall x \in \cx \setminus \{\mbox{{\em attribute}}\}: v(x) = v'(x) \big\}\\[1ex]

1392:

1393:     \bbbkl \mbox{{\tt FA(}{\em attribute}{\tt ,}~{\em rel\_expr}{\tt )}} \bbbkr_e (s) &=&

1394:       \bbbkl \mbox{{\tt !}~{\tt EX(}{\em attribute}{\tt ,}~{\tt !}~{\tt (}{\em rel\_expr}{\tt ))}} \bbbkr_e (s) \\[1ex]

1395:

1396:     \bbbkl \mbox{{\tt TC(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s)

1397:                 &\stackrel{\mbox{\em \footnotesize lfp}}{=}&

1398:       \bbbkl \mbox{~~~~{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt )}} \\

1399:          &&\mbox{~~{\tt | EX(}{\em attribute}$_3${\tt ,}~~~

1400:            {\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_3${\tt )}}\\

1401:          &&\mbox{\hspace{27mm}{\tt \& TC(}{\em rel\_var}{\tt (}{\em attribute}$_3${\tt ,}~{\em attribute}$_2${\tt )))}} \bbbkr_e (s) \\[1ex]

1402:

1403:     \bbbkl \mbox{{\tt TCFAST(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s) &=&

1404:       \bbbkl \mbox{{\tt TC(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s) \\[3ex]

1405:

1406:     \bbbkl \mbox{{\tt TRUE(}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&

1407:       \mathit{Val}(\cx)\\[1ex]

1408:

1409:     \bbbkl \mbox{{\tt FALSE(}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&

1410:       \emptyset \\[1ex]

1411:

1412:     \bbbkl \mbox{{\tt =(}{\em term}$_1${\tt ,}~{\em term}$_2${\tt )}} \bbbkr_e (s) &=&

1413:       \big\{ v \in Val(\cx) ~|~ v(term_1) = v(term_2) \big\}\\[1ex]

1414:

1415:     \bbbkl \mbox{{\tt <(}{\em term}$_1${\tt ,}~{\em term}$_2${\tt )}} \bbbkr_e (s) &=&

1416:       \big\{ v \in Val(\cx) ~|~ v(term_1) <_{\mathrm{lexicographically}} v(term_2) \big\}\\[3ex]

1417:

1418:     \bbbkl \mbox{{\tt =(}{\em rel\_expr}$_1${\tt ,}~{\em rel\_expr}$_2${\tt )}} \bbbkr_e (s) &=&

1419:       \mathit{Val}(\cx) \mbox{,~~~if }

1420:         \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) = \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\

1421:       && \emptyset \mbox{,~~~~~~~~~~~otherwise}\\[1ex]

1422:

1423:     \bbbkl \mbox{{\tt <(}{\em rel\_expr}$_1${\tt ,}~{\em rel\_expr}$_2${\tt )}} \bbbkr_e (s) &=&

1424:       \mathit{Val}(\cx) \mbox{,~~~if }

1425:         \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \subsetneq \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\

1426:       && \emptyset \mbox{,~~~~~~~~~~~otherwise}\\[1ex]

1427:

1428:     \bbbkl \mbox{{\tt (}{\em rel\_expr}{\tt )}} \bbbkr_e(s) &=&

1429:       \bbbkl \mbox{\em rel\_expr} \bbbkr_e(s)\\[3ex]

1430:

1431:     \big( \bbbkl \mbox{{\em rel\_var}{\tt (}{\em term}$_1${\tt ,}$\dots${\tt ,}{\em term}$_n${\tt )}{\tt :=}~{\em rel\_expr}} \bbbkr_s (s)\big) (r) &=&

1432:       s(r), \hspace{64mm}\mbox{if } r \not= \mbox{\em rel\_var}\\[1.5ex]

1433:       && \quad \big\{ \big( v(term_1), \dots, v(term_n) \big)~\big|~v \in \bbbkl \mbox{\em rel\_expr} \bbbkr_e(s) \big\} \\[1ex]

1434:       && \cup~\big\{ t \in s(\mbox{\em r})~|~ \exists i : term_i \in \cc \wedge term_i \not= t_i \big\},

1435:          \qquad\quad \mbox{if } r = \mbox{\em rel\_var}

1436:   \end{array}

1437: \]

1438: with $\mbox{\em attribute, attribute$_1$, attribute$_2$, attribute$_3$} \in \cx

1439: \mbox{, \em term$_1$, ..., term$_n$} \in \cx \cup \cc \mbox{, and \em rel\_var}

1440: \in \crel$.  The symbol $\stackrel{\mbox{\em \footnotesize lfp}}{=}$ denotes

1441: the least fixed point.

1442:

1443: \caption{RML semantics} \label{fig:sem}

1444: \end{figure*}

1445:

1446:

1447:

1448:

1449:

1450: \newcommand{\etalchar}[1]{$^{#1}$}

1451: \begin{thebibliography}{NSW{\etalchar{+}}02}

1452:

1453: \bibitem[AFC98]{AntoniolEtAl:1998}

1454: G.~Antoniol, R.~Fiutem, and L.~Cristoforetti.

1455: \newblock Design pattern recovery in object-oriented software.

1456: \newblock In {\em Proceedings of the 6th IEEE International Workshop on Program

1457:   Comprehension (IWPC 1998)}, pages 153--160. IEEE Computer Society, 1998.

1458:

1459: \bibitem[BBS97]{BehnkeEtAl:1997}

1460: Ralf Behnke, Rudolf Berghammer, and Peter Schneider.

1461: \newblock Machine support of relational computations: The {Kiel RELVIEW}

1462:   system.

1463: \newblock Technical Report 9711, Institut f\"ur Informatik und Praktische

1464:   Mathematik, Christian-Albrechts-Universit\"at Kiel, 1997.

1465:

1466: \bibitem[Bla04]{Blaha:2004}

1467: Michael Blaha.

1468: \newblock A copper bullet for software quality improvement.

1469: \newblock {\em Computer}, 37(2):21--25, 2004.

1470:

1471: \bibitem[BLM02]{BerghammerEtAl:2002}

1472: Rudolf Berghammer, Barbara Leoniuk, and Ulf Milanese.

1473: \newblock Implementation of relational algebra using binary decision diagrams.

1474: \newblock In H.~de~Swart, editor, {\em Proceedings of the 6th International

1475:   Conference on Relational Methods in Computer Science (RelMICS 2001)}, LNCS

1476:   2561, pages 241--257, Berlin, 2002. Springer-Verlag.

1477:

1478: \bibitem[BLQ{\etalchar{+}}03]{BerndlEtAl:2003}

1479: Marc Berndl, Ondrej Lhot{\'a}k, Feng Qian, Laurie~J. Hendren, and Navindra

1480:   Umanee.

1481: \newblock Points-to analysis using {BDDs}.

1482: \newblock In {\em Proceedings of the ACM SIGPLAN Conference on Programming

1483:   Language Design and Implementation (PLDI 2003)}, pages 103--114. ACM, 2003.

1484:

1485: \bibitem[BNL03]{BeyerEtAl:2003}

1486: Dirk Beyer, Andreas Noack, and Claus Lewerentz.

1487: \newblock Simple and efficient relational querying of software structures.

1488: \newblock In {\em Proceedings of the 10th Working Conference on Reverse

1489:   Engineering (WCRE 2003)}, pages 216--225. IEEE Computer Society, 2003.

1490:

1491: \bibitem[Bry86]{Bryant:1986}

1492: Randal~E. Bryant.

1493: \newblock Graph-based algorithms for boolean function manipulation.

1494: \newblock {\em IEEE Transaction on Computers}, C-35(8):677--691, 1986.

1495:

1496: \bibitem[Bry92]{Bryant:1992}

1497: Randal~E. Bryant.

1498: \newblock Symbolic boolean manipulation with ordered binary decision diagrams.

1499: \newblock {\em ACM Computing Surveys}, 24(3):293--318, 1992.

1500:

1501: \bibitem[CGK98]{ChenEtAl:1998}

1502: Yih-Farn Chen, Emden~R. Gansner, and Elftherios Koutsofios.

1503: \newblock A {C++} data model supporting reachability analysis and dead code

1504:   detection.

1505: \newblock {\em IEEE Transactions on Software Engineering}, 24(9):682--694,

1506:   1998.

1507:

1508: \bibitem[Ciu99]{Ciupke:1999}

1509: Oliver Ciupke.

1510: \newblock Automatic detection of design problems in object-oriented

1511:   reengineering.

1512: \newblock In {\em Proceedings of Technology of Object-Oriented Languages and

1513:   Systems (TOOLS 1999)}, pages 18--32. IEEE Computer Society, 1999.

1514:

1515: \bibitem[CM03]{ClocksinMellish:2003}

1516: William~F. Clocksin and Cristopher~S. Mellish.

1517: \newblock {\em Programming in {Prolog}}.

1518: \newblock Springer-Verlag, Berlin, 5th edition, 2003.

1519:

1520: \bibitem[FH00]{FahmyHolt:2000}

1521: Hoda Fahmy and Richard~C. Holt.

1522: \newblock Software architecture transformations.

1523: \newblock In {\em Proceedings of the International Conference on Software

1524:   Maintenance (ICSM 2000)}, pages 88--96. IEEE Computer Society, 2000.

1525:

1526: \bibitem[FKvO98]{FeijsEtAl:1998}

1527: Loe M.~G. Feijs, Ren\'e~L. Krikhaar, and Rob~C. van Ommering.

1528: \newblock A relational approach to support software architecture analysis.

1529: \newblock {\em Software: Practice \& Experience}, 28(4):371--400, 1998.

1530:

1531: \bibitem[GHJV95]{GammaEtAl:1995}

1532: Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.

1533: \newblock {\em Design Patterns}.

1534: \newblock Addison-Wesley, Reading, MA, 1995.

1535:

1536: \bibitem[Har91]{Hartman:1991}

1537: John Hartman.

1538: \newblock Understanding natural programs using proper decomposition.

1539: \newblock In {\em Proceedings of the 13th International Conference on Software

1540:   Engineering (ICSE 1991)}, pages 62--73, 1991.

1541:

1542: \bibitem[HN90]{HarandiNing:1990}

1543: Mehdi~T. Harandi and Jim~Q. Ning.

1544: \newblock Knowledge-based program analysis.

1545: \newblock {\em IEEE Software}, 7(1):74--81, 1990.

1546:

1547: \bibitem[Hol96]{Holt:1996}

1548: Richard~C. Holt.

1549: \newblock Binary relational algebra applied to software architecture.

1550: \newblock Technical Report CSRI 345, University of Toronto, 1996.

1551:

1552: \bibitem[Hol98]{Holt:1998}

1553: Richard~C. Holt.

1554: \newblock Structural manipulations of software architecture using {Tarski}

1555:   relational algebra.

1556: \newblock In {\em Proceedings of the 5th Working Conference on Reverse

1557:   Engineering (WCRE 1998)}, pages 210--219. IEEE Computer Society, 1998.

1558:

1559: \bibitem[IEE01]{IEEE:2001}

1560: IEEE.

1561: \newblock {\em IEEE Standard for Information Technology -- Portable Operating

1562:   System Interface (POSIX) (IEEE Std 1003.1-2001)}, 2001.

1563:

1564: \bibitem[KB98]{KazmanBurth:1998}

1565: Rick Kazman and Marcus Burth.

1566: \newblock Assessing architectural complexity.

1567: \newblock In {\em Proceedings of the 2nd Euromicro Conference on Software

1568:   Maintenance and Reengineering (CSMR 1998)}, pages 104--112, 1998.

1569:

1570: \bibitem[KP96]{KraemerPrechelt:1996}

1571: Christian Kr\"amer and Lutz Prechelt.

1572: \newblock Design recovery by automated search for structural design patterns in

1573:   object-oriented software.

1574: \newblock In {\em Proceedings of the 3rd Working Conference on Reverse

1575:   Engineering (WCRE 1996)}, pages 208--215. IEEE Computer Society, 1996.

1576:

1577: \bibitem[Kri01]{Krinke:2001}

1578: Jens Krinke.

1579: \newblock Identifying similar code with program dependence graphs.

1580: \newblock In {\em Proceedings of the 8th Working Conference on Reverse

1581:   Engineering (WCRE 2001)}, pages 301--309. IEEE Computer Society, 2001.

1582:

1583: \bibitem[KSRP99]{KellerEtAl:1999}

1584: Rudolf~K. Keller, Reinhard Schauer, S{\'e}bastien Robitaille, and Patrick

1585:   Pag{\'e}.

1586: \newblock Pattern-based reverse-engineering of design components.

1587: \newblock In {\em Proceedings of the 21st International Conference on Software

1588:   Engineering (ICSE 1999)}, pages 226--235. ACM, 1999.

1589:

1590: \bibitem[KW99]{KullbachWinter:1999}

1591: Bernt Kullbach and Andreas Winter.

1592: \newblock Querying as an enabling technology in software reengineering.

1593: \newblock In {\em Proceedings of the 3rd European Conference on Software

1594:   Maintenance and Reengineering (CSMR 1999)}, pages 42--50, 1999.

1595:

1596: \bibitem[Mar97]{Martin:1997}

1597: Robert~C. Martin.

1598: \newblock Engineering notebook: Stability.

1599: \newblock {\em C++ Report}, February 1997.

1600:

1601: \bibitem[MNS01]{MurphyEtAl:2001}

1602: Gail~C. Murphy, David Notkin, and Kevin~J. Sullivan.

1603: \newblock Software reflexion models: Bridging the gap between design and

1604:   implementation.

1605: \newblock {\em IEEE Transactions on Software Engineering}, 27(4):364--380,

1606:   2001.

1607:

1608: \bibitem[MS95]{MendelzonSametinger:1995}

1609: Alberto~O. Mendelzon and Johannes Sametinger.

1610: \newblock Reverse engineering by visualizing and querying.

1611: \newblock {\em Software -- Concepts \& Tools}, 16(4):170--182, 1995.

1612:

1613: \bibitem[MWD99]{MensWuyts:1999}

1614: Kim Mens, Roel Wuyts, and Theo D'Hondt.

1615: \newblock Declarative codifying software architectures using virtual software

1616:   classifications.

1617: \newblock In {\em Proceedings of the 29th International Conference on

1618:   Technology of Object-Oriented Languages and Systems - Europe (TOOLS 1999)},

1619:   pages 33--45. IEEE Computer Society, 1999.

1620:

1621: \bibitem[NSW{\etalchar{+}}02]{NiereEtAl:2002}

1622: J\"org Niere, Wilhelm Sch\"afer, J\"org~P. Wadsack, Lothar Wendehals, and Jim

1623:   Welsh.

1624: \newblock Towards pattern-based design recovery.

1625: \newblock In {\em Proceedings of the 24th International Conference on Software

1626:   Engineering (ICSE 2002)}, pages 338--348. ACM, 2002.

1627:

1628: \bibitem[Qui94]{Quilici:1994}

1629: Alex Quilici.

1630: \newblock A memory-based approach to recognizing programming plans.

1631: \newblock {\em Communications of the ACM}, 37(5):84--93, 1994.

1632:

1633: \bibitem[RW90]{RichWills:1990}

1634: Charles Rich and Linda~M. Wills.

1635: \newblock Recognizing a program's design: A graph-parsing approach.

1636: \newblock {\em IEEE Software}, 7(1):82--89, 1990.

1637:

1638: \bibitem[SMB96]{ShullEtAl:1996}

1639: Forrest Shull, Walc{\'e}lio~L. Melo, and Victor~R. Basili.

1640: \newblock An inductive method for discovering design patterns from

1641:   object-oriented software systems.

1642: \newblock Technical Report CS-TR-3597, Computer Science Department, University

1643:   of Maryland, 1996.

1644:

1645: \bibitem[SSC96]{SefikaEtAl:1996a}

1646: Mohlalefi Sefika, Aamod Sane, and Roy~H. Campbell.

1647: \newblock Monitoring compliance of a software system with its high-level design

1648:   models.

1649: \newblock In {\em Proceedings of the 18th International Conference on Software

1650:   Engineering (ICSE 1996)}, pages 387--396. IEEE Computer Society, 1996.

1651:

1652: \bibitem[TA99]{TonellaAntoniol:1999}

1653: Paolo Tonella and Giulio Antoniol.

1654: \newblock Object oriented design pattern inference.

1655: \newblock In {\em Proceedings of the International Conference on Software

1656:   Maintenance (ICSM 1999)}, pages 230--238. IEEE Computer Society, 1999.

1657:

1658: \bibitem[WHH02]{WuEtAl:2002}

1659: Jingwei Wu, Ahmed~E. Hassan, and Richard~C. Holt.

1660: \newblock Using graph patterns to extract scenarios.

1661: \newblock In {\em Proceedings of the 10th International Workshop on Program

1662:   Comprehension (IWPC 2002)}, pages 239--247. IEEE Computer Society, 2002.

1663:

1664: \bibitem[Won98]{Wong:1998}

1665: Kenny Wong.

1666: \newblock {\em Rigi User's Manual, Version 5.4.4}, 1998.

1667: \newblock http://ftp.rigi.csc.uvic.ca/pub/rigi/doc/.

1668:

1669: \end{thebibliography}

1670:

1671: \end{document}

1672: