cs0409009/main.tex
1: \documentclass{llncs}
2: \bibliographystyle{alpha}
3: 
4: \usepackage{cs_techrpt_cover}
5: \usepackage{graphicx}
6: \usepackage{amssymb}
7: \usepackage{eucal}
8: \usepackage[pdfpagemode=None]{hyperref}
9: 
10: \newcounter{verbline}
11: \makeatletter
12: \def\verblineinput#1{
13: \begingroup
14: \setcounter{verbline}{0} \fontsize{10pt}{11pt}
15: \def\par{\leavevmode\null\@@par}
16: \obeylines \frenchspacing\@vobeyspaces \@makeother\$\@makeother\&\@makeother\#
17: \@makeother\^\@makeother\_\@makeother\~ \@makeother\{\@makeother\}\@makeother\\
18: \@makeother\%\tt\@noligs\input{#1}\endgroup} \makeatother
19: 
20: % Seiten in Hoehe vergroessern
21: \addtolength{\topmargin}{-1.5cm}
22: \addtolength{\textheight}{3cm}
23: 
24: % Seiten in Breite vergroessern
25: \addtolength{\oddsidemargin}{-2.2cm}
26: \addtolength{\evensidemargin}{-2.2cm}
27: \addtolength{\textwidth}{3.7cm}
28: 
29: 
30: % Zus"atzliche Definitionen.
31: \def\R{\mathop{\rm I\kern-1.5pt R}}
32: \def\N{\mathop{\rm I\kern-1.5pt N}}
33: \def\NP{\mathop{\cal NP}}
34: \def\bbbkl{{[\![}}
35: \def\bbbkr{{]\!]}}
36: \def\cc{{\cal U}}
37: \def\crel{{\cal R}}
38: \def\ct{{\cal T}}
39: \def\cx{{\cal X}}
40: 
41: \pagestyle{empty}
42: \setcounter{page}{0}
43: 
44: \title{CrocoPat 2.1 Introduction and Reference Manual\thanks{%
45:   This research was supported in part by
46:   the NSF grants CCR-0234690 and ITR-0326577,
47:   and the DFG grant BE~1761/3-1.
48:        }}
49: 
50: \institute{
51:    Electronics Research Laboratory \\
52:    Department of Electrical Engineering and Computer Sciences \\
53:    College of Engineering \\
54:    University of California at Berkeley \\
55:    Berkeley, CA 94720-1770, U.S.A. \\
56:    \email{beyer@eecs.berkeley.edu}\\[5mm]
57: \and
58:    Software Systems Engineering Research Group\\
59:    Department of Computer Science\\
60:    Brandenburg University of Technology at Cottbus\\
61:    PO Box 10 13 44, 03013 Cottbus, Germany\\
62:    \email{an@informatik.tu-cottbus.de}
63: }
64: 
65: %\reportmonth{July}
66: %\reportyear{2004}
67: \date{July 2004}
68: \reportnumber{04-1338}
69: \useuglyformat
70: 
71: \begin{document}
72: 
73: \author{Dirk Beyer \and Andreas Noack}
74: \makecover
75: 
76: % Just to get an additional blank page.
77: \phantom{x}
78: 
79: \author{Dirk Beyer\inst{1} \and Andreas Noack\inst{2}}
80: \maketitle
81: 
82: % Seitennummern unten mittig.
83: \pagestyle{plain}
84: 
85: \begin{abstract}
86: CrocoPat is an efficient, powerful and easy-to-use tool for manipulating
87: relations of arbitrary arity, including directed graphs.  This manual provides
88: an introduction to and a reference for CrocoPat and its programming
89: language~RML. It includes several application examples, in particular from the
90: analysis of structural models of software systems.
91: \end{abstract}
92: 
93: \sloppy
94: 
95: 
96: 
97: 
98: %RML
99: %operator complexity
100: 
101: 
102: \section{Introduction} \label{s:intro}
103: 
104: CrocoPat is a tool for manipulating relations, including directed
105: graphs (binary relations).  CrocoPat is powerful, because it
106: manipulates relations of arbitrary arity; it is efficient in terms
107: of time and memory, because it uses the data structure binary
108: decision diagram (BDD,~\cite{Bryant:1986,Bryant:1992}) for the
109: internal representation of relations; it is fairly easy to use,
110: because its language is simple and based on the well-known
111: predicate calculus; and it is easy to integrate with other tools,
112: because it uses the simple and popular Rigi Standard Format (RSF)
113: as input and output format for relations. CrocoPat is free
114: software (released under LGPL) and can be obtained from
115: \href{http://www.software-systemtechnik.de/CrocoPat}{\tt
116: http://www.software-systemtechnik.de/CrocoPat}.
117: 
118: {\bf Overview.} CrocoPat is a command line tool which interprets programs
119: written in the Relation Manipulation Language (RML).  Its inputs are an RML
120: program and relations in the Rigi Standard Format (RSF), and its outputs are
121: relations in RSF and other text produced by the RML program.  The programming
122: language RML, and the input and output of relations from and to RSF files are
123: introduced with the help of many examples in Section~\ref{s:lanintro}.
124: Section~\ref{s:advanced} describes advanced programming techniques, in
125: particular for improving the performance of RML programs and for circumventing
126: limitations of RML. The manual concludes with references of CrocoPat's command
127: line options in Section~\ref{s:pat}, of RSF in Section~\ref{s:rsf}, and of RML
128: in Section~\ref{s:lan}.  The RML reference includes a concise informal
129: description of the semantics, and a formal description of the syntax and the
130: core semantics.
131: 
132: {\bf Applications.} CrocoPat was originally developed for analyzing graph
133: models of software systems, and in particular for finding patterns in such
134: graphs~\cite{BeyerEtAl:2003}. Existing tools were not appropriate for this
135: task, because they were limited to binary relations (e.g.
136: Grok~\cite{Holt:1998}, RPA \cite{FeijsEtAl:1998}, and
137: RelView~\cite{BerghammerEtAl:2002}), or consumed too much time or memory (e.g.
138: relational database management systems and Prolog interpreters). Applications
139: of graph pattern detection include
140: \begin{itemize}
141: \item the detection of implementation
142: patterns~\cite{RichWills:1990,HarandiNing:1990,Hartman:1991,Quilici:1994},
143: object-oriented design patterns (Section~\ref{s:pattern},
144: \cite{MendelzonSametinger:1995,KraemerPrechelt:1996,AntoniolEtAl:1998,KellerEtAl:1999,NiereEtAl:2002}),
145: and architectural styles~\cite{Holt:1996},
146: \item the detection of potential design problems (Section~\ref{s:pattern},
147: \cite{MendelzonSametinger:1995,SefikaEtAl:1996a,FeijsEtAl:1998,KazmanBurth:1998,Ciupke:1999,FahmyHolt:2000}),
148: \item the inductive inference of design
149: patterns~\cite{ShullEtAl:1996,TonellaAntoniol:1999},
150: \item the identification of code clones~\cite{Krinke:2001},
151: \item the extraction of scenarios from models of source
152: code~\cite{WuEtAl:2002}, and
153: \item the detection of design problems in databases~\cite{Blaha:2004}.
154: \end{itemize}
155: 
156: The computation of transitive closures of graphs -- another particular strength
157: of CrocoPat -- is not only needed for the detection of some of the above
158: patterns, but has also been applied for dead code detection and change impact
159: analysis~\cite{ChenEtAl:1998,FeijsEtAl:1998}.  Computing and analyzing the
160: difference between two graphs supports checking the conformance of the as-built
161: architecture to the as-designed architecture
162: \cite{SefikaEtAl:1996a,FeijsEtAl:1998,MensWuyts:1999,FahmyHolt:2000,MurphyEtAl:2001},
163: and studying the evolution of software systems between different versions.
164: Calculators for relations have also been used to compute views of systems on
165: different levels of abstraction by lifting and lowering
166: relations~\cite{FeijsEtAl:1998,FahmyHolt:2000}, and to calculate software
167: metrics (Section~\ref{s:numerical},
168: \cite{MendelzonSametinger:1995,KullbachWinter:1999}).
169: 
170: Although we are most familiar with potential applications in the analysis of
171: software designs, we are confident that CrocoPat can be beneficial in many
172: other areas. For example, calculators for relations have been used for program
173: analyses like points-to analysis~\cite{BerndlEtAl:2003}, and for the
174: implementation of graph algorithms~(Section~\ref{s:control},
175: \cite{BehnkeEtAl:1997}).
176: 
177: 
178: \section{RML Tutorial} \label{s:lanintro}
179: 
180: This section introduces RML, the programming language of CrocoPat, on examples.
181: The core of RML are relational expressions based on first-order predicate
182: calculus, a well-known, reasonably simple, precise and powerful language.
183: Relational expressions are explained in Subsection~\ref{s:re}, and additional
184: examples involving relations of arity greater than two are given in
185: Subsection~\ref{s:pattern}.  Besides relational expressions, the language
186: includes control structures, described in Subsection~\ref{s:control}, and
187: numerical expressions, described in Subsection~\ref{s:numerical}.  The input
188: and output of relations is described in Subsection~\ref{s:output}.  A more
189: concise and a more formal specification of the language can be found in
190: Section~\ref{s:lan}.
191: 
192: Although the main purpose of this section is the introduction of the language,
193: some of the application examples may be of interest by themselves. In
194: Subsection~\ref{s:control}, simple graph algorithms are implemented, and in the
195: Subsections~\ref{s:pattern} and~\ref{s:numerical} the design of object-oriented
196: software systems is analyzed.
197: 
198: 
199: \subsection{Relational Expressions} \label{s:re}
200: 
201: This subsection introduces relational expressions using relationships between
202: people as example. Remember that $n$-ary relations are sets of ordered
203: $n$-tuples.  In this subsection, we will only consider the cases $n = 1$ (unary
204: relations) and $n = 2$ (binary relations, directed graphs).  CrocoPat
205: manipulates tuples of strings, thus unary relations in CrocoPat are sets of
206: strings, and binary relations in CrocoPat are sets of ordered pairs of strings.
207: 
208: {\bf Adding Elements.} The statement
209: \begin{verbatim}
210: Male("John");
211: \end{verbatim}
212: expresses that {\tt John} is male. (In some languages, e.g. the logic
213: programming language Prolog~\cite{ClocksinMellish:2003}, such statements are
214: called facts.) It adds the string {\tt John} to the unary relation {\tt Male}.
215: Because each relation variable initially contains the empty relation, {\tt
216: John} is so far the only element of the set {\tt Male}. An explicit declaration
217: of variables is not necessary. However, variables should be defined (i.e.,
218: assigned a value) before they are first used, otherwise CrocoPat prints a
219: warning.
220: \begin{verbatim}
221: Male("Joe");
222: \end{verbatim}
223: adds the string {\tt Joe} to the set {\tt Male}, such that it now has two
224: elements. Similarly, we can initialize the variable {\tt Female}:
225: \begin{verbatim}
226: Female("Alice");
227: Female("Jane");
228: Female("Mary");
229: \end{verbatim}
230: 
231: To express that the {\tt John} and {\tt Mary} are the
232: parents of {\tt Alice} and {\tt Joe}, and {\tt Joe} is the father of {\tt
233: Jane}, we create a binary relation variable {\tt ParentOf} which contains the
234: five parent-child pairs:
235: \begin{verbatim}
236: ParentOf("John", "Alice");
237: ParentOf("John", "Joe");
238: ParentOf("Mary", "Alice");
239: ParentOf("Mary", "Joe");
240: ParentOf("Joe", "Jane");
241: \end{verbatim}
242: 
243: {\bf Assignments.} The following statement uses an {\em attribute}~{\tt x} to
244: assign the set of {\tt Joe}'s parents to the set {\tt JoesParent}:
245: \begin{verbatim}
246: JoesParent(x) := ParentOf(x, "Joe");
247: \end{verbatim}
248: Now {\tt JoesParent} contains the two elements {\tt John} and {\tt Mary}. As
249: another example, the following assignment says that {\tt x} is a child of {\tt
250: y} if and only if {\tt y} is a parent of {\tt x}:
251: \begin{verbatim}
252: ChildOf(x,y) := ParentOf(y,x);
253: \end{verbatim}
254: {\tt John} is the father of a person if and only if he is the parent of this
255: person.  The same is true for {\tt Joe}:
256: \begin{verbatim}
257: FatherOf("John", x) := ParentOf("John", x);
258: FatherOf("Joe", x) := ParentOf("Joe", x);
259: \end{verbatim}
260: Because the scope of each attribute is limited to one statement, the attribute
261: in the first statement and the attribute in the second statement are different,
262: despite of their equal name~{\tt x}.
263: 
264: {\bf Basic Relational Operators.} The relation {\tt FatherOf} can be described
265: more concisely: {\tt x} is father of~{\tt y} if and only if {\tt x} is a parent
266: of {\tt y} and {\tt x} is male:
267: \begin{verbatim}
268: FatherOf(x,y) := ParentOf(x,y) & Male(x);
269: \end{verbatim}
270: Of course, we can define a similar relation for female parents:
271: \begin{verbatim}
272: MotherOf(x,y) := ParentOf(x,y) & Female(x);
273: \end{verbatim}
274: 
275: Besides the operator {\em and} ({\tt \&}), another important operator is {\em
276: or} ({\tt |}). For example, we can define the {\tt ParentOf} relation in terms
277: of the relations {\tt MotherOf} and {\tt FatherOf}: {\tt x}~is a parent of~{\tt
278: y} if and only if {\tt x}~is the mother or the father of~{\tt y}:
279: \begin{verbatim}
280: ParentOf(x,y) := MotherOf(x,y) | FatherOf(x,y);
281: \end{verbatim}
282: 
283: {\bf Quantification.} Parents are people who are a parent of another person.
284: More precisely, {\tt x} is a parent if and only if there {\em exists} ({\tt
285: EX}) a {\tt y} such that {\tt x} is a parent of {\tt y}.
286: \begin{verbatim}
287: Parent(x) := EX(y, ParentOf(x,y));
288: \end{verbatim}
289: Now the set {\tt Parent} consists of {\tt John}, {\tt Mary}, and {\tt Joe}.
290: There is also an abbreviated notation for existential quantification which is
291: similar to anonymous variables in Prolog and functional programming languages:
292: \begin{verbatim}
293: Parent(x) := ParentOf(x,_);
294: \end{verbatim}
295: With the operator {\em not} ({\tt !}), we can compute who has no children:
296: \begin{verbatim}
297: Childless(x) := !EX(y, ParentOf(x,y));
298: \end{verbatim}
299: Equivalently, {\tt x} childless if {\em for all} ({\tt FA}) {\tt y} holds that
300: {\tt x} is not a parent of {\tt y}:
301: \begin{verbatim}
302: Childless(x) := FA(y, !ParentOf(x,y));
303: \end{verbatim}
304: In both cases, the set {\tt Childless} contains {\tt Alice} and {\tt Jane}.
305: 
306: {\bf Transitive Closure.} To compute the grandparents of a person we have to
307: determine the parents of his or her parents:
308: \begin{verbatim}
309: GrandparentOf(x,z) := EX(y, ParentOf(x,y) & ParentOf(y,z));
310: \end{verbatim}
311: Now {\tt GrandparentOf} contains the two pairs ({\tt John}, {\tt Jane}) and
312: ({\tt Mary}, {\tt Jane}).  To find out all ancestors of a person, i.e. parents,
313: parents of parents, parents of parents of parents, etc., we have to apply the
314: above operation (which is also called composition) repeatedly until the fixed
315: point is reached, and unite the results.  The transitive closure operator {\tt
316: TC} does exactly this:
317: \begin{verbatim}
318: AncestorOf(x,z) := TC(ParentOf(x,z));
319: \end{verbatim}
320: The resulting relation {\tt AncestorOf} contains any pair from {\tt ParentOf}
321: and {\tt GrandparentOf}. (It also contains grand-grandparents etc., but there
322: are none in this example.) The transitive closure operator {\tt TC} can only be
323: applied to binary relations.
324: 
325: {\bf Predefined Relations, the Universe.} The relations {\tt FALSE} and {\tt
326: TRUE} are predefined.  {\tt FALSE} is the empty relation, and {\tt TRUE} is the
327: full relation. More precisely, there is one predefined relation {\tt FALSE} and
328: one predefined relation {\tt TRUE} for every arity.  In particular, there is
329: also a 0-ary relation {\tt FALSE()}, which is the empty set, and a 0-ary
330: relation {\tt TRUE()}, which contains only~$()$ (the tuple of length~$0$).
331: Intuitively, these 0-ary relations can be used like Boolean literals.  By the
332: way, the statement
333: \begin{verbatim}
334: Male("John");
335: \end{verbatim}
336: is an abbreviation of the assignment
337: \begin{verbatim}
338: Male("John") := TRUE();
339: \end{verbatim}
340: 
341: The result of {\tt TRUE(x)} is the so-called {\em universe}.  The universe
342: contains all string literals that appear in the input RSF stream (if there is
343: one, see Subsection~\ref{s:output}) and on the left hand side of assignments in
344: the present RML program.  For example, the string literals used on the left
345: hand side of the assignments in the examples of this subsection are {\tt
346: Alice}, {\tt Jane}, {\tt Joe}, {\tt John}, and {\tt Mary}, so the set {\tt
347: TRUE(x)} contains these five elements. See Subsection~\ref{s:universe} for more
348: information on the universe.
349: 
350: The binary relations {\tt =}, {\tt !=}, {\tt <}, {\tt <=}, {\tt >}, and {\tt
351: >=} for the lexicographical order of the strings in the universe are also predefined.
352: For example, siblings are two {\em different} people who have a common parent:
353: \begin{verbatim}
354: SiblingOf(x,y) := EX(z, ParentOf(z,x) & ParentOf(z,y)) & !=(x,y);
355: \end{verbatim}
356: The infix notation is also available for binary relations, so the expression
357: {\tt !=(x,y)} can also be written as~{\tt x!=y}.  Note that the predefined
358: relations, like any other relation, are restricted to the universe. Thus the
359: expression {\tt "A"="A"} yields {\tt FALSE()} if (and only if) the string {\tt
360: A} is not in the universe.
361: 
362: Further relational expressions are provided to match POSIX extended regular
363: expressions~\cite[Section~9.4]{IEEE:2001}.  These relational expressions start
364: with the character {\tt @}, followed by the string for the regular expression.
365: For example,
366: \begin{verbatim}
367: StartsWithJ(x) := @"^J"(x);
368: \end{verbatim}
369: assigns to the set {\tt StartsWithJ} the set of all strings in the universe
370: that start with the letter {\tt J}, namely {\tt Jane}, {\tt Joe}, and {\tt
371: John}. A short overview of the syntax of regular expressions is given in
372: Subsection~\ref{s:syntax}.
373: 
374: {\bf Boolean Operators.} Two relations can be compared with the operators {\tt
375: =}, {\tt !=}, {\tt <}, {\tt <=}, {\tt >}, or {\tt >=}.  Because such
376: comparisons evaluate to either {\tt TRUE()} or {\tt FALSE()}, they are called
377: Boolean expressions. For example,
378: \begin{verbatim}
379: GrandparentOf(x,y) < AncestorOf(x,y)
380: \end{verbatim}
381: yields {\tt TRUE()}, because {\tt GrandparentOf} is a proper subset of {\tt
382: AncestorOf}.  However,
383: \begin{verbatim}
384: GrandparentOf(x,y) = AncestorOf(x,y)
385: \end{verbatim}
386: yields {\tt FALSE()}, because the two relations are not equal.
387: 
388: The six comparison operators should not be confused with the six predefined
389: relations for the lexicographical order. The operators take two relations as
390: parameters, while the predefined relations take strings or attributes as
391: parameters.
392: 
393: 
394: 
395: \subsection{Input and Output of Relations} \label{s:output}
396: 
397: {\bf File Format RSF.} CrocoPat reads and writes relations in Rigi Standard
398: Format (RSF,~\cite[Section~4.7.1]{Wong:1998}). Files in RSF are human-readable,
399: can be loaded into and saved from many reverse engineering tools, and are
400: easily processed by scripts in common scripting languages.
401: 
402: In an RSF file, a tuple of an $n$-ary relation is represented as a line of the
403: form
404: \begin{verbatim}
405: RelationName element1 element2 ... elementn
406: \end{verbatim}
407: The elements may be enclosed by double quotes. Because white space serves as
408: delimiter of the elements, elements that contain white space {\em must} be
409: enclosed by double quotes. A relation is represented as a sequence of such
410: lines.  The order of the lines is arbitrary.  An RSF file may contain several
411: relations.
412: 
413: As an example, the relation {\tt ParentOf} from the previous subsection can be
414: represented in RSF format as follows:
415: \begin{verbatim}
416: ParentOf John Alice
417: ParentOf John Joe
418: ParentOf Mary Alice
419: ParentOf Mary Joe
420: ParentOf Joe  Jane
421: \end{verbatim}
422: 
423: {\bf Input.} RML has no input statements.  When CrocoPat is started, it first
424: reads input relations in RSF from the standard input before it parses and
425: executes the RML program.  RSF reading can be skipped with the {\tt -e} command
426: line option. If the input relations are available as files, they can be feeded
427: into CrocoPat's standard input using the shell operator {\tt <}, as the
428: following examples shows for the file {\tt ParentOf.rsf}:
429: \begin{verbatim}
430: crocopat Prog.rml < ParentOf.rsf
431: \end{verbatim}
432: The end of the input data is recognized either from the end of file character
433: or from a line that starts with the dot ({\tt .}) character.  The latter is
434: sometimes useful if RSF input is feeded interactively.
435: 
436: If the above RSF data is used as input, then at the start of the program the
437: binary relation variable {\tt ParentOf} contains the five pairs, and the
438: universe contains the five string literals {\tt Alice}, {\tt Jane}, {\tt Joe},
439: {\tt John}, and {\tt Mary} (and additionally all string literals that appear on
440: the left hand side of assignments in the program.)
441: 
442: {\bf Output.} The {\tt PRINT} statement outputs relations in RSF format to the
443: standard output. For example, running the program
444: \begin{verbatim}
445: ParentOf("Joe",x) := FALSE(x);
446: ParentOf(x,"Joe") := FALSE(x);
447: PRINT ParentOf(x,y);
448: \end{verbatim}
449: with the above input data prints to the standard output
450: \begin{verbatim}
451: John Alice
452: Mary Alice
453: \end{verbatim} The statement
454: \begin{verbatim}
455: PRINT ["ParentOf"] ParentOf(x,y);
456: \end{verbatim}
457: writes the string {\tt ParentOf} before each tuple, and thus outputs
458: \begin{verbatim}
459: ParentOf John Alice
460: ParentOf Mary Alice
461: \end{verbatim}
462: The output can also be appended to a file {\tt ParentOf2.rsf} (which is created
463: if it does not exist) with
464: \begin{verbatim}
465: PRINT ["ParentOf"] ParentOf(x,y) TO "ParentOf2.rsf";
466: \end{verbatim}
467: or to stderr with
468: \begin{verbatim}
469: PRINT ["ParentOf"] ParentOf(x,y) TO STDERR;
470: \end{verbatim}
471: \pagebreak
472: 
473: {\bf Command Line Arguments.} It is sometimes convenient to specify the names
474: of output files at the command line and not in the RML program.  If there is
475: only one output file, the standard output can be simply redirected to a file
476: using the shell operator {\tt >}:
477: \begin{verbatim}
478: crocopat Prog.rml < ParentOf.rsf > ParentOf2.rsf
479: \end{verbatim}
480: 
481: An alternative solution (which also works with more than one file) is to pass
482: command line arguments to the program.  Command line arguments can be accessed
483: in RML as {\tt \$1}, {\tt \$2}, etc.  For example, when the program
484: \begin{verbatim}
485: ChildOf(x,y) := ParentOf(y,x);
486: PRINT ["Child"] ChildOf(x,$1) TO $1 + ".rsf";
487: PRINT ["Child"] ChildOf(x,$2) TO $2 + ".rsf";
488: \end{verbatim}
489: is executed with
490: \begin{verbatim}
491: crocopat IO.rml Joe Mary < ParentOf.rsf
492: \end{verbatim}
493: then the first {\tt PRINT} statement writes to the file {\tt Joe.rsf}, and
494: the second {\tt PRINT} statement writes to {\tt Mary.rsf}.
495: 
496: Command line arguments are not restricted to specifying file names, but can be
497: used like string literals.  However, in contrast to string literals, command
498: line arguments are never added to the universe, and thus cannot be used on the
499: left hand side of relational assignments.
500: 
501: 
502: \subsection{Control Structures} \label{s:control}
503: 
504: This subsection introduces the control structures of RML, using algorithms for
505: computing the transitive closure of a binary relation~{\tt R} as examples.
506: 
507: {\bfseries {\ttfamily WHILE} Statement.} As a first algorithm, the relation
508: {\tt R} is composed with itself until the fixed point is reached.
509: %Intro_TC-Exp.rml
510: \begin{verbatim}
511: Result(x,y) := R(x,y);
512: PrevResult(x,y) := FALSE(x,y);
513: WHILE (PrevResult(x,y) != Result(x,y)) {
514:     PrevResult(x,y) := Result(x,y);
515:     Result(x,z) := Result(x,z) | EX(y, Result(x,y) & Result(y,z));
516: }
517: \end{verbatim}
518: The program illustrates the use of the {\tt WHILE} loop, which has the usual
519: meaning: The body of the loop is executed repeatedly as long as the condition
520: after {\tt WHILE} evaluates to~{\tt TRUE()}.
521: 
522: {\bfseries {\ttfamily FOR} Statement.} The second program computes the
523: transitive closure of the relation~{\tt R} using the Warshall algorithm.  This
524: algorithm successively adds arcs. In the first iteration, an arc $(u,v)$ is
525: added if the input graph contains the arcs $(u,$~node$_0)$ and $($node$_0, v)$.
526: In the second iteration, an arc $(u, v)$ is added if the graph that results
527: from the first iteration contains the arcs $(u,$~node$_1)$ and $($node$_1, v)$.
528: And so on, for all nodes of the graph (in arbitrary order.)
529: %Intro_TC-Warshall.rml
530: \begin{verbatim}
531: Result(x,y) := R(x,y);
532: Node(x) := Result(x,_) & Result(_,x);
533: FOR node IN Node(x) {
534:   Result(x,y) := Result(x,y) | (Result(x,node) & Result(node,y));
535: }
536: \end{verbatim}
537: The program illustrates the use of the {\tt FOR} loop.  The relation after {\tt
538: IN} must be a unary relation.  The iterator after {\tt FOR} is a string
539: variable and takes as values the elements of the unary relation in
540: lexicographical order. Thus, the number of iterations equals the number of
541: elements of the unary relation.
542: 
543: For the implementation of the transitive closure operator of RML, we
544: experimented with several algorithms.  An interesting observation in these
545: experiments was that the empirical complexity of some algorithms for practical
546: graphs deviated strongly from their theoretical worst case complexity, thus
547: some algorithms with a relatively bad worst-case complexity were very
548: competitive in practice.  In our experiments, the first of the above algorithms
549: was very fast, thus we made it available as operator {\tt TCFAST}.  The
550: implementation of the {\tt TC} operator of RML is a variant of the Warshall
551: algorithm.  It is somewhat slower than {\tt TCFAST} (typically about 20 percent
552: in our experiments), but often needs much less memory because it uses no
553: ternary relations.
554: 
555: {\bfseries {\ttfamily IF} Statement.} The following example program determines
556: if the input graph {\tt R} is acyclic, by checking if its transitive closure
557: contains loops (i.e. arcs from a node to itself):
558: %Intro_Cyclic.rml
559: \begin{verbatim}
560: SelfArcs(x,y) := TC(R(x,y)) & (x = y);
561: IF (SelfArcs(_,_)) {
562:     PRINT "R is not acyclic", ENDL;
563: } ELSE {
564:     PRINT "R is acyclic", ENDL;
565: }
566: \end{verbatim}
567: 
568: 
569: 
570: \subsection{Relations of Higher Arity}\label{s:pattern}
571: 
572: In this subsection, relations of arity greater than two are used for finding
573: potential design patterns and design problems in structural models of
574: object-oriented programs.  The examples are taken from~\cite{BeyerEtAl:2003}.
575: 
576: The models of object-oriented programs contain the call, containment, and
577: inheritance relationships between classes. Here containment means that a class
578: has an attribute whose type is another class. The direction of inheritance
579: relationships is from the subclass to the superclass. As an example, the source
580: code
581: \begin{verbatim}
582: class ContainedClass {}
583: class SuperClass {}
584: class SubClass extends SuperClass {
585:     ContainedClass c;
586: }
587: \end{verbatim}
588: corresponds to the following RSF file:
589: \begin{verbatim}
590: Inherit   SubClass  SuperClass
591: Contain   SubClass  ContainedClass
592: \end{verbatim}
593: 
594: \begin{figure}[ht]
595:    \centering \includegraphics{compositemg}
596:    {\caption{Composite design pattern
597:    \label{fig:composite}}}
598: \end{figure}
599: 
600: {\bf Composite Design Pattern.} Figure~\ref{fig:composite} shows the class
601: diagram of the Composite design pattern~\cite{GammaEtAl:1995}. To identify
602: possible instances of this pattern, we compute all triples of a Component
603: class, a Composite class, and a Leaf class, such that (1) Composite and Leaf
604: are subclasses of Component, (2) Composite contains an instance of Component,
605: and (3) Leaf does not contain an instance of Component. The translation of
606: these conditions to an RML statement is straightforward:
607: \begin{verbatim}
608: CompPat(component, composite, leaf) :=   Inherit(composite, component)
609:                                        & Contain(composite, component)
610:                                        & Inherit(leaf, component)
611:                                        & !Contain(leaf, component);
612: \end{verbatim}
613: 
614: {\bf Degenerate Inheritance.} When a class {\tt C} inherits from another class
615: {\tt A} directly and indirectly via a class {\tt B}, the direct inheritance is
616: probably redundant or even misleading.  The following statement detects such
617: patterns:
618: \begin{verbatim}
619: DegInh(a,b,c) :=   Inherit(c,b)
620:                  & Inherit(c,a)
621:                  & TC(Inherit(b,a));
622: \end{verbatim}
623: 
624: {\bf Cycles.} To understand an undocumented class, one has to understand all
625: classes it uses. If one of the (directly or indirectly) used classes is the
626: class itself, understanding this class is difficult.  All classes that
627: participate in cycles can be found using the transitive closure operator, as
628: shown in Subsection~\ref{s:control}.  However, in many large software systems
629: hundreds of classes participate in cycles, and it is tedious for a human
630: analyst to find the actual cycles in the list of these classes. In our
631: experience, it is often more useful to detect cycles in the order of ascending
632: length.  As a part of such a program, the following statements detects all
633: cycles of length~$3$.
634: \begin{verbatim}
635: Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
636: Cycle3(x,y,z) := Use(x,y) & Use(y,z) & Use(z,x);
637: Cycle3(x,y,z) := Cycle3(x,y,z) & (x <= y) & (x <= z);
638: \end{verbatim}
639: To see the purpose of the third statement, consider three classes {\tt A}, {\tt
640: B}, and {\tt C} that form a cycle.  After the second statement, the relation
641: variable {\tt Cycle3} contains three representatives of this cycle: $(${\tt
642: A},~{\tt B},~{\tt C}$)$, $(${\tt B},~{\tt C},~{\tt A}$)$ and $(${\tt C},~{\tt
643: A},~{\tt B}$)$. The third statement removes two of these representatives from
644: {\tt Cycle3}, and keeps only the tuple with the lexicographically smallest
645: class at the first position, namely $(${\tt A},~{\tt B},~{\tt C}$)$.
646: 
647: 
648: 
649: \subsection{Numerical Expressions} \label{s:numerical}
650: 
651: In this subsection, a software metric is calculated as example for the use of
652: numerical expressions in RML programs.  Therefore, we extend the structural
653: model of object-oriented programs introduced in the previous subsection with a
654: binary relation {\tt PackageOf}.  This relation assigns to each package the
655: classes that it contains. (Packages are high-level entities in object-oriented
656: software systems that can be considered as sets of classes.)
657: 
658: Robert Martin's metric for the instability of a package is defined as $ce /
659: (ca+ce)$, where $ca$ is the number of classes outside the package that use
660: classes inside the package, and $ce$ is the number of classes inside the
661: package that use classes outside the package~\cite{Martin:1997}.
662: \begin{verbatim}
663: Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
664: Package(x) := PackageOf(x,_);
665: FOR p IN Package(x) {
666:     CaClass(x) := !PackageOf(p,x) & EX(y, Use(x,y) & PackageOf(p,y));
667:     ca := #(CaClass(x));
668:     CeClass(x) := PackageOf(p,x) & EX(y, Use(x,y) & !PackageOf(p,y));
669:     ce := #(CeClass(x));
670:     IF (ca + ce > 0) {
671:         PRINT p, " ", ce / (ca+ce), ENDL;
672:     }
673: }
674: \end{verbatim}
675: 
676: 
677: 
678: \section{Advanced Programming Techniques} \label{s:advanced}
679: 
680: This section describes advanced programming techniques, in particular for
681: improving efficiency and circumventing language limitations.  The first
682: subsection explains how to control the memory usage of CrocoPat.  The second
683: and third subsection describe how relational expressions are evaluated in
684: CrocoPat, and how to assess and improve the efficiency of their evaluation. The
685: fourth subsection explains why the universe is immutable during the execution
686: of an RML program and how to work around this limitation.
687: 
688: \subsection{Controlling the Memory Usage} \label{ss:memory}
689: 
690: CrocoPat represents relations using the data structure binary decision diagram
691: (BDD,~\cite{Bryant:1986}). When CrocoPat is started, it reserves a fixed amount
692: of memory for BDDs, which is not changed during the execution of the RML
693: program. If the available memory is insufficient, CrocoPat exits with the error
694: message
695: \begin{verbatim}
696: Error: BDD package out of memory.
697: \end{verbatim}
698: 
699: The BDD memory can be controlled with the command line option~{\tt -m},
700: followed by an integer number giving the amount of memory in MByte.  The
701: default value is~50.  The actual amount of memory reserved for BDDs is not
702: infinitely variable, so the specified value is only a rough upper bound of the
703: amount of memory used.
704: 
705: It can also be beneficial to reserve less memory, because the time used for
706: allocating memory increases with the amount of memory.  When the manipulated
707: relations are small or the algorithms are computationally inexpensive, memory
708: allocation can dominate the overall runtime.
709: 
710: \subsection{Speeding up the Evaluation of Relational Expressions}
711: 
712: This subsection explains how CrocoPat evaluates relational expressions.  Based
713: on this information, hints for performance improvement are given. Understanding
714: the subsection requires basic knowledge about BDDs and the impacts of the
715: variable order on the size of BDDs. An introduction to BDDs is beyond the scope
716: of this manual, we refer the reader to~\cite{Bryant:1992}.
717: 
718: The attributes in an RML program are called {\em user attributes} in the
719: following.  For example, the expression {\tt R(x,y)} contains the user
720: attributes {\tt x} and~{\tt y}.  For the internal representation of relations,
721: CrocoPat uses a sequence of {\em internal attributes}, which are distinct from
722: the user attributes.  We call these internal attributes {\tt i1}, {\tt i2},
723: {\tt i3}, ... For example, the binary relation {\tt R} is internally
724: represented as a set of assignments to the internal attributes {\tt i1}
725: and~{\tt i2}.
726: 
727: When the expression {\tt R(x,y)} is evaluated, the internal attributes {\tt i1}
728: and {\tt i2} are renamed to the user attributes {\tt x} and {\tt y}. Therefore
729: all BDD nodes of the representation of {\tt R} have to be traversed. Thus, the
730: time for evaluating the expression {\tt R(x,y)} is at least linear in the
731: number of BDD nodes of~{\tt R}'s representation.
732: 
733: The order of the internal attributes in the BDD is always {\tt i1}, {\tt i2},
734: ... The order of the user attributes in the BDD may be different in the
735: evaluation of each statement, because the scope of user attributes is restricted
736: to one statement. The order of the user attributes in the BDD in the evaluation
737: of a statement is the order in which CrocoPat encounters the user attributes in
738: the execution of the statement.  In the example statement
739: \begin{verbatim}
740: R(x,z) := EX(y, R(x,y) & R(y,z));
741: \end{verbatim}
742: CrocoPat evaluates first {\tt R(x,y)}, then {\tt R(y,z)}, then the conjunction,
743: then the existential quantification, and finally the assignment.  Therefore,
744: the order of the user attributes in the BDD is {\tt x}, {\tt y}, {\tt z}.
745: 
746: {\bf Avoid renaming large relations.} The time for the evaluation of the
747: expression {\tt R(x,y)} is at least linear in the number of BDD nodes in the
748: representation of~{\tt R}, because all BDD nodes have to be renamed from
749: internal attributes to user attributes. Usually this effort for renaming does
750: not dominate the overall runtime, but in the following we give an example where
751: it does. \pagebreak
752: 
753: Let {\tt R(x,y)} be a directed graph with $n$~nodes. Let the BDD representation
754: of {\tt R} have $\Theta(n^2)$ BDD nodes (which is the worst case). The
755: assignment
756: \begin{verbatim}
757: Outneighbor(y) := EX(x, R(x,y) & x="node1");
758: \end{verbatim}
759: assigns the outneighbors of the graph node {\tt node1} to the set {\tt
760: Outneighbor}.  The evaluation of {\tt R(x,y)} costs $\Theta(n^2)$ time in this
761: example, because of the renaming of all nodes.  The ``real computation'',
762: namely the conjunction and the existential quantification, can be done in
763: $O(\log n)$ time.  So the renaming dominates the overall time.
764: 
765: The equivalent statement
766: \begin{verbatim}
767: Outneighbor(y) := R("node1",y);
768: \end{verbatim}
769: is executed in only $O(n)$ time, because the set {\tt R("node1",y)} has $O(n)$
770: elements, and its BDD representation has $O(n)$ nodes.
771: 
772: {\bf Avoid swapping attributes.} Renaming the nodes of a BDD costs at least
773: linear time, but can be much more expensive when attributes have to be swapped.
774: In the statement
775: \begin{verbatim}
776: S(x,y,z) := R(y,z) & R(x,y);
777: \end{verbatim}
778: the BDD attribute order on the right hand side of the assignment is {\tt y},
779: {\tt z}, {\tt x}, while the BDD attribute order on the left hand side is {\tt
780: x}, {\tt y}, {\tt z}. Because the two orders are different, attributes have to
781: be swapped to execute the assignment. This can be easily avoided by using the
782: equivalent statement
783: \begin{verbatim}
784: S(x,y,z) := R(x,y) & R(y,z);
785: \end{verbatim}
786: 
787: Of course, swapping attributes can not always be avoided.  However, developers
788: of RML programs should know that swapping attributes can be expensive, and
789: should minimize it when performance is critical.
790: 
791: {\bf Ensure good attribute orders.} A detailed discussion of BDD attribute
792: orders is beyond the scope of this manual (see
793: e.g.~\cite[Section~1.3]{Bryant:1992} for details), but the basic rule is that
794: related attributes should be grouped together.  In the two assignment
795: statements
796: \begin{verbatim}
797: S1(v,w,x,y) := R(v,w) & R(x,y);
798: S2(v,x,w,y) := R(v,w) & R(x,y);
799: \end{verbatim}
800: the attributes {\tt v} and {\tt w} are related, and the attributes {\tt x} and
801: {\tt y} are related, while {\tt v} and {\tt w} are unrelated to {\tt x}
802: and~{\tt y}.  In {\tt S1}, related attributes are grouped together, but not
803: in~{\tt S2}.  For many relations~{\tt R}, the BDD representation of~{\tt S1}
804: will be drastically smaller than the BDD representation of~{\tt S2}.
805: 
806: {\bf Profile.} Information about the number of BDD nodes and the BDD attribute
807: order of an expression can be printed with {\tt PRINT RELINFO}. For example,
808: \begin{verbatim}
809: PRINT RELINFO(R(y,z) & R(x,y));
810: \end{verbatim}
811: may output
812: \begin{verbatim}
813: Number of tuples in the relation: 461705
814: Number of values (universe): 6218
815: Number of BDD nodes: 246986
816: Percentage of free nodes in BDD package: 1614430 / 1966102 = 82 %
817: Attribute order: y z x
818: \end{verbatim}
819: The first line gives the cardinality of the relation, the second line the
820: cardinality of the universe, the third line the size of the BDD that represents
821: the result of the expression, and the fifth line the attribute order in this
822: BDD.
823: 
824: \subsection{Estimating the Evaluation Time of Relational Expressions} \label{ss:estimation}
825: 
826: Knowledge of the computational complexity of RML's operators is useful to
827: optimize the performance of RML programs. This subsection gives theoretical
828: complexity results, but also discusses the limits of their practical
829: application.
830: 
831: Table~\ref{t:complexity} shows the asymptotic worst case time complexity for
832: the evaluation of RML's relational operators.  The times do not include the
833: renaming of internal attributes discussed in the previous subsection, and
834: the evaluation of subexpressions.  It is assumed that the caches of the BDD
835: package are sufficiently large. This assumption is closely approximated in
836: practice when the manipulated BDDs only occupy a small fraction of the
837: available nodes in the BDD package. Otherwise, performance may be improved by
838: increasing the BDD memory (see Subsection~\ref{ss:memory}).
839: 
840: When the operands of an expression are relations, the computation time is given
841: as function of the sizes of their BDD representation. (The only exception are
842: the transitive closure operators, where a function of the size of the universe
843: gives a more useful bound.) This raises the problem of how to estimate these
844: BDD sizes. Many practical relations have regularities that enable an (often
845: dramatically) compressed BDD representation, but the analytical derivation of
846: the typical compression rate for relations from a particular application domain
847: is generally difficult.  Our advice is to choose some representative examples
848: and measure the BDD sizes with the {\tt PRINT RELINFO} statement.
849: 
850: It is important to note that Table~\ref{t:complexity} gives {\em worst-case}
851: computation times.  In many cases, the typical practical performance is much
852: better than the worst case.  For example, the relational comparison operators
853: ({\tt <=}, {\tt <}, {\tt >=}, {\tt >}) and the binary logic operators ({\tt
854: \&}, {\tt |}, {\tt ->}, {\tt <->}) are very common in RML programs. Their
855: worst-case complexity is the product of the sizes of their operand BDDs, which
856: is alarmingly high. However, in practice the performance is often much closer
857: to the sum of the operand BDD sizes. Similarly, the quantification operators
858: are often efficient despite their prohibitive worst case runtime (which is
859: difficult to derive because quantification is implemented as a series of
860: several bit-level operations).
861: 
862: Another practically important example for the gap between average-case and
863: worst-case runtime are the transitive closure operators.  The worst case
864: complexity of their BDD-based implementation is the same as for implementations
865: with conventional data structures. However, the BDD-based implementations are
866: much more efficient for many practical graphs~\cite{BeyerEtAl:2003}.  Even in
867: the comparison of different BDD-based implementations, a better worst-case
868: complexity does not imply a better performance in practice.  We conclude from
869: our experience that  knowledge of the theoretical complexity complements but
870: cannot replace experimentation in the development of highly optimized RML
871: programs.
872: 
873: \begin{table}
874: \vspace{-1ex}\centering
875: 
876: \normalsize
877: \begin{tabular}{l@{~~~~}l}
878: {\tt ! re}
879:     & $O(${\em bddsize}$(${\tt re}$))$ \\
880: {\tt re1 \& re2}, {\tt ~re1 | re2}
881:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\
882: {\tt re1 -> re2}, {\tt ~re1 <-> re2}
883:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\
884: {\tt EX(x, re)}, {\tt ~FA(x, re)}
885:     & $\ge O(${\em bddsize}$(${\tt re}$)^2)$ \\
886: {\tt TC(re)} & $O(n^3)$ \\
887: {\tt TCFAST(re)} & $O(n^3 \log n)$ \\
888: {\tt FALSE(x1, x2, ...)} & $O(1)$ \\
889: {\tt TRUE(x1, x2, ...)} & $O(1)$ \\
890: {\tt @s(x)} & $O(n \log n)$ \\
891: {\tt \symbol{126}(x1, x2)}
892:     & $O(n \log n)$ ({\tt \symbol{126}} can be
893:       {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >})\\
894: {\tt \symbol{126}(ne1, ne2)}
895:     & $O(1)$ ({\tt \symbol{126}} can be
896:       {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >})\\
897: {\tt re1 = re2}, {\tt ~re1 != re2} & $O(1)$ \\
898: {\tt re1 < re2}, {\tt ~re1 <= re2}
899:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\
900: {\tt re1 > re2}, {\tt ~re1 >= re2}
901:     & $O(${\em bddsize}$(${\tt re1}$)$ $\cdot$ {\em bddsize}$(${\tt re2}$))$\\[2ex]
902: \end{tabular}
903: 
904: \caption{Worst case time complexity of the evaluation of relational
905: expressions. {\tt re}, {\tt re1}, {\tt re2} are relational expressions, {\tt
906: x}, {\tt x1}, {\tt x2} are attributes, and {\tt ne1}, {\tt ne2} are numerical
907: expressions. {\em bddsize}$(${\tt re}$)$ is the number of BDD nodes of the
908: result of the expression {\tt re}, and $n$~is the cardinality of the universe.}
909: \label{t:complexity}
910: \end{table}
911: 
912: 
913: 
914: 
915: \subsection{Extending the Universe} \label{s:universe}
916: 
917: The set of all strings that may be tuple elements of relations in an RML
918: program is called the {\em universe}.  The universe contains all tuple elements
919: of the input relations (from the input RSF data), and all string literals that
920: appear on the left hand side of assignments in the RML program.  The universe
921: is immutable in the sense that it can be determined before the interpretation
922: of the RML program starts, and is not changed during the interpretation of the
923: RML program.
924: 
925: Sometimes the immutability of the universe is inconvenient for the developer of
926: RML programs. Consider, for example, a program that takes as input the nodes
927: and arcs of a graph, and computes the binary relation {\tt OutneighborCnt}
928: which contains for each node the number of outneighbors:
929: \begin{verbatim}
930: FOR n IN Node(x) {
931:     OutneighborCnt(n, #(Arc(n,x))) := TRUE();
932: }
933: \end{verbatim}
934: This is not a syntactically correct RML program, because {\tt \#(Arc(n,x))} is
935: not a string, but a number. However, RML has an operator {\tt STRING} that
936: converts a number into a string.  But
937: \begin{verbatim}
938: OutneighborCnt(n, STRING( #(Arc(n,x)) )) := TRUE();
939: \end{verbatim}
940: is still not syntactically correct, because such a conversion is not allowed at
941: the left hand side of assignment statements. The reason is that the string that
942: results from such a conversion is generally not known before the execution of
943: the RML program, can therefore not be added to the universe before the
944: execution, and is thus not allowed as tuple element of a relation.
945: 
946: The immutability of the universe during the execution of an RML program is
947: necessary because constant relations like {\tt TRUE(x)} (the universe) and {\tt
948: =(x,y)} (string equality for all strings in the universe) are only defined for
949: a given universe.  Also, the complement of a relation depends on the universe:
950: The complement of a set contains all strings of the universe that are not in
951: the given set, and thus clearly changes when the universe changes.
952: 
953: However, there is a way to work around this limitation: Writing an RSF file,
954: and restarting CrocoPat with this RSF file as input, which adds all tuple
955: elements in the RSF file to the universe.  For example, the above incorrect
956: program can be replaced by the following correct program:
957: \begin{verbatim}
958: FOR n IN Node(x) {
959:     PRINT "OutneighborCnt ", n, " ", #(Arc(n,x)) TO "OutneighborCnt.rsf";
960: }
961: \end{verbatim}
962: When CrocoPat is restarted with the resulting RSF file {\tt OutneighborCnt.rsf}
963: as input, the binary relation {\tt OutneighborCnt} is available for further
964: processing.
965: 
966: 
967: \section{CrocoPat Reference} \label{s:pat}
968: 
969: CrocoPat is executed with
970: \begin{verbatim}
971: crocopat [OPTION]... FILE [ARGUMENT]...
972: \end{verbatim}
973: It first reads relations in RSF (see Section~\ref{s:rsf}) from stdin (unless
974: the option {\tt -e} is given) and then executes the RML program {\tt FILE} (see
975: Section~\ref{s:lan}).  The {\tt ARGUMENT}s are passed
976: to the RML program.  The {\tt OPTION}s are \\[1ex]
977: \begin{tabular}{l@{\hspace{3mm}}l}
978: {\tt -e}        &  Do not read RSF data from stdin. \\
979: {\tt -m NUMBER} &  Approximate memory for BDD package in MB. The default is~50. See Subsection~\ref{ss:memory}.\\
980: {\tt -q}        &  Suppress warnings. \\
981: {\tt -h}        &  Display help message and exit. \\
982: {\tt -v}        &  Print version information and exit. \\
983: \end{tabular}\\[1ex]
984: \pagebreak
985: 
986: The output of the RML program can be written to files, stdout, or stderr, as
987: specified in the RML program.  Error messages and warnings of CrocoPat are always written
988: to stderr.
989: 
990: The exit status of CrocoPat is~1 if it terminates abnormally and~0 otherwise.
991: CrocoPat always outputs an error message to stderr before it terminates with exit
992: status~1.
993: 
994: 
995: \section{RSF Reference} \label{s:rsf}
996: 
997: Rigi Standard Format (RSF) is CrocoPat's input and output format for relations.
998: It is an extension of the format for binary relations defined
999: in~\cite[Section~4.7.1]{Wong:1998}.  For examples of its use see
1000: Subsection~\ref{s:output}.
1001: 
1002: An RSF stream is a sequence of lines.  The order of the lines is arbitrary. The
1003: repeated occurrence of a line is permissable and has the same meaning as a
1004: single occurrence. The end of an RSF stream is indicated by the end of the file
1005: or by a line that starts with a dot ({\tt .}). Lines starting with a sharp
1006: ({\tt \#}) are comment lines.
1007: 
1008: All lines that do not start with a dot or a sharp specify a tuple in a
1009: relation. They consist of the name of the relation followed by a sequence of
1010: (an arbitrary number of) tuple elements, separated by at least one whitespace
1011: character (i.e., space or horizontal tab).
1012: 
1013: Relation names must be RML identifiers (see Subsection~\ref{ss:lexical}). Tuple
1014: elements are sequences of arbitrary characters except line breaks and
1015: whitespace characters.  A tuple element may be optionally enclosed by double
1016: quotes ({\tt "}), in which case it may also contain whitespace characters.
1017: Tuple elements that are enclosed by double quotes in the RSF input of an RML
1018: program are also enclosed by double quotes in its output.
1019: 
1020: 
1021: 
1022: \section{RML Reference} \label{s:lan}
1023: 
1024: Relation Manipulation Language (RML) is CrocoPat's programming language for
1025: manipulating relations. This section defines the lexical structure, the syntax,
1026: and the semantics of RML. Nonterminals are printed in italics and terminals in
1027: typewriter.
1028: 
1029: \subsection{Lexical Structure} \label{ss:lexical}
1030: 
1031: Identifiers are sequences of Latin letters ({\tt a-zA-z}), digits ({\tt 0-9})
1032: and underscores ({\tt \_}), the first of which must be a letter or underscore.
1033: RML has four types of identifiers: attributes ({\em attribute}), relational
1034: variables ({\em rel\_var}), string variables ({\em str\_var}), and numerical
1035: variables ({\em num\_var}).  Every identifier of an RML program belongs to
1036: exactly one of these types.  The type is determined at the first occurrence of
1037: the identifier in the input RSF file (only possible for relational variables)
1038: or in the RML program. Explicit declaration of identifiers is not necessary.
1039: 
1040: The following strings are reserved as keywords and therefore cannot be
1041: used as identifiers:
1042: \begin{verbatim}
1043: AVG DIV ELSE ENDL EX EXEC EXIT FA FOR IF IN MAX MIN MOD NUMBER PRINT RELINFO
1044: STDERR STRING SUM TC TCFAST TO WHILE
1045: \end{verbatim}
1046: 
1047: RML has two types of literals: string literals ({\em str\_literal}) and
1048: numerical literals ({\em num\_literal}).  String literals are delimited by
1049: double quotes ({\tt "}) and can contain arbitrary characters except double
1050: quotes.  A numerical literal consists of an integer part, a fractional part
1051: indicated by a decimal point ({\tt .}), and an exponent indicated by the letter
1052: {\tt e} or {\tt E} followed by an optionally signed integer.  All three parts
1053: are optional, but at least one digit in the integer part or the fractional part
1054: is required. Examples of numerical literals are {\tt 1}, {\tt .2}, {\tt 3.},
1055: {\tt 4.5}, and {\tt 6e-7}.
1056: 
1057: There are two kinds of comments: Text starting with {\tt /*} and ending with
1058: {\tt */}, and text from {\tt //} to the end of the line. \pagebreak
1059: 
1060: \subsection{Syntax and Informal Semantics}\label{s:syntax}
1061: 
1062: \footnotetext[1]{{\em stmt} ... denotes a sequence of one or more {\em stmt}s.}
1063: \footnotetext[2]{Context conditions are marked with~$\to$.}
1064: \footnotetext[3]{{\tt \symbol{126}} can be
1065:        {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >}.}
1066: 
1067: \begin{tabbing}
1068: \hspace*{52mm}\=\hspace{5mm}\= \kill
1069: \noindent{\bfseries \em program} ::=
1070:     \>\> {\bf RML program.}\\
1071: {\em stmt} ...\footnotemark[1]
1072:     \>\> Executes the {\em stmt}s in the given order.\\[1.5ex]
1073: 
1074: \noindent{\bfseries \em stmt} ::=
1075:     \>\> {\bf Statement.}\\
1076: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt )} {\tt := }{\em rel\_expr}{\tt ;}
1077:     \>\> Assigns the result of {\em rel\_expr} to {\em rel\_var}.\\
1078:     \>$\to$\footnotemark[2]\> The {\em term}s must be {\em attribute}s or {\em str\_literal}s. \\
1079:     \>$\to$\> The set of {\em attribute}s among the {\em term}s on the left hand side \\
1080:     \>\> must equal the set of free attributes in {\em rel\_expr}. \\
1081: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt );}
1082:     \>\> Shortcut for {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt )} {\tt := TRUE(}{\em term}{\tt ,}...{\tt )}. \\
1083: {\em str\_var}{\tt ~:= }{\em str\_expr}{\tt ;}
1084:     \>\> Assigns the result of {\em str\_expr} to {\em str\_var}.\\
1085: {\em num\_var}{\tt ~:= }{\em num\_expr}{\tt ;}
1086:     \>\> Assigns the result of {\em num\_expr} to {\em num\_var}.\\
1087: {\tt IF}~{\em rel\_expr}~{\tt \{}{\em stmt} ...{\tt\}}~{\tt ELSE}~{\tt \{}{\em stmt} ...{\tt\}}
1088:     \>\> Executes the {\em stmt}s before {\tt ELSE} if the result of {\em rel\_expr} is {\tt TRUE()}, \\
1089: {\tt IF}~{\em rel\_expr}~{\tt \{}{\em stmt} ...{\tt\}}
1090:     \>\> and the {\em stmt}s after {\tt ELSE} (if present) otherwise.\\
1091:     \>$\to$\> {\em rel\_expr} must not have free attributes.\\
1092: {\tt WHILE }{\em rel\_expr}{\tt~\{}{\em stmt} ...{\tt\}}
1093:     \>\> Exec. the {\em stmt}s repeatedly as long as {\em rel\_expr} evaluates to {\tt TRUE()}. \\
1094:     \>$\to$\> {\em rel\_expr} must not have free attributes.\\
1095: {\tt FOR~}{\em str\_var}{\tt~IN~}{\em rel\_expr}{\tt~\{}{\em stmt} ...{\tt\}}
1096:     \>\> Executes the {\em stmt}s once for each element in the result of {\em rel\_expr}. \\
1097:     \>$\to$\> {\em rel\_expr} must have exactly one free attribute.\\
1098: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt ;}
1099:     \>\> Writes the results of the {\em print\_expr}s to stdout. \\
1100: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt~TO STDERR;}
1101:     \>\> Writes the results of the {\em print\_expr}s to stderr. \\
1102: {\tt PRINT }{\em print\_expr}{\tt ,}...{\tt~TO }{\em str\_expr}{\tt ;}
1103:     \>\> Appends the results of the {\em print\_expr}s to the specified file.\\
1104: {\tt EXEC }{\em str\_expr}{\tt ;}
1105:     \>\> Executes the shell command given by {\em str\_expr}. \\
1106:     \>\> The exit status is available as numerical constant {\tt exitStatus}.\\
1107: {\tt EXIT }{\em num\_expr}{\tt ;}
1108:     \>\> Exits CrocoPat with the given exit status. \\
1109: {\tt \{ }{\em stmt} ...{\tt~\}}
1110:     \>\> Executes the {\em stmt}s in the given order. \\[1.5ex]
1111: 
1112: 
1113: \noindent{\bfseries \em rel\_expr} ::=
1114:     \>\> {\bf Relational Expression. The result is a relation.} \\*
1115: {\em rel\_var}{\tt (}{\em term}{\tt ,}...{\tt)}
1116:     \>\> Atomic relational expression. \\*
1117: {\em term~~rel\_var~~term}
1118:     \>\> Same as {\em rel\_var}{\tt (}{\em term}{\tt ,} {\em term}{\tt)}. \\*
1119: {\tt ! }{\em rel\_expr}
1120:     \>\> Negation (not). \\*
1121: {\em rel\_expr}{\tt~\& }{\em rel\_expr}
1122:     \>\> Conjunction (and). \\*
1123: {\em rel\_expr}{\tt~| }{\em rel\_expr}
1124:     \>\> Disjunction (or). \\*
1125: {\em rel\_expr}{\tt~-> }{\em rel\_expr}
1126:     \>\> Implication (if). {\tt r1 -> r2 }is equivalent to{\tt~!(r1) | (r2)}.\\*
1127: {\em rel\_expr}{\tt~<-> }{\em rel\_expr}
1128:     \>\> Equivalence (if and only if). \\*
1129:     \>\> {\tt r1 <-> r2 }is equivalent to{\tt~(r1 -> r2) \& (r2 -> r1)}.\\*
1130: {\tt EX(}{\em attribute}{\tt ,}...{\tt , }{\em rel\_expr}{\tt )}
1131:     \>\> Existential quantification of the {\em attribute}s. \\*
1132: {\tt FA(}{\em attribute}{\tt ,}...{\tt , }{\em rel\_expr}{\tt )}
1133:     \>\> Universal quantification of the {\em attribute}s. \\*
1134: {\tt TC(}{\em rel\_expr}{\tt )}
1135:     \>\> Transitive closure. \\*
1136:     \>$\to$\> {\em rel\_expr} must have exactly two free attributes. \\*
1137: {\tt TCFAST(}{\em rel\_expr}{\tt )}
1138:     \>\> Same as {\tt TC}, but with an alternative algorithm (see Section~\ref{s:control}). \\*
1139: {\tt FALSE(}{\em term}{\tt ,}...{\tt )}
1140:     \>\> Empty relation. \\*
1141: {\tt TRUE(}{\em term}{\tt ,}...{\tt )}
1142:     \>\> Relation containing all tuples of strings in the universe. \\*
1143: {\tt @}{\em str\_expr}{\tt (}{\em term}{\tt )}
1144:     \>\> Strings in the universe that match the regular expression {\em str\_expr}. \\*
1145: {\tt \symbol{126}(}{\em term}{\tt , }{\em term}{\tt)}
1146:     \>\> Lexicographical order of all strings in the universe.\footnotemark[3] \\*
1147: {\em term}{\tt~\symbol{126} }{\em term}
1148:     \>\> Same as {\tt \symbol{126}(}{\em term}{\tt , }{\em term}{\tt)}.\\*
1149: {\tt \symbol{126}(}{\em num\_expr}{\tt , }{\em num\_expr}{\tt)}
1150:     \>\> Numerical comparison. The result is either {\tt TRUE()} or {\tt FALSE()}.\footnotemark[3] \\*
1151: {\em num\_expr}{\tt~\symbol{126} }{\em num\_expr}
1152:     \>\> Same as {\tt \symbol{126}(}{\em num\_expr}{\tt , }{\em num\_expr}{\tt)}.\\*
1153: {\em rel\_expr}{\tt~\symbol{126} }{\em rel\_expr}
1154:     \>\> Relational comparison. The result is either {\tt TRUE()} or {\tt FALSE()}.\footnotemark[3] \\*
1155: {\tt (}{\em rel\_expr}{\tt )}
1156: \end{tabbing}
1157: 
1158: \begin{tabbing}
1159: \hspace*{45mm}\=\hspace{5mm}\= \kill
1160: \noindent{\bfseries \em term} ::=
1161:     \>\> {\bf Term.} \\
1162: {\em attribute}
1163:     \>\> Attribute. \\
1164: {\tt \_}
1165:     \>\> Anonymous attribute. E.g. {\tt R(\_)} is equivalent to {\tt EX(x, R(x))}.\\
1166: {\em str\_expr}
1167:     \>\> String expression. \\[2ex]
1168: 
1169: 
1170: \noindent{\bfseries \em str\_expr} ::=
1171:     \>\> {\bf String Expression. The result is a string.} \\
1172: {\em str\_literal}
1173:     \>\> String literal.\\
1174: {\em str\_var}
1175:     \>\> String variable.\\
1176: {\tt STRING(}{\em num\_expr}{\tt )}
1177:     \>\> Converts the result of {\em num\_expr} into a string. \\
1178: {\tt \$ }{\em num\_expr}
1179:     \>\> Command line argument. The first argument has the number~$1$. \\
1180:     \>\> The constant {\tt argCount} contains the number of arguments. \\
1181: {\em str\_expr}{\tt~+ }{\em str\_expr}
1182:     \>\> Concatenation. \\
1183: {\tt (}{\em str\_expr}{\tt )}\\[2ex]
1184: 
1185: 
1186: \noindent{\bfseries \em num\_expr} ::=
1187:     \>\> {\bf Numerical Expression. The result is a floating point number.} \\
1188: {\em num\_literal}
1189:     \>\> Numerical literal.\\
1190: {\em num\_var}
1191:     \>\> Numerical variable.\\
1192: {\tt NUMBER(}{\em str\_expr}{\tt )}
1193:     \>\> Converts the result of {\em str\_expr} into a number. Yields~$0.0$ if \\
1194:     \>\> the result of {\em str\_expr} is not the string representation of a number.\\
1195: {\tt \#(}{\em rel\_expr}{\tt )}
1196:     \>\> Cardinality (number of elements) of the result of {\em rel\_expr}. \\
1197: {\tt MIN(}{\em rel\_expr}{\tt )}, {\tt MAX(}{\em rel\_expr}{\tt )}, {\tt SUM(}{\em rel\_expr}{\tt )}, {\tt AVG(}{\em rel\_expr}{\tt )} \\
1198:     \>\> Minimum, maximum, sum, and arithmetic mean of {\tt NUMBER(s)} \\
1199:     \>\> over all strings {\tt s} in the result of {\em rel\_expr}. \\
1200:     \>$\to$\> {\em rel\_expr} must have one free attribute, its result must be non-empty. \\
1201: {\em num\_expr}{\tt~+ }{\em num\_expr}
1202:     \>\> Addition. \\
1203: {\em num\_expr}{\tt~- }{\em num\_expr}
1204:     \>\> Subtraction. \\
1205: {\em num\_expr}{\tt~* }{\em num\_expr}
1206:     \>\> Multiplication. \\
1207: {\em num\_expr}{\tt~/ }{\em num\_expr}
1208:     \>\> Real division. \\
1209: {\em num\_expr}{\tt~DIV }{\em num\_expr}
1210:     \>\> Integer division (truncating). \\
1211: {\em num\_expr}{\tt~MOD }{\em num\_expr}
1212:     \>\> Modulo. \\
1213: {\em num\_expr}{\tt~\symbol{94} }{\em num\_expr}
1214:     \>\> The first {\em num\_expr} raised to the power given by the second {\em num\_expr}. \\
1215: {\tt argCount}
1216:     \>\> Number of command line arguments. \\
1217: {\tt exitStatus}
1218:     \>\> Exit status of the last executed {\tt EXEC} statement. \\
1219: {\tt (}{\em num\_expr}{\tt )}\\[2ex]
1220: 
1221: 
1222: \noindent{\bfseries \em print\_expr} ::=
1223:     \>\> {\bf Print Expression.} \\
1224: {\em rel\_expr}
1225:     \>\> Prints the tuples in the result of {\em rel\_expr}, one tuple per line.\\
1226: {\tt [}{\em str\_expr}{\tt ] }{\em rel\_expr}
1227:     \>\> Prints the result of {\em str\_expr} before each tuple of {\em rel\_expr}. \\
1228: {\em str\_expr}
1229:     \>\> Prints the result of {\em str\_expr}.\\
1230: {\em num\_expr}
1231:     \>\> Prints the result of {\em num\_expr}.\\
1232: {\tt ENDL}
1233:     \>\> Prints a line break. \\
1234: {\tt RELINFO(}{\em rel\_expr}{\tt )}
1235:     \>\> Prints information about the BDD representation of {\em rel\_expr}.
1236: \end{tabbing}
1237: 
1238: 
1239: \noindent{\bf Levels of Precedence (from Low to High)}\\
1240: {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt >=}, {\tt >} \\
1241: {\tt ->}, {\tt <->}\\
1242: {\tt |}\\
1243: {\tt \&}\\
1244: {\tt !}\\
1245: {\tt +}, {\tt -} (binary) \\
1246: {\tt *}, {\tt /}, {\tt DIV}, {\tt MOD} \\
1247: {\tt \symbol{94}}\\
1248: {\tt -} (unary)\\
1249: {\tt \$}
1250: 
1251: 
1252: \subsubsection{Free Attributes in Relational Expressions.}
1253: Several context conditions of RML refer to the free attributes in relational
1254: expressions.  The number of free attributes in a relational expression equals
1255: the arity of the resulting relation.  Informally, the set of free attributes of
1256: an expression is the set of its contained attributes that are not in the scope
1257: of a quantifier (i.e., {\tt EX} or {\tt FA}).  Exceptions are the numerical and
1258: relational comparison, which have Boolean results and therefore no free
1259: attributes. Formally, the function~$f$ that assigns to each relational
1260: expression the set of its free attributes is inductively defined as
1261: follows:\\[1ex]
1262: \begin{tabular}{l@{\hspace{2mm}}l@{\hspace{2mm}}l}
1263: $f($~{\em rel\_var}{\tt (}{\em term}$_1${\tt ,} {\em term}$_2${\tt ,}...{\tt)}$~)$
1264:     & $:=$ & $\{$ {\em term}$_i~|~${\em term}$_i$ is an attribute $\}$ \\
1265: $f($~{\tt !}{\em rel\_expr}~$)$
1266:     & $:=$ & $f($~{\em rel\_expr}~$)$\\
1267: $f($~{\em rel\_expr}$_1${\tt~\& }{\em rel\_expr}$_2$~$)$
1268:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\
1269: $f($~{\em rel\_expr}$_1${\tt~| }{\em rel\_expr}$_2$~$)$
1270:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\
1271: $f($~{\em rel\_expr}$_1${\tt~-> }{\em rel\_expr}$_2$~$)$
1272:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\
1273: $f($~{\em rel\_expr}$_1${\tt~<-> }{\em rel\_expr}$_2$~$)$
1274:     & $:=$ & $f($ {\em rel\_expr}$_1$~$) ~\cup~ f($~{\em rel\_expr}$_2$~$)$ \\
1275: $f($~{\tt EX(}{\em attribute}{\tt , }{\em rel\_expr}{\tt )}~$)$
1276:     & $:=$ & $f($ {\em rel\_expr} $) ~\setminus~ \{$ {\em attribute} $\}$\\
1277: $f($~{\tt FA(}{\em attribute}{\tt , }{\em rel\_expr}{\tt )}~$)$
1278:     & $:=$ & $f($ {\em rel\_expr} $) ~\setminus~ \{$ {\em attribute} $\}$\\
1279: $f($~{\tt TC(}{\em rel\_expr}{\tt )}~$)$
1280:     & $:=$ & $f($~{\em rel\_expr}~$)$\\
1281: $f($~{\tt TCFAST(}{\em rel\_expr}{\tt )}~$)$
1282:     & $:=$ & $f($~{\em rel\_expr}~$)$\\
1283: $f($~{\tt (}{\em num\_expr$_1$}{\tt~\symbol{126} }{\em num\_expr$_2$}{\tt )}~$)$
1284:     & $:=$ & $\emptyset$ \\
1285: $f($~{\tt (}{\em rel\_expr$_1$}{\tt~\symbol{126} }{\em rel\_expr$_2$}{\tt )}~$)$
1286:     & $:=$ & $\emptyset$ \\
1287: $f($~{\tt (}{\em rel\_expr}{\tt )} $)$
1288:     & $:=$ & $f($~{\em rel\_expr}~$)$
1289: \end{tabular} \\[1ex]
1290: As before, {\tt\symbol{126}} can be {\tt =}, {\tt !=}, {\tt <=}, {\tt <}, {\tt
1291: >=}, or {\tt >}. The relational constants {\tt\symbol{126}}, {\tt FALSE},
1292: {\tt TRUE}, and {\tt @}{\em str\_expr} are equivalent to {\em rel\_var} with
1293: respect to the definition of free attributes.
1294: 
1295: 
1296: \subsubsection{Regular Expressions.} In the relational expression
1297: {\tt @}{\em str\_expr}{\tt (}{\em term}{\tt )}, the result of {\em str\_expr}
1298: can be any POSIX extended regular expression~\cite[Section~9.4]{IEEE:2001}.  A
1299: full description is beyond the scope, we only give a short overview.
1300: 
1301: Most characters in a regular expression only match themselves.  The following
1302: special characters match themselves only when they are preceded by a
1303: backslash~({\tt \symbol{92}}), and otherwise have special meanings:\\[1ex]
1304: \begin{tabular}{ll}
1305: {\tt .}  & Matches any single character.\\
1306: $\mathtt{[~]}$ & Matches any single character contained within the brackets.\\
1307: $\mathtt{[}${\tt \symbol{94}}~$\mathtt{]}$ & Matches any single character not contained within the
1308: brackets.\\
1309: {\tt \symbol{94}} & Matches the start of the string.\\
1310: {\tt \$} & Matches the end of the string.\\
1311: {\tt \{$x$,$y$\}}~~ & Matches the last character (or regular expression enclosed by
1312: parentheses) \\
1313:                     & at least $x$ and at most $y$ times.\\
1314: {\tt +}  & Matches the last character (or regular expression enclosed by
1315: parentheses) one or more times.\\
1316: {\tt *}  & Matches the last character (or regular expression enclosed by
1317: parentheses) zero or more times.\\
1318: {\tt ?}  & Matches the last character (or regular expression enclosed by
1319: parentheses) zero or one times.\\
1320: {\tt |}  & Matches either the expression before or the expression after the
1321: operator.
1322: \end{tabular}\\[1ex]
1323: Regular expressions can be grouped by enclosing them with
1324: parentheses~({\tt(}...{\tt)}).
1325: 
1326: 
1327: 
1328: 
1329: \subsection{Formal Semantics}\label{s:semantics}
1330: 
1331: This subsection formally defines the semantics of the part of RML that deals
1332: with relations, namely of relational expressions and the relational assignment
1333: statement.
1334: 
1335: The {\em universe} $\cc$ is the finite set of all string literals that appear
1336: in the input RSF file, or on the left side of a relational assignment. The
1337: finite set of {\em attributes} of the RML program is denoted by $\cx$ ($\cc
1338: \cap \cx = \emptyset$). An {\em attribute assignment} is a total function $v:
1339: \cx \cup \cc \to \cc$ which maps each attribute to its value and (for
1340: notational convenience) each string literal to itself.  The set of all
1341: attribute assignments is denoted by $\mathit{Val(\cx)}$.
1342: 
1343: The finite set of {\em relation variables} in the RML program is denoted by
1344: $\crel$.  A {\em relation assignment} is a total function $s: \crel \to
1345: \bigcup_{n \in \mathbb{N}} 2^{\prod_{k=1}^n \cc}$, which maps each relation
1346: variable to a relation of arbitrary arity. The set of all relation assignments
1347: is denoted by~$Rel(\crel)$.
1348: 
1349: The semantics of relational expressions and statements are given
1350: by the following interpretation functions:
1351: \[
1352: \begin{array}{lcll}
1353: \bbbkl.\bbbkr_e &:& \mbox{\em rel\_expr}s \to
1354:                   ( Rel(\crel) \to 2^{\mathit{Val}(\cx)} ) \\[1ex]
1355: \bbbkl.\bbbkr_s &:& \mbox{\em stmt}s \to
1356:                   ( Rel(\crel) \to Rel(\crel) ) \\
1357: \end{array}
1358: \]
1359: So we define the semantics of an expression as the set of attribute assignments
1360: that satisfy the expression, and the semantics of a statement as a
1361: transformation of the relation assignment. The interpretation functions are
1362: defined inductively in Figure~\ref{fig:sem}.
1363: 
1364: \begin{figure*}%[htb]
1365: \[
1366: \hspace{-5mm}  \begin{array}{rcll}
1367:     \bbbkl \mbox{{\em rel\_var}{\tt (}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&
1368:       \big\{ v \in Val(\cx) ~\big|~
1369:         \big( v(term_1), \dots, v(term_n) \big) \in s(\mbox{\em rel\_var})
1370:       \big\}\\[1ex]
1371: 
1372:     \bbbkl \mbox{{\tt !}~{\em rel\_expr}} \bbbkr_e (s) &=&
1373:       \mathit{Val}(\cx) \setminus \bbbkl \mbox{{\em rel\_expr}} \bbbkr_e (s) \\[1ex]
1374: 
1375:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt \&}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&
1376:       \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \cap
1377:       \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\[1ex]
1378: 
1379:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt |}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&
1380:       \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \cup
1381:       \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\[1ex]
1382: 
1383:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt ->}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&
1384:       \bbbkl \mbox{{\tt !}~{\tt (}{\em rel\_expr$_1$}{\tt )}~{\tt |\,(}{\em rel\_expr$_2$}{\tt )}} \bbbkr_e (s) \\[1ex]
1385: 
1386:     \bbbkl \mbox{{\em rel\_expr$_1$}~{\tt <->}~{\em rel\_expr$_2$}} \bbbkr_e (s) &=&
1387:       \bbbkl \mbox{{\tt (}{\em rel\_expr$_1$}~{\tt ->}~{\em rel\_expr$_2$}{\tt )}~{\tt \&\,}{\tt (}{\em rel\_expr$_2$}~{\tt ->}~{\em rel\_expr$_1$}{\tt )}} \bbbkr_e (s) \\[1ex]
1388: 
1389:     \bbbkl \mbox{{\tt EX(}{\em attribute}{\tt ,}~{\em rel\_expr}{\tt )}} \bbbkr_e (s) &=&
1390:       \big\{ v \in Val(\cx) ~\big|~ \exists v' \in \bbbkl \mbox{{\em rel\_expr}} \bbbkr_e(s)
1391:                                    ~\forall x \in \cx \setminus \{\mbox{{\em attribute}}\}: v(x) = v'(x) \big\}\\[1ex]
1392: 
1393:     \bbbkl \mbox{{\tt FA(}{\em attribute}{\tt ,}~{\em rel\_expr}{\tt )}} \bbbkr_e (s) &=&
1394:       \bbbkl \mbox{{\tt !}~{\tt EX(}{\em attribute}{\tt ,}~{\tt !}~{\tt (}{\em rel\_expr}{\tt ))}} \bbbkr_e (s) \\[1ex]
1395: 
1396:     \bbbkl \mbox{{\tt TC(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s)
1397:                 &\stackrel{\mbox{\em \footnotesize lfp}}{=}&
1398:       \bbbkl \mbox{~~~~{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt )}} \\
1399:          &&\mbox{~~{\tt | EX(}{\em attribute}$_3${\tt ,}~~~
1400:            {\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_3${\tt )}}\\
1401:          &&\mbox{\hspace{27mm}{\tt \& TC(}{\em rel\_var}{\tt (}{\em attribute}$_3${\tt ,}~{\em attribute}$_2${\tt )))}} \bbbkr_e (s) \\[1ex]
1402: 
1403:     \bbbkl \mbox{{\tt TCFAST(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s) &=&
1404:       \bbbkl \mbox{{\tt TC(}{\em rel\_var}{\tt (}{\em attribute}$_1${\tt ,}~{\em attribute}$_2${\tt ))}} \bbbkr_e (s) \\[3ex]
1405: 
1406:     \bbbkl \mbox{{\tt TRUE(}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&
1407:       \mathit{Val}(\cx)\\[1ex]
1408: 
1409:     \bbbkl \mbox{{\tt FALSE(}{\em term}$_1${\tt ,}~$\dots${\tt ,}~{\em term}$_n${\tt )}} \bbbkr_e (s) &=&
1410:       \emptyset \\[1ex]
1411: 
1412:     \bbbkl \mbox{{\tt =(}{\em term}$_1${\tt ,}~{\em term}$_2${\tt )}} \bbbkr_e (s) &=&
1413:       \big\{ v \in Val(\cx) ~|~ v(term_1) = v(term_2) \big\}\\[1ex]
1414: 
1415:     \bbbkl \mbox{{\tt <(}{\em term}$_1${\tt ,}~{\em term}$_2${\tt )}} \bbbkr_e (s) &=&
1416:       \big\{ v \in Val(\cx) ~|~ v(term_1) <_{\mathrm{lexicographically}} v(term_2) \big\}\\[3ex]
1417: 
1418:     \bbbkl \mbox{{\tt =(}{\em rel\_expr}$_1${\tt ,}~{\em rel\_expr}$_2${\tt )}} \bbbkr_e (s) &=&
1419:       \mathit{Val}(\cx) \mbox{,~~~if }
1420:         \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) = \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\
1421:       && \emptyset \mbox{,~~~~~~~~~~~otherwise}\\[1ex]
1422: 
1423:     \bbbkl \mbox{{\tt <(}{\em rel\_expr}$_1${\tt ,}~{\em rel\_expr}$_2${\tt )}} \bbbkr_e (s) &=&
1424:       \mathit{Val}(\cx) \mbox{,~~~if }
1425:         \bbbkl \mbox{{\em rel\_expr$_1$}} \bbbkr_e(s) \subsetneq \bbbkl \mbox{{\em rel\_expr$_2$}} \bbbkr_e(s) \\
1426:       && \emptyset \mbox{,~~~~~~~~~~~otherwise}\\[1ex]
1427: 
1428:     \bbbkl \mbox{{\tt (}{\em rel\_expr}{\tt )}} \bbbkr_e(s) &=&
1429:       \bbbkl \mbox{\em rel\_expr} \bbbkr_e(s)\\[3ex]
1430: 
1431:     \big( \bbbkl \mbox{{\em rel\_var}{\tt (}{\em term}$_1${\tt ,}$\dots${\tt ,}{\em term}$_n${\tt )}{\tt :=}~{\em rel\_expr}} \bbbkr_s (s)\big) (r) &=&
1432:       s(r), \hspace{64mm}\mbox{if } r \not= \mbox{\em rel\_var}\\[1.5ex]
1433:       && \quad \big\{ \big( v(term_1), \dots, v(term_n) \big)~\big|~v \in \bbbkl \mbox{\em rel\_expr} \bbbkr_e(s) \big\} \\[1ex]
1434:       && \cup~\big\{ t \in s(\mbox{\em r})~|~ \exists i : term_i \in \cc \wedge term_i \not= t_i \big\},
1435:          \qquad\quad \mbox{if } r = \mbox{\em rel\_var}
1436:   \end{array}
1437: \]
1438: with $\mbox{\em attribute, attribute$_1$, attribute$_2$, attribute$_3$} \in \cx
1439: \mbox{, \em term$_1$, ..., term$_n$} \in \cx \cup \cc \mbox{, and \em rel\_var}
1440: \in \crel$.  The symbol $\stackrel{\mbox{\em \footnotesize lfp}}{=}$ denotes
1441: the least fixed point.
1442: 
1443: \caption{RML semantics} \label{fig:sem}
1444: \end{figure*}
1445: 
1446: 
1447: 
1448: 
1449: 
1450: \newcommand{\etalchar}[1]{$^{#1}$}
1451: \begin{thebibliography}{NSW{\etalchar{+}}02}
1452: 
1453: \bibitem[AFC98]{AntoniolEtAl:1998}
1454: G.~Antoniol, R.~Fiutem, and L.~Cristoforetti.
1455: \newblock Design pattern recovery in object-oriented software.
1456: \newblock In {\em Proceedings of the 6th IEEE International Workshop on Program
1457:   Comprehension (IWPC 1998)}, pages 153--160. IEEE Computer Society, 1998.
1458: 
1459: \bibitem[BBS97]{BehnkeEtAl:1997}
1460: Ralf Behnke, Rudolf Berghammer, and Peter Schneider.
1461: \newblock Machine support of relational computations: The {Kiel RELVIEW}
1462:   system.
1463: \newblock Technical Report 9711, Institut f\"ur Informatik und Praktische
1464:   Mathematik, Christian-Albrechts-Universit\"at Kiel, 1997.
1465: 
1466: \bibitem[Bla04]{Blaha:2004}
1467: Michael Blaha.
1468: \newblock A copper bullet for software quality improvement.
1469: \newblock {\em Computer}, 37(2):21--25, 2004.
1470: 
1471: \bibitem[BLM02]{BerghammerEtAl:2002}
1472: Rudolf Berghammer, Barbara Leoniuk, and Ulf Milanese.
1473: \newblock Implementation of relational algebra using binary decision diagrams.
1474: \newblock In H.~de~Swart, editor, {\em Proceedings of the 6th International
1475:   Conference on Relational Methods in Computer Science (RelMICS 2001)}, LNCS
1476:   2561, pages 241--257, Berlin, 2002. Springer-Verlag.
1477: 
1478: \bibitem[BLQ{\etalchar{+}}03]{BerndlEtAl:2003}
1479: Marc Berndl, Ondrej Lhot{\'a}k, Feng Qian, Laurie~J. Hendren, and Navindra
1480:   Umanee.
1481: \newblock Points-to analysis using {BDDs}.
1482: \newblock In {\em Proceedings of the ACM SIGPLAN Conference on Programming
1483:   Language Design and Implementation (PLDI 2003)}, pages 103--114. ACM, 2003.
1484: 
1485: \bibitem[BNL03]{BeyerEtAl:2003}
1486: Dirk Beyer, Andreas Noack, and Claus Lewerentz.
1487: \newblock Simple and efficient relational querying of software structures.
1488: \newblock In {\em Proceedings of the 10th Working Conference on Reverse
1489:   Engineering (WCRE 2003)}, pages 216--225. IEEE Computer Society, 2003.
1490: 
1491: \bibitem[Bry86]{Bryant:1986}
1492: Randal~E. Bryant.
1493: \newblock Graph-based algorithms for boolean function manipulation.
1494: \newblock {\em IEEE Transaction on Computers}, C-35(8):677--691, 1986.
1495: 
1496: \bibitem[Bry92]{Bryant:1992}
1497: Randal~E. Bryant.
1498: \newblock Symbolic boolean manipulation with ordered binary decision diagrams.
1499: \newblock {\em ACM Computing Surveys}, 24(3):293--318, 1992.
1500: 
1501: \bibitem[CGK98]{ChenEtAl:1998}
1502: Yih-Farn Chen, Emden~R. Gansner, and Elftherios Koutsofios.
1503: \newblock A {C++} data model supporting reachability analysis and dead code
1504:   detection.
1505: \newblock {\em IEEE Transactions on Software Engineering}, 24(9):682--694,
1506:   1998.
1507: 
1508: \bibitem[Ciu99]{Ciupke:1999}
1509: Oliver Ciupke.
1510: \newblock Automatic detection of design problems in object-oriented
1511:   reengineering.
1512: \newblock In {\em Proceedings of Technology of Object-Oriented Languages and
1513:   Systems (TOOLS 1999)}, pages 18--32. IEEE Computer Society, 1999.
1514: 
1515: \bibitem[CM03]{ClocksinMellish:2003}
1516: William~F. Clocksin and Cristopher~S. Mellish.
1517: \newblock {\em Programming in {Prolog}}.
1518: \newblock Springer-Verlag, Berlin, 5th edition, 2003.
1519: 
1520: \bibitem[FH00]{FahmyHolt:2000}
1521: Hoda Fahmy and Richard~C. Holt.
1522: \newblock Software architecture transformations.
1523: \newblock In {\em Proceedings of the International Conference on Software
1524:   Maintenance (ICSM 2000)}, pages 88--96. IEEE Computer Society, 2000.
1525: 
1526: \bibitem[FKvO98]{FeijsEtAl:1998}
1527: Loe M.~G. Feijs, Ren\'e~L. Krikhaar, and Rob~C. van Ommering.
1528: \newblock A relational approach to support software architecture analysis.
1529: \newblock {\em Software: Practice \& Experience}, 28(4):371--400, 1998.
1530: 
1531: \bibitem[GHJV95]{GammaEtAl:1995}
1532: Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.
1533: \newblock {\em Design Patterns}.
1534: \newblock Addison-Wesley, Reading, MA, 1995.
1535: 
1536: \bibitem[Har91]{Hartman:1991}
1537: John Hartman.
1538: \newblock Understanding natural programs using proper decomposition.
1539: \newblock In {\em Proceedings of the 13th International Conference on Software
1540:   Engineering (ICSE 1991)}, pages 62--73, 1991.
1541: 
1542: \bibitem[HN90]{HarandiNing:1990}
1543: Mehdi~T. Harandi and Jim~Q. Ning.
1544: \newblock Knowledge-based program analysis.
1545: \newblock {\em IEEE Software}, 7(1):74--81, 1990.
1546: 
1547: \bibitem[Hol96]{Holt:1996}
1548: Richard~C. Holt.
1549: \newblock Binary relational algebra applied to software architecture.
1550: \newblock Technical Report CSRI 345, University of Toronto, 1996.
1551: 
1552: \bibitem[Hol98]{Holt:1998}
1553: Richard~C. Holt.
1554: \newblock Structural manipulations of software architecture using {Tarski}
1555:   relational algebra.
1556: \newblock In {\em Proceedings of the 5th Working Conference on Reverse
1557:   Engineering (WCRE 1998)}, pages 210--219. IEEE Computer Society, 1998.
1558: 
1559: \bibitem[IEE01]{IEEE:2001}
1560: IEEE.
1561: \newblock {\em IEEE Standard for Information Technology -- Portable Operating
1562:   System Interface (POSIX) (IEEE Std 1003.1-2001)}, 2001.
1563: 
1564: \bibitem[KB98]{KazmanBurth:1998}
1565: Rick Kazman and Marcus Burth.
1566: \newblock Assessing architectural complexity.
1567: \newblock In {\em Proceedings of the 2nd Euromicro Conference on Software
1568:   Maintenance and Reengineering (CSMR 1998)}, pages 104--112, 1998.
1569: 
1570: \bibitem[KP96]{KraemerPrechelt:1996}
1571: Christian Kr\"amer and Lutz Prechelt.
1572: \newblock Design recovery by automated search for structural design patterns in
1573:   object-oriented software.
1574: \newblock In {\em Proceedings of the 3rd Working Conference on Reverse
1575:   Engineering (WCRE 1996)}, pages 208--215. IEEE Computer Society, 1996.
1576: 
1577: \bibitem[Kri01]{Krinke:2001}
1578: Jens Krinke.
1579: \newblock Identifying similar code with program dependence graphs.
1580: \newblock In {\em Proceedings of the 8th Working Conference on Reverse
1581:   Engineering (WCRE 2001)}, pages 301--309. IEEE Computer Society, 2001.
1582: 
1583: \bibitem[KSRP99]{KellerEtAl:1999}
1584: Rudolf~K. Keller, Reinhard Schauer, S{\'e}bastien Robitaille, and Patrick
1585:   Pag{\'e}.
1586: \newblock Pattern-based reverse-engineering of design components.
1587: \newblock In {\em Proceedings of the 21st International Conference on Software
1588:   Engineering (ICSE 1999)}, pages 226--235. ACM, 1999.
1589: 
1590: \bibitem[KW99]{KullbachWinter:1999}
1591: Bernt Kullbach and Andreas Winter.
1592: \newblock Querying as an enabling technology in software reengineering.
1593: \newblock In {\em Proceedings of the 3rd European Conference on Software
1594:   Maintenance and Reengineering (CSMR 1999)}, pages 42--50, 1999.
1595: 
1596: \bibitem[Mar97]{Martin:1997}
1597: Robert~C. Martin.
1598: \newblock Engineering notebook: Stability.
1599: \newblock {\em C++ Report}, February 1997.
1600: 
1601: \bibitem[MNS01]{MurphyEtAl:2001}
1602: Gail~C. Murphy, David Notkin, and Kevin~J. Sullivan.
1603: \newblock Software reflexion models: Bridging the gap between design and
1604:   implementation.
1605: \newblock {\em IEEE Transactions on Software Engineering}, 27(4):364--380,
1606:   2001.
1607: 
1608: \bibitem[MS95]{MendelzonSametinger:1995}
1609: Alberto~O. Mendelzon and Johannes Sametinger.
1610: \newblock Reverse engineering by visualizing and querying.
1611: \newblock {\em Software -- Concepts \& Tools}, 16(4):170--182, 1995.
1612: 
1613: \bibitem[MWD99]{MensWuyts:1999}
1614: Kim Mens, Roel Wuyts, and Theo D'Hondt.
1615: \newblock Declarative codifying software architectures using virtual software
1616:   classifications.
1617: \newblock In {\em Proceedings of the 29th International Conference on
1618:   Technology of Object-Oriented Languages and Systems - Europe (TOOLS 1999)},
1619:   pages 33--45. IEEE Computer Society, 1999.
1620: 
1621: \bibitem[NSW{\etalchar{+}}02]{NiereEtAl:2002}
1622: J\"org Niere, Wilhelm Sch\"afer, J\"org~P. Wadsack, Lothar Wendehals, and Jim
1623:   Welsh.
1624: \newblock Towards pattern-based design recovery.
1625: \newblock In {\em Proceedings of the 24th International Conference on Software
1626:   Engineering (ICSE 2002)}, pages 338--348. ACM, 2002.
1627: 
1628: \bibitem[Qui94]{Quilici:1994}
1629: Alex Quilici.
1630: \newblock A memory-based approach to recognizing programming plans.
1631: \newblock {\em Communications of the ACM}, 37(5):84--93, 1994.
1632: 
1633: \bibitem[RW90]{RichWills:1990}
1634: Charles Rich and Linda~M. Wills.
1635: \newblock Recognizing a program's design: A graph-parsing approach.
1636: \newblock {\em IEEE Software}, 7(1):82--89, 1990.
1637: 
1638: \bibitem[SMB96]{ShullEtAl:1996}
1639: Forrest Shull, Walc{\'e}lio~L. Melo, and Victor~R. Basili.
1640: \newblock An inductive method for discovering design patterns from
1641:   object-oriented software systems.
1642: \newblock Technical Report CS-TR-3597, Computer Science Department, University
1643:   of Maryland, 1996.
1644: 
1645: \bibitem[SSC96]{SefikaEtAl:1996a}
1646: Mohlalefi Sefika, Aamod Sane, and Roy~H. Campbell.
1647: \newblock Monitoring compliance of a software system with its high-level design
1648:   models.
1649: \newblock In {\em Proceedings of the 18th International Conference on Software
1650:   Engineering (ICSE 1996)}, pages 387--396. IEEE Computer Society, 1996.
1651: 
1652: \bibitem[TA99]{TonellaAntoniol:1999}
1653: Paolo Tonella and Giulio Antoniol.
1654: \newblock Object oriented design pattern inference.
1655: \newblock In {\em Proceedings of the International Conference on Software
1656:   Maintenance (ICSM 1999)}, pages 230--238. IEEE Computer Society, 1999.
1657: 
1658: \bibitem[WHH02]{WuEtAl:2002}
1659: Jingwei Wu, Ahmed~E. Hassan, and Richard~C. Holt.
1660: \newblock Using graph patterns to extract scenarios.
1661: \newblock In {\em Proceedings of the 10th International Workshop on Program
1662:   Comprehension (IWPC 2002)}, pages 239--247. IEEE Computer Society, 2002.
1663: 
1664: \bibitem[Won98]{Wong:1998}
1665: Kenny Wong.
1666: \newblock {\em Rigi User's Manual, Version 5.4.4}, 1998.
1667: \newblock http://ftp.rigi.csc.uvic.ca/pub/rigi/doc/.
1668: 
1669: \end{thebibliography}
1670: 
1671: \end{document}
1672: