cs0603028/AI.tex
1: \documentclass[11pt]{article}
2: 
3: \usepackage{alltt}
4: \usepackage{graphics}
5: \usepackage{latexsym}
6: \usepackage{textcomp}
7: \usepackage{amssymb}
8: \usepackage{amsmath}
9: \usepackage{amsthm}
10: \usepackage{pst-tree}
11: \psset{levelsep=5ex,nodesep=3pt,radius=1ex,fillcolor=black,hatchsep=1.5pt}
12: \psset{tnpos=r}
13: 
14: \renewcommand{\leq}{\leqslant}
15: \renewcommand{\geq}{\geqslant}
16: \renewcommand{\emptyset}{\varnothing}
17: \newcommand{\aco}{\texttt{\symbol{'173}}}
18: \newcommand{\act}{\texttt{\symbol{'175}}}
19: 
20: \newtheorem{theorem}{Theorem}
21: \theoremstyle{definition}
22: \newtheorem{definition}{Definition}
23: \theoremstyle{remark}
24: \newtheorem{remark}{Remark}[section]
25: 
26: \title{On the tree-transformation power of XSLT}
27: 
28: \author{Wim Janssen \and
29: Alexandr Korlyukov$^{\text{\textdied}}$
30: \and
31: Jan Van den Bussche\thanks{Wim Janssen and Jan Van den Bussche are with the University of Hasselt, Belgium.}}
32: 
33: \date{}
34: 
35: \begin{document}
36: 	\maketitle
37: 	\renewcommand{\thefootnote}{}
38: 		\footnotetext{Alexandr Korlyukov, who was with Grodno State University,
39: Belarus, sadly passed away shortly after we agreed to write a joint paper.}
40: 	\renewcommand{\thefootnote}{\arabic{footnote}}
41: 	
42: 		
43: 		\begin{abstract}
44: 			XSLT is a standard rule-based programming language for expressing
45: 			transformations of XML data.  The language is currently in transition from
46: 			version 1.0 to 2.0.  In order to understand the computational consequences of
47: 			this transition, we restrict XSLT to its pure tree-transformation
48: 			capabilities.  Under this focus, we observe that XSLT~1.0 was not yet a
49: 			computationally complete tree-transformation language: every 1.0 program can
50: 			be implemented in exponential time.
51: 			A crucial new feature of version~2.0,
52: 			however, which allows node sets over temporary trees, yields completeness.  We
53: 			provide a formal operational semantics for XSLT programs, and establish
54: 			confluence for this semantics.		
55: 		\end{abstract}		
56: 	
57: 	
58: 	
59: 
60: 
61: 		\section{Introduction}
62: 		
63: 			XSLT is a powerful rule-based programming language, relatively widely used,
64: 			for expressing transformations of XML data, and is developed by the W3C (World
65: 			Wide Web Consortium) \cite{xslt,xslt2,xslt_kay}.  An XSLT program is run on
66: 			an XML document as input, and produces another XML document as output.  (XSLT
67: 			programs are actually called ``stylesheets'', as one of their main uses is to
68: 			produce stylised renderings of the input data, but we will continue to call
69: 			them programs here.)
70: 			
71: 			The language is actually in a transition period: the current standard,
72: 			version~1.0, is being replaced by version~2.0.  It is important to understand
73: 			what the new features of 2.0 really add.  In the present paper, we focus on
74: 			the tree-transformation capabilities of XSLT\@.  Indeed, XML documents are
75: 			essentially ordered, node-labeled trees.
76: 			
77: 			From the perspective of tree-transformation capabilities, the most important
78: 			new feature is that of ``node sets over temporary trees''.  We will show that
79: 			this feature turns XSLT into a computationally complete tree-transformation
80: 			language.  Indeed, as we will also show, XSLT~1.0 was \emph{not} yet complete
81: 			in this sense.  Specifically, any 1.0 program can be implemented within
82: 			exponential time in the worst case.  Some programs actually express
83: 			PSPACE-complete problems, because we will show that any linear-space turing
84: 			machine can be simulated by an XSLT~1.0 program.
85: 			
86: 			To put our results in context, we note that the designers of XSLT will most
87: 			probably regard the incompleteness of their language as a feature, rather than
88: 			a defect.  Indeed, in the requirements document for 2.0, turning XSLT into a
89: 			general-purpose programming language is explicitly stated as a ``non-goal''
90: 			\cite{xsltreq}.  In that respect, our result on the completeness of 2.0
91: 			exposes (albeit in a narrow sense) a failure to meet the requirements!
92: 			
93: 			At this point we should be a little clearer on what we mean by ``focusing on
94: 			the tree-transformation capabilities of XSLT''\@.  As already mentioned, XML
95: 			documents are essentially trees where the nodes are labeled by arbitrary
96: 			strings.  We make abstraction of this string content by regarding the node
97: 			labels as coming from some finite alphabet.  Accordingly, we strip XSLT of its
98: 			string-manipulation functions, and restrict its arithmetic to arbitrary
99: 			polynomial-time functions on counters, i.e., integers in the range
100: 			$\{1,2,\dots,n\}$ with $n$ the number of nodes in the input tree.  It is,
101: 			incidentally, quite easy to see that XSLT~1.0
102: 			\emph{without} these restrictions
103: 			can express all computable functions on strings (or integers).  Indeed, rules
104: 			in XSLT can be called recursively, and we all know that arbitrary recursion
105: 			over the strings or the integers gives us completeness.
106: 			
107: 			We will provide a formal operational semantics for the substantial fragment of
108: 			XSLT discussed in this paper.  A formal semantics has not been available,
109: 			although the W3C specifications represent a fine effort in defining it
110: 			informally.  Of course we have tried to make our formalisation faithful to
111: 			those specifications.  Our semantics does not impose an order on operations
112: 			when there is no need to, and as a result the resulting transition relation is
113: 			non-deterministic.  We establish, however, a confluence property, so that any
114: 			two terminating runs on the same input yield the same final result.
115: 			Confluence was not yet proven rigorously for XSLT, and can help in providing a
116: 			formal justification for alternative processing strategies that XSLT
117: 			implementations may follow for the sake of optimisation.
118: 
119: 
120: 			\section{Data model}
121: 			
122: 			\newcommand{\X}{\mathcal{X}}
123: 			\newcommand{\T}{\mathcal{T}}
124: 			\newcommand{\V}{\mathcal{V}}
125: 			\renewcommand{\t}{\mathbf{t}}
126: 			\renewcommand{\S}{\mathbf{S}}
127: 			\newcommand{\E}{\mathbf{E}}
128: 			\newcommand{\n}{\mathbf{n}}
129: 			\newcommand{\Input}{\textsf{Input}}
130: 			\newcommand{\nodes}{\textsf{nodes}}
131: 			\newcommand{\mixed}{\textsf{mixed}}
132: 			\newcommand{\eval}{\mathit{eval}}
133: 			\newcommand{\tstring}{\mathit{string}}
134: 			\newcommand{\maketree}{\mathit{maketree}}
135: 			\newcommand{\tta}{\texttt{a}}
136: 			\newcommand{\ttb}{\texttt{b}}
137: 			\newcommand{\ttc}{\texttt{c}}
138: 			\newcommand{\ttdoc}{\texttt{doc}}
139: 			
140: 			\subsection{Data trees} \label{secdatatrees}
141: 				
142: 				Let $\Sigma$ be a finite alphabet, including the special label \ttdoc.
143: 				By a \emph{data tree} we simply mean a
144: 				finite ordered tree, in which the nodes are labeled by elements of $\Sigma$.
145: 				Up to isomorphism, we can describe a data tree
146: 				$\t$ by a string $\tstring(\t)$
147: 				over the alphabet $\Sigma$ extended with the two symbols \aco\ and \act: if
148: 				the root of $\t$ is labeled \tta\
149: 				and its sequence of top-level subtrees is
150: 				$\t_1,\dots,\t_k$, then $$ \tstring(\t) = \tta \aco\tstring(\t_1)\dots
151: 				\tstring(\t_k) \act $$
152: 				Thus, for the data tree shown in Figure~\ref{figdatatree}, the string
153: 				representation equals
154: 				$$ \tta \aco \ttb \aco \act \ttc \aco \tta \aco \act \ttb \aco \act \act \ttc
155: 				\aco \act \act. $$
156: 				
157: 				\begin{figure}
158: 					\begin{center}
159: 						\pstree{\TR{$\tta$}}
160: 						{
161: 						  \TR{$\ttb$}
162: 						  \pstree{\TR{$\ttc$}}
163: 						  {
164: 						    \TR{$\tta$}
165: 						    \TR{$\ttb$}
166: 						  }
167: 						  \TR{$\ttc$}
168: 						}
169: 					\end{center}
170: 			%		\nocaptionrule
171: 					\caption{A data tree.}
172: 					\label{figdatatree}
173: 				\end{figure}
174: 				
175: 				A \emph{data forest} is a finite sequence of data trees.  Forests arise
176: 				naturally in XSLT, and for uniformity reasons we need to be able to
177: 				present them as data trees.  This can easily be done as follows:
178: 				
179: 				\begin{definition}[maketree] \label{defmaketree}
180: 					Let $F$ be a data forest.  Then $\maketree(F)$ is
181: 					the data tree obtained by affixing a root node on top of $F$,
182: 					and labeling this root node with \ttdoc.\footnote{The root node added by
183: 					$\maketree$ models what is called the ``document root'' in the XPath data
184: 					model \cite{xquerypath_datamodel}, although we do not model it entirely
185: 					faithfully, as we do not formally distinguish ``document nodes'' from
186: 					``element nodes''.  This is only for simplicity; it is no problem to
187: 					incorporate this distinction in our formalism, and our technical results do
188: 					not depend on our simplification.}
189: 				\end{definition}
190: 				
191: 			\subsection{Stores and values}
192: 				
193: 				Let $\T$ be a supply of \emph{tree variables},
194: 				including the special tree variable \Input.  We define:
195: 				
196: 				\begin{definition}
197: 					A \emph{store} is a finite set $\S$ of pairs of the form
198: 					$(x,\t)$, where $x \in \T$ and $\t$ is a data tree, such that
199: 					(1)~\Input\ occurs in $\S$;
200: 					(2)~no tree variable occurs twice in $\S$; and
201: 					(3)~all data trees occurring in $\S$ have disjoint sets of nodes.
202: 					
203: 					The tree assigned to \Input\ is called the \emph{input tree}; the
204: 					other trees are called the \emph{temporary trees}.
205: 				\end{definition}
206: 				
207: 				\begin{definition}
208: 					A \emph{value over $\S$} is a finite sequence consisting
209: 					of nodes from trees in $\S$, and counters
210: 					over $\S$.  Here, a \emph{counter over $\S$} is an integer in the range
211: 					$\{1,2,\dots,n\}$, where $n$ is the total number of nodes in $\S$.
212: 				\end{definition}
213: 				
214: 				Values as defined above formalise the kind of values that can be returned by
215: 				XPath expressions.  XPath \cite{xpath,xpath2} is a language that is used as a
216: 				sublanguage in XSLT for the purpose of selecting nodes from trees.  But XPath
217: 				expressions can also return numbers, which is useful as an aid in making
218: 				node selections (e.g., the $i$-th child of a node, or the $i$-th
219: 				node of the tree in preorder).  We limit these numbers to counters, in order
220: 				to concentrate on pure tree transformations.
221: 
222: 		
223: 		\section{XPath abstraction}
224: 			
225: 			\newcommand{\rootexp}{\texttt{/*}}
226: 			\newcommand{\childexp}{\texttt{child::*}}
227: 			
228: 			Since the language XPath is already well understood
229: 			\cite{wadler_xpath,xpath_nutshell,xquery_formal,hidders_light}, and its study
230: 			in itself is not our focus, we will work with an abstraction of XPath, which
231: 			we denote by $\X$.  For our purposes it will suffice to divide the
232: 			$\X$-expressions in only two different types, which we denote by \nodes\ and
233: 			\mixed.  A value is of type \nodes\ if it consists exclusively of
234: 			nodes; otherwise it is of type \mixed.
235: 			
236: 			In order to define the semantics of $\X$, we need some definitions, which
237: 			reflect those from the XPath specification.
238: 			Let $\V$ be a supply of \emph{value variables}, disjoint from $\T$.
239: 			
240: 			\begin{definition}
241: 				An \emph{environment over $\S$}
242: 				is a finite set $\E$ of pairs of the form
243: 				$(x,v)$, where $x \in \V$ and $v$ is a value over $\S$, such that no value
244: 				variable occurs twice in $\E$.
245: 			\end{definition}
246: 			
247: 			\begin{definition}
248: 				A \emph{context triple over $\S$} is a triple $(z,i,k)$ where $z$ is a node
249: 				from $\S$ or a counter over $\S$,
250: 				and $i$ and $k$ are counters over $\S$ such that $i \leq k$.
251: 				We call $z$ the \emph{context item}, $i$ the \emph{context position}, and $k$
252: 				the \emph{context size}.
253: 			\end{definition}
254: 			
255: 			\begin{definition}
256: 				A \emph{context} is a triple $(\S,\E,c)$ where $\S$ is a store, $\E$ is an
257: 				environment over $\S$, and $c$ is a context triple over $\S$.
258: 			\end{definition}
259: 			
260: 			If we denote the universe of all possible contexts by \textit{Contexts}, the
261: 			semantics of $\X$ is now given by a partial function $\eval$ on $\X \times
262: 			\mathit{Contexts}$, such that whenever defined, $\eval(e,C)$ is a value over
263: 			$C$'s store, and this value has the same type as $e$.
264: 			
265: 			\begin{remark} \label{xmlschema}
266: 				A static type
267: 				system, based on XML Schema \cite{xmlschema,essencexml}, can be put on
268: 				contexts to ensure definedness of expressions \cite{xquery_formal}, but we
269: 				omit that as safety is not the focus of the present paper.
270: 				\qed
271: 			\end{remark}
272: 			
273: 			In general we do not assume much from $\X$, except for the
274: 			availability of the
275: 			following basic expressions, also present in real XPath:
276: 			\begin{itemize}
277: 				\item
278: 					An expression `\rootexp', such that $\eval(\rootexp,C)$ equals
279: 					the root node of the input tree in $C$'s store.
280: 				\item
281: 					An expression `\childexp', such that $\eval(\childexp,\allowbreak C)$
282: 					is defined whenever $C$'s context item is a node $\n$, and then
283: 					equals the list of children of $\n$.
284: 			\end{itemize}
285: 			
286: 			
287: 		
288: 		\section{Syntax} \label{secsyntax}
289: 			
290: 			In this section, we define the syntax of a sizeable fragment of XSLT~2.0.  The
291: 			reader familiar with XSLT will notice that we have simplified and cleaned up
292: 			the language in a few places.  These modifications are only for the sake of
293: 			simplicity of exposition, and our technical results do not depend on them.  We
294: 			discuss our deviations from the real language further in
295: 			Section~\ref{deviations}.
296: 			
297: 			Also, the concrete syntax of real
298: 			XSLT is XML-based and rather unwieldy.  For the sake of presentation, we
299: 			therefore give a syntax of our own, which is non-XML, but otherwise follows the
300: 			same lines as the real syntax.
301: 			
302: 			The grammar is shown in Figure~\ref{figgrammar}.  The only typing condition we
303: 			need is that in an apply-statement or in a vcopy-statement, \textit{expr} must
304: 			be of type \nodes.  Also, no two different rules can have the same
305: 			\textit{name}, and the \textit{name} in a call-statement must be the
306: 			\textit{name} of some rule.
307: 						
308: 			\begin{figure}
309: 				\begin{tabbing}
310: 					Program $\to$ Rule* \\
311: 					Rule $\to$ \=\textsf{template} \textit{name} \textsf{match} \textit{expr} (\textsf{mode} \textit{name})? 
312: 																\aco\ Template \act \\
313: 					Template $\to$ Statement* \\
314: 					Statement $\to$ \=$|$ \=\textsf{cons}
315:                                         \textit{label} \aco\ Template
316:                                         \act\kill
317: 					Statement $\to$ \>\> \textsf{cons} \textit{label} \aco\ Template \act\\
318: 						\> $|$ \textsf{apply} \textit{expr} (\textsf{mode} \textit{name})?\\
319: 						\> $|$ \textsf{call} \textit{name}\\
320: 						\> $|$ \textsf{foreach} \textit{expr} \aco\ Template \act\\
321: 						\> $|$ \textsf{val} \textit{value\_variable} \textit{expr}\\
322: 						\> $|$ \textsf{tree} \textit{tree\_variable} \aco\ Template \act\\
323: 						\> $|$ \textsf{vcopy} \textit{expr}\\
324: 						\> $|$ \textsf{tcopy} \textit{tree\_variable}\\
325: 						\> $|$ \textsf{if} \textit{expr} \aco\ Template \act\ \textsf{else} \aco\ Template \act 
326: 					
327: 				\end{tabbing}
328: %				\nocaptionrule
329: 				\caption{Our syntax.
330: 				The terminal symbol \textit{expr} stands for an
331: 				$\X$-expression; \textit{label} stands for an element of our alphabet
332: 				$\Sigma$; \textit{value\_variable} and \textit{tree\_variable} stand for
333: 				elements of $\V$ and $\T$, respectively; and \textit{name} is
334: 				self-explanatory.  As usual we use * to denote repetition, ? to denote
335: 				optionality, and use (~and ) for lexical grouping.}
336: 				\label{figgrammar}
337: 			\end{figure}
338: 			
339: 			We will often identify a template $M$ with its \emph{syntax tree}.  This tree
340: 			consists of all occurrences of statements in $M$ and represents how they
341: 			follow each other and how they are nested in each other; we omit the formal
342: 			definition.  Observe that only cons-, foreach-, tree-, and if-statements can
343: 			have children.
344: 			Note also that, since a
345: 			template is a sequence of statements, the syntax ``tree'' is actually a
346: 			forest, i.e., a sequence of trees, but we will still call it a tree.
347: 			
348: 			Variable definitions happen through val- and tree-statements.
349: 			We will need the notion of a statement being in the scope of some variable
350: 			definition; this is defined in the standard way as follows.
351: 			
352: 			\begin{definition}
353: 				Let $M$ be a template, and let $S_1$ and $S_2$ be two
354: 				statements occurring in $M$.  We say that $S_2$ is \emph{in the scope of}
355: 				$S_1$ if $S_2$ is a right sibling of $S_1$
356: 				in the syntax tree of $M$, or a descendant of such a right sibling.
357: 				An illustration is in Figure~\ref{figscope}.
358: 			\end{definition}
359: 			
360: 			\begin{figure}
361: 				\begin{center}
362: 					\pstree{\TC}
363: 					{
364: 					  \pstree{\TC}
365: 					  {
366: 					    \TC
367: 					    \pstree{\TC[fillstyle=solid]}
368: 					    {
369: 					      \TC
370: 					      \TC
371: 					    }
372: 					    \TC[fillstyle=vlines]
373: 					    \pstree{\TC[fillstyle=vlines]}
374: 					    {
375: 					      \TC[fillstyle=vlines]
376: 					      \TC[fillstyle=vlines]
377: 					    }
378: 					  }
379: 					  \TC
380: 					  \pstree{\TC}
381: 					  {
382: 					    \TC
383: 					    \TC
384: 					  }
385: 					}
386: 				\end{center}
387: 	%			\nocaptionrule
388: 				\caption{Depiction of a syntax tree.  The nodes in the scope of the black node
389: 				are those that are striped.}
390: 				\label{figscope}
391: 			\end{figure}
392: 			
393: 			One final definition:
394: 			
395: 			\begin{definition}
396: 				Template $M'$ is called a \emph{subtemplate} of template $M$ if $M'$ consists
397: 				of a sequence of consecutive sibling statements occurring in  $M$.
398: 			\end{definition}
399: 
400: 
401: \section{Operational semantics} \label{secsemantics}
402: 
403: \newcommand{\context}{\textit{context}}
404: \newcommand{\confrepl}[2]{(#1\gets #2)}
405: 
406: Fix a program $P$ and a data tree $\t$.  We will describe the semantics of $P$
407: on input $\t$ as a rewrite relation $\Rightarrow$ among configurations.
408: 
409: \begin{definition} \label{defconfig}
410: A \emph{configuration} consists of a template $M$
411: together with a partial function that assigns a context to some of
412: the statements of $M$ (more precisely, the nodes of its syntax tree).
413: The statements that have a context are called \emph{active}; we require that
414: the descendants of an inactive node are inactive too.
415: Cons-statements are never active.
416: 
417: We use the following notation concerning configurations:
418: \begin{itemize}
419: \item
420: $S \lhd \gamma$ denotes that $S$ is a statement occurring in the template of
421: configuration $\gamma$.
422: \item
423: If $S \lhd \gamma$, then
424: $\gamma(S) = C$ denotes that $S$ is active in $\gamma$, having context $C$.
425: \item
426: If $M$ is a subtemplate of a configuration $\gamma$, then $M$
427: itself can be taken as a configuration by inheriting all the context
428: assignments done by $\gamma$.  We call such a configuration a
429: \emph{subconfiguration}.
430: \item
431: If $M$ is a subconfiguration of $\gamma$, and $\gamma'$ is another
432: configuration, then
433: $\gamma\confrepl{M}{\gamma'}$
434: denotes the configuration obtained from $\gamma$ by
435: replacing $M$ by $\gamma'$.
436: \end{itemize}
437: \end{definition}
438: 
439: The initial configuration is defined as follows.
440: 
441: \begin{definition}
442: \begin{enumerate}
443: \item
444: The \emph{initial context} equals
445: $$ \bigl ( 
446: \{(\Input,\t)\},\,\emptyset,\,(\mathbf{r},1,1)
447: \bigr ) $$
448: where $\mathbf{r}$ is the root of $\t$.
449: \item
450: The \emph{initial template} equals the single statement `\textsf{apply}
451: \rootexp'.
452: \item
453: The \emph{initial configuration} consists of the initial template, whose
454: single statement is assigned the initial context.
455: \end{enumerate}
456: \end{definition}
457: 
458: The goal will be to rewrite the initial configuration into a \emph{terminal
459: template}; this is a configuration consisting exclusively of cons-statements.
460: Observe that terminal templates can be viewed as data forests; indeed, simply
461: by removing the \textsf{cons}'s from a terminal template, we obtain the string
462: representation of a data forest.  
463: 
464: For the rewrite relation $\Rightarrow$ we are going to define, terminal
465: configurations will be normal forms, i.e., cannot be rewritten further.  If,
466: for two configurations $\gamma_0$ and $\gamma_1$, we have $\gamma_0
467: \Rightarrow \cdots \Rightarrow \gamma_1$ and $\gamma_1$ is a normal form, we
468: denote that by $\gamma_0 \Rightarrow^! \gamma_1$.  The relation $\Rightarrow$
469: will be defined in such a way that if $\gamma_0$ is the initial configuration
470: and $\gamma_0 \Rightarrow^! \gamma_1$, then $\gamma_1$ will be terminal.
471: Moreover, we will prove in Theorem~\ref{confluence} that each configuration
472: $\gamma_0$ has at most one such normal form $\gamma_1$.  We thus define:
473: 
474: \begin{definition} \label{deffinal}
475: Given $P$ and $\t$, let $\gamma_0$ be the initial configuration and let
476: $\gamma_0 \Rightarrow^! \gamma_1$.  Then the \emph{final result
477: tree of applying $P$ to $\t$} is defined to be $\maketree(\gamma_1)$.
478: \end{definition}
479: 
480: In the above definition, we can indeed apply $\maketree$, defined on data
481: forests (Definition~\ref{defmaketree}),
482: to $\gamma_1$, since $\gamma_1$ is terminal and we just observed that
483: terminal templates describe forests.
484: Note that the final result tree is only determined up to isomorphism.
485: 
486: \subsection{If-statements}
487: 
488: \newcommand{\transif}{\stackrel{\mathsf{if}}{\Rightarrow}}
489: \newcommand{\normif}{\mathrel{\transif{}^!}}
490: 
491: If-statements are the only ones that generate control flow, so we treat
492: them by a separate rewrite relation $\transif$, defined
493: by the semantic rules shown in Figure~\ref{figif}.
494: 
495: \begin{figure}
496: \begin{itemize}
497: \item
498: $
499: \begin{array}[t]{l}
500: S = \text{\textsf{if} $e$ \aco\ $M_{\rm true}$ \act\
501: \textsf{else} \aco\ $M_{\rm false}$ \act} \lhd \gamma \\
502: \gamma(S)=C \\
503: \eval(e,C) \neq \emptyset \\
504: \hline
505: \gamma \transif \gamma\confrepl{S}{M_{\rm true}}
506: \end{array}
507: $
508: \medskip
509: \item
510: $
511: \begin{array}[t]{l}
512: S = \text{\textsf{if} $e$ \aco\ $M_{\rm true}$ \act\
513: \textsf{else} \aco\ $M_{\rm false}$ \act} \lhd \gamma \\
514: \gamma(S)=C \\
515: \eval(e,C) = \emptyset \\
516: \hline
517: \gamma \transif \gamma\confrepl{S}{M_{\rm false}}
518: \end{array}
519: $
520: \end{itemize}
521: %\nocaptionrule
522: \caption{Semantics of if-statements; $\emptyset$ denotes the empty sequence.}
523: \label{figif}
524: \end{figure}
525: 
526: It is not difficult to show that $\transif$ is terminating and
527: locally confluent, whence confluent,
528: so that every configuration has a unique normal form w.r.t.\
529: $\transif$ \cite{terese}.  This normal form no longer contains any active
530: if-statements.  (Quite obviously, the most efficient way to get to this normal 
531: form is to work out the if-statements top-down.)
532: We write $\gamma \normif \gamma'$ to denote that $\gamma'$ is the
533: normal form of $\gamma$ w.r.t.\ $\transif$.
534: 
535: \begin{remark}
536: Our main rewrite relation $\Rightarrow$ is not terminating in general.
537: The reason why we treat if-statements separately is to avoid 
538: nonsensical rewritings such as
539: where we execute a non-terminating statement
540: in the else-branch of an if-statement whose test 
541: evaluates to true.
542: \end{remark}
543: 
544: \subsection{Apply-, call-, and foreach-statements}
545: 
546: \newcommand{\init}{\mathit{init}}
547: 
548: For the semantics of apply-statements, we need the following definitions.
549: 
550: \begin{definition}[ruletoapply] \label{defruletoapply}
551: Let $C$ be a context, let
552: $\n$ be a node, and let $m$ be a name.
553: Then $\mathit{ruletoapply}(C,\n)$ (respectively,
554: $\mathit{ruletoapply}(C,\n,m)$)
555: equals the template belonging to the
556: first rule in $P$ (respectively, with \textsf{mode} name equal to $m$)
557: whose \textit{expr} satisfies 
558: $\n \in \eval(\textit{expr},C)$.
559: 
560: If no such rule exists, both $\mathit{ruletoapply}(C,\n)$
561: and $\mathit{ruletoapply}(C,\n,m)$ default to the
562: single-statement template `\textsf{apply} \childexp'.
563: \end{definition}
564: 
565: \begin{definition}[init] \label{definit}
566: Let $M$ be a template, and let $C$ be a context.  Then $\init(M,C)$
567: equals the configuration obtained from $M$ by assigning context $C$ to every
568: statement in $M$, except for all statements in the scope of any variable
569: definition, and all statements that are below a foreach-statement;
570: all those statements remain inactive.
571: \end{definition}
572: 
573: We are now ready for the semantic rule for apply-statements, shown in
574: Figure~\ref{figapply}.  We omit the rule for an apply-statement with a
575: \textsf{mode $m$}: the only difference with the rule shown
576: is that we use $\mathit{ruletoapply}(\n_i,C,m)$.
577: 
578: \begin{figure}
579: \begin{itemize}
580: \item
581: $
582: \begin{array}[t]{l}
583: S = \text{\textsf{apply $e$}} \lhd \gamma \\
584: \gamma(S) = C = (\S,\E,c) \\
585: \eval(e,C) = (\n_1,\dots,\n_k) \\
586: \mathit{ruletoapply}(\n_i,C) = M_i \quad \text{for $i=1,\dots,k$} \\
587: \init(M_i,(\S,\E,(\n_i,i,k))) = \gamma_i \quad \text{for $i=1,\dots,k$} \\
588: \gamma\confrepl{S}{\gamma_1\dots\gamma_k} \normif \gamma' \\
589: \hline
590: \gamma \Rightarrow \gamma'
591: \end{array}
592: $
593: \medskip
594: \item
595: $
596: \begin{array}[t]{l}
597: S = \text{\textsf{foreach} $e$ \aco\ $M$ \act} \lhd \gamma \\
598: \gamma(S) = C = (\S,\E,c) \\
599: \eval(e,C) = (z_1,\dots,z_k) \\
600: \init(M,(\S,\E,(z_i,i,k))) = \gamma_i \quad \text{for $i=1,\dots,k$} \\
601: \gamma\confrepl{S}{\gamma_1\dots\gamma_k} \normif \gamma' \\
602: \hline
603: \gamma \Rightarrow \gamma'
604: \end{array}
605: $
606: \medskip
607: \item
608: $
609: \begin{array}[t]{l}
610: S = \text{\textsf{call} \textit{name}} \lhd \gamma \\
611: \gamma(S) = C \\
612: \mathit{rulewithname}(\mathit{name}) = M \\
613: \init(M,C) = \gamma_1 \\
614: \gamma\confrepl{S}{\gamma_1} \normif \gamma' \\
615: \hline
616: \gamma \Rightarrow \gamma'
617: \end{array}
618: $
619: \end{itemize}
620: %\nocaptionrule
621: \caption{Semantics of apply-, call-, and foreach-statements.}
622: \label{figapply}
623: \end{figure}
624: 
625: The semantic rule for foreach-statements is very similar to that for
626: apply-statements, and is also shown in Figure~\ref{figapply}.
627: 
628: For call-statements, we need the following definition.
629: 
630: \begin{definition}[rulewithname] \label{defrulewithname}
631: 	For any \textit{name}, let $\mathit{rulewithname}(\mathit{name})$ denote the
632: template of the rule in $P$ with that \textit{name}.
633: \end{definition}
634: 
635: The semantic rule for a call-statement is then again shown in Figure~\ref{figapply}.
636: 
637: \subsection{Variable definitions}
638: 
639: \newcommand{\updateset}{\mathit{updateset}}
640: \newcommand{\contupd}[2]{(#1\colon #2)}
641: 
642: For a context $C=(\S,\E,c)$, a value variable $x$, a value $v$, a tree
643: variable $y$, and a data tree $\t$, we denote by
644: \begin{itemize}
645: \item
646: $C\contupd{x}{v}$
647: the context obtained from $C$ by updating $\E$ with the pair $(x,v)$;
648: and by
649: \item
650: $C\contupd{y}{\t}$
651: the context obtained from $C$ by updating $\S$ with the pair
652: $(y,\t)$.
653: \end{itemize}
654: 
655: We also define:
656: 
657: \begin{definition}[updateset] \label{defupdateset}
658: Let $\gamma$ be a configuration and let $S \lhd \gamma$.
659: Let $M$ be the template underlying $\gamma$.
660: Let $S_1,\dots,S_k$ be the right siblings of $S$ in $M$, in that
661: order.  Let $j$ be the smallest index for which $S_j$ is active in $\gamma$;
662: if all the $S_i$ are inactive, put $j=k+1$.  Then the template $S_1 \dots
663: S_{j-1}$ is denoted by $\updateset(\gamma,S)$.  If $j=1$ then this is the
664: empty template.
665: \end{definition}
666: 
667: We are now ready for the semantic rules for variable definitions,
668: shown in Figure~\ref{figvar}.
669: 
670: \begin{figure}
671: \begin{itemize}
672: \item
673: $\begin{array}[t]{l}
674: S = \text{\textsf{val} $x$ $e$} \lhd \gamma \\
675: \gamma(S) =  C \\
676: C\contupd{x}{\eval(e,C)} = C' \\
677: \updateset(\gamma,S) = M \\
678: \init(M,C') = \gamma_1 \\
679: \gamma\confrepl{SM}{\gamma_1} \normif \gamma' \\
680: \hline
681: \gamma \Rightarrow \gamma'
682: \end{array}$
683: \medskip
684: \item
685: $\begin{array}[t]{l}
686: S = \text{\textsf{tree} $y$ \aco\ $M$ \act} \lhd \gamma \\
687: \text{$M$ is terminal} \\
688: \gamma(S) =  C \\
689: C\contupd{y}{\maketree(M)} = C' \\
690: \updateset(\gamma,S)=M' \\
691: \init(M',C') = \gamma_3 \\
692: \gamma\confrepl{SM'}{\gamma_3} \normif \gamma' \\
693: \hline
694: \gamma \Rightarrow \gamma'
695: \end{array}$
696: \end{itemize}
697: %\nocaptionrule
698: \caption{Semantics of variable definitions.}
699: \label{figvar}
700: \end{figure}
701: 
702: \subsection{Copy-statements}
703: 
704: \newcommand{\forest}{\mathit{forest}}
705: \newcommand{\ttemp}{\mathit{ttemp}}
706: \newcommand{\choproot}{\mathit{choproot}}
707: 
708: The following definitions are illustrated in Figure~\ref{figforest}.
709: 
710: \begin{definition}[forest] \label{defforest}
711: Let $\S$ be a store, and let $(\n_1,\dots,\n_k)$ be a sequence of nodes from
712: $\S$.  For $i=1,\dots,k$, let $\t_i$ be the data subtree rooted at $\n_i$.
713: Then $\forest((\n_1,\dots,\n_k),\S)$ equals the data forest
714: $(\t_1,\dots,\t_n)$.
715: \end{definition}
716: 
717: \begin{definition}[ttemp] \label{defttemp}
718: Let $F$ be a data forest.  Then $\ttemp(F)$ equals the
719: terminal template describing $F$.
720: \end{definition}
721: 
722: \begin{figure}
723: \begin{center}
724: \pstree{\TR{$\tta$}}
725: {
726:   \TR{$\ttb$}~{$\n_1$}
727:   \pstree{\TR{$\ttc$}~{$\n_2$}}
728:   {
729:     \TR{$\tta$}~{$\n_3$}
730:     \TR{$\ttb$}
731:   }
732:   \TR{$\ttc$}
733: }
734: \qquad
735: \pstree{\TR{$\ttc$}~{$\n_4$}}
736: {
737:   \TR{$\tta$}
738:   \TR{$\ttb$}
739: }
740: \end{center}
741: 
742: \begin{tabbing}
743: 	$ \ttemp \bigl ( \forest((\n_4,\n_1,\n_2,\n_3,\n_1),\S) \bigr ) = {}$\=	
744: 	\textsf{cons $\ttc$ \aco\ cons $\tta$ \aco \act\ cons $\ttb$ \aco \act\ \act}\\
745: 	\>\textsf{cons $\ttb$ \aco \act}\\
746: 	\>\textsf{cons $\ttc$ \aco\ cons $\tta$ \aco \act\ cons $\ttb$ \aco \act\ \act}\\
747: 	\>\textsf{cons $\tta$ \aco \act}\\
748: 	\>\textsf{cons $\ttb$ \aco \act}
749: \end{tabbing}
750: %\nocaptionrule
751: \caption{Illustration of Definitions \ref{defforest} and \ref{defttemp}.}
752: \label{figforest}
753: \end{figure}
754: 
755: We also need:
756: 
757: \begin{definition}[choproot]
758: Let $\t$ be a data tree with top-level subtrees $\t_1,\dots,\t_k$, in that
759: order.  Then $\choproot(\t)$ equals the data forest $(\t_1,\dots,\t_k)$.
760: \end{definition}
761: 
762: The semantic rules for copy-statements are now shown in Figure~\ref{figcopy}.
763: 
764: \begin{figure}
765: \begin{itemize}
766: \item
767: $
768: \begin{array}[t]{l}
769: S = \text{\textsf{vcopy} $e$} \lhd \gamma \\
770: \gamma(S) = C = (\S,\E,c) \\
771: \eval(e,C) = (\n_1,\dots,\n_k) \\
772: \ttemp \bigl ( \forest((\n_1,\dots,\n_k),\S) \bigr ) = M \\
773: \hline
774: \gamma \Rightarrow \gamma\confrepl{S}{M}
775: \end{array}
776: $
777: \medskip
778: \item
779: $
780: \begin{array}[t]{l}
781: S = \text{\textsf{tcopy} $y$} \lhd \gamma \\
782: \gamma(S) = (\S,\E,c) \\
783: (y,\t) \in \S \\
784: \ttemp(\choproot(\t)) = M \\
785: \hline
786: \gamma \Rightarrow \gamma\confrepl{S}{M}
787: \end{array}
788: $
789: \end{itemize}
790: %\nocaptionrule
791: \caption{Semantics of copy-statements.}
792: \label{figcopy}
793: \end{figure}
794: 
795: \subsection{Discussion} \label{deviations}
796: 
797: The final result of applying $P$ to $\t$ (Definition~\ref{deffinal}) may be
798: undefined for two very different reasons.  The first, fundamental, reason is
799: that the rewriting may be nonterminating.  The second reason is that the
800: rewriting may abort because the evaluation of an $\X$-expression is undefined,
801: or the tree variable in a tcopy-statement is not defined in the store.  This
802: second reason can easily be avoided by a type system on $\X$, as already
803: mentioned in Remark~\ref{xmlschema}, together with scoping rules to keep track
804: of which variables are visible in the XSLT program and which variables are
805: used in the $\X$-expressions.  Such scoping rules are entirely standard, and
806: indeed are implemented in the XSLT processor SAXON \cite{saxon}.
807: 
808: In the same vein, we have simplified the parameter passing mechanism of XSLT,
809: and have omitted the feature of global variables.  On the other hand, our
810: mechanism for choosing the rule to apply (Definition~\ref{defruletoapply}) is
811: more powerful than the one provided by XSLT, as ours is context-dependent.
812: It is actually easier to define that way.  As already
813: mentioned at the beginning of Section~\ref{secsyntax}, none of our technical
814: results depend on the modifications we have made.
815: 
816: Finally, we note that the XSLT processor SAXON evaluates variable definitions
817: lazily, whereas we simply evaluate them eagerly.  Again, lazy evaluation could
818: have been easily incorporated in our formalism.  Some programs may terminate
819: on some inputs lazily, while they do not terminate eagerly, but for programs
820: that use all the variables they define there is no difference.
821: 
822: \subsection{Confluence}
823: 
824: \newcommand{\R}{\mathcal{R}}
825: 
826: Recall that we call a rewrite relation \emph{confluent} if, whenever we can
827: rewrite a configuration $\gamma_1$ to $\gamma_2$ as well as to $\gamma_3$,
828: then there exists $\gamma_4$ such that we can further rewrite both $\gamma_2$
829: and $\gamma_3$ into $\gamma_4$.  Confluence guarantees that all terminating
830: runs from a common configuration also end in a common configuration
831: \cite{terese}.  Since, for our rewrite relation $\Rightarrow$,
832: either all runs on some input are nonterminating, or none is, the
833: following theorem implies that the same final result of a program $P$ on an
834: input $\t$, if defined at all, will be obtained regardless of the order in
835: which we process active statements.
836: 
837: \begin{theorem} \label{confluence}
838: Our rewrite relation $\Rightarrow$ is confluent.
839: \end{theorem}
840: 
841: \begin{proof}
842: 
843: The proof is a very easy application of
844: a basic theorem of Rosen about subtree replacement systems
845: \cite{rosen_cr}.  A subtree replacement system $\R$ is a (typically infinite)
846: set of pairs of the form $\phi \to \psi$, where $\phi$ and $\psi$ are
847: descriptions up to isomorphism of ordered, node-labeled trees, where the node
848: labels come from some (again typically infinite) set $V$.  Let us refer to
849: such trees as $V$-trees.  Such a system $\R$ naturally induces a rewrite
850: system $\Rightarrow_{\R}$ on $V$-trees: we have $\t \Rightarrow_{\R} \t'$ if
851: there exists a node $\n$ of $\t$ and a pair $\phi \to \psi$ in $\R$ such that
852: the subtree $\t/\n$ is isomorphic to $\phi$, and $\t' =
853: \t\confrepl{\n}{\psi}$.  Here, we use the notation $\t/\n$ for the subtree of
854: $\t$ rooted at $\n$, and the notation $\t\confrepl{\n}{\psi}$ for the tree
855: obtained from $\t$ by replacing $\t/\n$ by a fresh copy of $\psi$.  Rosen's
856: theorem states that if $\R$ is ``unequivocal'' and ``closed'', then
857: $\Rightarrow_{\R}$ is confluent.
858: 
859: ``Unequivocal'' means that for each $\phi$ there is at most one $\psi$ such
860: that $\phi\to \psi$ is in $\R$.  The definition of $\R$ being ``closed'' is a
861: bit more complicated.  To state it, we need the notion of a \emph{residue map}
862: from $\phi$ to $\psi$.  This is a mapping $r$
863: from the nonroot nodes of $\phi$ to
864: sets of nonroot nodes of $\psi$, such that for $m \in r(n)$ the subtrees
865: $\phi/n$ and $\psi/m$ are isomorphic.  Moreover, if $n_1$ and $n_2$ are
866: independent (no descendants of each other), then all nodes in $r(n_1)$ must
867: also be independent of all nodes in $r(n_2)$.
868: 
869: Now $\R$ being closed means that we can assign a residue map $r[\phi,\psi]$
870: to every $\phi\to\psi$ in $\R$ in such a way that for any $\phi_0 \to \psi_0$
871: in $\R$, and any node $n$ of $\phi_0$, if there exists a pair $\phi_0/n \to
872: \psi$ in $\R$, then the pair $\phi_0\confrepl{n}{\psi} \to
873: \psi_0\confrepl{r[\phi_0,\psi_0](n)}{\psi}$ is also in $\R$.  Denoting the
874: latter pair by $\phi_1 \to \psi_1$, we must moreover have for each node $p$
875: of $\phi_0$ that is independent of $n$, that $r[\phi_1,\psi_1](p) =
876: r[\phi_0,\psi_0](p)$.
877: 
878: To apply Rosen's theorem, we view configurations (Definition~\ref{defconfig})
879: as $V$-trees, where $V = \mathit{Statements} \cup (\mathit{Statements} \times
880: \mathit{Contexts})$.  Here, $\mathit{Statements}$ is the set of all possible
881: syntactic forms of statements.  So, given a configuration, we take the syntax
882: tree of the underlying template, and label every inactive node by its
883: corresponding statement, and every active node by its corresponding statement
884: and its context in the configuration.  (Since templates are sequences, we
885: actually get $V$-forests rather than $V$-trees, but that is a minor fuss.)
886: 
887: Now consider the subtree replacement system $\R$ consisting of all pairs
888: $\gamma \to \gamma'$ for which $\gamma \Rightarrow \gamma'$ as defined by our
889: semantics, where $\gamma$ consists of a single statement $S_0$, and the active
890: statement being processed to get $\gamma'$ is a direct child of $S_0$.  Since
891: our semantics always substitutes siblings for siblings, it is clear that
892: $\Rightarrow_{\R}$ then coincides with our rewrite relation $\Rightarrow$.
893: Since the processing of every individual statement is always deterministic (up
894: to isomorphism of trees), $\R$ as just defined is clearly unequivocal.
895: 
896: 
897: We want to show that $\R$ is closed.  Thereto, we define residue
898: maps $r[\gamma,\gamma']$ as follows.
899: 
900: The case where $\gamma\to\gamma'$ is the processing of an apply- or
901: call-statement, is depicted in Figure~\ref{figrosen} (top).  The node being
902: processed is shown in black.  The subtemplates to the left and right are left
903: untouched.  Referring to the notation used in Figure~\ref{figapply}, the newly
904: substituted subtemplate $\gamma_{\mathrm{new}}$ is such that $\gamma_1 \dots
905: \gamma_k \normif \gamma_{\mathrm{new}}$ (for apply) or $\gamma_1 \normif
906: \gamma_{\mathrm{new}}$ (for call).  Indeed, since we apply $\normif$ at the
907: end of every processing step, $\gamma$ itself does not contain any active
908: if-statements.  We define $r = r[\gamma,\gamma']$ as follows:
909: \begin{itemize}
910: \item
911: For nodes $n$ in $\gamma_{\mathrm{left}}$ or $\gamma_{\mathrm{right}}$, we put
912: $r(n) := \{n'\}$, where $n'$ is the corresponding node in $\gamma'$.
913: \item
914: For the black node $b$, we put $r(b) := \emptyset$.
915: \end{itemize}
916: The main condition for closedness is clearly satisfied, because statements can
917: be processed independently.  Note that the black node has no children, let
918: alone active children, which allows us to put $r(b)=\emptyset$.  The condition
919: on $p$'s is also satisfied, because both $r[\phi_0,\psi_0]$ and
920: $r[\phi_1,\psi_1]$ will set $r(p)$ to $\{p'\}$.
921: 
922: The case where $\gamma\to\gamma'$ is the processing of a foreach-statement is
923: depicted in Figure~\ref{figrosen} (middle).  This case is analogous to the
924: previous one.  The only difference is that the black node now has descendants
925: ($M$ in the figure).  Because the $\init$ function (Definition~\ref{definit})
926: always leaves descendants of a foreach node inactive, however, the nodes in
927: $M$ are inactive at this time, and we can
928: put $r(n) := \emptyset$ for all of them.
929: 
930: The case where $\gamma\to\gamma'$ is the processing of a val-statement is
931: depicted in Figure~\ref{figrosen} (bottom).  Since all
932: nodes in the update set are inactive by definition
933: (Definition~\ref{defupdateset}), we can again put $r(n) :=
934: \emptyset$ for all nodes in the update set.
935: The case of a tree-statement is similar; now the black node again
936: has descendants, but again these are all inactive (they are all
937: cons-statements).
938: \begin{figure}
939: \begin{center}
940: \psset{tnpos=a}
941: \pstree{\TC~{$\gamma$}}
942: {
943:   \TR{$\gamma_{\mathrm{left}}$}
944:   \TC[fillstyle=solid]~[tnpos=b]{\textsf{apply/call}}
945:   \TR{$\gamma_{\mathrm{right}}$}
946: }
947: \quad
948: $\to$
949: \quad
950: \pstree{\TC~{$\gamma'$}}
951: {
952:   \TR{$\gamma_{\mathrm{left}}$}
953:   \TR{$\gamma_{\mathrm{new}}$}
954:   \TR{$\gamma_{\mathrm{right}}$}
955: }
956: \bigskip
957: 
958: \pstree{\TC~{$\gamma$}}
959: {
960:   \TR{$\gamma_{\mathrm{left}}$}
961:   \pstree{\TC[fillstyle=solid]~[tnpos=r]{\textsf{foreach}}}
962:   {
963:     \TR{$M$}
964:   }
965:   \TR{$\gamma_{\mathrm{right}}$}
966: }
967: \quad
968: $\to$
969: \quad
970: \pstree{\TC~{$\gamma'$}}
971: {
972:   \TR{$\gamma_{\mathrm{left}}$}
973:   \TR{$\gamma_{\mathrm{new}}$}
974:   \TR{$\gamma_{\mathrm{right}}$}
975: }
976: \bigskip
977: 
978: \pstree{\TC~{$\gamma$}}
979: {
980:   \TR{$\gamma_{\mathrm{left}}$}
981:   \TC[fillstyle=solid]~[tnpos=b]{\textsf{val}}
982:   \TR{\textit{updateset}}
983:   \TR{$\gamma_{\mathrm{right}}$}
984: }
985: \quad $\to$ \quad
986: \pstree{\TC~{$\gamma'$}}
987: {
988:   \TR{$\gamma_{\mathrm{left}}$}
989:   \TR{$\gamma_{\mathrm{new}}$}
990:   \TR{$\gamma_{\mathrm{right}}$}
991: }
992: \end{center}
993: %\nocaptionrule
994: \caption{Illustration to the proof of Theorem~\ref{confluence}.}
995: \label{figrosen}
996: \end{figure}
997: The case where $\gamma\to\gamma'$ is the processing of a
998: copy-statement, finally, is again analogous.
999: \end{proof}
1000: 
1001: \section{Computational completeness}
1002: 
1003: \newcommand{\allexp}{\texttt{//*}}
1004: \newcommand{\firstchild}{\texttt{child::*[1]}}
1005: \newcommand{\nextexp}{\texttt{following-sibling::*[1]}}
1006: \newcommand{\flatree}{\mathit{flattree}}
1007: 
1008: As defined in Definition~\ref{deffinal},
1009: an XSLT program $P$ expresses a partial
1010: function from data trees to data forests, where the output forest is
1011: represented by a tree by affixing a root node labeled $\ttdoc$ on top
1012: (Definition~\ref{defmaketree}).  The output
1013: is defined up to isomorphism only, and $P$ does not distinguish between
1014: isomorphic inputs.  This leads us to the following definition:
1015: 
1016: \begin{definition}
1017: A \emph{tree transformation}
1018: is a partial function from data trees to data
1019: trees with root labeled $\ttdoc$,
1020: mapping isomorphic trees to isomorphic trees.
1021: \end{definition}
1022: 
1023: Using the string representation of data trees defined in
1024: Section~\ref{secdatatrees}, we further define:
1025: 
1026: \begin{definition}
1027: A tree transformation $f$ is called \emph{computable} if the string function
1028: $\tilde f\colon \tstring(\t) \mapsto \tstring(f(\t))$ is computable in the
1029: classical sense.
1030: \end{definition}
1031: 
1032: Up to now, we have assumed from our XPath abstraction $\X$ only the
1033: availability of the expressions `\rootexp' and `\childexp'. For
1034: our proof of the following
1035: theorem, we need to assume the availability of a few more very simple
1036: expressions, also present in real XPath:
1037: 
1038: \begin{itemize}
1039: 
1040: \item
1041: 
1042: $y$\texttt{/*},
1043: for any tree variable $y$,
1044: evaluates to the
1045: root of the tree assigned to $y$.
1046: 
1047: \item
1048: 
1049: \allexp\ evaluates to the sequence of all nodes in the store (it does not
1050: matter in which order).
1051: 
1052: \item
1053: 
1054: \firstchild\ evaluates to the first child of the context item (which should
1055: be a node).
1056: 
1057: \item
1058: 
1059: \nextexp\ evaluates to the immediate right sibling of the context node, or
1060: the empty sequence if the context node has no right siblings.
1061: 
1062: \item
1063: 
1064: Increment, decrement, and test on counters:
1065: the constant expression
1066: `\texttt{1}', and the expressions
1067: `\texttt{$x$+1}', `\texttt{$x$-1}', and `\texttt{$x$=1}' for any value
1068: variable $x$, which should consist of a single counter.  If
1069: $x$ has the maximal counter value,  then
1070: \texttt{$x$+1} need not be defined, and if 
1071: $x$ has value 1, then
1072: \texttt{$x$-1} need not be defined.  The test
1073: \texttt{$x$=1} yields any nonempty sequence for true and the empty sequence
1074: for false.
1075: 
1076: \item
1077: 
1078: \texttt{name()='a'}, for any $\tta \in \Sigma$, returning any nonempty
1079: sequence if the label of the context node is \tta, and the empty sequence
1080: otherwise.
1081: 
1082: \item
1083: 
1084: \texttt{()} evaluates to the empty sequence.
1085: 
1086: \end{itemize}
1087: 
1088: We establish:
1089: 
1090: \begin{theorem} \label{completeness}
1091: Every computable tree transformation $f$ can be realised by a program.
1092: \end{theorem}
1093: 
1094: \begin{proof}
1095: We can naturally represent
1096: any string $s$ over some finite alphabet as a flat data tree
1097: over the same alphabet.  We denote this flat tree by $\flatree(s)$.
1098: Its root is
1099: labeled \ttdoc, and has $k$
1100: children, where $k$ is the length of $s$, such that the labels of the children
1101: spell out the string $s$.  There are no other nodes.
1102: 
1103: The proof now consists of three parts:
1104: \begin{enumerate}
1105: \item
1106: Program the transformation $\t \mapsto \flatree(\tstring(\t))$.
1107: \item
1108: Show that every turing machine (working on strings)
1109: can be simulated by some program working on the
1110: $\flatree$ representation of strings.
1111: \item
1112: Program the transformation $\flatree(\tstring(\t)) \mapsto \t$.
1113: \end{enumerate}
1114: The theorem then follows by composing these three steps, where we simulate a
1115: turing machine for $\tilde f$ in step~2. Note that the composition of three
1116: programs can be written as a single program, using a temporary tree to pass
1117: the intermediate results, and using modes to keep the rules from the different
1118: programs separate.
1119: 
1120: The programs for steps 1 and 3 are shown in Figures \ref{figt2s} and
1121: \ref{figs2t}. For simplicity, they are for an alphabet consisting of a single
1122: letter \tta, but it is obvious how to generalise the programs.  The real XSLT
1123: versions are given in the Appendix.  We point out that these programs are
1124: actually 1.0 programs, so it is only for step~2 of the proof that we need
1125: XSLT~2.0.
1126: 
1127: \begin{figure}
1128: \begin{tabbing}
1129: \quad\=\kill
1130: \textsf{template \texttt{tree2string} match (\allexp)} \\
1131: \aco \+\\
1132: \textsf{cons \texttt{a} \aco\ \act} \\
1133: \textsf{cons \texttt{lbrace} \aco\ \act} \\
1134: \textsf{apply (\childexp)} \\
1135: \textsf{cons \texttt{rbrace} \aco\ \act} \- \\
1136: \act
1137: \end{tabbing}
1138: %\nocaptionrule
1139: \caption{From $\t$ to $\flatree(\tstring(\t))$.}
1140: \label{figt2s}
1141: \end{figure}
1142: 
1143: \begin{figure}
1144: \small
1145: \begin{tabbing}
1146: \quad\=\quad\=\quad\=\quad\=\quad\=\quad\=\kill
1147: \textsf{template \texttt{doc} match (\rootexp)} \\
1148: \aco \\
1149: \>\textsf{apply (\firstchild)} \\
1150: \act \\[\medskipamount]
1151: \textsf{template \texttt{string2tree} match (\allexp)} \\
1152: \aco \+ \\
1153: \textsf{cons \texttt{a}} \\
1154: \aco\ \textsf{apply (\nextexp) mode \texttt{dochildren}} \act \\
1155: \textsf{val \texttt{counter} (\texttt{1})} \\
1156: \textsf{call \texttt{searchnextsibling}} \-\\
1157: \act \\[\medskipamount]
1158: \textsf{template \texttt{dochildren} match (\allexp)
1159: mode \texttt{dochildren}} \\
1160: \aco \+ \\
1161: \textsf{if \texttt{name()='lbrace'}} \\
1162: \aco\ \textsf{apply (\nextexp) mode \texttt{dochildren}} \act \\
1163: \textsf{else} \aco \+\\
1164: \textsf{if \texttt{name()='a'}} \\
1165: \aco\ \textsf{call \texttt{string2tree}} \act \\
1166: \textsf{else} \aco\ \act \-\\
1167: \act \-\\
1168: \act \\[\medskipamount]
1169: \textsf{template \texttt{searchnextsibling} match (\allexp) mode
1170: \texttt{search}} \\
1171: \aco \+ \\
1172: \textsf{if \texttt{name()='lbrace'}} \aco\+\\
1173: \textsf{val \texttt{counter} (\texttt{counter + 1})} \\
1174: \textsf{apply (\nextexp) mode \texttt{search}} \-\\
1175: \act \\
1176: \textsf{else} \aco\+\\
1177: \textsf{if \texttt{name()='a'}} \\
1178: \aco\ \textsf{apply (\nextexp) mode \texttt{search}} \act\\
1179: \textsf{else} \aco\+\\
1180: \textsf{val \texttt{counter} (\texttt{counter - 1})} \\
1181: \textsf{if \texttt{counter = 1}} \\
1182: \aco\ \textsf{apply }\=\textsf{(\nextexp)} \\
1183: \>\textsf{mode \texttt{dochildren}} \act \\
1184: \textsf{else} \\
1185: \aco\ \textsf{apply (\nextexp) mode \texttt{search}} \act \-\\
1186: \act \-\\
1187: \act \-\\
1188: \act
1189: \end{tabbing}
1190: %\nocaptionrule
1191: \caption{From $\flatree(\tstring(\t))$ to $\t$.}
1192: \label{figs2t}
1193: \end{figure}
1194: 
1195: For step~2, we can represent a configuration of a turing machine $A$ by two
1196: temporary trees \texttt{left} and \texttt{right}.  At each step, variable
1197: \texttt{right} holds (as a flat tree)
1198: the content of the tape starting at the
1199: head position and ending in the last tape cell;
1200: variable \texttt{left} holds the reverse of the tape portion left of the head
1201: position.  To keep track of the current state of the machine, we use value
1202: variables $q$ for each state $q$ of $A$, such that at each step precisely one
1203: of these is nonempty.  (This is why we need the $\X$-expression \texttt{()}.)
1204: Changing the symbol under the head to an \texttt{a} amounts to assigning a new
1205: content to \texttt{right} by putting in \textsf{cons \tta\ \aco\act},
1206: followed by copies of the nodes in the current content of
1207: \texttt{right}, where we skip the first one.
1208: Moving the head a cell
1209: to the right amounts to assigning a new content to \texttt{left} by putting
1210: in a node labeled with the current symbol, followed by copies of the nodes in
1211: the current content of \texttt{left}. We also assign a new content to
1212: \texttt{right} in the now obvious way; if we were at the end of the
1213: tape we add a new node labeled \texttt{blank}.
1214: Moving the head a cell to the left is
1215: simulated analogously. The only $\X$-expressions we need here are the ones we
1216: have assumed to be available.
1217: 
1218: The simulation thus consists of repeatedly calling a big if-then-else that
1219: tests for the transition to be performed, and performs that transition.  We
1220: may assume $A$ is programmed in such a way that the final output is produced
1221: starting from a designated state.  In this way we can build up the final
1222: output string in a fresh temporary tree and pass it to step~3.
1223: \end{proof}
1224: 
1225: 
1226: 
1227: \section{XSLT 1.0}
1228: 
1229: \newcommand{\G}{\mathcal{G}}
1230: 
1231: In this section we will show that every XSLT~1.0 program can be implemented in
1232: exponential time, in sharp contrast to the computational completeness result
1233: of the previous section.
1234: 
1235: A fundamental difference between XSLT 1.0 and 2.0 is that in 1.0,
1236: $\X$-expressions are ``input-only'', defined as follows.
1237: 
1238: \begin{definition} \begin{enumerate} \item Let $C=(\S,\E,(z,i,k))$ be a
1239: context. Let the input tree in $\S$ be $\t$. Then we call $C$
1240: \emph{input-only} if every value appearing in $\E$ is already a value over the
1241: store $\{(\Input,\t)\}$, and also $(z,i,k)$ is like that.
1242: \item
1243: By $\hat C$, we mean
1244: the context $(\{(\Input,\t)\},\E,(z,i,k))$.  So, $\hat C$ equals $C$ where we
1245: have removed all temporary trees.
1246: \item
1247: Now an $\X$-expression $e$ is called input-only if for any input-only context
1248: $C$ for which $\eval(e,C)$ is defined, we have $\eval(e,C) = \eval(e,\hat C)$,
1249: and this must be a value over $C$'s input tree only.
1250: \end{enumerate} \end{definition}
1251: 
1252: In other words, input-only expressions are oblivious to the temporary trees in
1253: the store; they only see the input tree.
1254: 
1255: We further define:
1256: 
1257: \begin{definition} An input-only $\X$-expression $e$ is called
1258: \emph{polynomial} if for each input-only context $C$, the computation of
1259: $\eval(e,C)$ can be done in time polynomial in the size of $C$'s input tree.
1260: \end{definition}
1261: 
1262: We now define:
1263: 
1264: \begin{definition}
1265: A program is called 1.0 if it only uses input-only, polynomial
1266: $\X$-expressions.
1267: \end{definition}
1268: 
1269: Essentially, 1.0 programs cannot do anything with temporary trees except copy
1270: them using \textsf{tcopy} statements.  We note that real XPath~1.0 expressions
1271: are indeed input-only and polynomial; actually, real XPath~1.0 is much more
1272: restricted than that, but for our purpose we do not need to assume anything
1273: more.
1274: 
1275: In order to establish an exponential upper bound on the time-complexity of 1.0
1276: programs, we cannot use an explicit representation of the output tree.
1277: Indeed, 1.0 programs can produce result trees of size \emph{doubly}
1278: exponential in the size of the input tree.  For example, using subsets of
1279: input nodes, ordered lexicographically, as depth counters, we can produce a
1280: full binary tree of depth $2^n$ from an input tree with $n$ nodes.  Obviously
1281: a doubly exponentially long output could never be computed in singly
1282: exponential time.
1283: 
1284: We therefore use a \emph{DAG representation} of trees: an old and well-known
1285: trick \cite{linearunification} that is also used in tree transduction
1286: \cite{maneth_macromplexity}, and that has recently found new
1287: applications in XML \cite{bgk_xmlcompressed}.
1288: Formally, a DAG representation is a collection $\G$ of trees, where trees in
1289: $\G$ can have special leafs which are not labeled, and from which a pointer
1290: departs to the root of another tree in $\G$. On condition that the resulting
1291: pointer graph is acyclic, starting from a designated ``root tree'' in $\G$ we
1292: can naturally obtain a tree by unfolding along the pointers.  An illustration
1293: is shown in Figure~\ref{figdag}.
1294: 
1295: \begin{figure}
1296: \begin{center}
1297: \includegraphics{dag.eps}
1298: \end{center}
1299: %\nocaptionrule
1300: \caption{Left, a data tree, and right, a DAG representation of it.}
1301: \label{figdag}
1302: \end{figure}
1303: 
1304: We establish:
1305: 
1306: \begin{theorem} \label{complexity}
1307: Let $P$ be an 1.0 program. Then the following problem is solvable in
1308: exponential, i.e., $2^{n^{O(1)}}$ time:
1309: \begin{description}
1310: \item[Input:] a data tree $\t$
1311: \item[Output:] a DAG representation of the final result tree of applying $P$
1312: to $\t$, or a message signaling non-termination if $P$ does not terminate on
1313: $\t$.
1314: \end{description}
1315: \end{theorem}
1316: 
1317: \begin{proof}
1318: We will generate a DAG representation $\G$ by applying modified versions of
1319: the semantic rules from Section~\ref{secsemantics}.  We initialise $\G$ with
1320: all the subtrees of $\t$.  These trees have no pointers.  Each tree
1321: that will be added to $\G$ will be a configuration, which still has to be
1322: developed further into a final data tree with pointers, using the same
1323: modified rules.  Because we will have to point to the newly added
1324: configurations later, we identify each added configuration by a pair
1325: $(\mathit{name},C)$ where $\mathit{name}$ is the name of a template rule in
1326: $P$ and $C$ is a context.  In the description below, whenever we say that we
1327: ``add'' a configuration to $\G$, identified by some pair $(\mathit{name},C)$,
1328: we really mean that we add it unless a configuration identified by that same
1329: pair already exists in $\G$.
1330: 
1331: The modifications are now the following.
1332: 
1333: \begin{enumerate}
1334: 
1335: \item
1336: 
1337: When executing an apply-statement, we do not directly insert copies of the
1338: templates belonging to the rules that must be applied (the $\gamma_i$'s in
1339: Figure~\ref{figapply}). Rather, we add, for $i=1,\dots,k$, the configuration
1340: $\gamma_i'$ to $\G$, where $\gamma_i \normif \gamma_i'$.  We identify
1341: $\gamma_i'$ by the pair $(\mathit{name}_i,C_i)$, with $\mathit{name}_i$ the
1342: name of the rule $\gamma_i$ comes from, and $C_i = (\S,\E,(\n_i,i,k))$ using
1343: the notation of Figure~\ref{figapply}.  Moreover, in place of the
1344: apply-statement we insert a sequence of
1345: $k$ pointer nodes pointing to $(\mathit{name}_1,C_1)$, \dots,
1346: $(\mathit{name}_k,C_k)$, respectively.
1347: 
1348: \item
1349: 
1350: When executing a call-statement \textsf{call $\mathit{name}$} under context
1351: $C$, we again do not insert $\gamma_1$ (compare Figure~\ref{figapply}), but
1352: add the configuration $\gamma_1'$ to $\G$, where $\gamma_1 \normif \gamma_1'$,
1353: and identify it by the pair $(\mathit{name},C)$. We then replace the
1354: statement by a pointer node pointing to that pair.
1355: 
1356: \item
1357: 
1358: By making template rules from the bodies of all foreach-statements in $P$, we
1359: may assume without loss of generality that the body of every foreach-statement
1360: is a single call-statement.  A foreach-statement is then processed analogously
1361: to apply- and call-statements.
1362: 
1363: \item
1364: 
1365: As we did with foreach-statements, we may assume that the body of each
1366: tree-statement is a single call-statement.  When executing a tree-statement,
1367: we may assume that the call-statement has already been turned into a pointer
1368: to some pair $(\mathit{name}_0,C_0)$. We then assign that pair directly to $y$
1369: in the new context $C'$ (compare Figure~\ref{figvar}); we no longer apply
1370: $\maketree$.
1371: 
1372: So, in the modified kind of store we use, we assign name--context pairs,
1373: rather than fully specified temporary trees, to tree variables.
1374: 
1375: \item
1376: 
1377: Correspondingly, when executing a statement \textsf{tcopy $y$}, we now
1378: directly turn it into a pointer to the pair assigned to $y$.
1379: 
1380: \item
1381: 
1382: Finally, when executing a vcopy-statement, we do not insert the whole
1383: forest generated by $(\n_1,\dots,\n_k)$
1384: in the configuration (compare
1385: Figure~\ref{figcopy}), but merely insert a sequence of
1386: $k$ pointers to the input subtrees rooted at $\n_1$, \dots, $\n_k$,
1387: respectively.
1388: 
1389: \end{enumerate}
1390: 
1391: We initiate the generation of $\G$ by starting with the initial configuration
1392: as always.  Processing that configuration will add the first tree to $\G$,
1393: which serves as the root tree of the DAG representation.  When all trees in
1394: $\G$ have been fully developed into data trees with pointer nodes, the
1395: algorithm terminates.  In case $P$ does not terminate on $\t$, however, that
1396: will never happen, and we need a way to detect nontermination.
1397: 
1398: Thereto, recall that every context consists of an environment $\E$ and a
1399: context triple $c$ on the one hand, and a store $\S$ on the other hand.  Since
1400: all $\X$-expressions used are input-only, and thus oblivious to the store-part
1401: of a context (except for the input tree, which does not change), we are in an
1402: infinite loop from the moment that there is a cycle in $\G$'s pointer graph
1403: \emph{where we ignore the store-part of the contexts}.  More precisely, this
1404: happens when from a pointer node in a tree identified by $(\mathit{name},C_1)$
1405: we can follow pointers and reach a pointer to a pair $(\mathit{name},C_2)$
1406: with the same $\mathit{name}$ and where $C_1$ and $C_2$ are equal in their
1407: $(\E,c)$-parts.  As soon as we detect such a cycle, we terminate the algorithm
1408: and signal nontermination.  Note that thus the algorithm always terminates.
1409: Indeed, since only input-only $\X$-expressions are used, all contexts that
1410: appear in the computation are input-only, and there are only a finite number
1411: of possible $(\E,c)$-parts of input-only configuration over a fixed input
1412: tree.
1413: 
1414: Let us analyse the complexity of this algorithm.  Since all $\X$-expressions
1415: used are polynomial, there is a natural number $K$ such that each value that
1416: appears in a context is at most $n^K$ long, where $n$ equals the number of
1417: nodes in $\t$.  Each element of such a length-$n^K$ sequence is a node or a
1418: counter over $\t$, so there are at most $(2n)^{n^K}$ different values.  There
1419: are a constant $c_1$ number of different value variables in $P$, so there are
1420: at most $((2n)^{n^K})^{c_1}$ different environments.  Likewise, the number of
1421: different context triples is $(2n)^3$, so, ignoring the stores, there are in
1422: total at most $(2n)^3 \cdot (2n)^{c_1 n^K} \leq 2^{n^{K'}}$ different
1423: contexts, for some natural number $K' \geq K$.
1424: With a constant $c_2$ number of different template names in $P$, we get a
1425: maximal number of $c_2 2^{n^{K'}}$ different configurations that can be added
1426: to $\G$ before the algorithm will surely terminate.
1427: 
1428: It remains to see how long it takes to fully rewrite each of those
1429: configurations into a data tree with pointers.  A configuration initially
1430: consists of at most a constant $c_3$ number of statements.  The evaluation of
1431: $\X$-expressions, which are polynomial, takes at most $c_3 n^K$ time in total.
1432: Processing an apply- or a foreach-statement takes at most $c_3n^K$
1433: modifications to the configuration and to $\G$; for the other statements this
1434: takes at most $c_3$ such operations.  Each such operation, however, involves
1435: the handling of contexts, whose stores can become quite large if treated
1436: naively.  Indeed, tree-statements assign a context to a tree variable,
1437: yielding a new context which may then again be assigned to a tree variable,
1438: and so on.  To keep this under control, we do not copy the contexts literally,
1439: but number them consecutively in the order they are introduced in $\G$.  A map
1440: data structure keeps track of this numbering.  The stores then consist of an
1441: at most constant $c_4$ number of assignments of pairs (name, context number)
1442: to tree variables.  As there are at most $2^{n^{K'}}$ different contexts, each
1443: number is at most $n^{K'}$ bits long.  Looking up whether a given context is
1444: already in $\G$, and if so, finding its number, takes $O(\log 2^{n^{K'}}) =
1445: O(n^{K'})$ time using a suitable map data structure.
1446: 
1447: We conclude that the processing of $\G$ takes a total time of $c_2 2^{n^{K'}}
1448: \cdot O(n^{K'}) = 2^{n^{O(1)}}$, as had to be proven.
1449: \end{proof}
1450: 
1451: A legitimate question is whether the complexity bound given by
1452: Theorem~\ref{complexity} can still be improved.  In this respect we can show
1453: that, even within the limits of real XSLT~1.0, any linear-space turing machine
1454: can be simulated by a 1.0 program.  Note that some PSPACE-complete problems,
1455: such as QBF-SAT \cite{papadimitriou}, are solvable in linear space. This shows
1456: that the time complexity upper bound of Theorem~\ref{complexity} cannot be
1457: improved without showing that PSPACE is properly included in EXPTIME (a famous
1458: open problem).
1459: 
1460: The simulation gets as input a flat tree representing an input string, and
1461: uses the $n$ child nodes to simulate the $n$ tape cells.  For each letter
1462: \tta\ of the tape alphabet, a value variable $\mathit{cell}_{\text{\tta}}$
1463: holds the nodes representing the tape cells that have an \tta.  A value
1464: variable $\it head$ holds the node representing the cell seen by the machine's
1465: head.  The machine's state is kept by additional value variables ${\it
1466: state}_q$ for each state $q$, such that ${\it state}_q$ is nonempty iff the
1467: machine is in state $q$.  Writing a letter in a cell, moving the head left or
1468: right, or changing state, are accomplished by easy updates on the
1469: value-variables, which can be expressed by real XPath~1.0 expressions.
1470: Choosing the right transition is done by a big if-then-else statement.
1471: Successive transitions are performed by recursively applying the simulating
1472: template rule until a halting state is reached.
1473: 
1474: \begin{remark}
1475: A final remark is that our results imply that XSLT~1.0 is not closed under
1476: composition.  Indeed, building up a tree of
1477: doubly exponential size (as we already remarked is possible in XSLT~1.0),
1478: followed by the building up of a tree of exponential
1479: size, amounts to building up a tree of triply exponential size.  If that would
1480: be possible by a single program, then a DAG representation of a triply
1481: exponentially large tree would be computable
1482: in singly exponential time.  It is well known, however, that a DAG
1483: representation cannot be more than singly
1484: exponentially smaller than the tree it
1485: represents.  Closure under composition is another sharp contrast
1486: between XSLT~1.0 and 2.0, as the latter is indeed
1487: closed under composition as already noted in the proof of
1488: Theorem~\ref{completeness}.
1489: \end{remark}
1490: 
1491: \section{Conclusions}
1492: 
1493: W3C recommendations such as the XSLT specifications are no Holy scriptures.
1494: Theoretical scrutinising of W3C work, which is what we have done here, can
1495: help in better understanding the possibilities and limitations of various
1496: newly proposed programming languages related to the Web, eventually leading to
1497: better proposals.
1498: 
1499: A formalisation of the full XSLT~2.0 language, with all the dirty details both
1500: concerning the language itself as concerning the XPath~2.0 data model, is
1501: probably something that should be done.  We believe our work gives a clear
1502: direction how this could be done.
1503: 
1504: Note also that XSLT contains a lot of redundancies.  For example,
1505: foreach-statements are eliminable, as are call-statements, and the match
1506: attribute of template rules.  A formalisation such as ours can provide a
1507: rigorous foundation to prove such redundancies, or to prove correct various
1508: processing strategies or optimisation techniques XSLT implementations may use.
1509: 
1510: A formal tree transformation model denoted by TL, in part inspired by XSLT,
1511: but still omitting many of its features, has already been studied by Maneth
1512: and his collaborators \cite{frank_xsltformal,maneth_pods}.  The TL model can
1513: be compiled into the earlier formalism of ``macro tree transducers''
1514: \cite{mtt,ps_macroforest}.  It is certainly an interesting topic for further
1515: research to similarly translate our XSLT formalisation (even partially) into
1516: macro tree transducers, so that techniques already developed for these
1517: transducers can be applied.  For example, under regular expression types
1518: \cite{xduce} (known much earlier under the name of ``recognisable tree
1519: languages''), exact automated typechecking is possible for compositions of
1520: macro tree transducers, using the method of ``inverse type inference''
1521: \cite{msv_typecheckxml}.  This method has various other applications, such as
1522: deciding termination on all possible inputs \cite{maneth_pods}.  Being able to
1523: apply this method to our XSLT~1.0 formalism would improve the analysis
1524: techniques of Dong and Bailey \cite{bailey_xslt}, which are not complete.
1525: 
1526: \section*{Acknowledgment}
1527: 
1528: We are indebted to Frank Neven for his initial participation in this research.
1529: 
1530: \bibliographystyle{plain}
1531: \begin{thebibliography}{10}
1532: 
1533: \bibitem{xpath}
1534: {XML} path language ({XPath}) version 1.0.
1535: \newblock W3C Recommendation, November 1999.
1536: 
1537: \bibitem{xslt}
1538: {XSL} transformations ({XSLT}) version 1.0.
1539: \newblock W3C Recommendation, November 1999.
1540: 
1541: \bibitem{xsltreq}
1542: {XSLT} requirements version 2.0.
1543: \newblock W3C Working Draft, February 2001.
1544: 
1545: \bibitem{xmlschema}
1546: {XML} schema.
1547: \newblock W3C Recommendation, October 2004.
1548: 
1549: \bibitem{xpath2}
1550: {XML} path language ({XPath}) version 2.0.
1551: \newblock W3C Working Draft, April 2005.
1552: 
1553: \bibitem{xquerypath_datamodel}
1554: {XQuery} 1.0 and {XPath} 2.0 data model.
1555: \newblock W3C Working Draft, April 2005.
1556: 
1557: \bibitem{xquery_formal}
1558: {XQuery} 1.0 and {XPath} 2.0 formal semantics.
1559: \newblock W3C Working Draft, June 2005.
1560: 
1561: \bibitem{xslt2}
1562: {XSL} transformations ({XSLT}) version 2.0.
1563: \newblock W3C Working Draft, April 2005.
1564: 
1565: \bibitem{frank_xsltformal}
1566: G.J. Bex, S.~Maneth, and F.~Neven.
1567: \newblock A formal model for an expressive fragment of {XSLT}.
1568: \newblock {\em Information Systems}, 27(1):21--39, 2002.
1569: 
1570: \bibitem{bgk_xmlcompressed}
1571: P.~Buneman, M.~Grohe, and C.~Koch.
1572: \newblock Path queries on compressed {XML}.
1573: \newblock In J.C. Freytag, P.C. Lockemann, et~al., editors, {\em Proceedings
1574:   29th International Conference on Very Large Data Bases}, pages 141--152.
1575:   Morgan Kaufmann, 2003.
1576: 
1577: \bibitem{bailey_xslt}
1578: C.~Dong and J.~Bailey.
1579: \newblock Static analysis of {XSLT} programs.
1580: \newblock In K.D. Schewe and H.E. Williams, editors, {\em Database
1581:   technologies---Proceedings ADC 2004}, pages 151--160. Australian Computer
1582:   Society, 2004.
1583: 
1584: \bibitem{mtt}
1585: J.~Engelfriet and H.~Vogler.
1586: \newblock Macro tree transducers.
1587: \newblock {\em Journal of Computer and System Sciences}, 31(1):71--146, 1985.
1588: 
1589: \bibitem{xpath_nutshell}
1590: G.~Gottlob, C.~Koch, and R.~Pichler.
1591: \newblock {XPath} processing in a nutshell.
1592: \newblock {\em SIGMOD Record}, 32(2):21--27, 2003.
1593: 
1594: \bibitem{hidders_light}
1595: J.~Hidders, J.~Paredaens, R.~Vercammen, et~al.
1596: \newblock A light but formal introduction to {XQuery}.
1597: \newblock In Z.~Bellahs\`ene, T.~Milo, M.~Rys, et~al., editors, {\em Database
1598:   and XML Technologies---Proceedings XSym}, volume 3186 of {\em Lecture Notes
1599:   in Computer Science}, pages 5--20. Springer, 2004.
1600: 
1601: \bibitem{xduce}
1602: H.~Hosoya and B.C. Pierce.
1603: \newblock {XDuce}: A statically typed {XML} processing language.
1604: \newblock {\em ACM Transactions on Internet Technology}, 3(2):117--148, 2003.
1605: 
1606: \bibitem{saxon}
1607: M.~Kay.
1608: \newblock {SAXON}: The {XSLT} and {XQuery} processor.
1609: \newblock http://saxon.sourceforge.net.
1610: 
1611: \bibitem{xslt_kay}
1612: M.~Kay.
1613: \newblock {\em {XSLT} 2.0 Programmer's Reference}.
1614: \newblock Wrox, 3rd edition, 2004.
1615: 
1616: \bibitem{maneth_macromplexity}
1617: S.~Maneth.
1618: \newblock The complexity of compositions of deterministic tree transducers.
1619: \newblock In M.~Agrawal and A.~Seth, editors, {\em FST TCS 2002 Proceedings},
1620:   volume 2556 of {\em Lecture Notes in Computer Science}, pages 265--276.
1621:   Springer, 2002.
1622: 
1623: \bibitem{maneth_pods}
1624: S.~Maneth, A.~Berlea, T.~Perst, and H.~Seidl.
1625: \newblock {XML} type checking with macro tree transducers.
1626: \newblock In {\em Proceedings 24th ACM Symposium on Principles of Database
1627:   Systems}, pages 283--294. ACM Press, 2005.
1628: 
1629: \bibitem{msv_typecheckxml}
1630: T.~Milo, D.~Suciu, and V.~Vianu.
1631: \newblock Typechecking for {XML} transformers.
1632: \newblock {\em Journal of Computer and System Sciences}, 66(1):66--97, 2003.
1633: 
1634: \bibitem{papadimitriou}
1635: C.H. Papadimitriou.
1636: \newblock {\em Computational Complexity}.
1637: \newblock Addison-Wesley, 1994.
1638: 
1639: \bibitem{linearunification}
1640: M.S. Paterson and M.N. Wegman.
1641: \newblock Linear unification.
1642: \newblock {\em Journal of Computer and System Sciences}, 16(2):158--167, 1978.
1643: 
1644: \bibitem{ps_macroforest}
1645: T.~Perst and H.~Seidl.
1646: \newblock Macro forest transducers.
1647: \newblock {\em Information Processing Letters}, 89(3):141--149, 2004.
1648: 
1649: \bibitem{rosen_cr}
1650: B.K. Rosen.
1651: \newblock Tree-manipulating systems and {Church}-{Rosser} theorems.
1652: \newblock {\em Journal of the ACM}, 20(1):160--187, 1973.
1653: 
1654: \bibitem{essencexml}
1655: J.~Sim\'eon and P.~Wadler.
1656: \newblock The essence of {XML}.
1657: \newblock In {\em Proceedings 30th ACM Symposium on Principles of Programming
1658:   Languages}, pages 1--13. ACM Press, 2003.
1659: 
1660: \bibitem{terese}
1661: Terese.
1662: \newblock {\em Term Rewriting Systems}.
1663: \newblock Cambridge University Press, 2003.
1664: 
1665: \bibitem{wadler_xpath}
1666: P.~Wadler.
1667: \newblock A formal semantics of patterns in {XSLT} and {XPath}.
1668: \newblock {\em Markup Languages: Theory and Practice}, 2(2):183--202, 2000.
1669: 
1670: \end{thebibliography}
1671: 
1672: \clearpage
1673: \appendix
1674: 
1675: \section{Real XSLT programs}
1676: 	\subsection{Figure~\ref{figt2s} in real XSLT}
1677: 	
1678: 	
1679: 	
1680: 	\begin{alltt}
1681: 
1682: 
1683: 	<xsl:transform 
1684: 	  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
1685: 	  version="1.0">
1686: 	
1687: 	  <xsl:template name="tree2string" match="//*">
1688: 	    <a/>
1689: 	    <lbrace/>
1690: 	    <xsl:apply-templates select="child::*"/>
1691: 	    <rbrace/>
1692: 	  </xsl:template>
1693: 	
1694: 	</xsl:transform>
1695: 
1696: 	\end{alltt}
1697: 
1698: 	\subsection{Figure~\ref{figs2t} in real XSLT}
1699: 
1700: \begin{alltt}
1701: 
1702: <xsl:transform 
1703:   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
1704:   version="1.0">
1705: 
1706: <xsl:template match="/doc">
1707:   <xsl:apply-templates select="child::*[1]"/>
1708: </xsl:template>
1709: 
1710: <xsl:template name="string2tree" match="/doc//*">
1711:   <a>
1712:     <xsl:apply-templates select="following-sibling::*[1]" mode="dochildren"/>
1713:   </a>
1714:   <xsl:call-template name="searchnextsibling">
1715:     <xsl:with-param name="counter" select="1"/>
1716:   </xsl:call-template>
1717: </xsl:template>
1718: 
1719: <xsl:template match="//*" mode="dochildren">
1720:   <xsl:if test="name()='lbrace'">
1721:     <xsl:apply-templates select="following-sibling::*[1]" mode="dochildren"/>
1722:   </xsl:if>
1723:   <xsl:if test="name()='a'">
1724:     <xsl:call-template name="string2tree"/>
1725:   </xsl:if>
1726: </xsl:template>
1727: 
1728: <xsl:template name="searchnextsibling" match="//*" mode="search">
1729:   <xsl:param name="counter"/>
1730:   <xsl:if test="name()='lbrace'">
1731:     <xsl:apply-templates select="following-sibling::*[1]" mode="search">
1732:       <xsl:with-param name="counter" select="$counter + 1"/>
1733:     </xsl:apply-templates>
1734:   </xsl:if>
1735:   <xsl:if test="name()='a'">
1736:     <xsl:apply-templates select="following-sibling::*[1]" mode="search">
1737:       <xsl:with-param name="counter" select="$counter"/>
1738:     </xsl:apply-templates>
1739:   </xsl:if>
1740:   <xsl:if test="name()='rbrace'">
1741:     <xsl:if test="$counter=2">
1742:       <xsl:apply-templates select="following-sibling::*[1]" mode="dochildren"/>
1743:     </xsl:if>
1744:     <xsl:if test="$counter>2">
1745:       <xsl:apply-templates select="following-sibling::*[1]" mode="search">
1746:         <xsl:with-param name="counter" select="$counter - 1"/>
1747:       </xsl:apply-templates>
1748:     </xsl:if>
1749:   </xsl:if>
1750: </xsl:template>
1751: 
1752: </xsl:transform>
1753: 
1754: \end{alltt}
1755: 
1756: \end{document}
1757: 
1758: