1: \documentclass[11pt]{article}
2:
3: %\usepackage{ijcai01}
4: %\usepackage{fullpage,palatino}
5: \usepackage{fullpage,url}
6: \setlength{\oddsidemargin}{-0.25in}
7: \setlength{\evensidemargin}{-0.25in}
8: \setlength{\topmargin}{0.5in}
9: \setlength{\headheight}{0pt}
10: \setlength{\headsep}{0pt}
11: \setlength{\footskip}{0.35in}
12: \setlength{\textheight}{8.75in}
13: \setlength{\textwidth}{7in}
14: \setlength{\itemindent}{-0.5cm}
15: \setlength{\marginparwidth}{0in}
16: \setlength{\marginparsep}{0in}
17: \hyphenation{inform-ation-seeking inform-ation}
18: \newenvironment{descit}[1]{\begin{quote} \textit{#1}}{\end{quote}}
19:
20: \input{psfig-dvips}
21:
22: \newif\ifpdf
23: \ifx\pdfoutput\undefined
24: \pdffalse
25: \else
26: \pdfoutput=1
27: \pdftrue
28: \fi
29:
30: \ifpdf
31: \usepackage[pdftex]{graphicx}
32: \usepackage[pdftex]{color}
33: \DeclareGraphicsExtensions{.pdf,.png,.jpg}
34: \else
35: \usepackage[dvips]{graphicx}
36: \usepackage[dvips]{color}
37: \DeclareGraphicsExtensions{.eps,.epsi,.ps}
38: \fi
39:
40: \usepackage{times}
41: %\usepackage{fancyheadings}
42:
43: \pagestyle{plain}
44: %\thispagestyle{empty}
45: %\pagestyle{empty}
46:
47: \def\midv{\mathop{\,|\,}}
48: \newtheorem{defn}{Definition}
49: \long\def\cbk#1{{\color{red}[CBK: #1]}}
50: \newlength\colwidth \setlength\colwidth{3.25in}
51:
52: \title{The Partial Evaluation Approach to \\
53: Information Personalization}
54:
55: \author{Naren Ramakrishnan and Saverio Perugini\\
56: Department of Computer Science\\
57: Virginia Tech, Blacksburg, VA 24061\\
58: Email:\{naren,sperugin\}@cs.vt.edu}
59:
60: \begin{document}
61: %\renewcommand{\baselinestretch}{2}
62:
63: \maketitle
64: %\thispagestyle{empty}
65: %\pagestyle{empty}
66:
67: \begin{abstract}
68: \noindent
69: Information personalization refers to the automatic adjustment of information content,
70: structure, and presentation tailored to an individual user. By reducing information
71: overload and customizing information access, personalization systems have emerged as an
72: important segment of the Internet economy. This paper presents a systematic modeling methodology ---
73: PIPE (`Personalization is Partial Evaluation') --- for personalization.
74: Personalization systems are designed and implemented in PIPE by modeling
75: an information-seeking interaction in a programmatic representation. The representation
76: supports the description of information-seeking activities as partial information and their
77: subsequent realization by {\it partial
78: evaluation}, a technique for specializing programs.
79: We describe the modeling methodology at a conceptual level and outline representational choices.
80: We present two application case studies that use PIPE for personalizing web sites and describe
81: how PIPE suggests a novel evaluation criterion for information system designs. Finally,
82: we mention several fundamental implications of adopting the PIPE model for personalization and
83: when it is (and is not) applicable.
84: \end{abstract}
85: \newpage
86: \tableofcontents
87: \newpage
88: %\begin{descit}{}
89: %
90: %\noindent
91: %I have observed that many good ideas start out by claiming too much territory for themselves,
92: %and eventually, when they have received their fair share of attention and respect, the air
93: %clears and it emerges that, though still grand, they are not quite so grand and all-encompassing
94: %as their proponents first thought. But that's all right. $\ldots$ That would be a fine start.
95: %\flushright{Douglas R. Hofstadter, in {\it Analogy as the Core of Cognition}}
96: %\end{descit}
97: \section{Introduction}
98: %\renewcommand{\baselinestretch}{2}
99:
100: One of the main contributions of information systems research is
101: the development of models that allow the
102: specification and realization of information-seeking activities.
103: Besides formalizing important operations, such
104: models provide a vocabulary with which to reason about the information-seeking activity.
105: For instance, if an information space is modeled as a term-document matrix, then the
106: vector-space model permits the view of retrieval as measuring similarities between document
107: vectors. Similarly, the modeling of data as a set of relations in a database system
108: affords expressive
109: query languages such as SQL. Other models and modeling methodologies can be found
110: in interactive information retrieval applications~\cite{scatter-gather,tkde-navigation,rabbit}.
111: Our goal in this paper is to present a modeling methodology for information personalization.
112:
113: Personalization constitutes the mechanisms and technologies required to
114: customize information access to the end-user. It can be defined as
115: the automatic adjustment of information content, structure, and presentation
116: tailored to an individual user. The reader will be familiar with instances of
117: personalization such as web sites that welcome a returning user and
118: recommender systems~\cite{adomavicius01expert-driven,specissue}
119: at sites such as {\tt amazon.com}. The scope of
120: personalization today extends beyond web pages and web sites~\cite{terveen} to many different
121: forms of information content and
122: delivery~\cite{cacm-broader,cacm-kantor,cacm-streams}.
123: The underlying algorithms and techniques range from simple keyword matching
124: of consumer profiles, to explicit~\cite{ira,grouplens,phoaks} or
125: implicit~\cite{cacm-jaideep,cacm-myra} capture of user interaction.
126:
127: Despite its apparent popularity in reducing
128: information overload on the Internet, personalization suffers from a lack of any rigorous model
129: or modeling methodology. One of the main reasons is that there are `personal
130: views of personalization~\cite{cacm-personal}.' There are hence as many ways
131: to design and build a personalization system as there are interpretations for what
132: personalization means. Such a diversity presents a difficulty when studying conceptual
133: models of personalization, in general.
134:
135: %Consequently, a large number of dichotomies have been proposed
136: %that classify personalization research according to the
137: %philosophies of the underlying domains, the proposers, and their
138: %parent communities. Thus, categorizations such as
139: %content-based versus collaborative~\cite{adomavicius01expert-driven,specissue},
140: %`customization versus transformation' \cite{adaptive-sites}, non-destructive
141: %versus destructive, `public transportation versus hot-rods' \cite{rus} have
142: %become widely accepted. These dichotomies are based on the form of personalization,
143: %the level at which it is targeted, and the types of information used in
144: %delivering the personalization.
145:
146: We present the first
147: (to the the best of our knowledge)
148: systematic modeling methodology for information personalization.
149: Termed PIPE (`Personalization is Partial Evaluation')~\cite{naren-ic}, our methodology makes no commitments to a
150: particular
151: algorithm, format for information resources,
152: type of information-seeking activities or, more basically, the nature
153: of personalization delivered. Instead, it emphasizes the modeling of an
154: information space in a way where descriptions of
155: information-seeking activities can be
156: represented as partial information. Such
157: partial information is then exploited (in the model) by
158: {\it partial evaluation}, a technique popular in the programming languages
159: community~\cite{jones}.
160:
161: %In important ways,
162: %PIPE advances personalization research as well as our understanding of how to build information
163: %systems. PIPE enables the view of personalization as
164: %specializing representations.
165: %%personalization, not in terms of particular algorithms or targeting goals, but in
166: %%terms of representations (of information systems).
167: %%This viewpoint provides a framework to study questions
168: %%such as `how should I design my information system so that it is personalized for my users?'
169: %It requires the modeling of {\it interaction} with an information
170: %system in a suitable representation. Once we conduct the modeling, personalization is achieved
171: %quite literally, `for free.' Our modeling methodology also allows
172: %personalization to be provided in conjunction with
173: %other information-seeking activities such as browsing.
174: %Finally, PIPE contributes a novel evaluation criterion
175: %for information system designs. It relates
176: %personalization to the way an information system design is {\it factored}.
177: %The evaluation criterion allows us to qualify the {\it personability} of an
178: %information system for a particular information-seeking activity, for example
179: %`web site X is 60\% more personalized for activity Y than web site Z.'
180: %
181:
182: While our ideas and results apply to many forms of computerized
183: information systems (e.g., web-based, voice-activated),
184: we restrict our attention to web sites in this paper.
185: Later in our discussion, we
186: qualify the range of information systems technologies to which PIPE can be applied.
187:
188: \subsection*{Reader's Guide}
189: Section~\ref{example} introduces the basic concepts of PIPE with the example of personalizing
190: a browsing hierarchy on the web. Section~\ref{basics} outlines the
191: PIPE modeling
192: methodology and how it can be used
193: for representing a variety of situations.
194: Section~\ref{studies} describes two
195: application studies that use PIPE for personalizing web sites.
196: Evaluation aspects implied by PIPE as
197: a modeling methodology are also described here.
198: Section~\ref{discuss}
199: describes connections between PIPE and other approaches, and carefully
200: qualifies situations where PIPE is
201: (and is not) applicable.
202: %Emerging scenarios that can be realized with PIPE and PIPE-like methodologies are also presented here.
203: Finally, Section~\ref{conclu} summarizes the major
204: contributions of this work.
205: \section{Motivating Example}
206: \label{example}
207:
208: Consider a consumer visiting an automobile dealership
209: to purchase a vehicle. Here are two possible scenarios.
210:
211: \newpage
212: \begin{descit}{Scenario 1}
213: \vspace{-0.1in}
214: \begin{tabbing}
215: {\bf Dealer:} \= Madam, are you looking to purchase a passenger vehicle?\\
216: {\bf Buyer:} \> Yes.\\
217: {\bf Dealer:} \> Do you have a particular manufacturer in mind?\\
218: {\bf Buyer:} \> I know that cars made by Honda have the highest safety approval rating.\\
219: {\bf Dealer:} \> That is true. Honda comes in seven colors. Do you have a preference for color?\\
220: {\bf Buyer:} \> The `cyclone blue' looks pleasing.\\
221: (conversation continues to ascertain further details of the vehicle)
222: \end{tabbing}
223: \end{descit}
224:
225: \begin{descit}{Scenario 2}
226: \vspace{-0.1in}
227: \begin{tabbing}
228: {\bf Dealer:} \= Sir, may I interest you in anything?\\
229: {\bf Buyer:} \> I am looking for a sport utility vehicle.\\
230: {\bf Dealer:} \> Sure, do you have a particular manufacturer in mind?\\
231: {\bf Buyer:} \> Not really, but the vehicle should be Red and made in 2001.\\
232: {\bf Dealer:} \> I see. \\
233: {\bf Buyer:} \> And by the way, I don't care for the fancy doormats and fittings.\\
234: {\bf Dealer:} \> Of course.\\
235: (conversation continues)
236: \end{tabbing}
237: \end{descit}
238: \noindent
239: %The sophistication in these scenarios arises from the {\it mixed-initiative} nature of human
240: %conversations.
241: In the first scenario, the conversation is directed by the dealer,
242: and the buyer merely answers questions posed by the dealer. The second scenario
243: resembles the first upto a point, after which the buyer takes the initiative and
244: provides answers `out of turn.' When queried about manufacturer, the buyer responds
245: with information about color and year of manufacture instead. Nevertheless,
246: the conversation is not stalled and both parties continue the dialog
247: to (eventually) complete the information assessment task. At each stage in the
248: above conversations, the buyer has the choice of proceeding along the lines of inquiry initiated
249: by the dealer
250: or can shift gears and address a different aspect of information assessment. Scenarios that
251: `mix' these two modes of inquiry in such arbitrary ways constitute the
252: scope of {\it mixed-initiative} interaction~\cite{mixed-notkin}.
253:
254: \begin{figure}
255: \centering
256: \begin{tabular}{cc}
257: \includegraphics[width=8.25cm,height=6cm]{auto1.eps}
258: \includegraphics[width=7.4cm,height=6cm]{auto2.eps}
259: \end{tabular}
260: \begin{tabular}{cc}
261: \includegraphics[width=8.25cm,height=9.89cm]{auto4.eps}
262: \includegraphics[width=7.4cm,height=9.89cm]{auto3.eps}
263: \end{tabular}
264: %\begin{tabular}{cc}
265: %& \mbox{\psfig{figure=stupidautos.eps,width=5.5in}}
266: %\end{tabular}
267: \caption{Four typical solutions to organizing web catalogs. (top left)
268: A hardwired scenario.
269: (top right) A choice of two hardwired scenarios.
270: (bottom left) Complete enumeration involving all possible scenarios of interaction.
271: (bottom right) A `power-search' form that hides details of enumeration.}
272: \label{auto-solutions}
273: \end{figure}
274: Can we support a similar diversity of interaction in an online information system?
275: In other words, the system should have a default mode of interaction where a user
276: would fill in forms (or click on choices) in a specified order. A more
277: enterprising user should be able to supply any piece of information out
278: of turn. Finally, it should be possible to
279: mix these two modes of interaction in any order.
280: %While the term `mixed-initiative' affords many interpretations~\cite{mixed-notkin},
281: %we adopt an operational definition from our view of a dialog as
282: %an information assessment activity. By mixed-initiative, we emphasize the ability at any
283: %point to invoke (and later, return from) a subdialog whenever requested or specified by the
284: %user. This particular aspect of dialog structure in the context of information-seeking
285: %has been termed `goal-specification subdialogs' in~\cite{mixed-hci}.
286: %
287: At each stage of the interaction (whether system-initiated or user-requested), the system should
288: respond with the appropriate set of choices available. For instance, notice the
289: restriction to
290: seven colors once the decision on Honda is made in {\it Scenario 1}. If the choice
291: of color was made at the outset, presumably more selections would have been available.
292: A system that supports such a diversity of interaction would be personalized to
293: a user's individual preference(s) for information-seeking.
294:
295: The typical solution involves
296: anticipating the forms of interactions that have to be supported and designing interfaces
297: to support the implied scenarios (in this paper, we use the term `scenarios' to mean scenarios
298: of interaction).
299: Fig.~\ref{auto-solutions} describes four typical solutions that make various assumptions
300: on the scenarios that will be supported.
301: Fig.~\ref{auto-solutions} (top left)
302: can only support situations such as {\it Scenario 1} above, in that the user is forced to make
303: a choice of manufacturer at the outset (and all remaining levels are similarly fixed).
304: We refer to this as a design that hardwires scenarios.
305: Fig.~\ref{auto-solutions} (top right) also hardwires scenarios,
306: but provides a choice of two such hardwired scenarios (i.e., search by model
307: or search by price). Fig.~\ref{auto-solutions}
308: (bottom left) is what we refer to as {\it complete enumeration}, which involves
309: enumerating all possible scenarios and providing interfaces to all of them~\cite{hearst-setting}.
310: While the interface in Fig.~\ref{auto-solutions} (bottom left) only depicts the top-level choice, we
311: could imagine that such multiplicity of choices are duplicated at all lower levels.
312: It is clear that enumeration could involve an exponential number of possibilities and correspondingly
313: cumbersome site designs.
314: And finally, Fig.~\ref{auto-solutions} (bottom right) provides the same
315: functionality as Fig.~\ref{auto-solutions} (bottom left)
316: but masks the details of enumeration in a convenient `power-search' form.
317:
318: All of these solutions rely on anticipating the points where an out-of-turn interaction
319: can occur and provide mechanisms to support it. When opportunities
320: for out-of-turn interaction are too restrictive, information systems cause major frustrations to users.
321: The basic problem is the representational mismatch between the user's mental model of the
322: information-seeking activity and the facilities that are available for describing the activity.
323:
324: In Fig.~\ref{stupidyellow}, the user is attempting to decide on an automotive retailer
325: based on the services offered. He is open to
326: the possibility of traveling to a different city in order to make his purchase. He is thus
327: unsure of providing information about the location of the retailer, but the system insists that he
328: make this choice first. The reader can identify with examples such as these from other personal
329: experiences.
330:
331: \begin{figure}
332: \centering
333: \begin{tabular}{cc}
334: \includegraphics[width=8.25cm,height=6.8cm]{yellow}
335: %& \mbox{\psfig{figure=yellowstupid.eps,width=5.5in}}
336: \end{tabular}
337: \caption{An interface that prohibits certain information-seeking activities from
338: being decribed.}
339: \label{stupidyellow}
340: \end{figure}
341:
342: \subsection{The PIPE Approach}
343: We present an alternative design approach, one that promotes out-of-turn interaction without
344: predefining the points where such interaction can take place.
345: %and forms in which the information-seeker (buyer, in the above example)
346: %can barge in and alter the flow of interaction (conversation, in the above example).
347: Consequently, the
348: interfaces produced by our approach are, at once, both more expressive and simpler than the ones
349: in Fig.~\ref{auto-solutions}.
350:
351: \begin{figure}
352: \centering
353: \begin{tabular}{|l|l|} \hline
354: {\tt int pow(int base, int exponent) \{} & {\tt int pow2(int base) \{} \\
355: \,\,\,\,\,{\tt int prod = 1;} & \,\,\,\,\,{\tt return (base * base)} \\
356: \,\,\,\,\,{\tt for (int i=0;i<exponent;i++)} & \} \\
357: \,\,\,\,\,\,\,\,\,\,{\tt prod = prod * base;} & \\
358: \,\,\,\,\,{\tt return (prod);} & \\
359: \} & \\
360: \hline
361: \end{tabular}
362: \caption{Illustration of the partial evaluation technique.
363: A general purpose {\tt pow}er function written in C (left) and
364: its specialized version (with {\tt exponent} statically set to 2) to handle squares
365: (right). Such specializations are performed automatically by partial evaluators such as C-Mix.}
366: \label{pe}
367: \end{figure}
368:
369: \begin{figure}
370: \centering
371: \begin{tabular}{cc}
372: & \mbox{\psfig{figure=autohierarchy2.eps,width=5in}}
373: \end{tabular}
374: \caption{Personalizing a browsing hierarchy. (left)
375: Original information resource. (right) Personalized hierarchy with respect to vehicles
376: made in 2001. Notice that not only the pages, but also their structure is customized for (further
377: browsing by) the user.}
378: \label{pipe-illustrate}
379: \end{figure}
380:
381: \begin{figure}
382: \centering
383: \begin{tabular}{|l|l|} \hline
384: {\tt if (Blue)} & \\
385: \,\,\,\,{\tt if (2001)} & \\
386: \,\,\,\,\,\,\,\,{\tt if (Honda)} & \\
387: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & {\tt if (Blue)}\\
388: \,\,\,\,\,\,\,\,{\tt else if (Toyota)} & \,\,\,\,{\tt if (Honda)} \\
389: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$}\\
390: \,\,\,\,{\tt else if (2000)} & \,\,\,\,{\tt else if (Toyota)}\\
391: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
392: {\tt else if (Red)} & {\tt else if(Red)} \\
393: \,\,\,\,{\tt if (2001)} & \,\,\,\,{$\cdots \cdots \cdots$} \\
394: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \\
395: \,\,\,\,{\tt else if (2000)} & \\
396: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \\
397: \hline
398: \end{tabular}
399: \caption{Using partial evaluation for personalization. (left) Programmatic input to
400: partial evaluator, reflecting the organization of information in Fig.~\ref{pipe-illustrate} (left).
401: (right) Specialized program from the partial evaluator, used to create the personalized
402: information space shown in Fig.~\ref{pipe-illustrate} (right).}
403: \label{pe2}
404: \end{figure}
405:
406: Let us begin by considering the scenario where a user obediently supplies information attributes
407: in the order requested. For ease of presentation, we assume that there are three attributes
408: --- color, year of manufacture, and manufacturer --- and that the information system ascertains
409: values for them in this order. The key contribution of PIPE is to cast this seemingly
410: inflexible and hardwired scenario in a representation that allows its automatic transformation
411: into other scenarios. In particular, PIPE represents an information space
412: as a program, partially evaluates the program with respect to (any) user input, and
413: recreates a personalized information space from the specialized program.
414:
415: %\subsubsection*{Partial Evaluation}
416: The input to a partial evaluator
417: is a program and (some) static information about its arguments. Its
418: output is a specialized version of this program (typically in the same
419: language),
420: that uses the static information to `pre-compile' as many operations
421: as possible. A simple example is how the C function {\tt pow}
422: can be specialized to create a new function, say
423: {\tt pow2}, that computes the square of an integer.
424: Consider for example,
425: the definition of a {\tt pow}er function shown in the left part of Fig.~\ref{pe}.
426: If we knew that a particular user will utilize it
427: only for computing squares of
428: integers, we could specialize it (for that user) to produce the {\tt pow2}
429: function.
430: Thus, {\tt pow2} is obtained automatically (not by a human programmer)
431: from {\tt pow} by precomputing all expressions that involve {\tt exponent},
432: unfolding the for-loop, and by various other compiler transformations such as
433: {\it copy propagation} and {\it forward substitution}.
434: Automatic program specializers are available for C, FORTRAN, PROLOG, LISP, and several other
435: important
436: languages. The interested reader is referred to \cite{jones} for a good introduction.
437: While the traditional motivation for using partial evaluation is to achieve speedup
438: and/or remove interpretation overhead \cite{jones}, it can also be viewed as a technique
439: for simplifying program presentation, by removing inapplicable, unnecessary,
440: and `uninteresting' information (based on user criteria) from a program.
441:
442: %\subsubsection*{Using Partial Evaluation for Personalization}
443: Consider the hardwired scenario depicted in Fig.~\ref{pipe-illustrate} (left).
444: We can abstract this hierarchy by the program
445: in Fig.~\ref{pe2} (left) whose structure models the information resource (in
446: this case, a hierarchy of web pages) and whose control-flow models
447: the information-seeking activity within it (in
448: this case, browsing through the hierarchy by making individual selections). The link
449: labels are represented as program variables and semantic dependencies between links
450: are captured by the mutually-exclusive {\tt if..else} dichotomies. As it is modeled in
451: Fig.~\ref{pe2} (left), the program reflects the assumption that
452: the choice of year is usually made at the second level, after a color selection has been
453: made. However, to personalize for the user who says `2001' at the outset, we partially
454: evaluate the program with respect to the variable {\tt 2001} (setting it to one and all
455: conflicting variables such as {\tt 2000} to zero). This produces the simplified
456: program in Fig.~\ref{pe2} (right), which can
457: be used to recreate web pages with personalized web content (shown in Fig.~\ref{pipe-illustrate},
458: right). The second level of the hierarchy is simplified,
459: bringing the originally third level as the new second level. The
460: user is able to provide the value of any deeply
461: nested variable out of turn, thus achieving mixed-initiative interaction.
462:
463:
464: %Executing the program in the
465: %order and form in which it was modeled amounts to the system-initiated mode of `browse as I say.'
466: %`Jumping ahead' to nested program segments by partially evaluating the program amounts to
467: %the user-directed mode of personalization.
468:
469: \subsection{Some Preliminary Observations}
470: \label{observe}
471:
472: Personalization systems are thus designed and implemented in PIPE by modeling
473: an information-seeking activity in a programmatic representation. The above
474: example has been carefully constructed to highlight the many advantages and
475: opportunities provided by PIPE. Before we describe PIPE in detail, it will be
476: helpful to summarize the lessons from the above example.
477:
478: \begin{enumerate}
479: \setlength{\leftmargin}{0in}
480: \item {\it PIPE equates personalization to specializing representations.}
481: %\vspace{-0.1in}
482: As a methodology, PIPE asserts that if interaction in an information space can
483: be represented as a program, then a personalized information space can be automatically
484: generated by partial evaluation. It is upto the designer to supply the representation
485: as a program and reinterpret the program in information systems terms.
486: The meaning of the programmatic representation is thus
487: external to the basis for personalization (partial evaluation).
488:
489: For instance,
490: the act of clicking on the `Honda' hyperlink to browse through Honda cars is captured
491: in Fig.~\ref{pe2} by just the expression {\tt if (Honda)}. Clicking on the link amounts
492: to evaluating this conditional to be true.
493: %A partial evaluator has no semantic
494: %understanding of `clicking.' It is only the designer's creativity (or imagination)
495: %that allows him to assign such an understanding. It is thus imperative that we
496: %have a meaningful mapping
497: %of programming constructs to aspects of
498: %interaction with an information system.
499: %
500: %In some cases, these mappings are obvious. A
501: The conditional construct {\tt if} is thus used as a logical
502: point where the state of
503: information is tested before proceeding any further. It could model either
504: a hyperlink that has to be clicked or a free-form text box whose entries are evaluated.
505: %Assigning an interpretation to constructs such as {\tt goto}
506: %is more tricky (but see Section~\ref{methods}).
507:
508: \item {\it The effectiveness of PIPE depends on what is modeled (and how).}
509: The effectiveness of a PIPE implementation depends on the
510: the particular modeling choices made {\it within} the programmatic
511: representation (akin
512: to~\cite{rabbit}).
513: We cannot overemphasize this aspect --- the example
514: in Fig.~\ref{pe2} can be made `more personalized' by conducting
515: a more sophisticated modeling of the underlying domain. For instance, information such
516: as vehicle VIN numbers, history of ownership,
517: mileage on the vehicle, and photos of the car can be further modeled as a browsable
518: hierarchy and `attached' (functionally invoked) at various places
519: in the program of Fig.~\ref{pe2} (left). Conversely the example in Fig.~\ref{pe2} (left)
520: can be made `less personalized' by, for instance, requiring categorical information along with user input.
521: Replacing {\tt if (2001)} in Fig.~\ref{pe2} (left)
522: with {\tt if (Year=2001)} implies that the specification of the type of
523: input (namely that `2001' refers to the year of manufacture) is required
524: in order for the statement to be partially evaluated.
525: Personalization systems built with PIPE can thus be distinguished by what
526: they model and the forms of customization enabled by applying
527: partial evaluation to such a modeling.
528:
529: Similarly, the way in which program variables are associated with user input
530: can influence the effectiveness of a PIPE implementation.
531: Values for program variables
532: could come from a content-based technique or a so-called
533: collaborative technique. For instance, the variable {\tt Honda} could be set to true, either
534: because the user explicitly said so, or because `Honda' was recommended to the user by an automatic
535: recommender system. In addition, different variables could afford different
536: interpretations.
537:
538: Sometimes we can take advantage of a domain semantics when associating values
539: with program variables or in modeling the program.
540: Fig.~\ref{pe2} models a `strict' semantics of variable assignment
541: by the {\tt if..else} dichotomies. If {\tt Blue} is evaluated to true, then every other option
542: qualified by the {\tt else} constructs (such as {\tt Red}) would be automatically removed from
543: further consideration. This is due to our assumption that if the user declares `Blue' as his
544: preference, then he would not be interested in Red cars. If such a semantics
545: is not appropriate, then we would not have {\tt else} clauses in our conditionals.
546: Thus, PIPE doesn't dictate what the domain semantics (for assigning program variables) should be
547: or even that it should be available. But it can take advantage of a domain semantics, if one exists.
548:
549: %Consider again Fig.~\ref{pe2}, where the user
550: %declares `2001' as his preference and we partially evaluate
551: %w.r.t. {\tt 2001}. What is being simplified (actually, removed) in this example is
552: %the test {\tt if (2001)} and along with it,
553: %a web page that requires the user to click on `2001.'
554: %The reader might notice that we also set the program variable {\tt 2000} to be false. This
555: %is due to our understanding of how the personalization scenario is expressible as partial inputs.
556: %When the user declares `2001,' we can reason that she will not be interested in vehicles made
557: %in other years, and set the corresponding variables to zero. We reiterate that PIPE doesn't
558: %require that we set {\tt 2000} to be false; it comes from our understanding of what the user's
559: %interests are.
560: %
561: %Likewise, if the user declares `Blue,' we can set the {\tt Blue} variable to one and all
562: %other conflicting variables such as {\tt Red} to zero. Once again, this doesn't mean that `Blue'
563: %implies `not Red'; it just means that we are using a domain semantics that enables us
564: %to make simplifications. If the user could still be interested in Red cars, then we
565: %do not set {\tt Red} to zero.
566: %
567: Finally, the translation of the program from and back to
568: the information space could be done in different ways. In Fig.~\ref{pe2} (left) we modeled the
569: program by abstracting hyperlinks across pages as conditionals. When we recreate personalized
570: pages from
571: Fig.~\ref{pe2} (right) we are not obliged to this design choice. We could cascade all the
572: interactions to within a single page, for instance. PIPE only requires that the designer of
573: the information system has a way of going from an information space to a programmatic
574: representation, and back again. Section~\ref{basics} covers modeling options
575: in detail.
576:
577: % how to set variables, how to employ a domain theory
578: % goal-oriented
579: % interaction sequences are bounded
580: % common criticism of our work is..
581: \item {\it PIPE separates modeling for a personalization system
582: from the operational aspect of personalization.}
583: Personalization systems are usually described in terms of the techniques
584: that provide personalization or the level at which the information is tailored. Due to the
585: variety possible, comparisons of personalization systems have been
586: difficult to make. PIPE, on the other hand, shifts the focus to modeling for
587: a personalization system. Any form of personalization is possible if the modeled
588: program allows the pertinent scenarios to be expressible as partial inputs. In
589: Fig.~\ref{pe2}
590: we cannot personalize cars with respect to occupancy, not because of
591: any fundamental limitation in our personalization methodology, but because
592: {\tt occupancy} is not available as a program variable. Similarly, we cannot
593: personalize cars with respect to the {\it Edmund's Car Guide} recommendations,
594: because the latter information resource has not been modeled.
595: The separation of modeling
596: from the operational aspect of conducting personalization means that we can
597: devote our attention to modeling the interaction in as sophisticated a manner
598: as required. It also means that we have to distinguish between evaluating an
599: implementation of the PIPE
600: methodology from an evaluation of the methodology itself.
601:
602: \begin{figure}
603: \centering
604: \begin{tabular}{cc}
605: \includegraphics[width=3in]{sketchy}
606: %& \mbox{\psfig{figure=yellowstupid.eps,width=5.5in}}
607: \end{tabular}
608: \caption{Sketch of a PIPE interface to a traditional browser. The interface retains the
609: existing browsing functionality at all times. At any point in the
610: interaction, in addition, the user has
611: the option of supplying personalization parameters and conducting personalization
612: (bottom two windows). Such an interface can be implemented as a toolbar option in existing
613: systems.}
614: \label{toolbar}
615: \end{figure}
616:
617: %To our surprise, we have experienced remarkable resistance in propagating the view that
618: %PIPE is a modeling methodology, as different from a `system.' A frequent comment from visitors
619: %to our example applications at {\tt pipe.cs.vt.edu} is:
620: %\begin{descit}{}
621: %``PIPE is unimpressive since only subgraphs of the original site are being presented
622: %back to the user. No restructuring of the document content is provided.''
623: %\end{descit}
624: %\noindent
625: %This is a limitation of the {\it particular application} of PIPE (which does not model document
626: %content) than the methodology. Only
627: %the site graph (as in Fig.~\ref{pe2}) was modeled and hence only personalizations pertaining
628: %to manipulating the site graph are possible.
629: %In Section~\ref{options} we outline a variety of ways to conduct more sophisticated modeling,
630: %including modeling document content. In Section~\ref{factor}, we address
631: %limitations of the PIPE modeling methodology itself.
632:
633: % it is interaction that is personalized. not content-based collaborative, etc.
634: % it is upto us to interpret what the represntation means
635: % sometimes easy: if
636: % sammy: sometimes difficult: goto
637: % marcos: what is removed is the "link," We can go further.
638:
639: % say dfs
640: % what is the representation: a compaction of interaction sequences
641: % define interaction sequence at this point
642: % once we know how to model interaction, we get P for free.
643: \item {\it The PIPE personalization operator is closed.}
644: Since the partial evaluation of a program results in another program, the PIPE
645: personalization operator is closed. In terms of interaction, this means that
646: any modes of information-seeking (such as browsing, in Fig.~\ref{pe2})
647: originally modeled in the program are preserved. In the above example, personalizing a browsable
648: hierarchy returns another browsable hierarchy. The closure property also means that the
649: original information-seeking activity (browsing) and personalization can be interleaved in
650: any order. Executing the program in the
651: order and form in which it was modeled amounts to the system-initiated mode of `browse as I say.'
652: `Jumping ahead' to nested program segments by partially evaluating the program amounts to
653: the user-directed mode of personalization.
654: In Fig.~\ref{pe2}, the simplified program can be browsed in the traditional sense,
655: or partially evaluated further with additional user inputs.
656: PIPE's use of partial evaluation is thus central to realizing a mixed-initiative mode of
657: information-seeking, without explicitly hardwiring all possible scenarios
658: of interaction (including out-of-turn interactions). A sketch of an interface design for
659: such mixed-initiative interaction is provided in Fig.~\ref{toolbar}.
660:
661: \item {\it PIPE is most advantageous in information spaces that afford nested representations
662: of interactions and where information-seeking activities can involve out-of-turn interactions.}
663: For browsing hierarchies, a nested programmatic model can be trivially built by a depth-first
664: crawl of the site (as in Fig.~\ref{pe2}). Not only is this modeling appropriate, it is
665: also concise and makes the advantages of partial evaluation obvious.
666:
667: \begin{figure}
668: \centering
669: \begin{tabular}{|l|} \hline
670: {\tt if (Returning Customer)} \\
671: \,\,\,\,{\tt /* be nice to her */} \\
672: {\tt else} \\
673: \,\,\,\,{\tt /* just show usual catalog */} \\
674: \hline
675: \end{tabular}
676: \caption{A modeling of an information space that involves only one level of interaction.}
677: \label{pe4}
678: \end{figure}
679:
680: %Even though the hierarchy is designed in a color-year-model order, PIPE
681: %can accept values for program variables in any order.
682: %Further, the size of the program in Fig.~\ref{pe2}
683: %(left) is a small fraction of what it would be if we were to enumerate scenarios in an exhaustive
684: %fashion. Recall that enumerating scenarios to support all forms of barge in leads to
685: %cumbersome designs such as Fig.~\ref{auto-solutions} (bottom left).
686: %In addition to mirroring the browsing hierarchy, our nested representation makes the advantages
687: %of partial evaluation obvious.
688: %
689: On the other hand,
690: consider a web site
691: that determines (perhaps by a cookie~\cite{cookies}) if a user is a
692: returning customer and does something different based on this information.
693: Modeling (only this) interaction can be done by the program in Fig.~\ref{pe4}.
694: While partial evaluation is still applicable, it cannot do anything fancy since
695: there is only one variable ({\tt Returning Customer}) to specify values for. There is
696: no deeply nested variable whose value can be supplied out of turn.
697: %In addition
698: %A nested representation is possible only if the space of information-seeking
699: %activities is structured for us to take advantage of commonality across different
700: %scenarios of interaction.
701:
702: Similarly, if all users would like to browse through the catalog in Fig.~\ref{pe2} by a
703: color-year-model motif, then there is really only one way in which the catalog is being used.
704: This usage mirrors the way in which the catalog is modeled, without any out-of-turn interactions.
705: Partial evaluation is thus not necessary to support the information-seeking goals of any user.
706:
707: The presence of out-of-turn interactions implies different rates of specification for
708: different aspects of information seeking, causing a rich variety of possible interactions.
709: In such a case, PIPE can be viewed as a technique that realizes a particular interaction sequence
710: by combinations of simplification and normal execution.
711: In Section~\ref{factor}, we show more formally which representations (and
712: which information spaces) are best suited for personalization by partial evaluation.
713:
714: \end{enumerate}
715: \section{Essential Aspects of PIPE}
716: \label{basics}
717: We now describe the PIPE methodology in more detail and outline choices available for modeling
718: typical situations.
719: While partial evaluation permits formal specification with mathematical notation~\cite{jones-book}, we do not
720: take this approach here. Instead, for the {\it ACM TOIS} audience,
721: we aim to emphasize the larger context in which partial evaluation is used in PIPE
722: and describe its advantages for information systems. We intend to present the formal aspects
723: of the PIPE methodology in a second paper.
724:
725: \subsection{Modeling Methodology}
726: \label{methods}
727:
728: As a modeling methodology, PIPE only makes the weak assumption that information is
729: organized along a motif of interaction sequences. For our purposes, an interaction sequence is
730: a list of primitive inputs used to describe the information-seeking
731: activity. For instance in Fig.~\ref{pe2}, information about vehicles is organized along
732: a color-year-model motif with the primitive inputs corresponding to specific choices
733: of color, year, or model. The interaction sequence in this example involves the
734: the choice of {\tt 2001} for {\tt year}, in support of the user's goals.
735:
736: Information is embodied in an interaction sequence in two forms --- {\it structural}
737: and {\it terminal}. Structural information is what helps us refer to an interaction sequence;
738: it is explicitly represented in PIPE and specified via program variables.
739: In Fig.~\ref{pe2}, the structural information corresponds to choices of color, year, and model.
740: This form of information thus captures the partial information supplied by the user
741: by instantiating parts of the motif. When the user specifies `2001' in Fig.~\ref{pe2}, the
742: {\tt year} part of the motif is turned on and set to this value.
743:
744: Terminal information is also represented in PIPE, but is not directly manipulatable or even
745: directly addressable. Programs in PIPE are not explicitly parameterized by
746: this information and so the user cannot specify personalization in these terms.
747: In Fig.~\ref{pe2}, terminal information corresponds to the leaves, which would be information
748: about particular vehicles. In a different application, terminal information could reside at
749: every step in the interaction sequence.
750:
751: Structural information provides the `backbone' that strings together terminal information.
752: However, it is important to note that
753: structural information is considered first-class information in PIPE and not merely `features' with
754: which we index the `real information' (although it is tempting to view it this way).
755: To see why, observe that partial evaluation does not provide a mapping
756: from structural to terminal information (unless it was a complete evaluation specifying all
757: program variables).
758: After a partial evaluation (e.g., Fig.~\ref{pe2} (right)) the specialized program might still
759: contain structural information. This does not necessarily mean
760: that the user's information-seeking activity is incomplete. The residual structural
761: information contributes to the programmatic modeling of interaction, {\it which is} the
762: personalized information space in PIPE. Another way to see this is to note
763: that PIPE simplifies {\it interaction} with an information space. Thus interaction
764: can be seen to be the determiner of information (both structural and terminal). The view
765: of structural information as first-class information is also natural if we think of the program
766: in logic programming terms, rather than imperative programming.
767:
768: %The issue of whether structural information is first-class is actually very fundamental to
769: %information system design. In Section~\ref{factor} we show the widespread ramifications of this idea.
770:
771: Since information can be organized all along the interaction sequence, in both structural and terminal
772: forms, we need a way to define the state of information described by the sequence as a whole.
773: It is useful to assume a `combining function' for defining the state of
774: information at the end of the sequence. A simple example of a combining function is the additive
775: operator which mirrors the accumulation of information by following an interaction sequence.
776: In Fig.~\ref{pe2}, if the color and model parts of the motif are turned on, then the state
777: of information known about that sequence is a set of values for \{{\tt color,model}\}.
778: Another example is to just retain information from the most recent step(s) in the sequence. This
779: would be appropriate when information-seeking has an exploratory nature to it and we wish to
780: discount some earlier steps in an interaction sequence as being `tentative' (the applications
781: presented in this paper do not have this flavor). Combining functions for terminal information
782: can be defined similarly.
783:
784: Since PIPE only emphasizes the design and implementation of personalization systems, it doesn't
785: pay any attention to how the interaction sequences are obtained and how the choice between
786: terminal and structural parts is made. In particular, PIPE is not a
787: complete lifecycle model for personalization system design and doesn't address issues such
788: as requirements gathering. Interaction sequences could come from explaining users'
789: behavior~\cite{pipe-tochi,footprints}, by identifying all possible paths through a given
790: site, or from our conceptual understanding of the information-seeking activity. They also depend
791: on the targeting goals of the personalization system. In~\cite{pipe-tochi}, we have
792: presented a systematic methodology for obtaining interaction sequences and identifying
793: structural and terminal parts, by `operationalizing' scenarios of interaction; we refer the
794: reader to this reference for details. In this paper, we assume that they are available
795: and proceed to further characterize and represent them.
796:
797: \subsubsection*{Characterizing Interaction Sequences}
798: Information seekers forage in different ways~\cite{pirolli-chapter} and the existing
799: design of the information system also influences their interaction sequences.
800: An important aspect of an interaction sequence is its length, which affects
801: its subsequent representation in PIPE.
802:
803: In many applications, interaction sequences are bounded. For instance,
804: in Fig.~\ref{pe2} an interaction sequence of length at most 3 describes
805: the information-seeking activity. Such sites and applications are characterized by their support
806: for a goal-oriented, opportunistic view of information-seeking. Hierarchies, recommender systems,
807: and scrolling to a specific location on a page are examples.
808: In general, any information-seeking
809: activity that has clear start and end states and which relies on perceptual, display-driven
810: clues that focus attention can be represented as a bounded sequence.
811:
812: In other important cases, interaction sequences can be unbounded. The trivial example is when
813: we allow the possibility that a user
814: may click `back buttons.' If we undo these steps before representation, we can proceed as if
815: they never happened. Alternatively, we can model back buttons using a finite-state machine (FSM),
816: but we have to find a characterization of applications where modeling at this level
817: of detail would be useful. A more interesting example of unbounded sequences involves browsing at a
818: site based on social network navigation, such as
819: {\tt www.\hskip0ex imdb.\hskip0ex com}. There are no leaves in this site and the site graph
820: resembles a social network.
821: Users are encouraged to systematically explore relationships between actors, movies, and directors
822: by `jumping connections.' Such
823: a site is characterized by an exploratory nature of information-seeking, akin to data mining. Goals are
824: articulated less clearly and cognitive knowledge is used from various resources to decide on how to
825: conduct information-seeking. In fact, there is no distinction between structural and terminal
826: information in this site! Any particular web page could be used to address other items or thought
827: of as the result of an information-seeking activity.
828:
829: Both bounded and unbounded interaction sequences can be described using constructs such
830: as regular expressions, grammars, FSMs, and programs; unbounded interaction sequences require
831: special handling, due to the reasons mentioned above. In this paper, we concentrate on personalization
832: applications describable by bounded interaction sequences and which have a clear separation between
833: structural and terminal parts.
834:
835: \subsubsection*{Representing Interaction Sequences in PIPE}
836: Given that we can represent information-seeking activities as interaction sequences,
837: the set of scenarios that are likely to be encountered (over all users, perhaps)
838: can be represented by a corresponding set of interaction sequences. Representing this latter set
839: faithfully and compactly as a program is key to the application of PIPE. Once again, PIPE doesn't
840: indicate what this set should be: whether it is across all users~\cite{footprints}, whether it is for a
841: group of users~\cite{adaptive-sites}, or whether it comes from our conceptual understanding
842: of information-seeking.
843:
844: For instance, Fig.~\ref{pe2} uses a nested representation to form the program for subsequent
845: partial evaluation. Not only does it model the color-year-model motif (as it would have
846: been observed), it also allows us to model the year-color-model motif (by one
847: partial evaluation). Since PIPE provides out-of-turn personalization, it is not necessary to
848: represent every interaction sequence explicitly in the program.
849:
850: Compaction of interaction sequences is important for two reasons. The first is that it
851: preserves the inherent structure of the (unpersonalized) information-seeking activity (such as browsing,
852: in Fig.~\ref{pe2}). This is useful in realizing mixed-initiative interaction with PIPE. Another reason is that
853: compaction permits scalable personalization solutions.
854:
855: Structural parts of interaction sequences can be represented using constructs in
856: a full-fledged programming language, such as C (as done in Fig.~\ref{pe2}) or LISP.
857: A programming language provides many facilities that can help in compaction of interaction sequences. For example,
858: if we notice that all interaction sequences at a site require registration at some point in the interaction,
859: then the steps associated with registration could be factored out and procedurally invoked from various
860: other locations.
861: Off-the-shelf
862: partial evaluators (such as C-Mix) can then be used for specializing the representations.
863:
864: It is important that we also model terminal parts of interaction sequences. In the example
865: of Fig.~\ref{pe2}, if there is text anchoring every hyperlink, then we can define
866: a program variable to start accumulating text once every conditional is evaluated to be true. This
867: could be achieved using associate arrays or by dynamic memory allocation constructs (e.g., pointers).
868: After partial evaluation, we can inspect the contents of this data structure at every stage
869: to present personalized (terminal) content. Inspecting the contents of the sequence as a whole will
870: provide an overall summary of the terminal information. Inspecting the contents of subsequences
871: will provide
872: more fine-grain summaries of terminal information.
873:
874: \begin{figure}
875: \centering
876: \begin{tabular}{|ll|}
877: \hline
878: \includegraphics[width=2.5cm,height=0.45cm]{display3.epsi} &
879: \begin{tabular}{l}
880: {\tt if (Drink=Coffee)}\\
881: \,\,\,\,\,\,\,\,\,{\tt /* */}
882: \end{tabular}\\
883: & \\
884: \includegraphics[width=1.75cm,height=1.25cm]{display2.epsi} &
885: \begin{tabular}{l}
886: {\tt switch (topping) \{}\\
887: \,\,\,\,\,{\tt case onions: /* */}\\
888: \,\,\,\,\,{\tt case mushrooms: /* */}\\
889: \,\,\,\,\,{\tt case olives: /* */}\\
890: {\tt \}}
891: \end{tabular} \\
892: & \\
893: \includegraphics[width=3.1cm,height=1cm]{display4.epsi} &
894: \begin{tabular}{l}
895: {\tt f2c(x) \{}\\
896: \,\,\,\,\,{\tt return (5*(x-32)/9);}\\
897: {\tt \}}
898: \end{tabular} \\ \hline
899: \end{tabular}
900:
901: \caption{Choices for representing aspects of interaction in PIPE.}
902: \label{cute-diagram}
903: \end{figure}
904:
905: \subsubsection*{Creating a Personalization System}
906: To effect the creation of a personalization system, we define ways for the user to
907: specify values for program variables and a procedure by which personalized information
908: content is presented back to the user. Every construct used in the programmatic modeling (terminal or
909: structural) should be translatable into information systems terms, and vice versa.
910:
911: Typically, there is a one-one mapping between interactions and programming
912: constructs. In Fig.~\ref{cute-diagram},
913: the textbox corresponds to a conditional, the listbox to a {\tt switch} construct, and
914: the unit convertor to a function in a PIPE modeling.
915:
916: Such mappings have to be revisited after partial evaluation.
917: For instance, the {\tt if} construct in Fig.~\ref{cute-diagram}
918: will either be removed or left as-is by a partial evaluation. This will just correspond
919: to removing or retaining the textbox in the personalized web site. The {\tt switch} construct
920: in Fig.~\ref{cute-diagram} corresponding to a listbox is more interesting. After partial evaluation,
921: it might be the case that only one of the three topping options are left. Perhaps the person is allergic
922: to mushrooms and olives and we set those variables to zero. In this case, the
923: partial evaluator might remove the {\tt switch} altogether and replace it with a simple
924: {\tt if}. We can view this as a hint to render the listbox as a hyperlink in the personalized site.
925: Finally, the unit conversion utility in Fig.~\ref{cute-diagram} can be modeled in several ways.
926: We can view it as a functional black-box and model in PIPE the act of getting a value and
927: passing it to, say, a server-side script that performs the conversion. If we take this approach, we
928: should ensure that partial evaluation either retains the black-box representation or removes it;
929: it shouldn't `open' it up. Alternatively, we can explicitly
930: open up this black-box and model its contents as a function in a PIPE modeling (as done in Fig.~\ref{cute-diagram}).
931: As a functional modeling, PIPE thus enables the view of information systems as transducers.
932:
933: In some cases partial evaluators, by their sophisticated support for program specialization,
934: cause difficulties. For instance,
935: the technique of {\it program-point specialization}~\cite{jones} introduces copies
936: of functions at various places in the specialized program, tailored to specific situations. In
937: information systems terms, this amounts to creating content (structural as
938: well as terminal) that didn't exist before. In such a
939: case, we need to carefully interpret the meaning of the specialized representation.
940:
941: Another caveat is that partial evaluation can sometimes induce {\tt goto}s in the specialized
942: program. We can view {\tt goto}s as
943: suggesting means by which the site design could be structured. If there is a {\tt goto}
944: from a point $A$ in the program to another point $B$, it just means that the information system
945: corresponding to point $B$ can be arrived at in many ways via interaction sequences and hence is
946: advantageous if factored out.
947:
948: Finally, a semantics of values for program variables has to be defined.
949: In partial evaluation, values may be either specified or left unspecified.
950: By default, variable values cannot be weighted unless explicitly
951: modeled in the PIPE program. However, techniques
952: such as query expansion can be employed to obtain values for other program variables. For instance, if
953: a user says `Honda' and a PIPE program models Honda cars under `Japanese automakers,'
954: then we can turn both these variables on
955: for the purposes of personalization. Semantics for program variables can also
956: be defined to take advantage of other taxonomical relationships in hierarchies~\cite{taxo-assign}.
957:
958: \subsubsection*{A Salient Feature of PIPE}
959: An important advantage of PIPE is that while we provide options for modeling, there is
960: is no explicit step for describing how to implement personalization. Due
961: to the sophistication of our representation, personalization will be
962: achieved if program variables (which correspond
963: to structural information) are available for partial evaluation.
964: This is in contrast to other modeling methodologies~\cite{statecharts,autoweb,rus} where personalization has to be provided
965: as an explicit function from the conceptual design stage.
966:
967: % 1. Modeling site structure: dfs, site graph, from navigational schema (see program compaction below)
968: % 2. information integration: composing subsystems, e.g., multiple sites.
969: %To do this effectively, we need wrappers (for handling syn. poly. problems)
970: % 3. crawling clickable maps:
971: % 4. interacting with recommender systems: rec. sys. could be a f() within the model. Or, the aspect of recommendation could
972: %be "external" to PIPE, in that collaborative features correspond to program variables. "x" means.
973: % 5. browsing computed information: hooks,
974: % 6. modeling within a web page: virtual nodes, DTDs, naveen ashish, XTRACT,
975: % 7. program compaction: put 4 pictures
976:
977: %senators: 1,6
978: %gams: 1,2,4,6,7
979: %pigments: 1,2,4,5,6
980:
981: % generators of hierarchies (the one you found with Sammy)
982: \subsection{Representational Choices}
983: \label{options}
984:
985: Our primary example of modeling thus far addressed
986: navigation down a hierarchy via nested conditionals (see Fig.~\ref{pe2}).
987: This is one of the most common sources of bounded sequences; it can be obtained either
988: by explicit crawling or as graph representations of site structure from website
989: management tools~\cite{webSiteManagement,webstrudel}. In the former,
990: extra care should be used to address purely navigational links (like a `Go Back'
991: button) and irregularities in web page authoring. Representations obtained from the latter
992: case are more robust since they directly enable the modeling of interaction sequences in terms of
993: directed labeled graphs~\cite{dataontheweb} or web schema~\cite{autoweb}.
994:
995: In this section, we present a number of other modeling options for personalization
996: applications described by bounded interaction sequences.
997:
998: \subsubsection*{Interacting with Recommender Systems}
999: A recommender system can be viewed in PIPE as a way to set values for program variables or as a
1000: function to be modeled. In the first case, the recommender is abstracted as a black-box and is external
1001: to the program. Consider a recommender system at a third-party site that suggests
1002: automobile dealers based on
1003: experiences of its users. In such a case, we can invoke the facility to obtain values for program variables
1004: which are then subsequently used for personalization. Alternatively, the functioning of the recommender
1005: can be explicitly modeled in PIPE. This allows the possibility that even its operation could
1006: be personalized. For instance, if the recommender system can suggest dealers all across the United States,
1007: we can personalize its operation to only recommend dealers in a particular geographical region. This will
1008: not be possible in the black-box modeling unless the recommender allows such explicit specification.
1009:
1010: \subsubsection*{Information Integration}
1011: Effective personalization scenarios require the integration of information from multiple sites.
1012: Consider personalizing stock quotes for potential investors. The Yahoo!
1013: Finance Cross-Index at {\tt quote.yahoo.com} provides a ticker symbol lookup for stock charts,
1014: financial statistics, and links to company profiles. It is easy to model and personalize
1015: this site by the methods described above.
1016: However, what if the user desires to browse this site based on recommendations from an
1017: online brokerage? Besides support for cascading information flows, care should be taken
1018: to ensure that structural information across multiple sites is correctly
1019: cross-referenced. The online brokerage might refer to its
1020: recommendations by company name (e.g., `Microsoft'), while the Yahoo! cross-index
1021: uses the ticker symbol (`MSFT'). Standard solutions based on wrappers~\cite{kush-ai} and mediators can
1022: be employed here~\cite{db-www,Ariadne}.
1023: In PIPE, the individual interaction sequences from multiple sites can be cascaded in
1024: sequence to provide support for such integration scenarios, as shown in Fig.~\ref{ii}.
1025:
1026: \begin{figure}
1027: \centering
1028: \begin{tabular}{|l|} \hline
1029: {\tt main() \{} \\
1030: \,\,\,\,{\tt /* invoke online brokerage */} \\
1031: \,\,\,\,{\tt /* transforms from company name to ticker symbol */} \\
1032: \,\,\,\,{\tt /* modeling of yahoo cross-index */} \\
1033: {\tt \}} \\
1034: \hline
1035: \end{tabular}
1036: \caption{Modeling information integration in PIPE.}
1037: \label{ii}
1038: \end{figure}
1039:
1040: \subsubsection*{Modeling Clickable Maps}
1041: Many web sites provide clickable image maps (e.g., JAVA/GIF) as
1042: interfaces to information. This is especially true
1043: for weather sites, bioinformatics resources, and sites that involve modeling spatial information. Interpretation
1044: is attached to clicking on particular locations of the map (for instance, `click on the state for which
1045: you would like the weather').
1046: Using data mining techniques~\cite{fukuda} and
1047: by sampling clicks on the map (and determining which pages they lead to), we can functionally model
1048: a clickable map in PIPE to arrive
1049: at constructs such as: `Choosing Wyoming on the United States map corresponds to clicking
1050: within $[a,b] \times [c,d]$.' Non-rectangular areas are described by unions of isothetic regions by
1051: the data-mining technique described in~\cite{fukuda}.
1052: Given such a representation, partial evaluation can remove portions of
1053: the image map based on user preferences. At this stage, we can reconstruct a personalized clickable
1054: map by reversing the mapping or use attributes such as color and shade to highlight the selected
1055: regions (for instance, to show only those regions on the map where air travel is delayed). We can also represent
1056: the personalized information in non-graphical terms. This option is useful not just for personalization
1057: but for improving the accessibility of information systems. A mobile handheld device incapable
1058: of presenting graphical content can take advantage of such modeling.
1059:
1060:
1061: %\begin{figure}
1062: %\centering
1063: %\begin{tabular}{cc}
1064: %\includegraphics[width=4in,angle=-90]{SQL}
1065: %\end{tabular}
1066: %\caption{Browsing computed information using PIPE. $\cup$ denotes the union operation which serves as
1067: %the combining function for an interaction sequence. After personalizing with `female voters for
1068: %Gore,' the user has drilled down to states California (CA) and New York (NY) to inspect the values
1069: %more closely.}
1070: %\label{browse-compute}
1071: %\end{figure}
1072: %
1073: %\subsubsection*{Browsing Computed Information}
1074: %Much of the content on the web is dynamically generated via interfaces to databases~\cite{lawrence}.
1075: %In such a case, terminal information in interaction sequences need not be web pages, but rather
1076: %SQL queries. In Fig.~\ref{browse-compute}, the designer has modeled an artificial hierarchy as an interface to a
1077: %traditional database system. The hierarchy is based on the attributes of the relations in the database
1078: %schema (state, candidate, and sex). Represented in PIPE, such a model resembles Fig.~\ref{pe2}, except that the leaves
1079: %denote hooks to database queries instead of web pages. The internal nodes model
1080: %information obtained by set-theoretic operations conducted on the results of the database
1081: %queries (which are computed, but terminal information nevertheless). In other words, the
1082: %%internal nodes are the combining functions (see Section~\ref{methods})
1083: %for summarizing terminal information for an entire interaction sequence.
1084: %Fig.~\ref{browse-compute} describes personalization for the criteria
1085: %`female voters for Gore' where the combining function is the set-theoretic union
1086: %operation. After partial evaluation, the specialized program has one level of hierarchy
1087: %left for browsing (by state). In addition to
1088: %obtaining the aggregate number of female votes for Gore, we are able to drill down
1089: %this number (by browsing) to obtain votes per state. This is similar to a GROUP BY operation, but doesn't
1090: %require the user to know a query language such as SQL. PIPE hence provides a novel way to
1091: %combine querying of databases and the Web, one that is complementary to other
1092: %projects~\cite{wsq-dsq}. It is also ideal for interactive information
1093: %exploration applications~\cite{jmh-control}.
1094: %
1095: \subsubsection*{Modeling within a Page}
1096: In some cases, it is necessary to model interaction sequences within a web page.
1097: For instance, if a user is eyeballing a web page to look for telephone numbers of
1098: an individual, then modeling the web page at this level of granularity and providing a program variable
1099: for telephone number would be useful. Algorithms for mining structure within a
1100: web page (e.g., DTDs) ~\cite{naveen,craven-aij,xtract}
1101: and for document segmentation~\cite{rus} can be used to arrive at compact
1102: representations of within-page interaction sequences. This
1103: provides a richer set of features with which to conduct personalization. For instance,
1104: partial evaluation can be used to remove complete sections of documents (e.g., intrusive
1105: advertisement banners) when rendering the personalization.
1106:
1107: \begin{figure}
1108: \centering
1109: \begin{tabular}{cc}
1110: \includegraphics[width=2.3in]{ex1}
1111: \hspace{0.2in}
1112: \includegraphics[width=2.3in]{ex2}
1113: \end{tabular}
1114: \begin{tabular}{cc}
1115: \includegraphics[width=2.3in]{ex3}
1116: \hspace{0.2in}
1117: \includegraphics[width=2.3in]{ex4}
1118: \end{tabular}
1119: \caption{Four stages in extracting structure from a semistructured data source, by
1120: the algorithm of~\cite{nestorov}. (top left) Original semistructured resource with
1121: labeled and directed edges modeling interaction sequences. (top right)
1122: Factorization of commonalities encountered in crawling. (bottom left)
1123: A `minimal perfect typing' of the data. (bottom right)
1124: Final output of data mining algorithm, after modeling `multiple roles' ~\cite{nestorov}.}
1125: \label{nestorov}
1126: \end{figure}
1127:
1128: \subsubsection*{Program Compaction}
1129: The naive rendition of a PIPE model by the above mechanisms might result in
1130: lengthy programs, with duplications of interaction sequences. Techniques for program
1131: compaction are hence important. This topic has been studied extensively in the data mining
1132: and semistructured modeling communities~\cite{dataontheweb,nestorov,tkde-structure}.
1133: Of particular relevance to PIPE is the algorithm
1134: of Nestorov et al.~\cite{nestorov} whose modeling of semistructure closely resembles
1135: our representation of an interaction sequence in terms of program variables. This algorithm
1136: works by identifying graph constructs that could be factored, simplified, or approximated.
1137: Fig.~\ref{nestorov} describes four stages in a procedure for program compaction. The starting
1138: point is the schema in Fig.~\ref{nestorov} (top left) obtained by a naive crawl of a site.
1139: Fig.~\ref{nestorov} (top right) factors commonalities encountered in crawling.
1140: There are only three leaf nodes and the internal nodes {\tt P3} and {\tt P4} are collapsed because
1141: they are really the same page.
1142: Fig.~\ref{nestorov} (bottom left) is a `minimal perfect typing~\cite{nestorov}' of the data, which means that
1143: the fewest internal nodes needed to describe the schema are used. In this example, {\tt P1} and
1144: {\tt P2} are collapsed, not because they are the same but because they exhibit the same schema.
1145: Both have an incoming edge labeled {\tt e} from the same type of page ({\tt S2}) and display an
1146: outgoing edge labeled {\tt i} to the same type of page ({\tt M1}). While their contents
1147: may not be the same, interaction sequences involving them can be compacted.
1148: Care must be taken to ensure that any accompanying text with these nodes are not lost.
1149: And finally, Fig.~\ref{nestorov} (bottom right) casts {\tt P6} as redundant for the purpose
1150: of modeling
1151: interaction sequences. The role of {\tt P6} in Fig.~\ref{nestorov} (bottom right) is to establish
1152: connections from {\tt S5} to {\tt M2} and {\tt M3}, which are already embodied in
1153: {\tt P5} and {\tt P7} respectively. Thus, we can remove {\tt P6}, once again after ensuring that
1154: any contents of that node are suitably represented elsewhere. In~\cite{nestorov}, {\tt P6} is referred
1155: to as a node that exhibits `multiple roles.'
1156: %and that the termination of the partial evaluator is not compromised by such approximations~\cite{jones-book}.
1157:
1158: \subsubsection*{Miscellaneous Optimizations}
1159: Finally, the success of a personalization system relies on those finer touches that deliver
1160: a compelling experience to the user. Options in this category are ad-hoc by nature and are not technically
1161: modeling choices since they involve post-processing of the specialized
1162: program. For instance, assume
1163: that we personalize the automobile example in Fig.~\ref{pe2} with respect to the variables {\tt Honda} and {\tt 2001}.
1164: This might produce a construct such as:
1165: \begin{verbatim}
1166: if (Green) {
1167: /* two empty code blocks */
1168:
1169: /* the first is empty because Honda and 2001 evaluated to true,
1170: but there were no green Honda cars made in 2001 */
1171:
1172: /* the second is empty because other models and other years were set
1173: to be evaluated to false */
1174: }
1175: \end{verbatim}
1176: %This is due to the fact that in the interaction sequence {\tt make} and {\tt year} are modeled beneath
1177: %{\tt color}.
1178: While semantically correct, such code blocks are useless for information presentation.
1179: They can be perceived as dead-ends and safely omitted during web page reconstruction. It would also be
1180: confusing to the user who clicks on `Green' and receives nothing (or an empty page) in return!
1181:
1182: A second form of
1183: optimization arises when partial evaluation results in a nested conditional with no {\tt else}
1184: clauses:
1185: \begin{verbatim}
1186: if (Blue) {
1187: if (2001) {
1188: if (Honda) {
1189: /* something here */
1190: }
1191: }
1192: }
1193: /* nothing here */
1194: \end{verbatim}
1195: In such a case, we need to pay attention to how the simplified program is presented back to the user.
1196: Forcing the user to continue clicking on items when there is only one choice at every level is undesirable.
1197: Rather, we could just reveal to the user that according to his personalization criteria,
1198: the only type of cars remaining are `Blue Honda 2001' and directly link to the items of information.
1199: This example reinforces our idea that structural information is first-class information.
1200: We are working on a customized partial evaluator that can perform such optimizations.
1201:
1202: % 3. crawling clickable maps:
1203: % 4. interacting with recommender systems: rec. sys. could be a f() within the model. Or, the aspect of recommendation could
1204: %be "external" to PIPE, in that collaborative features correspond to program variables. "x" means.
1205: % 5. browsing computed information: hooks,
1206: % 6. modeling within a web page: virtual nodes, DTDs, naveen ashish, XTRACT,
1207: % 7. program compaction: put 4 pictures
1208:
1209: % generators of hierarchies (the one you found with Sammy)
1210:
1211: \section{Application Case Studies}
1212: \label{studies}
1213:
1214: We now describe two applications that use PIPE to personalize collections
1215: of web sites. They are presented in increasing order of complexity, as evidenced
1216: by the forms of modeling they conduct (Table~\ref{whatismodeled}). In each
1217: of these applications, we state the conceptual model of interaction sequences and
1218: the specific choices made in modeling. Evaluation methodologies
1219: are outlined after the descriptions. Since PIPE only specializes representations,
1220: we are able to personalize even third-party sites by forming suitable representations.
1221: More personalization systems designed with PIPE are described in~\cite{naren-ic,pipe-tochi}; we present only
1222: two here for space considerations.
1223:
1224: \begin{table}
1225: \centering
1226: \begin{tabular}{|lcl|} \hline
1227: Congressional Officials & & Modeling Site Structure \\
1228: & & Modeling within a Page\\
1229: & & \\
1230: % Pigment Composition and Analysis & & Modeling Site Structure\\
1231: % & & Information Integration \\
1232: % & & Browsing Computed Information\\
1233: % & & \\
1234: Mathematical and Scientific Software & & Modeling Site Structure\\
1235: & & Interacting with Recommender Systems \\
1236: & & Information Integration \\
1237: & & Modeling within a Page\\
1238: & & Program Compaction\\
1239: \hline
1240: \end{tabular}
1241: \caption{Modeling options used in the application case studies.}
1242: \label{whatismodeled}
1243: \end{table}
1244:
1245: \subsection{Congressional Officials}
1246: \label{politics}
1247: Our first application customizes access to the
1248: Project Vote Smart website (\url{http://www.vote-smart.org}), an independent resource
1249: for information about United States governmental officials.
1250: The site caters to people interested in
1251: politicians' backgrounds, committee memberships, and positions
1252: on major political issues.
1253: While Project Vote Smart reports on state and local governments as
1254: well as the federal government, we focused only on the
1255: congressional subsection of the site in our experiments.
1256:
1257: The conceptual model of information-seeking involves browsing through the congressional
1258: subsection to retrieve individual web pages of politicians. Interaction sequences at this
1259: site consist of choices of state (e.g., California, Virginia, etc.), branch of
1260: congress (House or Senate), party (Democrat, Republican, or Independent), and
1261: district information (numbers of districts). The terminal information involved 540
1262: home pages (for 100 Senate members and 440 House members) and resides at the
1263: ends of interaction sequences.
1264:
1265: Fig.~\ref{politicians1} describes a typical interaction sequence.
1266: At the root congressional page (Fig.~\ref{politicians1} (top)),
1267: users are directed to select a state of interest.
1268: Selection of state transfers the
1269: user to that particular state's web page
1270: (Fig.~\ref{politicians1} (bottom left)).
1271: A state web page is semistructured, listing
1272: both senators and representatives as well as their party, district
1273: affiliations, and other associated information. Finally, a user
1274: arrives at a politician's webpage (Fig.~\ref{politicians1} (bottom right))
1275: by making a selection at the state page. Thus, the congressional
1276: section of Project Vote Smart is three levels deep (with a two-step interaction
1277: sequence).
1278:
1279: \begin{figure}
1280: \centering
1281: \begin{tabular}{c}
1282: \includegraphics[width=12.4cm,height=12.6cm]{politiciansRoot}
1283: \end{tabular}
1284: \begin{tabular}{cc}
1285: \includegraphics[width=6.2cm,height=6.84cm]{VA}
1286: \includegraphics[width=6.2cm,height=6.84cm]{Allen}
1287: \end{tabular}
1288: \caption{A typical interaction sequence at the Project Vote Smart web site. (top) Start page
1289: for congressional officials. Making a selection of state at this level reaches a state-level
1290: page (bottom left). Finally, individual politicians' web pages are accessed by making selections
1291: at the state-level page (bottom right).}
1292: \label{politicians1}
1293: \end{figure}
1294:
1295: Since many of the choices made
1296: by the user in browsing through Project Vote Smart are independent
1297: of each other (e.g., selecting Virginia as state does not imply a particular political party),
1298: the site is highly amenable to personalization by partial evaluation.
1299: Currently the site hardwires interaction sequences in the order shown in Fig.~\ref{politicians1}.
1300: We modeled the two-step interaction sequence (as shown in Fig.~\ref{politicians1}) as
1301: actually a four-step interaction sequence by conducting a more detailed modeling of the
1302: state-level page. In particular, the semistructure on state-level pages was abstracted
1303: to yield independently addressable information about branch of congress, party, and district.
1304:
1305: The site graph is not a balanced tree. For instance, every state has exactly two senators
1306: but the number of representatives varies from 1 in South Dakota to 52 in California (this
1307: is dependent on state population). Our modeling of data at state pages
1308: expanded the original 3-level tree shown in Fig.~\ref{politicians1}
1309: consisting of 596 nodes (1 root page + 55 state pages + the
1310: previously mentioned 540 leaves of the tree) to 5 levels comprising
1311: 857 nodes (317 internal nodes + 540 leaf nodes). This amounts to
1312: a approximately 44\% percent explosion in the site schema.
1313:
1314: %3111 lines of C code and 129 program variables that are addressable.
1315: The programmatic representation of the new site schema was in C and it
1316: captured miscellaneous domain semantics about interaction at the site (e.g.,
1317: if the user says `District 21,' he is referring to a Representative, not
1318: a Senator). The partial evaluator C-Mix was used for this study.
1319:
1320: \subsection{Mathematical and Scientific Software}
1321: Our second application is a personalization system for recommending mathematical
1322: software on the web for scientists and engineers. Consider a scientist studying stress
1323: in a helical spring; he formulates the problem mathematically in terms of
1324: a partial differential equation (PDE) and proceeds to find software that can help
1325: in solving his PDE. He uses a collection of three web sites to conduct his information-seeking
1326: activity.
1327:
1328: First, he accesses the GAMS (Guide to Available Mathematical Software)
1329: cross-index of mathematical software ({\tt http://\hskip0ex gams.\hskip0ex nist.\hskip0ex gov}), a
1330: tree-structured taxonomy that covers nearly 10,000 algorithms (from
1331: over 100 software packages) for most areas of scientific software.
1332: GAMS functions in an interactive fashion, guiding the user
1333: from the top of a classification tree to specific modules as the
1334: user describes his problem in increasing detail. During this process,
1335: many important features of the software (e.g., `are you looking for a software to solve elliptic
1336: problems?') are determined, from the user.
1337: However at the ends of the interaction sequences at GAMS, there still exist several choices
1338: of algorithms for a specific problem.
1339: Now, the scientist consults a recommender system or a performance database server
1340: (for his category of scientific software) to pick an appropriate algorithm for his problem.
1341: An example is the PYTHIA recommender system for selecting solvers for PDEs~\cite{pythiaii}.
1342: At this point, the scientist supplies additional information to the recommender such as
1343: his performance constraints (on the time to solve his PDE).
1344: Systems like PYTHIA use previously archived performance data to arrive at recommendations such as
1345: `Use the second-order 9-point finite differences code from the ELLPACK module.'
1346: After such a recommendation, the scientist conducts the final step of downloading the
1347: recommended software module from repositories such as Netlib ({\tt http://\hskip0ex www.\hskip0ex netlib.\hskip0ex org})
1348: housed at the Oak Ridge National Laboratory (ORNL) or other packages at the National Institute
1349: of Standards and Technology (NIST). The conceptual model involved the information flow from the GAMS
1350: site, to a repository such as Netlib, through a recommender such as PYTHIA.
1351:
1352: The choices made in GAMS will affect the choice of recommender which in turn affect the choice
1353: of repository. This application thus presents an interesting information flow for modeling.
1354: Since PIPE permits partial instantiation of the information flow, the scientist can
1355: directly access a repository such as Netlib if he is sure of the specific software he needs.
1356:
1357: We modeled the entire GAMS web site, used the PYTHIA recommender (that addresses
1358: software for the domain of PDEs), and established connections with individual software modules at the
1359: various repositories. After an initial expansion of GAMS (e.g., by within-page modeling),
1360: we applied the program compaction algorithm described in Section~\ref{options}.
1361: Cross-references in GAMS and duplication of common module sets (which are now revealed
1362: by our initial expansion) helped compress the site schema to 60\% of its original size.
1363: In particular, the GAMS subtree relevant to describing PDEs provided for a 11\% compression.
1364: There was no terminal information alongside intermediate nodes, and hence there was no
1365: need for any special handling. PYTHIA's details are described in~\cite{pythiaii} and we conducted
1366: a white-box modeling in PIPE to better associate program variables from GAMS with variables in
1367: PYTHIA (one of the authors of this paper was also the co-designer of the PYTHIA recommender).
1368: Finally, the step to reach individual software modules was a simple one-step interaction sequence
1369: leading to terminal information about the code (in FORTRAN) and its documentation.
1370: The entire composite program was represented in the CLIPS programming language~\cite{clips} and
1371: we employed its rule-based interface for partial evaluation. More modeling details on this case study
1372: can be found in~\cite{pipe-gams}.
1373:
1374: \subsection{Evaluation}
1375: \label{eval}
1376: We now describe procedures for evaluation. There are three possible types of evaluation:
1377:
1378: \begin{enumerate}
1379: \item Evaluating PIPE applications
1380: \item Evaluating our modeling of information-seeking activities in PIPE
1381: \item Evaluating PIPE
1382: \end{enumerate}
1383:
1384: The first type of evaluation is what is usually described in the literature
1385: and there are many ways of conducting it.
1386: The accepted practice is to measure improvements in revenues,
1387: site visits, and user satisfaction (e.g., via surveys). In~\cite{naren-ic},
1388: we have described the evaluation of PIPE applications using traditional user
1389: interviews followed by statistical validation (they have yielded good results). Commercial
1390: ventures such as {\it NetPerceptions} emphasize the scalability and speed-of-response of
1391: personalization systems. The second and third types of evaluation criteria highlight the role
1392: of PIPE as a modeling methodology.
1393: We concentrate on them since we have already described traditional user-response
1394: evaluation of PIPE applications in~\cite{naren-ic}.
1395: This section covers the evaluation
1396: of modeling and Section~\ref{factor} helps identify shortcomings of the PIPE methodology
1397: itself.
1398:
1399: We evaluate a PIPE modeling by the extent to which
1400: it allows users' information-seeking activities to be described as partial inputs.
1401: This is in keeping with the view that PIPE's services are only as good as the
1402: modeling conducted in it. If a faulty recommender system is modeled in PIPE, then no
1403: amount of partial evaluation can provide satisfactory results.
1404:
1405: Recall that our modeling was conducted with respect to a set of interaction sequences.
1406: For evaluation purposes, we identified an independent `external examiner'
1407: model, which was also a set of interaction sequences.
1408: We then evaluated our PIPE modeling
1409: by the fraction of interaction sequences in the external examiner model that can
1410: be realized by an appropriate partial evaluation operation. We discounted optimizations such
1411: as described in Section~\ref{options} when determining the `unrealizable'
1412: interaction sequences.
1413:
1414: In the first study, the
1415: examiner model was obtained from users. They were provided
1416: knowledge of the functional specification of our original conceptual
1417: modeling, not its details. For instance, they
1418: were told about the nature of structural and terminal information (and
1419: any functional dependencies among them), but not the exact interaction sequences that
1420: constitute the conceptual model. Formal methodologies for this activity are described
1421: in~\cite{gannon-verify}.
1422:
1423: We identified 25 user subjects who were predominantly graduate students from Virginia
1424: Tech (but not necessarily computer science majors). The ages of the subjects ranged from
1425: 19 to 49, with the average age being 26. A majority of the subjects rated their
1426: computer and web familiarity as above average. All subjects acquainted themselves with the
1427: Project Vote Smart site by browsing for about ten minutes. Each subject was then asked to
1428: describe 1-2 personalization scenarios. Notice that these are different from `queries,'
1429: as they specified constraints on interaction e.g., `I would like to browse by state,
1430: and then I will make a choice of party, and then I would click any remaining hyperlinks
1431: to browse the site.'
1432:
1433: In total, 32 interaction sequences were identified, of which 25 were
1434: realizable in our modeling. One of the unmodelable
1435: scenarios was `I would like to see all politicians who represent Los Angeles,' a request that
1436: was not faithful to our conceptual model. We do not discuss this further. The other six
1437: unmodelable scenarios are not shortcomings of our modeling, but rather shortcomings of the
1438: PIPE personalization methodology itself. They involved restructuring operations on interaction
1439: sequences that are not describable as partial evaluations. Section~\ref{factor} analyzes
1440: these in detail.
1441:
1442: \begin {figure}
1443: \centering
1444: \begin {tabular}{|lp{3.5in}|}\hline
1445: {\sc Problem \#28} & ${\left(w\,u_x\right)}_x + {\left(w\,u_y\right)}_y = 1,$\\
1446: & where $w = \left\{ \begin {array}{l}
1447: \alpha,\,\,\, {\rm if}\,\,\, 0 \leq x,y \leq 1\\
1448: 1,\,\,\, {\rm otherwise.}
1449: \end {array}
1450: \right.$\\
1451: {\sc Domain} & $\left[-1,1\right] \times \left[-1,1\right]$\\
1452: {\sc BC} & $u = 0$\\
1453: {\sc True} & unknown\\
1454: {\sc Operator} & Self--adjoint, discontinuous coefficients\\
1455: {\sc Right Side} & Constant\\
1456: {\sc Boundary Conditions} & Dirichlet, homogeneous\\
1457: {\sc Error Constraint} & 1.0E-05 \\
1458: {\sc Time Constraint} & 60s
1459: \\\hline
1460: \end {tabular}
1461: \caption {A problem from the examiner model for the second case study.}
1462: \label{pde-scenario}
1463: \end {figure}
1464:
1465: For the second study, the examiner model was derived from a benchmark set of problems that are used
1466: in mathematical software evaluation (the set is described in~\cite{pythiaii}). Each of these problems
1467: describes scenarios in terms of features of the PDE problem (e.g., is it Laplace?, is it Helmholtz?)
1468: any constraints on its solution (e.g., relative error should be $< 10^{-9}$), and any restrictions
1469: on software modules (e.g., `I would like to use the package NAG' or `ELLPACK modules are prefered.').
1470: Fig.~\ref{pde-scenario} describes an example scenario that places constraints on the type of software
1471: to be used (for instance, it should be applicable to `Dirichlet' problems) and the basis for
1472: recommendation (namely, that it should satisfy the time and error constraints specified). This scenario does
1473: not give any preferences for software modules or packages. Such mathematical descriptions are translated
1474: into parameters for personalization (a process is described in~\cite{pythiaii}).
1475: The examiner model comprised of 35 such interaction sequencs, of which all are modelable. More details on this
1476: case study can be obtained from~\cite{pipe-gams}.
1477: \section{Discussion}
1478: \label{discuss}
1479: \subsection{Related Research}
1480:
1481: As a systematic methodology for personalization, PIPE is a unique research project.
1482: Most research on personalization emphasizes the nature of information being
1483: modeled~\cite{specissue,phoaks} (content-based~\cite{fab} versus
1484: collaborative~\cite{adomavicius01expert-driven,
1485: ira,grouplens,Siteseer}), the level at
1486: which the
1487: personalized information is targeted (is it by user~\cite{manber}, by topic~\cite{adaptive-sites}
1488: or for everybody~\cite{holland-hill,footprints}), or the specific algorithms that are involved in
1489: making recommendations.
1490: %Evaluation metrics have also been influenced by these viewpoints.
1491:
1492: In contrast, PIPE models interaction with an information system as the basis for personalization.
1493: Most of recommender systems research can be viewed as modeling options for PIPE. The systems that
1494: make distinctions among targeting constitute making different assumptions on the possible set of
1495: interaction sequences. They can hence be tied to requirements analysis, as described
1496: in~\cite{pipe-tochi}. Systems that conduct web usage mining~\cite{cacm-jaideep,cacm-mulvenna} also
1497: address the earlier parts (and sometimes, later parts~\cite{cacm-myra})
1498: of the personalization system design lifecycle, and can be viewed as methodologies
1499: to suggest and refine interaction sequences.
1500:
1501: Other connections to information systems research can be made by observing that PIPE
1502: contributes both a way to model information-seeking activities as well as a closed transformation
1503: operator for personalization i.e.,
1504: partial evaluation. RABBIT~\cite{rabbit} is an early interactive
1505: information retrieval methodology that resembles PIPE in this respect. It proposes
1506: the model of `retrieval by reformulation' to address the mismatch between
1507: how an information space is organized and how a particular user forages in it. Several closed transformation
1508: operators are provided in RABBIT to enable the user to specify and realize information-seeking goals.
1509: Like RABBIT, PIPE
1510: assumes that `the user knows more about the generic structure of the [information space] than [PIPE]
1511: does, although [PIPE] knows more about the particulars ([terminal information])~\cite{rabbit}.' For
1512: instance, personalization by partial evaluation is only as effective as the ease with
1513: which program variables could be set (on or off) based on information supplied by the user.
1514: Unlike RABBIT, PIPE
1515: emphasizes the modeling of an information
1516: space as well as an information-seeking activity in a unified programmatic representation.
1517: Its single transformation operator is expressive enough to simplify a variety of
1518: interaction sequences.
1519:
1520: %In addition, PIPE achieves mixed-initiative interaction without providing it as a specific
1521: %function or requiring any explicit effort on the part of the user.
1522:
1523: %The Scatter-Gather~\cite{scatter-gather} and Dynamic Taxonomies~\cite{tkde-navigation} projects
1524: %resemble PIPE in that they contribute closed transformation operators for retrieval and navigation,
1525: %respectively. However, they do not emphasize representation as much as PIPE (or RABBIT)
1526: %and adopt traditional modeling methodologies. Scatter-Gather introduces two operations
1527: %(Scatter and Gather) that are used for clustering and declustering documents.
1528: %The Dynamic Taxonomies project assumes a conceptual model of `selective thinning of an infobase,'
1529: %and provides several set-oriented operations for navigation.
1530:
1531: The closed nature of transformation operators is central to interactive modes of information seeking,
1532: as shown in projects such as Scatter-Gather~\cite{scatter-gather} and Dynamic Taxonomies~\cite{tkde-navigation}.
1533: PIPE is novel in that it contributes a transformation operator for {\it representations of interactions} in
1534: information spaces, and does not transform documents or web pages directly.
1535:
1536: The `larger' approach to personalization taken in this paper is reminiscent of the integration
1537: of task models in software design~\cite{intTaskObj}. Typically such integration has utilized
1538: object oriented methodologies and symbolic modeling approaches e.g., UML. This idea has been used for designing
1539: personalization systems as well~\cite{li-catalog,schwabe2,human1,schwabe1}. However, in all of these projects,
1540: personalization is introduced a function from the conceptual design stage. PIPE's
1541: support for personalization, on the other hand, is built into the programmatic model of the information
1542: space and doesn't require any special handling.
1543:
1544: \subsection{When PIPE does not Work: Reasoning about Representations}
1545: \label{factor}
1546:
1547: We now address limitations and some fundamental implications of the PIPE
1548: methodology. We will explain why the six unmodelable interaction sequences in Section~\ref{politics}
1549: are shortcomings of the PIPE methodology itself.
1550: Let us first recall why examples such as Fig.~\ref{pe2} and the other application
1551: study in Section~\ref{studies} work so well: Information-seeking activities in
1552: these scenarios were describable as partial inputs in the modeling. Since the modeling
1553: was parameterized in terms of program variables, another way to explain the
1554: success of these applications is to say that `the representation of the
1555: information space is factored in terms of structural information.'
1556:
1557: This suggests that it will be useful to understand how information spaces are factored,
1558: in general. If the representation of the information space is not factored at all, it means that no program
1559: variables are available to be turned on or off and hence the space is not
1560: personalizable by PIPE. What is counterintuitive is that `too much factoring' could
1561: also render PIPE inapplicable or useless.
1562:
1563: Consider our automobile example from Fig.~\ref{pe2} in Section~\ref{example}.
1564: It is reproduced in Fig.~\ref{newfactor} (right) with the addition of
1565: some line numbers (to denote particular points in the program).
1566: We can think of this as a factorization in terms of variables such as
1567: {\tt Blue} and {\tt Honda}, which in turn allow us to describe user requests.
1568: The left part of Fig.~\ref{newfactor} describes an alternative
1569: factorization of the same information space. In this case, the program variables and their
1570: connections are stored in a `structure table' and an explicit generator is used to construct
1571: the information space in Fig.~\ref{newfactor} (right). For instance, the structure table associates
1572: the {\tt Blue} program variable as the condition that gets us from line 1 to line 2 in the
1573: modeling. We can think of the structure table as modeling the site graph and the generator
1574: as a depth-first search (DFS) algorithm that walks the site graph to construct the information space.
1575:
1576: \begin{figure}
1577: \centering
1578: \begin{tabular}{lll}
1579: \begin{tabular}{|l|l|l|} \hline
1580: \multicolumn{3}{|c|}{\bf Structure Table} \\ \hline
1581: From & To & Program \\
1582: Line & Line & Variable \\ \hline
1583: 1 & 2 & Blue \\
1584: 1 & 3 & Red \\
1585: 2 & 4 & 2001 \\
1586: 2 & 5 & 2000 \\
1587: $\cdots$ & $\cdots$ & $\cdots$ \\
1588: \hline
1589: \end{tabular} & \large{$\times$ Site Generator (e.g., DFS)} = &
1590: \begin{tabular}{|l|} \hline
1591: {\tt L1:}\\
1592: {\tt if (Blue)} \\
1593: {\tt L2:}\\
1594: \,\,\,\,{\tt if (2001)} \\
1595: {\tt L4:}\\
1596: \,\,\,\,\,\,\,\,{\tt if (Honda)} \\
1597: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
1598: \,\,\,\,\,\,\,\,{\tt else if (Toyota)} \\
1599: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
1600: \,\,\,\,{\tt else if (2000)} \\
1601: {\tt L5:}\\
1602: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
1603: {\tt else if (Red)} \\
1604: {\tt L3:}\\
1605: \,\,\,\,{\tt if (2001)} \\
1606: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
1607: \,\,\,\,{\tt else if (2000)} \\
1608: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\
1609: \hline
1610: \end{tabular}
1611: \\
1612: \end{tabular}
1613: \caption{An example of a over-factored information space for personalization by partial evaluation. (left)
1614: Modeling the generation of an information space. (right) Modeling the interaction in an information
1615: space.}
1616: \label{newfactor}
1617: \end{figure}
1618:
1619: Rather than think of the left part of Fig.~\ref{newfactor} as the {\it generator of an information space} and contrast
1620: it with the right side (which describes it directly), let us temporarily think of both the left and
1621: right sides of Fig.~\ref{newfactor} as {\it alternative representations} of the same
1622: information space. The word `representation' does not
1623: imply the mechanical aspect of constructing the information space (left of Fig.~\ref{newfactor})
1624: or the interaction with the information space (right of Fig.~\ref{newfactor}). Since partial evaluation
1625: merely specializes programs, it doesn't pay any attention to whether the program is meant to represent
1626: interaction or generation. By losing this distinction (temporarily), we will be able to reason
1627: about representations in general.
1628:
1629: In Fig.~\ref{pe2}, we personalized the representation w.r.t. `2001'; the result
1630: was shown in Fig.~\ref{pe2} (right).
1631: Let us reconsider how we will address this request with the new design
1632: shown in Fig.~\ref{newfactor} (left).
1633: We cannot specify this input to the DFS algorithm since it is not
1634: parameterized in terms of specific variables like {\tt 2001.} The DFS is meant to work for all types of
1635: trees and graphs, not just an automobile browsing hierarchy. We also cannot specify {\tt 2001} in terms of
1636: the structure table since we have to manually readjust the line numbers to conform to the request.
1637: The only way we can obtain the same result as in
1638: Fig.~\ref{pe2} is to change the structure table in Fig.~\ref{newfactor} completely to reflect the
1639: tree shown in Fig.~\ref{pipe-illustrate} (right). But by then, we have done most of the work needed
1640: for personalization! In fact, the personalization request is no longer describable as partial evaluation,
1641: but as a {\it complete evaluation} (specifying all arguments).
1642: We say that such a design is {\it over-factored}, for the given
1643: information-seeking activity.
1644:
1645: Attempting to use an over-factored representation (for the type of information-seeking
1646: activities in Fig.~\ref{pe2}) appears fruitless.
1647: The reason is that over-factorization divorces two crucial elements
1648: out, which really have to interplay for partial evaluation to be beneficial. Fig.~\ref{newfactor} (left) is like
1649: two sides of the PIPE coin separated: the structure table contains the structural information (with which
1650: we connect user requests) and the DFS contains the logic flow (which is simplified by partial
1651: evaluation for the user). Neither is useful in PIPE without the other and yet they cannot be represented
1652: distinctly. {\it This is why over-factorization is not desirable.}
1653:
1654: It is important to note that an information system design is not just over-factored, it is over-factored
1655: for a particular information-seeking activity. For instance, we can give an example of an information-seeking
1656: activity for which the design in Fig.~\ref{newfactor} (left) is factored `just right.' Consider the following
1657: user who walks into the automobile dealership:
1658:
1659: \begin{descit}
1660: \noindent
1661: {\bf Buyer:} I am here to buy a car. Ask me the questions for year, model, and color, in that order.
1662: \end{descit}
1663: \
1664: In this case, the user does not want a personalized information space for browsing. Rather, he is seeking
1665: to personalize the {\it generation} of an information space. Our original modeling in Fig.~\ref{newfactor} (right) cannot
1666: handle this situation. It can let the user give values out of turn, but it can't change the default order in which the
1667: questions are asked. We say that the design in Fig.~\ref{newfactor} (right) is {\it under-factored} (for this activity).
1668: However, the design in
1669: Fig.~\ref{newfactor} (left) can accommodate it, if the site generator
1670: can take arguments such as what the first level of the hierarchy should be, what the
1671: second level should be, and so on. Presumably such a generator would walk the tree described by the structure table
1672: and restructure it based on the arguments. In this case, we can still use partial evaluation for requests such as:
1673:
1674: \begin{descit}
1675: \noindent
1676: {\bf Buyer:} I am here to buy a car. I don't care in what order you ask the questions, but the second
1677: question should be about year.
1678: \end{descit}
1679: \noindent
1680: (It is a different issue if such scenarios are likely. For now, we are only exploring the PIPE concept theoretically.)
1681: After this information space is generated, we still have the option of re-representing the generated information space
1682: in our usual manner and conducting personalization by partial evaluation.
1683: We can thus state the following three definitions:
1684: \begin{descit}{}
1685: \noindent
1686: A representation $\mathcal{I}$ of an information space is well-factored for an information-seeking
1687: activity $\mathcal{G}$ if all interaction sequences in $\mathcal{G}$ can be realized by
1688: partial evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is personable
1689: for $\mathcal{G}$.
1690:
1691: A representation $\mathcal{I}$ of an information space is over-factored for an information-seeking
1692: activity $\mathcal{G}$ if all interaction sequences in $\mathcal{G}$ can be realized by
1693: complete evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is not personable
1694: for $\mathcal{G}$.
1695:
1696: A representation $\mathcal{I}$ of an information space is under-factored for an information-seeking
1697: activity $\mathcal{G}$ if no interaction sequences in $\mathcal{G}$ can be realized by partial (or complete)
1698: evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is not
1699: personable for $\mathcal{G}$.
1700: \end{descit}
1701:
1702: %\begin{figure}
1703: %\centering
1704: %\begin{tabular}{cc}
1705: %\includegraphics[width=3in]{venndiagram}
1706: %\end{tabular}
1707: %\caption{Overlap between interactions sequences that are describable by complete evaluation and those
1708: %that are describable by partial evaluation.}
1709: %\label{venn}
1710: %\end{figure}
1711:
1712: Thus, a given representation could be well-factored for one information-seeking activity but over-factored
1713: for another. Fig.~\ref{newfactor} (left) is well-factored for generation but over-factored for interaction.
1714: Fig.~\ref{newfactor} (right) is well-factored for interaction but over-factored for
1715: users who employ the color-year-model motif diligently (and completely).
1716:
1717: The 6 unmodelable scenarios in Section~\ref{politics} involved requests such as `I would like to have the choice
1718: of party as the first level of the hierarchy, the choice of state as the second level.' Our design was obviously
1719: under-factored for such interaction sequences. We can define the {\it personability} of a representation as the
1720: fraction of interaction sequences in a (external examiner) model that are describable as partial evaluations. For the
1721: external examiner model described in Section~\ref{eval}, the personability of the PIPE modeling
1722: (presented in Section~\ref{politics}) is thus 25/32.
1723:
1724: Notice that all of these statements assume that the model for transforming representations is {\it partial
1725: evaluation.} There are other program-transformation techniques which might be able to address the
1726: unmodelable requests above, but PIPE only provides partial evaluation as the operator for personalization.
1727: Our statements should only be interpreted in the context of personalization by partial evaluation.
1728:
1729: In practice, the decision of choosing a factoring will depend on which situations are more
1730: likely and also the composition of the space of interaction sequences $\mathcal{G}$. It is acceptable to have
1731: some interaction sequences that involve complete evaluation, as long as they are
1732: a small fraction of the total number of interaction sequences.
1733:
1734: Thus far, we have fixed the representation and analyzed the information-seeking activities for which it
1735: was over-factored, the ones for which it was under-factored, and so on. This is the designer's viewpoint.
1736: For a given site design, it allows the designer to pose questions such as `What are the information-seeking activities
1737: for which my site is personable?'
1738:
1739: An alternate viewpoint is user-driven. Given an information-seeking activity, the user asks `What sites are
1740: most personable for my activity?' This allows the user to take different site designs (along with representations),
1741: analyze them w.r.t. a conceptual model of information seeking, and rank them in order of personability. For instance,
1742: consider again the external examiner model described in Section~\ref{eval} for the politicians case study. One information
1743: system design was described in Section~\ref{politics}. The personability of this design is, as stated earlier, 25/32.
1744: Seven interaction sequences were not modelable.
1745: Another information system design is the representation in Fig.~\ref{newfactor} (left). The personability of this
1746: design is 6/32. While it accommodates six of the seven sequences, it is no longer personable for the original
1747: 25 sequences! This is because those 25 sequences are now describable as complete evaluations, which also violate
1748: the partial evaluation model! Thus, both over-factorization and under-factorization lead to unpersonable
1749: information spaces. We hypothesize that the most interesting representations are in between.
1750:
1751: An open research issue is if we have to cross the barrier
1752: from interaction to generation to arrive at over-factored representations.
1753: \section{Concluding Remarks}
1754: \label{conclu}
1755: This paper makes several major contributions. We have presented a novel modeling
1756: methodology for information personalization. PIPE enables the view of personalization
1757: as specializing representations. It models interactions with information systems
1758: and uses partial evaluation to simplify the interactions. PIPE also contributes a novel
1759: evaluation criterion for information system designs. It relates personalization to
1760: the way an information system design is factored. This has implications for how web
1761: applications are developed and deployed~\cite{autoweb}. Many web sites today are based on
1762: the generator model; the results in this paper indicate that they might not be
1763: directly personable for interaction scenarios (under partial evaluation).
1764:
1765: Our modeling makes very weak assumptions on the nature of interactions with
1766: information systems. While we have covered only web sites (and collections of web sites)
1767: in this paper, any information system technology that affords the notion of interaction
1768: sequence or the idea of factorization
1769: can be studied on similar lines. This especially applies to designs for voice-activated
1770: systems (e.g., VoiceXML), directory access protocols (e.g., LDAP), information systems
1771: that provide a dialog model of interaction, and models for organizing digital libraries (e.g.,
1772: 5S).
1773:
1774: We plan to extend the PIPE methodology in several directions. We would like to extend the
1775: modeling methodology to address earlier aspects of the personalization system design life cycle, such
1776: as requirements gathering, verification, and validation. First steps toward this goal are
1777: described in a companion paper~\cite{pipe-tochi}. Another important direction of
1778: future work involves modeling {\it context} in personalization systems. The programmatic modeling
1779: provided in PIPE suggests that context can be usefully viewed as partial information. We believe
1780: that more sophisticated forms of modeling partial information will be needed for describing context, besides
1781: values for program variables.
1782: We are also interested in relaxing our
1783: assumptions of bounded sequences that have separable structural and terminal parts. This will
1784: allow us to address other information-seeking activities such as social network navigation.
1785: In addition, we are investigating program transformation techniques
1786: that can help reason about terminal information
1787: (e.g., program slicing~\cite{slicing}),
1788: in addition to structural information.
1789:
1790: Our long-term goal is to develop a theory of reasoning about representations of information
1791: spaces. This will allow us to formally study the design and implementation of information systems
1792: in terms of the representations they employ.
1793:
1794: \section*{Acknowledgements}
1795: Many of the ideas in this paper were developed during the
1796: Spring 2001 offering of the CS 6604 course (on recommender systems and personalization) at
1797: Virginia Tech. We acknowledge helpful discussions with
1798: Jack Carroll, Marcos Gon\c{c}alves, Dennis Kafura, Priya Lakshminarayanan, Dick Nance,
1799: Manuel Perez, and Mary Beth Rosson.
1800: Rob Capra helped establish connections between PIPE and mixed-initiative interaction and provided
1801: ideas for evaluating the modeling of personalization systems.
1802: Ed Fox suggested the usage of `structural' and `terminal' information to qualify interaction
1803: sequences. Comments from several anonymous referees helped improve the presentation of the article.
1804:
1805: \bibliographystyle{plain}
1806: %\bibliographystyle{named}
1807: \bibliography{paper}
1808:
1809: \end{document}
1810:
1811: