0108:cs0108003/paper.tex

1: \documentclass[11pt]{article}

2:

3: %\usepackage{ijcai01}

4: %\usepackage{fullpage,palatino}

5: \usepackage{fullpage,url}

6: \setlength{\oddsidemargin}{-0.25in}

7: \setlength{\evensidemargin}{-0.25in}

8: \setlength{\topmargin}{0.5in}

9: \setlength{\headheight}{0pt}

10: \setlength{\headsep}{0pt}

11: \setlength{\footskip}{0.35in}

12: \setlength{\textheight}{8.75in}

13: \setlength{\textwidth}{7in}

14: \setlength{\itemindent}{-0.5cm}

15: \setlength{\marginparwidth}{0in}

16: \setlength{\marginparsep}{0in}

17: \hyphenation{inform-ation-seeking inform-ation}

18: \newenvironment{descit}[1]{\begin{quote} \textit{#1}}{\end{quote}}

19:

20: \input{psfig-dvips}

21:

22: \newif\ifpdf

23: \ifx\pdfoutput\undefined

24:   \pdffalse

25: \else

26:   \pdfoutput=1

27:   \pdftrue

28: \fi

29:

30: \ifpdf

31:   \usepackage[pdftex]{graphicx}

32:   \usepackage[pdftex]{color}

33:   \DeclareGraphicsExtensions{.pdf,.png,.jpg}

34: \else

35:   \usepackage[dvips]{graphicx}

36:   \usepackage[dvips]{color}

37:   \DeclareGraphicsExtensions{.eps,.epsi,.ps}

38: \fi

39:

40: \usepackage{times}

41: %\usepackage{fancyheadings}

42:

43: \pagestyle{plain}

44: %\thispagestyle{empty}

45: %\pagestyle{empty}

46:

47: \def\midv{\mathop{\,|\,}}

48: \newtheorem{defn}{Definition}

49: \long\def\cbk#1{{\color{red}[CBK: #1]}}

50: \newlength\colwidth \setlength\colwidth{3.25in}

51:

52: \title{The Partial Evaluation Approach to \\

53: Information Personalization}

54:

55: \author{Naren Ramakrishnan and Saverio Perugini\\

56: Department of Computer Science\\

57: Virginia Tech, Blacksburg, VA 24061\\

58: Email:\{naren,sperugin\}@cs.vt.edu}

59:

60: \begin{document}

61: %\renewcommand{\baselinestretch}{2}

62:

63: \maketitle

64: %\thispagestyle{empty}

65: %\pagestyle{empty}

66:

67: \begin{abstract}

68: \noindent

69: Information personalization refers to the automatic adjustment of information content,

70: structure, and presentation tailored to an individual user. By reducing information

71: overload and customizing information access, personalization systems have emerged as an

72: important segment of the Internet economy. This paper presents a systematic modeling methodology  ---

73: PIPE (`Personalization is Partial Evaluation')  --- for personalization.

74: Personalization systems are designed and implemented in PIPE by modeling

75: an information-seeking interaction in a programmatic representation. The representation

76: supports the description of information-seeking activities as partial information and their

77: subsequent realization by {\it partial

78: evaluation}, a technique for specializing programs.

79: We describe the modeling methodology at a conceptual level and outline representational choices.

80: We present two application case studies that use PIPE for personalizing web sites and describe

81: how PIPE suggests a novel evaluation criterion for information system designs. Finally,

82: we mention several fundamental implications of adopting the PIPE model for personalization and

83: when it is (and is not) applicable.

84: \end{abstract}

85: \newpage

86: \tableofcontents

87: \newpage

88: %\begin{descit}{}

89: %

90: %\noindent

91: %I have observed that many good ideas start out by claiming too much territory for themselves,

92: %and eventually, when they have received their fair share of attention and respect, the air

93: %clears and it emerges that, though still grand, they are not quite so grand and all-encompassing

94: %as their proponents first thought. But that's all right. $\ldots$ That would be a fine start.

95: %\flushright{Douglas R. Hofstadter, in {\it Analogy as the Core of Cognition}}

96: %\end{descit}

97: \section{Introduction}

98: %\renewcommand{\baselinestretch}{2}

99:

100: One of the main contributions of information systems research is

101: the development of models that allow the

102: specification and realization of information-seeking activities.

103: Besides formalizing important operations, such

104: models provide a vocabulary with which to reason about the information-seeking activity.

105: For instance, if an information space is modeled as a term-document matrix, then the

106: vector-space model permits the view of retrieval as measuring similarities between document

107: vectors. Similarly, the modeling of data as a set of relations in a database system

108: affords expressive

109: query languages such as SQL. Other models and modeling methodologies can be found

110: in interactive information retrieval applications~\cite{scatter-gather,tkde-navigation,rabbit}.

111: Our goal in this paper is to present a modeling methodology for information personalization.

112:

113: Personalization constitutes the mechanisms and technologies required to

114: customize information access to the end-user.  It can be defined as

115: the automatic adjustment of information content, structure, and presentation

116: tailored to an individual user. The reader will be familiar with instances of

117: personalization such as web sites that welcome a returning user and

118: recommender systems~\cite{adomavicius01expert-driven,specissue}

119: at sites such as {\tt amazon.com}. The scope of

120: personalization today extends beyond web pages and web sites~\cite{terveen} to many different

121: forms of information content and

122: delivery~\cite{cacm-broader,cacm-kantor,cacm-streams}.

123: The underlying algorithms and techniques range from simple keyword matching

124: of consumer profiles, to explicit~\cite{ira,grouplens,phoaks} or

125: implicit~\cite{cacm-jaideep,cacm-myra} capture of user interaction.

126:

127: Despite its apparent popularity in reducing

128: information overload on the Internet, personalization suffers from a lack of any rigorous model

129: or modeling methodology. One of the main reasons is that there are `personal

130: views of personalization~\cite{cacm-personal}.' There are hence as many ways

131: to design and build a personalization system as there are interpretations for what

132: personalization means. Such a diversity presents a difficulty when studying conceptual

133: models of personalization, in general.

134:

135: %Consequently, a large number of dichotomies have been proposed

136: %that classify personalization research according to the

137: %philosophies of the underlying domains, the proposers, and their

138: %parent communities. Thus, categorizations such as

139: %content-based versus collaborative~\cite{adomavicius01expert-driven,specissue},

140: %`customization versus transformation' \cite{adaptive-sites}, non-destructive

141: %versus destructive, `public transportation versus hot-rods' \cite{rus} have

142: %become widely accepted. These dichotomies are based on the form of personalization,

143: %the level at which it is targeted, and the types of information used in

144: %delivering the personalization.

145:

146: We present the first

147: (to the the best of our knowledge)

148: systematic modeling methodology for information personalization.

149: Termed PIPE (`Personalization is Partial Evaluation')~\cite{naren-ic}, our methodology makes no commitments to a

150: particular

151: algorithm, format for information resources,

152: type of information-seeking activities or, more basically, the nature

153: of personalization delivered. Instead, it emphasizes the modeling of an

154: information space in a way where descriptions of

155: information-seeking activities can be

156: represented as partial information. Such

157: partial information is then exploited (in the model) by

158: {\it partial evaluation}, a technique popular in the programming languages

159: community~\cite{jones}.

160:

161: %In important ways,

162: %PIPE advances personalization research as well as our understanding of how to build information

163: %systems. PIPE enables the view of personalization as

164: %specializing representations.

165: %%personalization, not in terms of particular algorithms or targeting goals, but in

166: %%terms of representations (of information systems).

167: %%This viewpoint provides a framework to study questions

168: %%such as `how should I design my information system so that it is personalized for my users?'

169: %It requires the modeling of {\it interaction} with an information

170: %system in a suitable representation. Once we conduct the modeling, personalization is achieved

171: %quite literally, `for free.' Our modeling methodology also allows

172: %personalization to be provided in conjunction with

173: %other information-seeking activities such as browsing.

174: %Finally, PIPE contributes a novel evaluation criterion

175: %for information system designs. It relates

176: %personalization to the way an information system design is {\it factored}.

177: %The evaluation criterion allows us to qualify the {\it personability} of an

178: %information system for a particular information-seeking activity, for example

179: %`web site X is 60\% more personalized for activity Y than web site Z.'

180: %

181:

182: While our ideas and results apply to many forms of computerized

183: information systems (e.g., web-based, voice-activated),

184: we restrict our attention to web sites in this paper.

185: Later in our discussion, we

186: qualify the range of information systems technologies to which PIPE can be applied.

187:

188: \subsection*{Reader's Guide}

189: Section~\ref{example} introduces the basic concepts of PIPE with the example of personalizing

190: a browsing hierarchy on the web. Section~\ref{basics} outlines the

191: PIPE modeling

192: methodology and how it can be used

193: for representing a variety of situations.

194: Section~\ref{studies} describes two

195: application studies that use PIPE for personalizing web sites.

196: Evaluation aspects implied by PIPE as

197: a modeling methodology are also described here.

198: Section~\ref{discuss}

199: describes connections between PIPE and other approaches, and carefully

200: qualifies situations where PIPE is

201: (and is not) applicable.

202: %Emerging scenarios that can be realized with PIPE and PIPE-like methodologies are also presented here.

203: Finally, Section~\ref{conclu} summarizes the major

204: contributions of this work.

205: \section{Motivating Example}

206: \label{example}

207:

208: Consider a consumer visiting an automobile dealership

209: to purchase a vehicle. Here are two possible scenarios.

210:

211: \newpage

212: \begin{descit}{Scenario 1}

213: \vspace{-0.1in}

214: \begin{tabbing}

215: {\bf Dealer:} \= Madam, are you looking to purchase a passenger vehicle?\\

216: {\bf Buyer:} \> Yes.\\

217: {\bf Dealer:} \> Do you have a particular manufacturer in mind?\\

218: {\bf Buyer:} \> I know that cars made by Honda have the highest safety approval rating.\\

219: {\bf Dealer:} \> That is true. Honda comes in seven colors. Do you have a preference for color?\\

220: {\bf Buyer:} \> The `cyclone blue' looks pleasing.\\

221: (conversation continues to ascertain further details of the vehicle)

222: \end{tabbing}

223: \end{descit}

224:

225: \begin{descit}{Scenario 2}

226: \vspace{-0.1in}

227: \begin{tabbing}

228: {\bf Dealer:} \= Sir, may I interest you in anything?\\

229: {\bf Buyer:} \> I am looking for a sport utility vehicle.\\

230: {\bf Dealer:} \> Sure, do you have a particular manufacturer in mind?\\

231: {\bf Buyer:} \> Not really, but the vehicle should be Red and made in 2001.\\

232: {\bf Dealer:} \> I see. \\

233: {\bf Buyer:} \> And by the way, I don't care for the fancy doormats and fittings.\\

234: {\bf Dealer:} \> Of course.\\

235: (conversation continues)

236: \end{tabbing}

237: \end{descit}

238: \noindent

239: %The sophistication in these scenarios arises from the {\it mixed-initiative} nature of human

240: %conversations.

241: In the first scenario, the conversation is directed by the dealer,

242: and the buyer merely answers questions posed by the dealer. The second scenario

243: resembles the first upto a point, after which the buyer takes the initiative and

244: provides answers `out of turn.' When queried about manufacturer, the buyer responds

245: with information about color and year of manufacture instead. Nevertheless,

246: the conversation is not stalled and both parties continue the dialog

247: to (eventually) complete the information assessment task. At each stage in the

248: above conversations, the buyer has the choice of proceeding along the lines of inquiry initiated

249:  by the dealer

250: or can shift gears and address a different aspect of information assessment. Scenarios that

251: `mix' these two modes of inquiry in such arbitrary ways constitute the

252: scope of {\it mixed-initiative} interaction~\cite{mixed-notkin}.

253:

254: \begin{figure}

255: \centering

256: \begin{tabular}{cc}

257: \includegraphics[width=8.25cm,height=6cm]{auto1.eps}

258: \includegraphics[width=7.4cm,height=6cm]{auto2.eps}

259: \end{tabular}

260: \begin{tabular}{cc}

261: \includegraphics[width=8.25cm,height=9.89cm]{auto4.eps}

262: \includegraphics[width=7.4cm,height=9.89cm]{auto3.eps}

263: \end{tabular}

264: %\begin{tabular}{cc}

265: %& \mbox{\psfig{figure=stupidautos.eps,width=5.5in}}

266: %\end{tabular}

267: \caption{Four typical solutions to organizing web catalogs. (top left)

268: A hardwired scenario.

269: (top right) A choice of two hardwired scenarios.

270: (bottom left) Complete enumeration involving all possible scenarios of interaction.

271: (bottom right) A `power-search' form that hides details of enumeration.}

272: \label{auto-solutions}

273: \end{figure}

274: Can we support a similar diversity of interaction in an online information system?

275: In other words, the system should have a default mode of interaction where a user

276: would fill in forms (or click on choices) in a specified order. A more

277: enterprising user should be able to supply any piece of information out

278: of turn. Finally, it should be possible to

279: mix these two modes of interaction in any order.

280: %While the term `mixed-initiative' affords many interpretations~\cite{mixed-notkin},

281: %we adopt an operational definition from our view of a dialog as

282: %an information assessment activity. By mixed-initiative, we emphasize the ability at any

283: %point to invoke (and later, return from) a subdialog whenever requested or specified by the

284: %user. This particular aspect of dialog structure in the context of information-seeking

285: %has been termed `goal-specification subdialogs' in~\cite{mixed-hci}.

286: %

287: At each stage of the interaction (whether system-initiated or user-requested), the system should

288: respond with the appropriate set of choices available. For instance, notice the

289: restriction to

290: seven colors once the decision on Honda is made in {\it Scenario 1}. If the choice

291: of color was made at the outset, presumably more selections would have been available.

292: A system that supports such a diversity of interaction would be personalized to

293: a user's individual preference(s) for information-seeking.

294:

295: The typical solution involves

296: anticipating the forms of interactions that have to be supported and designing interfaces

297: to support the implied scenarios (in this paper, we use the term `scenarios' to mean scenarios

298: of interaction).

299: Fig.~\ref{auto-solutions} describes four typical solutions that make various assumptions

300: on the scenarios that will be supported.

301: Fig.~\ref{auto-solutions} (top left)

302: can only support situations such as {\it Scenario 1} above, in that the user is forced to make

303: a choice of manufacturer at the outset (and all remaining levels are similarly fixed).

304: We refer to this as a design that hardwires scenarios.

305: Fig.~\ref{auto-solutions} (top right) also hardwires scenarios,

306: but provides a choice of two such hardwired scenarios (i.e., search by model

307: or search by price). Fig.~\ref{auto-solutions}

308: (bottom left) is what we refer to as {\it complete enumeration}, which involves

309: enumerating all possible scenarios and providing interfaces to all of them~\cite{hearst-setting}.

310: While the interface in Fig.~\ref{auto-solutions} (bottom left) only depicts the top-level choice, we

311: could imagine that such multiplicity of choices are duplicated at all lower levels.

312: It is clear that enumeration could involve an exponential number of possibilities and correspondingly

313: cumbersome site designs.

314: And finally, Fig.~\ref{auto-solutions} (bottom right) provides the same

315: functionality as Fig.~\ref{auto-solutions} (bottom left)

316: but masks the details of enumeration in a convenient `power-search' form.

317:

318: All of these solutions rely on anticipating the points where an out-of-turn interaction

319: can occur and provide mechanisms to support it.  When opportunities

320: for out-of-turn interaction are too restrictive, information systems cause major frustrations to users.

321: The basic problem is the representational mismatch between the user's mental model of the

322: information-seeking activity and the facilities that are available for describing the activity.

323:

324: In Fig.~\ref{stupidyellow}, the user is attempting to decide on an automotive retailer

325: based on the services offered. He is open to

326: the possibility of traveling to a different city in order to make his purchase. He is thus

327: unsure of providing information about the location of the retailer, but the system insists that he

328: make this choice first. The reader can identify with examples such as these from other personal

329: experiences.

330:

331: \begin{figure}

332: \centering

333: \begin{tabular}{cc}

334: \includegraphics[width=8.25cm,height=6.8cm]{yellow}

335: %& \mbox{\psfig{figure=yellowstupid.eps,width=5.5in}}

336: \end{tabular}

337: \caption{An interface that prohibits certain information-seeking activities from

338: being decribed.}

339: \label{stupidyellow}

340: \end{figure}

341:

342: \subsection{The PIPE Approach}

343: We present an alternative design approach, one that promotes out-of-turn interaction without

344: predefining the points where such interaction can take place.

345: %and forms in which the information-seeker (buyer, in the above example)

346: %can barge in and alter the flow of interaction (conversation, in the above example).

347: Consequently, the

348: interfaces produced by our approach are, at once, both more expressive and simpler than the ones

349: in Fig.~\ref{auto-solutions}.

350:

351: \begin{figure}

352: \centering

353: \begin{tabular}{|l|l|} \hline

354: {\tt int pow(int base, int exponent) \{} & {\tt int pow2(int base) \{} \\

355: \,\,\,\,\,{\tt int prod = 1;} & \,\,\,\,\,{\tt return (base * base)} \\

356: \,\,\,\,\,{\tt for (int i=0;i<exponent;i++)} &  \} \\

357: \,\,\,\,\,\,\,\,\,\,{\tt prod = prod * base;} & \\

358: \,\,\,\,\,{\tt return (prod);} & \\

359: \} & \\

360: \hline

361: \end{tabular}

362: \caption{Illustration of the partial evaluation technique.

363: A general purpose {\tt pow}er function written in C (left) and

364: its specialized version (with {\tt exponent} statically set to 2) to handle squares

365: (right). Such specializations are performed automatically by partial evaluators such as C-Mix.}

366: \label{pe}

367: \end{figure}

368:

369: \begin{figure}

370: \centering

371: \begin{tabular}{cc}

372: & \mbox{\psfig{figure=autohierarchy2.eps,width=5in}}

373: \end{tabular}

374: \caption{Personalizing a browsing hierarchy. (left)

375: Original information resource. (right) Personalized hierarchy with respect to vehicles

376: made in 2001. Notice that not only the pages, but also their structure is customized for (further

377: browsing by) the user.}

378: \label{pipe-illustrate}

379: \end{figure}

380:

381: \begin{figure}

382: \centering

383: \begin{tabular}{|l|l|} \hline

384: {\tt if (Blue)} & \\

385: \,\,\,\,{\tt if (2001)} & \\

386: \,\,\,\,\,\,\,\,{\tt if (Honda)} & \\

387: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & {\tt if (Blue)}\\

388: \,\,\,\,\,\,\,\,{\tt else if (Toyota)} & \,\,\,\,{\tt if (Honda)} \\

389: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$}\\

390: \,\,\,\,{\tt else if (2000)} & \,\,\,\,{\tt else if (Toyota)}\\

391: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

392: {\tt else if (Red)} & {\tt else if(Red)} \\

393: \,\,\,\,{\tt if (2001)} &  \,\,\,\,{$\cdots \cdots \cdots$} \\

394: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \\

395: \,\,\,\,{\tt else if (2000)} & \\

396: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} & \\

397: \hline

398: \end{tabular}

399: \caption{Using partial evaluation for personalization. (left) Programmatic input to

400: partial evaluator, reflecting the organization of information in Fig.~\ref{pipe-illustrate} (left).

401: (right) Specialized program from the partial evaluator, used to create the personalized

402: information space shown in Fig.~\ref{pipe-illustrate} (right).}

403: \label{pe2}

404: \end{figure}

405:

406: Let us begin by considering the scenario where a user obediently supplies information attributes

407: in the order requested. For ease of presentation, we assume that there are three attributes

408: --- color, year of manufacture, and manufacturer --- and that the information system ascertains

409: values for them in this order. The key contribution of PIPE is to cast this seemingly

410: inflexible and hardwired scenario in a representation that allows its automatic transformation

411: into other scenarios. In particular, PIPE represents an information space

412: as a program, partially evaluates the program with respect to (any) user input, and

413: recreates a personalized information space from the specialized program.

414:

415: %\subsubsection*{Partial Evaluation}

416: The input to a partial evaluator

417: is a program and (some) static information about its arguments. Its

418: output is a specialized version of this program (typically in the same

419: language),

420: that uses the static information to `pre-compile' as many operations

421: as possible. A simple example is how the C function {\tt pow}

422: can be specialized to create a new function, say

423: {\tt pow2}, that computes the square of an integer.

424:  Consider for example,

425: the definition of a {\tt pow}er function shown in the left part of Fig.~\ref{pe}.

426: If we knew that a particular user will utilize it

427: only for computing squares of

428: integers, we could specialize it (for that user) to produce the {\tt pow2}

429: function.

430: Thus, {\tt pow2} is obtained automatically (not by a human programmer)

431: from {\tt pow} by precomputing all expressions that involve {\tt exponent},

432: unfolding the for-loop, and by various other compiler transformations such as

433: {\it copy propagation} and {\it forward substitution}.

434: Automatic program specializers are available for C, FORTRAN, PROLOG, LISP, and several other

435: important

436: languages. The interested reader is referred to \cite{jones} for a good introduction.

437: While the traditional motivation for using partial evaluation is to achieve speedup

438: and/or remove interpretation overhead \cite{jones}, it can also be viewed as a technique

439: for simplifying program presentation, by removing inapplicable, unnecessary,

440: and `uninteresting' information (based on user criteria) from a program.

441:

442: %\subsubsection*{Using Partial Evaluation for Personalization}

443: Consider the hardwired scenario depicted in Fig.~\ref{pipe-illustrate} (left).

444: We can abstract this hierarchy by the program

445: in Fig.~\ref{pe2} (left) whose structure models the information resource (in

446: this case, a hierarchy of web pages) and whose control-flow models

447: the information-seeking activity within it (in

448: this case, browsing through the hierarchy by making individual selections). The link

449: labels are represented as program variables and semantic dependencies between links

450: are captured by the mutually-exclusive {\tt if..else} dichotomies. As it is modeled in

451: Fig.~\ref{pe2} (left), the program reflects the assumption that

452: the choice of year is usually made at the second level, after a color selection has been

453: made. However, to personalize for the user who says `2001' at the outset, we partially

454: evaluate the program with respect to the variable {\tt 2001} (setting it to one and all

455: conflicting variables such as {\tt 2000} to zero). This produces the simplified

456: program in Fig.~\ref{pe2} (right), which can

457: be used to recreate web pages with personalized web content (shown in Fig.~\ref{pipe-illustrate},

458: right). The second level of the hierarchy is simplified,

459: bringing the originally third level as the new second level. The

460: user is able to provide the value of any deeply

461: nested variable out of turn, thus achieving mixed-initiative interaction.

462:

463:

464: %Executing the program in the

465: %order and form in which it was modeled amounts to the system-initiated mode of `browse as I say.'

466: %`Jumping ahead' to nested program segments by partially evaluating the program amounts to

467: %the user-directed mode of personalization.

468:

469: \subsection{Some Preliminary Observations}

470: \label{observe}

471:

472: Personalization systems are thus designed and implemented in PIPE by modeling

473: an information-seeking activity in a programmatic representation. The above

474: example has been carefully constructed to highlight the many advantages and

475: opportunities provided by PIPE. Before we describe PIPE in detail, it will be

476: helpful to summarize the lessons from the above example.

477:

478: \begin{enumerate}

479: \setlength{\leftmargin}{0in}

480: \item {\it PIPE equates personalization to specializing representations.}

481: %\vspace{-0.1in}

482: As a methodology, PIPE asserts that if interaction in an information space can

483: be represented as a program, then a personalized information space can be automatically

484: generated by partial evaluation. It is upto the designer to supply the representation

485: as a program and reinterpret the program in information systems terms.

486: The meaning of the programmatic representation is thus

487: external to the basis for personalization (partial evaluation).

488:

489: For instance,

490: the act of clicking on the `Honda' hyperlink to browse through Honda cars is captured

491: in Fig.~\ref{pe2} by just the expression {\tt if (Honda)}. Clicking on the link amounts

492: to evaluating this conditional to be true.

493: %A partial evaluator has no semantic

494: %understanding of `clicking.' It is only the designer's creativity (or imagination)

495: %that allows him to assign such an understanding. It is thus imperative that we

496: %have a meaningful mapping

497: %of programming constructs to aspects of

498: %interaction with an information system.

499: %

500: %In some cases, these mappings are obvious. A

501: The conditional construct {\tt if} is thus used as a logical

502: point where the state of

503: information is tested before proceeding any further. It could model either

504: a hyperlink that has to be clicked or a free-form text box whose entries are evaluated.

505: %Assigning an interpretation to constructs such as {\tt goto}

506: %is more tricky (but see Section~\ref{methods}).

507:

508: \item {\it The effectiveness of PIPE depends on what is modeled (and how).}

509: The effectiveness of a PIPE implementation depends on the

510: the particular modeling choices made {\it within} the programmatic

511: representation (akin

512: to~\cite{rabbit}).

513: We cannot overemphasize this aspect --- the example

514: in Fig.~\ref{pe2} can be made `more personalized' by conducting

515: a more sophisticated modeling of the underlying domain. For instance, information such

516: as vehicle VIN numbers, history of ownership,

517: mileage on the vehicle, and photos of the car can be further modeled as a browsable

518: hierarchy and `attached' (functionally invoked)  at various places

519: in the program of Fig.~\ref{pe2} (left). Conversely the example in Fig.~\ref{pe2} (left)

520: can be made `less personalized' by, for instance, requiring categorical information along with user input.

521: Replacing {\tt if (2001)} in Fig.~\ref{pe2} (left)

522: with {\tt if (Year=2001)} implies that the specification of the type of

523: input (namely that `2001' refers to the year of manufacture) is required

524: in order for the statement to be partially evaluated.

525: Personalization systems built with PIPE can thus be distinguished by what

526: they model and the forms of customization enabled by applying

527: partial evaluation to such a modeling.

528:

529: Similarly, the way in which program variables are associated with user input

530: can influence the effectiveness of a PIPE implementation.

531: Values for program variables

532: could come from a content-based technique or a so-called

533: collaborative technique. For instance, the variable {\tt Honda} could be set to true, either

534: because the user explicitly said so, or because `Honda' was recommended to the user by an automatic

535: recommender system. In addition, different variables could afford different

536: interpretations.

537:

538: Sometimes we can take advantage of a domain semantics when associating values

539: with program variables or in modeling the program.

540: Fig.~\ref{pe2} models a `strict' semantics of variable assignment

541: by the {\tt if..else} dichotomies. If {\tt Blue} is evaluated to true, then every other option

542: qualified by the {\tt else} constructs (such as {\tt Red}) would be automatically removed from

543: further consideration. This is due to our assumption that if the user declares `Blue' as his

544: preference, then he would not be interested in Red cars. If such a semantics

545: is not appropriate, then we would not have {\tt else} clauses in our conditionals.

546: Thus, PIPE doesn't dictate what the domain semantics (for assigning program variables) should be

547: or even that it should be available. But it can take advantage of a domain semantics, if one exists.

548:

549: %Consider again Fig.~\ref{pe2}, where the user

550: %declares `2001' as his preference and we partially evaluate

551: %w.r.t. {\tt 2001}. What is being simplified (actually, removed) in this example is

552: %the test {\tt if (2001)} and along with it,

553: %a web page that requires the user to click on `2001.'

554: %The reader might notice that we also set the program variable {\tt 2000} to be false. This

555: %is due to our understanding of how the personalization scenario is expressible as partial inputs.

556: %When the user declares `2001,' we can reason that she will not be interested in vehicles made

557: %in other years, and set the corresponding variables to zero. We reiterate that PIPE doesn't

558: %require that we set {\tt 2000} to be false; it comes from our understanding of what the user's

559: %interests are.

560: %

561: %Likewise, if the user declares `Blue,' we can set the {\tt Blue} variable to one and all

562: %other conflicting variables such as {\tt Red} to zero. Once again, this doesn't mean that `Blue'

563: %implies `not Red'; it just means that we are using a domain semantics that enables us

564: %to make simplifications. If the user could still be interested in Red cars, then we

565: %do not set {\tt Red} to zero.

566: %

567: Finally, the translation of the program from and back to

568: the information space could be done in different ways. In Fig.~\ref{pe2} (left) we modeled the

569: program by abstracting hyperlinks across pages as conditionals. When we recreate personalized

570: pages from

571: Fig.~\ref{pe2} (right) we are not obliged to this design choice. We could cascade all the

572: interactions to within a single page, for instance. PIPE only requires that the designer of

573: the information system has a way of going from an information space to a programmatic

574: representation, and back again. Section~\ref{basics} covers modeling options

575: in detail.

576:

577: % how to set variables, how to employ a domain theory

578: % goal-oriented

579: % interaction sequences are bounded

580: % common criticism of our work is..

581: \item {\it PIPE separates modeling for a personalization system

582: from the operational aspect of personalization.}

583: Personalization systems are usually described in terms of the techniques

584: that provide personalization or the level at which the information is tailored. Due to the

585: variety possible, comparisons of personalization systems have been

586: difficult to make.  PIPE, on the other hand, shifts the focus to modeling for

587: a personalization system. Any form of personalization is possible if the modeled

588: program allows the pertinent scenarios to be expressible as partial inputs. In

589: Fig.~\ref{pe2}

590: we cannot personalize cars with respect to occupancy, not because of

591: any fundamental limitation in our personalization methodology, but because

592: {\tt occupancy} is not available as a program variable. Similarly, we cannot

593: personalize cars with respect to the {\it Edmund's Car Guide} recommendations,

594: because the latter information resource has not been modeled.

595: The separation of modeling

596: from the operational aspect of conducting personalization means that we can

597: devote our attention to modeling the interaction in as sophisticated a manner

598: as required. It also means that we have to distinguish between evaluating an

599: implementation of the PIPE

600: methodology from an evaluation of the methodology itself.

601:

602: \begin{figure}

603: \centering

604: \begin{tabular}{cc}

605: \includegraphics[width=3in]{sketchy}

606: %& \mbox{\psfig{figure=yellowstupid.eps,width=5.5in}}

607: \end{tabular}

608: \caption{Sketch of a PIPE interface to a traditional browser. The interface retains the

609: existing browsing functionality at all times. At any point in the

610: interaction, in addition, the user has

611: the option of supplying personalization parameters and conducting personalization

612: (bottom two windows). Such an interface can be implemented as a toolbar option in existing

613: systems.}

614: \label{toolbar}

615: \end{figure}

616:

617: %To our surprise, we have experienced remarkable resistance in propagating the view that

618: %PIPE is a modeling methodology, as different from a `system.' A frequent comment from visitors

619: %to our example applications at {\tt pipe.cs.vt.edu} is:

620: %\begin{descit}{}

621: %``PIPE is unimpressive since only subgraphs of the original site are being presented

622: %back to the user. No restructuring of the document content is provided.''

623: %\end{descit}

624: %\noindent

625: %This is a limitation of the {\it particular application} of PIPE (which does not model document

626: %content) than the methodology. Only

627: %the site graph (as in Fig.~\ref{pe2}) was modeled and hence only personalizations pertaining

628: %to manipulating the site graph are possible.

629: %In Section~\ref{options} we outline a variety of ways to conduct more sophisticated modeling,

630: %including modeling document content. In Section~\ref{factor}, we address

631: %limitations of the PIPE modeling methodology itself.

632:

633: % it is interaction that is personalized. not content-based collaborative, etc.

634: % it is upto us to interpret what the represntation means

635: % sometimes easy: if

636: % sammy: sometimes difficult: goto

637: % marcos: what is removed is the "link," We can go further.

638:

639: % say dfs

640: % what is the representation: a compaction of interaction sequences

641: % define interaction sequence at this point

642: % once we know how to model interaction, we get P for free.

643: \item {\it The PIPE personalization operator is closed.}

644: Since the partial evaluation of a program results in another program, the PIPE

645: personalization operator is closed. In terms of interaction, this means that

646: any modes of information-seeking (such as browsing, in Fig.~\ref{pe2})

647: originally modeled in the program are preserved. In the above example, personalizing a browsable

648: hierarchy returns another browsable hierarchy.  The closure property also means that the

649: original information-seeking activity (browsing) and personalization can be interleaved in

650: any order. Executing the program in the

651: order and form in which it was modeled amounts to the system-initiated mode of `browse as I say.'

652: `Jumping ahead' to nested program segments by partially evaluating the program amounts to

653: the user-directed mode of personalization.

654: In Fig.~\ref{pe2}, the simplified program can be browsed in the traditional sense,

655: or partially evaluated further with additional user inputs.

656: PIPE's use of partial evaluation is thus central to realizing a mixed-initiative mode of

657: information-seeking, without explicitly hardwiring all possible scenarios

658: of interaction (including out-of-turn interactions). A sketch of an interface design for

659: such mixed-initiative interaction is provided in Fig.~\ref{toolbar}.

660:

661: \item {\it PIPE is most advantageous in information spaces that afford nested representations

662: of interactions and where information-seeking activities can involve out-of-turn interactions.}

663: For browsing hierarchies, a nested programmatic model can be trivially built by a depth-first

664: crawl of the site (as in Fig.~\ref{pe2}). Not only is this modeling appropriate, it is

665: also concise and makes the advantages of partial evaluation obvious.

666:

667: \begin{figure}

668: \centering

669: \begin{tabular}{|l|} \hline

670: {\tt if (Returning Customer)} \\

671: \,\,\,\,{\tt /* be nice to her */} \\

672: {\tt else} \\

673: \,\,\,\,{\tt /* just show usual catalog */} \\

674: \hline

675: \end{tabular}

676: \caption{A modeling of an information space that involves only one level of interaction.}

677: \label{pe4}

678: \end{figure}

679:

680: %Even though the hierarchy is designed in a color-year-model order, PIPE

681: %can accept values for program variables in any order.

682: %Further, the size of the program in Fig.~\ref{pe2}

683: %(left) is a small fraction of what it would be if we were to enumerate scenarios in an exhaustive

684: %fashion. Recall that enumerating scenarios to support all forms of barge in leads to

685: %cumbersome designs such as Fig.~\ref{auto-solutions} (bottom left).

686: %In addition to mirroring the browsing hierarchy, our nested representation makes the advantages

687: %of partial evaluation obvious.

688: %

689: On the other hand,

690: consider a web site

691: that determines (perhaps by a cookie~\cite{cookies}) if a user is a

692: returning customer and does something different based on this information.

693: Modeling (only this) interaction can be done by the program in Fig.~\ref{pe4}.

694: While partial evaluation is still applicable, it cannot do anything fancy since

695: there is only one variable ({\tt Returning Customer}) to specify values for. There is

696: no deeply nested variable whose value can be supplied out of turn.

697: %In addition

698: %A nested representation is possible only if the space of information-seeking

699: %activities is structured for us to take advantage of commonality across different

700: %scenarios of interaction.

701:

702: Similarly, if all users would like to browse through the catalog in Fig.~\ref{pe2} by a

703: color-year-model motif, then there is really only one way in which the catalog is being used.

704: This usage mirrors the way in which the catalog is modeled, without any out-of-turn interactions.

705: Partial evaluation is thus not necessary to support the information-seeking goals of any user.

706:

707: The presence of out-of-turn interactions implies different rates of specification for

708: different aspects of information seeking, causing a rich variety of possible interactions.

709: In such a case, PIPE can be viewed as a technique that realizes a particular interaction sequence

710: by combinations of simplification and normal execution.

711: In Section~\ref{factor}, we show more formally which representations (and

712: which information spaces) are best suited for personalization by partial evaluation.

713:

714: \end{enumerate}

715: \section{Essential Aspects of PIPE}

716: \label{basics}

717: We now describe the PIPE methodology in more detail and outline choices available for modeling

718: typical situations.

719: While partial evaluation permits formal specification with mathematical notation~\cite{jones-book}, we do not

720: take this approach here. Instead, for the {\it ACM TOIS} audience,

721: we aim to emphasize the larger context in which partial evaluation is used  in PIPE

722: and describe its advantages for information systems. We intend to present the formal aspects

723: of the PIPE methodology in a second paper.

724:

725: \subsection{Modeling Methodology}

726: \label{methods}

727:

728: As a modeling methodology, PIPE only makes the weak assumption that information is

729: organized along a motif of interaction sequences. For our purposes, an interaction sequence is

730: a list of primitive inputs used to describe the information-seeking

731: activity.  For instance in Fig.~\ref{pe2}, information about vehicles is organized along

732: a color-year-model motif with the primitive inputs corresponding to specific choices

733: of color, year, or model. The interaction sequence in this example involves the

734: the choice of {\tt 2001} for {\tt year}, in support of the user's goals.

735:

736: Information is embodied in an interaction sequence in two forms --- {\it structural}

737: and {\it terminal}. Structural information is what helps us refer to an interaction sequence;

738: it is explicitly represented in PIPE and specified via program variables.

739: In Fig.~\ref{pe2}, the structural information corresponds to choices of color, year, and model.

740: This form of information thus captures the partial information supplied by the user

741: by instantiating parts of the motif. When the user specifies `2001' in Fig.~\ref{pe2}, the

742: {\tt year} part of the motif is turned on and set to this value.

743:

744: Terminal information is also represented in PIPE, but is not directly manipulatable or even

745: directly addressable. Programs in PIPE are not explicitly parameterized by

746: this information and so the user cannot specify personalization in these terms.

747: In Fig.~\ref{pe2}, terminal information corresponds to the leaves, which would be information

748: about particular vehicles. In a different application, terminal information could reside at

749: every step in the interaction sequence.

750:

751: Structural information provides the `backbone' that strings together terminal information.

752: However, it is important to note that

753: structural information is considered first-class information in PIPE and not merely `features' with

754: which we index the `real information' (although it is tempting to view it this way).

755: To see why, observe that partial evaluation does not provide a mapping

756: from structural to terminal information (unless it was a complete evaluation specifying all

757: program variables).

758: After a partial evaluation (e.g., Fig.~\ref{pe2} (right)) the specialized program might still

759: contain structural information. This does not necessarily mean

760: that the user's information-seeking activity is incomplete. The residual structural

761: information contributes to the programmatic modeling of interaction, {\it which is} the

762: personalized information space in PIPE. Another way to see this is to note

763: that PIPE simplifies {\it interaction} with an information space. Thus interaction

764: can be seen to be the determiner of information (both structural and terminal). The view

765: of structural information as first-class information is also natural if we think of the program

766: in logic programming terms, rather than imperative programming.

767:

768: %The issue of whether structural information is first-class is actually very fundamental to

769: %information system design. In Section~\ref{factor} we show the widespread ramifications of this idea.

770:

771: Since information can be organized all along the interaction sequence, in both structural and terminal

772: forms, we need a way to define the state of information described by the sequence as a whole.

773: It is useful to assume a `combining function' for defining the state of

774: information at the end of the sequence. A simple example of a combining function is the additive

775: operator which mirrors the accumulation of information by following an interaction sequence.

776: In Fig.~\ref{pe2}, if the color and model parts of the motif are turned on, then the state

777: of information known about that sequence is a set of values for \{{\tt color,model}\}.

778: Another example is to just retain information from the most recent step(s) in the sequence. This

779: would be appropriate when information-seeking has an exploratory nature to it and we wish to

780: discount some earlier steps in an interaction sequence as being `tentative' (the applications

781: presented in this paper do not have this flavor). Combining functions for terminal information

782: can be defined similarly.

783:

784: Since PIPE only emphasizes the design and implementation of personalization systems, it doesn't

785: pay any attention to how the interaction sequences are obtained and how the choice between

786: terminal and structural parts is made. In particular, PIPE is not a

787: complete lifecycle model for personalization system design and doesn't address issues such

788: as requirements gathering. Interaction sequences could come from explaining users'

789: behavior~\cite{pipe-tochi,footprints}, by identifying all possible paths through a given

790: site, or from our conceptual understanding of the information-seeking activity. They also depend

791: on the targeting goals of the personalization system. In~\cite{pipe-tochi}, we have

792: presented a systematic methodology for obtaining interaction sequences and identifying

793: structural and terminal parts, by `operationalizing' scenarios of interaction; we refer the

794: reader to this reference for details. In this paper, we assume that they are available

795: and proceed to further characterize and represent them.

796:

797: \subsubsection*{Characterizing Interaction Sequences}

798: Information seekers forage in different ways~\cite{pirolli-chapter} and the existing

799: design of the information system also influences their interaction sequences.

800: An important aspect of an interaction sequence is its length, which affects

801: its subsequent representation in PIPE.

802:

803: In many applications, interaction sequences are bounded. For instance,

804: in Fig.~\ref{pe2} an interaction sequence of length at most 3 describes

805: the information-seeking activity. Such sites and applications are characterized by their support

806: for a goal-oriented, opportunistic view of information-seeking. Hierarchies, recommender systems,

807: and scrolling to a specific location on a page are examples.

808: In general, any information-seeking

809: activity that has clear start and end states and which relies on perceptual, display-driven

810:  clues that focus attention can be represented as a bounded sequence.

811:

812: In other important cases, interaction sequences can be unbounded. The trivial example is when

813: we allow the possibility that a user

814: may click `back buttons.' If we undo these steps before representation, we can proceed as if

815: they never happened. Alternatively, we can model back buttons using a finite-state machine (FSM),

816: but we have to find a characterization of applications where modeling at this level

817: of detail would be useful. A more interesting example of unbounded sequences involves browsing at a

818: site based on social network navigation, such as

819: {\tt www.\hskip0ex imdb.\hskip0ex com}. There are no leaves in this site and the site graph

820: resembles a social network.

821: Users are encouraged to systematically explore relationships between actors, movies, and directors

822: by `jumping connections.' Such

823: a site is characterized by an exploratory nature of information-seeking, akin to data mining. Goals are

824: articulated less clearly and cognitive knowledge is used from various resources to decide on how to

825: conduct information-seeking. In fact, there is no distinction between structural and terminal

826: information in this site! Any particular web page could be used to address other items or thought

827: of as the result of an information-seeking activity.

828:

829: Both bounded and unbounded interaction sequences can be described using constructs such

830: as regular expressions, grammars, FSMs, and programs; unbounded interaction sequences require

831: special handling, due to the reasons mentioned above. In this paper, we concentrate on personalization

832: applications describable by bounded interaction sequences and which have a clear separation between

833: structural and terminal parts.

834:

835: \subsubsection*{Representing Interaction Sequences in PIPE}

836: Given that we can represent information-seeking activities as interaction sequences,

837: the set of scenarios that are likely to be encountered (over all users, perhaps)

838: can be represented by a corresponding set of interaction sequences. Representing this latter set

839: faithfully and compactly as a program is key to the application of PIPE.  Once again, PIPE doesn't

840: indicate what this set should be: whether it is across all users~\cite{footprints}, whether it is for a

841: group of users~\cite{adaptive-sites}, or whether it comes from our conceptual understanding

842: of information-seeking.

843:

844: For instance, Fig.~\ref{pe2} uses a nested representation to form the program for subsequent

845: partial evaluation. Not only does it model the color-year-model motif (as it would have

846: been observed), it also allows us to model the year-color-model motif (by one

847: partial evaluation). Since PIPE provides out-of-turn personalization, it is not necessary to

848: represent every interaction sequence explicitly in the program.

849:

850: Compaction of interaction sequences is important for two reasons. The first is that it

851:  preserves the inherent structure of the (unpersonalized) information-seeking activity (such as browsing,

852: in Fig.~\ref{pe2}). This is useful in realizing mixed-initiative interaction with PIPE. Another reason is that

853: compaction permits scalable personalization solutions.

854:

855: Structural parts of interaction sequences can be represented using constructs in

856: a full-fledged programming language, such as C (as done in Fig.~\ref{pe2}) or LISP.

857: A programming language provides many facilities that can help in compaction of interaction sequences. For example,

858: if we notice that all interaction sequences at a site require registration at some point in the interaction,

859: then the steps associated with registration could be factored out and procedurally invoked from various

860: other locations.

861: Off-the-shelf

862: partial evaluators (such as C-Mix) can then be used for specializing the representations.

863:

864: It is important that we also model terminal parts of interaction sequences. In the example

865: of Fig.~\ref{pe2}, if there is text anchoring every hyperlink, then we can define

866: a program variable to start accumulating text once every conditional is evaluated to be true. This

867: could be achieved using associate arrays or by dynamic memory allocation constructs (e.g., pointers).

868: After partial evaluation, we can inspect the contents of this data structure at every stage

869: to present personalized (terminal) content. Inspecting the contents of the sequence as a whole will

870: provide an overall summary of the terminal information. Inspecting the contents of subsequences

871: will provide

872: more fine-grain summaries of terminal information.

873:

874: \begin{figure}

875: \centering

876: \begin{tabular}{|ll|}

877: \hline

878: \includegraphics[width=2.5cm,height=0.45cm]{display3.epsi} &

879: \begin{tabular}{l}

880: {\tt if (Drink=Coffee)}\\

881: \,\,\,\,\,\,\,\,\,{\tt /* */}

882: \end{tabular}\\

883:  & \\

884: \includegraphics[width=1.75cm,height=1.25cm]{display2.epsi} &

885: \begin{tabular}{l}

886: {\tt switch (topping) \{}\\

887: \,\,\,\,\,{\tt case onions: /* */}\\

888: \,\,\,\,\,{\tt case mushrooms: /* */}\\

889: \,\,\,\,\,{\tt case olives: /* */}\\

890: {\tt \}}

891: \end{tabular} \\

892:  & \\

893: \includegraphics[width=3.1cm,height=1cm]{display4.epsi} &

894: \begin{tabular}{l}

895: {\tt f2c(x) \{}\\

896: \,\,\,\,\,{\tt return (5*(x-32)/9);}\\

897: {\tt \}}

898: \end{tabular} \\ \hline

899: \end{tabular}

900:

901: \caption{Choices for representing aspects of interaction in PIPE.}

902: \label{cute-diagram}

903: \end{figure}

904:

905: \subsubsection*{Creating a Personalization System}

906: To effect the creation of a personalization system, we define ways for the user to

907: specify values for program variables and a procedure by which personalized information

908: content is presented back to the user. Every construct used in the programmatic modeling (terminal or

909: structural) should be translatable into information systems terms, and vice versa.

910:

911: Typically, there is a one-one mapping between interactions and programming

912: constructs. In Fig.~\ref{cute-diagram},

913: the textbox corresponds to a conditional, the listbox to a {\tt switch} construct, and

914: the unit convertor to a function in a PIPE modeling.

915:

916: Such mappings have to be revisited after partial evaluation.

917: For instance, the {\tt if} construct in Fig.~\ref{cute-diagram}

918: will either be removed or left as-is by a partial evaluation. This will just correspond

919: to removing or retaining the textbox in the personalized web site.  The {\tt switch} construct

920: in Fig.~\ref{cute-diagram} corresponding to a listbox is more interesting. After partial evaluation,

921: it might be the case that only one of the three topping options are left. Perhaps the person is allergic

922: to mushrooms and olives and we set those variables to zero. In this case, the

923: partial evaluator might remove the {\tt switch} altogether and replace it with a simple

924: {\tt if}. We can view this as a hint to render the listbox as a hyperlink in the personalized site.

925: Finally, the unit conversion utility in Fig.~\ref{cute-diagram} can be modeled in several ways.

926: We can view it as a functional black-box and model in PIPE the act of getting a value and

927: passing it to, say, a server-side script that performs the conversion. If we take this approach, we

928: should ensure that partial evaluation either retains the black-box representation or removes it;

929: it shouldn't `open' it up. Alternatively, we can explicitly

930: open up this black-box and model its contents as a function in a PIPE modeling (as done in Fig.~\ref{cute-diagram}).

931: As a functional modeling, PIPE thus enables the view of information systems as transducers.

932:

933: In some cases partial evaluators, by their sophisticated support for program specialization,

934: cause difficulties. For instance,

935: the technique of {\it program-point specialization}~\cite{jones} introduces copies

936: of functions at various places in the specialized program, tailored to specific situations. In

937: information systems terms, this amounts to creating content (structural as

938: well as terminal) that didn't exist before. In such a

939: case, we need to carefully interpret the meaning of the specialized representation.

940:

941: Another caveat is that partial evaluation can sometimes induce {\tt goto}s in the specialized

942: program. We can view {\tt goto}s as

943: suggesting means by which the site design could be structured. If there is a {\tt goto}

944: from a point $A$ in the program to another point $B$, it just means that the information system

945: corresponding to point $B$ can be arrived at in many ways via interaction sequences and hence is

946: advantageous if factored out.

947:

948: Finally, a semantics of values for program variables has to be defined.

949: In partial evaluation, values may be either specified or left unspecified.

950: By default, variable values cannot be weighted unless explicitly

951: modeled in the PIPE program. However, techniques

952: such as query expansion can be employed to obtain values for other program variables. For instance, if

953: a user says `Honda' and a PIPE program models Honda cars under `Japanese automakers,'

954: then we can turn both these variables on

955: for the purposes of personalization. Semantics for program variables can also

956: be defined to take advantage of other taxonomical relationships in hierarchies~\cite{taxo-assign}.

957:

958: \subsubsection*{A Salient Feature of PIPE}

959: An important advantage of PIPE is that while we provide options for modeling, there is

960: is no explicit step for describing how to implement personalization. Due

961: to the sophistication of our representation, personalization will be

962: achieved if program variables (which correspond

963: to structural information) are available for partial evaluation.

964: This is in contrast to other modeling methodologies~\cite{statecharts,autoweb,rus} where personalization has to be provided

965: as an explicit function from the conceptual design stage.

966:

967: % 1. Modeling site structure: dfs, site graph, from navigational schema (see program compaction below)

968: % 2. information integration: composing subsystems, e.g., multiple sites.

969: %To do this effectively, we need wrappers (for handling syn. poly. problems)

970: % 3. crawling clickable maps:

971: % 4. interacting with recommender systems: rec. sys. could be a f() within the model. Or, the aspect of recommendation could

972: %be "external" to PIPE, in that collaborative features correspond to program variables. "x" means.

973: % 5. browsing computed information: hooks,

974: % 6. modeling within a web page: virtual nodes, DTDs, naveen ashish, XTRACT,

975: % 7. program compaction: put 4 pictures

976:

977: %senators: 1,6

978: %gams: 1,2,4,6,7

979: %pigments: 1,2,4,5,6

980:

981: % generators of hierarchies (the one you found with Sammy)

982: \subsection{Representational Choices}

983: \label{options}

984:

985: Our primary example of modeling thus far addressed

986: navigation down a hierarchy via nested conditionals (see Fig.~\ref{pe2}).

987: This is one of the most common sources of bounded sequences; it can be obtained either

988: by explicit crawling or as graph representations of site structure from website

989: management tools~\cite{webSiteManagement,webstrudel}. In the former,

990: extra care should be used to address purely navigational links (like a `Go Back'

991: button) and irregularities in web page authoring. Representations obtained from the latter

992: case are more robust since they directly enable the modeling of interaction sequences in terms of

993: directed labeled graphs~\cite{dataontheweb} or web schema~\cite{autoweb}.

994:

995: In this section, we present a number of other modeling options for personalization

996: applications described by bounded interaction sequences.

997:

998: \subsubsection*{Interacting with Recommender Systems}

999: A recommender system can be viewed in PIPE as a way to set values for program variables or as a

1000: function to be modeled. In the first case, the recommender is abstracted as a black-box and is external

1001: to the program. Consider a recommender system at a third-party site that suggests

1002: automobile dealers based on

1003: experiences of its users. In such a case, we can invoke the facility to obtain values for program variables

1004: which are then subsequently used for personalization. Alternatively, the functioning of the recommender

1005: can be explicitly modeled in PIPE. This allows the possibility that even its operation could

1006: be personalized. For instance, if the recommender system can suggest dealers all across the United States,

1007: we can personalize its operation to only recommend dealers in a particular geographical region. This will

1008: not be possible in the black-box modeling unless the recommender allows such explicit specification.

1009:

1010: \subsubsection*{Information Integration}

1011: Effective personalization scenarios require the integration of information from multiple sites.

1012: Consider personalizing stock quotes for potential investors. The Yahoo!

1013: Finance Cross-Index at {\tt quote.yahoo.com} provides a ticker symbol lookup for stock charts,

1014: financial statistics, and links to company profiles. It is easy to model and personalize

1015: this site by the methods described above.

1016: However, what if the user desires to browse this site based on recommendations from an

1017: online brokerage? Besides support for cascading information flows, care should be taken

1018: to ensure that structural information across multiple sites is correctly

1019: cross-referenced. The online brokerage might refer to its

1020: recommendations by company name (e.g., `Microsoft'), while the Yahoo! cross-index

1021: uses the ticker symbol (`MSFT'). Standard solutions based on wrappers~\cite{kush-ai} and mediators can

1022: be employed here~\cite{db-www,Ariadne}.

1023: In PIPE, the individual interaction sequences from multiple sites can be cascaded in

1024: sequence to provide support for such integration scenarios, as shown in Fig.~\ref{ii}.

1025:

1026: \begin{figure}

1027: \centering

1028: \begin{tabular}{|l|} \hline

1029: {\tt main() \{} \\

1030: \,\,\,\,{\tt /* invoke online brokerage */} \\

1031: \,\,\,\,{\tt /* transforms from company name to ticker symbol */} \\

1032: \,\,\,\,{\tt /* modeling of yahoo cross-index */} \\

1033: {\tt \}} \\

1034: \hline

1035: \end{tabular}

1036: \caption{Modeling information integration in PIPE.}

1037: \label{ii}

1038: \end{figure}

1039:

1040: \subsubsection*{Modeling Clickable Maps}

1041: Many web sites provide clickable image maps (e.g., JAVA/GIF) as

1042: interfaces to information. This is especially true

1043: for weather sites, bioinformatics resources, and sites that involve modeling spatial information. Interpretation

1044: is attached to clicking on particular locations of the map (for instance, `click on the state for which

1045: you would like the weather').

1046: Using data mining techniques~\cite{fukuda} and

1047: by sampling clicks on the map (and determining which pages they lead to), we can functionally model

1048: a clickable map in PIPE to arrive

1049: at constructs such as: `Choosing Wyoming on the United States map corresponds to clicking

1050: within $[a,b] \times [c,d]$.' Non-rectangular areas are described by unions of isothetic regions by

1051: the data-mining technique described in~\cite{fukuda}.

1052: Given such a representation, partial evaluation can remove portions of

1053: the image map based on user preferences. At this stage, we can reconstruct a personalized clickable

1054: map by reversing the mapping or use attributes such as color and shade to highlight the selected

1055: regions (for instance, to show only those regions on the map where air travel is delayed). We can also represent

1056: the personalized information in non-graphical terms. This option is useful not just for personalization

1057: but for improving the accessibility of information systems. A mobile handheld device incapable

1058: of presenting graphical content can take advantage of such modeling.

1059:

1060:

1061: %\begin{figure}

1062: %\centering

1063: %\begin{tabular}{cc}

1064: %\includegraphics[width=4in,angle=-90]{SQL}

1065: %\end{tabular}

1066: %\caption{Browsing computed information using PIPE. $\cup$ denotes the union operation which serves as

1067: %the combining function for an interaction sequence. After personalizing with `female voters for

1068: %Gore,' the user has drilled down to states California (CA) and New York (NY) to inspect the values

1069: %more closely.}

1070: %\label{browse-compute}

1071: %\end{figure}

1072: %

1073: %\subsubsection*{Browsing Computed Information}

1074: %Much of the content on the web is dynamically generated via interfaces to databases~\cite{lawrence}.

1075: %In such a case, terminal information in interaction sequences need not be web pages, but rather

1076: %SQL queries. In Fig.~\ref{browse-compute}, the designer has modeled an artificial hierarchy as an interface to a

1077: %traditional database system. The hierarchy is based on the attributes of the relations in the database

1078: %schema (state, candidate, and sex). Represented in PIPE, such a model resembles Fig.~\ref{pe2}, except that the leaves

1079: %denote hooks to database queries instead of web pages. The internal nodes model

1080: %information obtained by set-theoretic operations conducted on the results of the database

1081: %queries (which are computed, but terminal information nevertheless). In other words, the

1082: %%internal nodes are the combining functions (see Section~\ref{methods})

1083: %for summarizing terminal information for an entire interaction sequence.

1084: %Fig.~\ref{browse-compute} describes personalization for the criteria

1085: %`female voters for Gore' where the combining function is the set-theoretic union

1086: %operation. After partial evaluation, the specialized program has one level of hierarchy

1087: %left for browsing (by state). In addition to

1088: %obtaining the aggregate number of female votes for Gore, we are able to drill down

1089: %this number (by browsing) to obtain votes per state. This is similar to a GROUP BY operation, but doesn't

1090: %require the user to know a query language such as SQL. PIPE hence provides a novel way to

1091: %combine querying of databases and the Web, one that is complementary to other

1092: %projects~\cite{wsq-dsq}. It is also ideal for interactive information

1093: %exploration applications~\cite{jmh-control}.

1094: %

1095: \subsubsection*{Modeling within a Page}

1096: In some cases, it is necessary to model interaction sequences within a web page.

1097: For instance, if a user is eyeballing a web page to look for telephone numbers of

1098: an individual, then modeling the web page at this level of granularity and providing a program variable

1099: for telephone number would be useful. Algorithms for mining structure within a

1100: web page (e.g., DTDs) ~\cite{naveen,craven-aij,xtract}

1101: and for document segmentation~\cite{rus} can be used to arrive at compact

1102: representations of within-page interaction sequences. This

1103: provides a richer set of features with which to conduct personalization. For instance,

1104: partial evaluation can be used to remove complete sections of documents (e.g., intrusive

1105: advertisement banners) when rendering the personalization.

1106:

1107: \begin{figure}

1108: \centering

1109: \begin{tabular}{cc}

1110: \includegraphics[width=2.3in]{ex1}

1111: \hspace{0.2in}

1112: \includegraphics[width=2.3in]{ex2}

1113: \end{tabular}

1114: \begin{tabular}{cc}

1115: \includegraphics[width=2.3in]{ex3}

1116: \hspace{0.2in}

1117: \includegraphics[width=2.3in]{ex4}

1118: \end{tabular}

1119: \caption{Four stages in extracting structure from a semistructured data source, by

1120: the algorithm of~\cite{nestorov}. (top left) Original semistructured resource with

1121: labeled and directed edges modeling interaction sequences. (top right)

1122: Factorization of commonalities encountered in crawling. (bottom left)

1123: A `minimal perfect typing' of the data. (bottom right)

1124: Final output of data mining algorithm, after modeling `multiple roles' ~\cite{nestorov}.}

1125: \label{nestorov}

1126: \end{figure}

1127:

1128: \subsubsection*{Program Compaction}

1129: The naive rendition of a PIPE model by the above mechanisms might result in

1130: lengthy programs, with duplications of interaction sequences. Techniques for program

1131: compaction are hence important. This topic has been studied extensively in the data mining

1132: and semistructured modeling communities~\cite{dataontheweb,nestorov,tkde-structure}.

1133: Of particular relevance to PIPE is the algorithm

1134: of Nestorov et al.~\cite{nestorov} whose modeling of semistructure closely resembles

1135: our representation of an interaction sequence in terms of program variables. This algorithm

1136: works by identifying graph constructs that could be factored, simplified, or approximated.

1137: Fig.~\ref{nestorov} describes four stages in a procedure for program compaction. The starting

1138: point is the schema in Fig.~\ref{nestorov} (top left) obtained by a naive crawl of a site.

1139: Fig.~\ref{nestorov} (top right) factors commonalities encountered in crawling.

1140: There are only three leaf nodes and the internal nodes {\tt P3} and {\tt P4} are collapsed because

1141: they are really the same page.

1142: Fig.~\ref{nestorov} (bottom left) is a `minimal perfect typing~\cite{nestorov}' of the data, which means that

1143: the fewest internal nodes needed to describe the schema are used. In this example, {\tt P1} and

1144: {\tt P2} are collapsed, not because they are the same but because they exhibit the same schema.

1145: Both have an incoming edge labeled {\tt e} from the same type of page ({\tt S2}) and display an

1146: outgoing edge labeled {\tt i} to the same type of page ({\tt M1}). While their contents

1147: may not be the same, interaction sequences involving them can be compacted.

1148: Care must be taken to ensure that any accompanying text with these nodes are not lost.

1149: And finally, Fig.~\ref{nestorov} (bottom right) casts {\tt P6} as redundant for the purpose

1150: of modeling

1151: interaction sequences. The role of {\tt P6} in Fig.~\ref{nestorov} (bottom right) is to establish

1152: connections from {\tt S5} to {\tt M2} and {\tt M3}, which are already embodied in

1153: {\tt P5} and {\tt P7} respectively. Thus, we can remove {\tt P6}, once again after ensuring that

1154: any contents of that node are suitably represented elsewhere. In~\cite{nestorov}, {\tt P6} is referred

1155: to as a node that exhibits `multiple roles.'

1156: %and that the termination of the partial evaluator is not compromised by such approximations~\cite{jones-book}.

1157:

1158: \subsubsection*{Miscellaneous Optimizations}

1159: Finally, the success of a personalization system relies on those finer touches that deliver

1160: a compelling experience to the user. Options in this category are ad-hoc by nature and are not technically

1161: modeling choices since they involve post-processing of the specialized

1162: program. For instance, assume

1163: that we personalize the automobile example in Fig.~\ref{pe2} with respect to the variables {\tt Honda} and {\tt 2001}.

1164: This might produce a construct such as:

1165: \begin{verbatim}

1166: if (Green) {

1167:         /* two empty code blocks */

1168:

1169:         /* the first is empty because Honda and 2001 evaluated to true,

1170:            but there were no green Honda cars made in 2001 */

1171:

1172:         /* the second is empty because other models and other years were set

1173:            to be evaluated to false */

1174: }

1175: \end{verbatim}

1176: %This is due to the fact that in the interaction sequence {\tt make} and {\tt year} are modeled beneath

1177: %{\tt color}.

1178: While semantically correct, such code blocks are useless for information presentation.

1179: They can be perceived as dead-ends and safely omitted during web page reconstruction. It would also be

1180: confusing to the user who clicks on `Green' and receives nothing (or an empty page) in return!

1181:

1182: A second form of

1183: optimization arises when partial evaluation results in a nested conditional with no {\tt else}

1184: clauses:

1185: \begin{verbatim}

1186: if (Blue) {

1187:    if (2001) {

1188:      if (Honda) {

1189:          /* something here */

1190:      }

1191:    }

1192: }

1193: /* nothing here */

1194: \end{verbatim}

1195: In such a case, we need to pay attention to how the simplified program is presented back to the user.

1196: Forcing the user to continue clicking on items when there is only one choice at every level is undesirable.

1197: Rather, we could just reveal to the user that according to his personalization criteria,

1198: the only type of cars remaining are `Blue Honda 2001' and directly link to the items of information.

1199: This example reinforces our idea that structural information is first-class information.

1200: We are working on a customized partial evaluator that can perform such optimizations.

1201:

1202: % 3. crawling clickable maps:

1203: % 4. interacting with recommender systems: rec. sys. could be a f() within the model. Or, the aspect of recommendation could

1204: %be "external" to PIPE, in that collaborative features correspond to program variables. "x" means.

1205: % 5. browsing computed information: hooks,

1206: % 6. modeling within a web page: virtual nodes, DTDs, naveen ashish, XTRACT,

1207: % 7. program compaction: put 4 pictures

1208:

1209: % generators of hierarchies (the one you found with Sammy)

1210:

1211: \section{Application Case Studies}

1212: \label{studies}

1213:

1214: We now describe two applications that use PIPE to personalize collections

1215: of web sites. They are presented in increasing order of complexity, as evidenced

1216: by the forms of modeling they conduct (Table~\ref{whatismodeled}). In each

1217: of these applications, we state the conceptual model of interaction sequences and

1218: the specific choices made in modeling. Evaluation methodologies

1219: are outlined after the descriptions. Since PIPE only specializes representations,

1220: we are able to personalize even third-party sites by forming suitable representations.

1221: More personalization systems designed with PIPE are described in~\cite{naren-ic,pipe-tochi}; we present only

1222: two here for space considerations.

1223:

1224: \begin{table}

1225: \centering

1226: \begin{tabular}{|lcl|} \hline

1227: Congressional Officials &   & Modeling Site Structure \\

1228:                         &   & Modeling within a Page\\

1229: 			& & \\

1230: % Pigment Composition and Analysis & &  Modeling Site Structure\\

1231: % 			&   & Information Integration \\

1232: %                         &   & Browsing Computed Information\\

1233: %			& & \\

1234: Mathematical and Scientific Software & &  Modeling Site Structure\\

1235: 			&   & Interacting with Recommender Systems \\

1236:  			&   & Information Integration \\

1237:                          &   & Modeling within a Page\\

1238:                          &   & Program Compaction\\

1239: \hline

1240: \end{tabular}

1241: \caption{Modeling options used in the application case studies.}

1242: \label{whatismodeled}

1243: \end{table}

1244:

1245: \subsection{Congressional Officials}

1246: \label{politics}

1247: Our first application customizes access to the

1248: Project Vote Smart website (\url{http://www.vote-smart.org}), an independent resource

1249: for information about United States governmental officials.

1250: The site caters to people interested in

1251: politicians' backgrounds, committee memberships, and positions

1252: on major political issues.

1253: While Project Vote Smart reports on state and local governments as

1254: well as the federal government, we focused only on the

1255: congressional subsection of the site in our experiments.

1256:

1257: The conceptual model of information-seeking involves browsing through the congressional

1258: subsection to retrieve individual web pages of politicians. Interaction sequences at this

1259: site consist of choices of state (e.g., California, Virginia, etc.), branch of

1260: congress (House or Senate), party (Democrat, Republican, or Independent), and

1261: district information (numbers of districts). The terminal information involved 540

1262: home pages (for 100 Senate members and 440 House members) and resides at the

1263: ends of interaction sequences.

1264:

1265: Fig.~\ref{politicians1} describes a typical interaction sequence.

1266: At the root congressional page (Fig.~\ref{politicians1} (top)),

1267: users are directed to select a state of interest.

1268: Selection of state transfers the

1269: user to that particular state's web page

1270: (Fig.~\ref{politicians1} (bottom left)).

1271: A state web page is semistructured, listing

1272: both senators and representatives as well as their party, district

1273: affiliations, and other associated information.  Finally, a user

1274: arrives at a politician's webpage (Fig.~\ref{politicians1} (bottom right))

1275: by making a selection at the state page.  Thus, the congressional

1276: section of Project Vote Smart is three levels deep (with a two-step interaction

1277: sequence).

1278:

1279: \begin{figure}

1280: \centering

1281: \begin{tabular}{c}

1282: \includegraphics[width=12.4cm,height=12.6cm]{politiciansRoot}

1283: \end{tabular}

1284: \begin{tabular}{cc}

1285: \includegraphics[width=6.2cm,height=6.84cm]{VA}

1286: \includegraphics[width=6.2cm,height=6.84cm]{Allen}

1287: \end{tabular}

1288: \caption{A typical interaction sequence at the Project Vote Smart web site. (top) Start page

1289: for congressional officials. Making a selection of state at this level reaches a state-level

1290: page (bottom left). Finally, individual politicians' web pages are accessed by making selections

1291: at the state-level page (bottom right).}

1292: \label{politicians1}

1293: \end{figure}

1294:

1295: Since many of the choices made

1296: by the user in browsing through Project Vote Smart are independent

1297: of each other (e.g., selecting Virginia as state does not imply a particular political party),

1298: the site is highly amenable to personalization by partial evaluation.

1299: Currently the site hardwires interaction sequences in the order shown in Fig.~\ref{politicians1}.

1300: We modeled the two-step interaction sequence (as shown in Fig.~\ref{politicians1}) as

1301: actually a four-step interaction sequence by conducting a more detailed modeling of the

1302: state-level page. In particular, the semistructure on state-level pages was abstracted

1303: to yield independently addressable information about branch of congress, party, and district.

1304:

1305: The site graph is not a balanced tree. For instance, every state has exactly two senators

1306: but the number of representatives varies from 1 in South Dakota to 52 in California (this

1307: is dependent on state population).  Our modeling of data at state pages

1308: expanded the original 3-level tree shown in Fig.~\ref{politicians1}

1309: consisting of 596 nodes (1 root page + 55 state pages + the

1310: previously mentioned 540 leaves of the tree) to 5 levels comprising

1311: 857 nodes (317 internal nodes + 540 leaf nodes).  This amounts to

1312: a approximately 44\% percent explosion in the site schema.

1313:

1314: %3111 lines of C code and 129 program variables that are addressable.

1315: The programmatic representation of the new site schema was in C and it

1316: captured miscellaneous domain semantics about interaction at the site (e.g.,

1317: if the user says `District 21,' he is referring to a Representative, not

1318: a Senator). The partial evaluator C-Mix was used for this study.

1319:

1320: \subsection{Mathematical and Scientific Software}

1321: Our second application is a personalization system for recommending mathematical

1322: software on the web for scientists and engineers. Consider a scientist studying stress

1323: in a helical spring; he formulates the problem mathematically in terms of

1324: a partial differential equation (PDE) and proceeds to find software that can help

1325: in solving his PDE. He uses a collection of three web sites to conduct his information-seeking

1326: activity.

1327:

1328: First, he accesses the GAMS (Guide to Available Mathematical Software)

1329: cross-index of mathematical software ({\tt http://\hskip0ex gams.\hskip0ex nist.\hskip0ex gov}), a

1330: tree-structured taxonomy that covers nearly 10,000 algorithms (from

1331: over 100 software packages) for most areas of scientific software.

1332: GAMS functions in an interactive fashion, guiding the user

1333: from the top of a classification tree to specific modules as the

1334: user describes his problem in increasing detail. During this process,

1335: many important features of the software (e.g., `are you looking for a software to solve elliptic

1336: problems?') are determined, from the user.

1337: However at the ends of the interaction sequences at GAMS, there still exist several choices

1338: of algorithms for a specific problem.

1339: Now, the scientist consults a recommender system or a performance database server

1340: (for his category of scientific software) to pick an appropriate algorithm for his problem.

1341: An example is the PYTHIA recommender system for selecting solvers for PDEs~\cite{pythiaii}.

1342: At this point, the scientist supplies additional information to the recommender such as

1343: his performance constraints (on the time to solve his PDE).

1344: Systems like PYTHIA use previously archived performance data to arrive at recommendations such as

1345: `Use the second-order 9-point finite differences code from the ELLPACK module.'

1346: After such a recommendation, the scientist conducts the final step of downloading the

1347: recommended software module from repositories such as Netlib ({\tt http://\hskip0ex www.\hskip0ex netlib.\hskip0ex org})

1348: housed at the Oak Ridge National Laboratory (ORNL) or other packages at the National Institute

1349: of Standards and Technology (NIST). The conceptual model involved the information flow from the GAMS

1350: site, to a repository such as Netlib, through a recommender such as PYTHIA.

1351:

1352: The choices made in GAMS will affect the choice of recommender which in turn affect the choice

1353: of repository. This application thus presents an interesting information flow for modeling.

1354: Since PIPE permits partial instantiation of the information flow, the scientist can

1355: directly access a repository such as Netlib if he is sure of the specific software he needs.

1356:

1357: We modeled the entire GAMS web site, used the PYTHIA recommender (that addresses

1358: software for the domain of PDEs), and established connections with individual software modules at the

1359: various repositories. After an initial expansion of GAMS (e.g., by within-page modeling),

1360: we applied the program compaction algorithm described in Section~\ref{options}.

1361: Cross-references in GAMS and duplication of common module sets (which are now revealed

1362: by our initial expansion) helped compress the site schema to 60\% of its original size.

1363: In particular, the GAMS subtree relevant to describing PDEs provided for a 11\% compression.

1364: There was no terminal information alongside intermediate nodes, and hence there was no

1365: need for any special handling. PYTHIA's details are described in~\cite{pythiaii} and we conducted

1366: a white-box modeling in PIPE to better associate program variables from GAMS with variables in

1367: PYTHIA (one of the authors of this paper was also the co-designer of the PYTHIA recommender).

1368: Finally, the step to reach individual software modules was a simple one-step interaction sequence

1369: leading to terminal information about the code (in FORTRAN) and its documentation.

1370: The entire composite program was represented in the CLIPS programming language~\cite{clips} and

1371: we employed its rule-based interface for partial evaluation. More modeling details on this case study

1372: can be found in~\cite{pipe-gams}.

1373:

1374: \subsection{Evaluation}

1375: \label{eval}

1376: We now describe procedures for evaluation. There are three possible types of evaluation:

1377:

1378: \begin{enumerate}

1379: \item Evaluating PIPE applications

1380: \item Evaluating our modeling of information-seeking activities in PIPE

1381: \item Evaluating PIPE

1382: \end{enumerate}

1383:

1384: The first type of evaluation is what is usually described in the literature

1385: and there are many ways of conducting it.

1386: The accepted practice is to measure improvements in revenues,

1387: site visits, and user satisfaction (e.g., via surveys). In~\cite{naren-ic},

1388: we have described the evaluation of PIPE applications using traditional user

1389: interviews followed by statistical validation (they have yielded good results). Commercial

1390: ventures such as {\it NetPerceptions} emphasize the scalability and speed-of-response of

1391: personalization systems. The second and third types of evaluation criteria highlight the role

1392: of PIPE as a modeling methodology.

1393: We concentrate on them since we have already described traditional user-response

1394: evaluation of PIPE applications in~\cite{naren-ic}.

1395: This section covers the evaluation

1396: of modeling and Section~\ref{factor} helps identify shortcomings of the PIPE methodology

1397: itself.

1398:

1399: We evaluate a PIPE modeling by the extent to which

1400: it allows users' information-seeking activities to be described as partial inputs.

1401: This is in keeping with the view that PIPE's services are only as good as the

1402: modeling conducted in it. If a faulty recommender system is modeled in PIPE, then no

1403: amount of partial evaluation can provide satisfactory results.

1404:

1405: Recall that our modeling was conducted with respect to a set of interaction sequences.

1406: For evaluation purposes, we identified an independent `external examiner'

1407: model, which was also a set of interaction sequences.

1408: We then evaluated our PIPE modeling

1409: by the fraction of interaction sequences in the external examiner model that can

1410: be realized by an appropriate partial evaluation operation. We discounted optimizations such

1411: as described in Section~\ref{options} when determining the `unrealizable'

1412: interaction sequences.

1413:

1414: In the first study, the

1415: examiner model was obtained from users. They were provided

1416: knowledge of the functional specification of our original conceptual

1417: modeling, not its details. For instance, they

1418: were told about the nature of structural and terminal information (and

1419: any functional dependencies among them), but not the exact interaction sequences that

1420: constitute the conceptual model. Formal methodologies for this activity are described

1421: in~\cite{gannon-verify}.

1422:

1423: We identified 25 user subjects who were predominantly graduate students from Virginia

1424: Tech (but not necessarily computer science majors). The ages of the subjects ranged from

1425: 19 to 49, with the average age being 26. A majority of the subjects rated their

1426: computer and web familiarity as above average. All subjects acquainted themselves with the

1427: Project Vote Smart site by browsing for about ten minutes. Each subject was then asked to

1428: describe 1-2 personalization scenarios. Notice that these are different from `queries,'

1429: as they specified constraints on interaction e.g., `I would like to browse by state,

1430: and then I will make a choice of party, and then I would click any remaining hyperlinks

1431: to browse the site.'

1432:

1433: In total, 32 interaction sequences were identified, of which 25 were

1434: realizable in our modeling. One of the unmodelable

1435: scenarios was `I would like to see all politicians who represent Los Angeles,' a request that

1436: was not faithful to our conceptual model. We do not discuss this further. The other six

1437: unmodelable scenarios are not shortcomings of our modeling, but rather shortcomings of the

1438: PIPE personalization methodology itself. They involved restructuring operations on interaction

1439: sequences that are not describable as partial evaluations. Section~\ref{factor} analyzes

1440: these in detail.

1441:

1442: \begin {figure}

1443: \centering

1444: \begin {tabular}{|lp{3.5in}|}\hline

1445: {\sc Problem \#28} & ${\left(w\,u_x\right)}_x + {\left(w\,u_y\right)}_y = 1,$\\

1446:  & where $w = \left\{ \begin {array}{l}

1447:                                \alpha,\,\,\, {\rm if}\,\,\, 0 \leq x,y \leq 1\\

1448:                                1,\,\,\, {\rm otherwise.}

1449:                               \end {array}

1450:                       \right.$\\

1451: {\sc Domain} & $\left[-1,1\right] \times \left[-1,1\right]$\\

1452: {\sc BC} & $u = 0$\\

1453: {\sc True} & unknown\\

1454: {\sc Operator} & Self--adjoint, discontinuous coefficients\\

1455: {\sc Right Side} & Constant\\

1456: {\sc Boundary Conditions} & Dirichlet, homogeneous\\

1457: {\sc Error Constraint} & 1.0E-05 \\

1458: {\sc Time Constraint} & 60s

1459: \\\hline

1460: \end {tabular}

1461: \caption {A problem from the examiner model for the second case study.}

1462: \label{pde-scenario}

1463: \end {figure}

1464:

1465: For the second study, the examiner model was derived from a benchmark set of problems that are used

1466: in mathematical software evaluation (the set is described in~\cite{pythiaii}). Each of these problems

1467: describes scenarios in terms of features of the PDE problem (e.g., is it Laplace?, is it Helmholtz?)

1468: any constraints on its solution (e.g., relative error should be $< 10^{-9}$), and any restrictions

1469: on software modules (e.g., `I would like to use the package NAG' or `ELLPACK modules are prefered.').

1470: Fig.~\ref{pde-scenario} describes an example scenario that places constraints on the type of software

1471: to be used (for instance, it should be applicable to `Dirichlet' problems) and the basis for

1472: recommendation (namely, that it should satisfy the time and error constraints specified). This scenario does

1473: not give any preferences for software modules or packages. Such mathematical descriptions are translated

1474: into parameters for personalization (a process is described in~\cite{pythiaii}).

1475: The examiner model comprised of 35 such interaction sequencs, of which all are modelable. More details on this

1476: case study can be obtained from~\cite{pipe-gams}.

1477: \section{Discussion}

1478: \label{discuss}

1479: \subsection{Related Research}

1480:

1481: As a systematic methodology for personalization, PIPE is a unique research project.

1482: Most research on personalization emphasizes the nature of information being

1483: modeled~\cite{specissue,phoaks} (content-based~\cite{fab} versus

1484: collaborative~\cite{adomavicius01expert-driven,

1485: ira,grouplens,Siteseer}), the level at

1486: which the

1487: personalized information is targeted (is it by user~\cite{manber}, by topic~\cite{adaptive-sites}

1488: or for everybody~\cite{holland-hill,footprints}), or the specific algorithms that are involved in

1489: making recommendations.

1490: %Evaluation metrics have also been influenced by these viewpoints.

1491:

1492: In contrast, PIPE models interaction with an information system as the basis for personalization.

1493: Most of recommender systems research can be viewed as modeling options for PIPE. The systems that

1494: make distinctions among targeting constitute making different assumptions on the possible set of

1495: interaction sequences. They can hence be tied to requirements analysis, as described

1496: in~\cite{pipe-tochi}. Systems that conduct web usage mining~\cite{cacm-jaideep,cacm-mulvenna} also

1497: address the earlier parts (and sometimes, later parts~\cite{cacm-myra})

1498: of the personalization system design lifecycle, and can be viewed as methodologies

1499: to suggest and refine interaction sequences.

1500:

1501: Other connections to information systems research can be made by observing that PIPE

1502: contributes both a way to model information-seeking activities as well as a closed transformation

1503: operator for personalization i.e.,

1504: partial evaluation. RABBIT~\cite{rabbit} is an early interactive

1505: information retrieval methodology that resembles PIPE in this respect. It proposes

1506: the model of `retrieval by reformulation' to address the mismatch between

1507: how an information space is organized and how a particular user forages in it. Several closed transformation

1508: operators are provided in RABBIT to enable the user to specify and realize information-seeking goals.

1509: Like RABBIT, PIPE

1510: assumes that `the user knows more about the generic structure of the [information space] than [PIPE]

1511: does, although [PIPE] knows more about the particulars ([terminal information])~\cite{rabbit}.' For

1512: instance, personalization by partial evaluation is only as effective as the ease with

1513: which program variables could be set (on or off) based on information supplied by the user.

1514: Unlike RABBIT, PIPE

1515: emphasizes the modeling of an information

1516: space as well as an information-seeking activity in a unified programmatic representation.

1517: Its single transformation operator is expressive enough to simplify a variety of

1518: interaction sequences.

1519:

1520: %In addition, PIPE achieves mixed-initiative interaction without providing it as a specific

1521: %function or requiring any explicit effort on the part of the user.

1522:

1523: %The Scatter-Gather~\cite{scatter-gather} and Dynamic Taxonomies~\cite{tkde-navigation} projects

1524: %resemble PIPE in that they contribute closed transformation operators for retrieval and navigation,

1525: %respectively. However, they do not emphasize representation as much as PIPE (or RABBIT)

1526: %and adopt traditional modeling methodologies. Scatter-Gather introduces two operations

1527: %(Scatter and Gather) that are used for clustering and declustering documents.

1528: %The Dynamic Taxonomies project assumes a conceptual model of `selective thinning of an infobase,'

1529: %and provides several set-oriented operations for navigation.

1530:

1531: The closed nature of transformation operators is central to interactive modes of information seeking,

1532: as shown in projects such as Scatter-Gather~\cite{scatter-gather} and Dynamic Taxonomies~\cite{tkde-navigation}.

1533: PIPE is novel in that it contributes a transformation operator for {\it representations of interactions} in

1534: information spaces, and does not transform documents or web pages directly.

1535:

1536: The `larger' approach to personalization taken in this paper is reminiscent of the integration

1537: of task models in software design~\cite{intTaskObj}. Typically such integration has utilized

1538: object oriented methodologies and symbolic modeling approaches e.g., UML. This idea has been used for designing

1539: personalization systems as well~\cite{li-catalog,schwabe2,human1,schwabe1}. However, in all of these projects,

1540: personalization is introduced a function from the conceptual design stage. PIPE's

1541: support for personalization, on the other hand, is built into the programmatic model of the information

1542: space and doesn't require any special handling.

1543:

1544: \subsection{When PIPE does not Work: Reasoning about Representations}

1545: \label{factor}

1546:

1547: We now address limitations and some fundamental implications of the PIPE

1548: methodology. We will explain why the six unmodelable interaction sequences in Section~\ref{politics}

1549: are shortcomings of the PIPE methodology itself.

1550: Let us first recall why examples such as Fig.~\ref{pe2} and the other application

1551: study in Section~\ref{studies} work so well: Information-seeking activities in

1552: these scenarios were describable as partial inputs in the modeling. Since the modeling

1553: was parameterized in terms of program variables, another way to explain the

1554: success of these applications is to say that `the representation of the

1555: information space is factored in terms of structural information.'

1556:

1557: This suggests that it will be useful to understand how information spaces are factored,

1558: in general. If the representation of the information space is not factored at all, it means that no program

1559: variables are available to be turned on or off and hence the space is not

1560: personalizable by PIPE. What is counterintuitive is that `too much factoring' could

1561: also render PIPE inapplicable or useless.

1562:

1563: Consider our automobile example from Fig.~\ref{pe2} in Section~\ref{example}.

1564: It is reproduced in Fig.~\ref{newfactor} (right) with the addition of

1565: some line numbers (to denote particular points in the program).

1566: We can think of this as a factorization in terms of variables such as

1567: {\tt Blue} and {\tt Honda}, which in turn allow us to describe user requests.

1568: The left part of Fig.~\ref{newfactor} describes an alternative

1569: factorization of the same information space. In this case, the program variables and their

1570: connections are stored in a `structure table' and an explicit generator is used to construct

1571: the information space in Fig.~\ref{newfactor} (right). For instance, the structure table associates

1572: the {\tt Blue} program variable as the condition that gets us from line 1 to line 2 in the

1573: modeling. We can think of the structure table as modeling the site graph and the generator

1574: as a depth-first search (DFS) algorithm that walks the site graph to construct the information space.

1575:

1576: \begin{figure}

1577: \centering

1578: \begin{tabular}{lll}

1579: \begin{tabular}{|l|l|l|} \hline

1580: \multicolumn{3}{|c|}{\bf Structure Table} \\ \hline

1581: From & To & Program \\

1582: Line & Line & Variable \\ \hline

1583: 1 & 2 & Blue \\

1584: 1 & 3 & Red \\

1585: 2 & 4 & 2001 \\

1586: 2 & 5 & 2000 \\

1587: $\cdots$ & $\cdots$ & $\cdots$ \\

1588: \hline

1589: \end{tabular} & \large{$\times$ Site Generator (e.g., DFS)} = &

1590: \begin{tabular}{|l|} \hline

1591: {\tt L1:}\\

1592: {\tt if (Blue)} \\

1593: {\tt L2:}\\

1594: \,\,\,\,{\tt if (2001)} \\

1595: {\tt L4:}\\

1596: \,\,\,\,\,\,\,\,{\tt if (Honda)} \\

1597: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

1598: \,\,\,\,\,\,\,\,{\tt else if (Toyota)} \\

1599: \,\,\,\,\,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

1600: \,\,\,\,{\tt else if (2000)} \\

1601: {\tt L5:}\\

1602: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

1603: {\tt else if (Red)} \\

1604: {\tt L3:}\\

1605: \,\,\,\,{\tt if (2001)} \\

1606: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

1607: \,\,\,\,{\tt else if (2000)} \\

1608: \,\,\,\,\,\,\,\,{$\cdots \cdots \cdots$} \\

1609: \hline

1610: \end{tabular}

1611: \\

1612: \end{tabular}

1613: \caption{An example of a over-factored information space for personalization by partial evaluation. (left)

1614: Modeling the generation of an information space. (right) Modeling the interaction in an information

1615: space.}

1616: \label{newfactor}

1617: \end{figure}

1618:

1619: Rather than think of the left part of Fig.~\ref{newfactor} as the {\it generator of an information space} and contrast

1620: it with the right side (which describes it directly), let us temporarily think of both the left and

1621: right sides of Fig.~\ref{newfactor} as {\it alternative representations} of the same

1622: information space. The word `representation' does not

1623: imply the mechanical aspect of constructing the information space (left of Fig.~\ref{newfactor})

1624: or the interaction with the information space (right of Fig.~\ref{newfactor}). Since partial evaluation

1625: merely specializes programs, it doesn't pay any attention to whether the program is meant to represent

1626: interaction or generation. By losing this distinction (temporarily), we will be able to reason

1627: about representations in general.

1628:

1629: In Fig.~\ref{pe2}, we personalized the representation w.r.t. `2001'; the result

1630: was shown in Fig.~\ref{pe2} (right).

1631: Let us reconsider how we will address this request with the new design

1632: shown in Fig.~\ref{newfactor} (left).

1633: We cannot specify this input to the DFS algorithm since it is not

1634: parameterized in terms of specific variables like {\tt 2001.} The DFS is meant to work for all types of

1635: trees and graphs, not just an automobile browsing hierarchy. We also cannot specify {\tt 2001} in terms of

1636: the structure table since we have to manually readjust the line numbers to conform to the request.

1637: The only way we can obtain the same result as in

1638: Fig.~\ref{pe2} is to change the structure table in Fig.~\ref{newfactor} completely to reflect the

1639: tree shown in Fig.~\ref{pipe-illustrate} (right). But by then, we have done most of the work needed

1640: for personalization! In fact, the personalization request is no longer describable as partial evaluation,

1641: but as a {\it complete evaluation} (specifying all arguments).

1642: We say that such a design is {\it over-factored}, for the given

1643: information-seeking activity.

1644:

1645: Attempting to use an over-factored representation (for the type of information-seeking

1646: activities in Fig.~\ref{pe2}) appears fruitless.

1647: The reason is that over-factorization divorces two crucial elements

1648: out, which really have to interplay for partial evaluation to be beneficial. Fig.~\ref{newfactor} (left) is like

1649: two sides of the PIPE coin separated: the structure table contains the structural information (with which

1650: we connect user requests) and the DFS contains the logic flow (which is simplified by partial

1651: evaluation for the user). Neither is useful in PIPE without the other and yet they cannot be represented

1652: distinctly. {\it This is why over-factorization is not desirable.}

1653:

1654: It is important to note that an information system design is not just over-factored, it is over-factored

1655: for a particular information-seeking activity. For instance, we can give an example of an information-seeking

1656: activity for which the design in Fig.~\ref{newfactor} (left) is factored `just right.' Consider the following

1657: user who walks into the automobile dealership:

1658:

1659: \begin{descit}

1660: \noindent

1661: {\bf Buyer:} I am here to buy a car. Ask me the questions for year, model, and color, in that order.

1662: \end{descit}

1663: \

1664: In this case, the user does not want a personalized information space for browsing. Rather, he is seeking

1665: to personalize the {\it generation} of an information space. Our original modeling in Fig.~\ref{newfactor} (right) cannot

1666: handle this situation. It can let the user give values out of turn, but it can't change the default order in which the

1667: questions are asked. We say that the design in Fig.~\ref{newfactor} (right) is {\it under-factored} (for this activity).

1668: However, the design in

1669: Fig.~\ref{newfactor} (left) can accommodate it, if the site generator

1670: can take arguments such as what the first level of the hierarchy should be, what the

1671: second level should be, and so on. Presumably such a generator would walk the tree described by the structure table

1672: and restructure it based on the arguments. In this case, we can still use partial evaluation for requests such as:

1673:

1674: \begin{descit}

1675: \noindent

1676: {\bf Buyer:} I am here to buy a car. I don't care in what order you ask the questions, but the second

1677: question should be about year.

1678: \end{descit}

1679: \noindent

1680: (It is a different issue if such scenarios are likely. For now, we are only exploring the PIPE concept theoretically.)

1681: After this information space is generated, we still have the option of  re-representing the generated information space

1682: in our usual manner and conducting personalization by partial evaluation.

1683: We can thus state the following three definitions:

1684: \begin{descit}{}

1685: \noindent

1686: A representation $\mathcal{I}$ of an information space is well-factored for an information-seeking

1687: activity $\mathcal{G}$ if all interaction sequences in $\mathcal{G}$ can be realized by

1688: partial evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is personable

1689: for $\mathcal{G}$.

1690:

1691: A representation $\mathcal{I}$ of an information space is over-factored for an information-seeking

1692: activity $\mathcal{G}$ if all interaction sequences in $\mathcal{G}$ can be realized by

1693: complete evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is not personable

1694: for $\mathcal{G}$.

1695:

1696: A representation $\mathcal{I}$ of an information space is under-factored for an information-seeking

1697: activity $\mathcal{G}$ if no interaction sequences in $\mathcal{G}$ can be realized by partial (or complete)

1698: evaluations of $\mathcal{I}$. In this case, we also say that $\mathcal{I}$ is not

1699: personable for $\mathcal{G}$.

1700: \end{descit}

1701:

1702: %\begin{figure}

1703: %\centering

1704: %\begin{tabular}{cc}

1705: %\includegraphics[width=3in]{venndiagram}

1706: %\end{tabular}

1707: %\caption{Overlap between interactions sequences that are describable by complete evaluation and those

1708: %that are describable by partial evaluation.}

1709: %\label{venn}

1710: %\end{figure}

1711:

1712: Thus, a given representation could be well-factored for one information-seeking activity but over-factored

1713: for another. Fig.~\ref{newfactor} (left) is well-factored for generation but over-factored for interaction.

1714: Fig.~\ref{newfactor} (right) is well-factored for interaction but over-factored for

1715: users who employ the color-year-model motif diligently (and completely).

1716:

1717: The 6 unmodelable scenarios in Section~\ref{politics} involved requests such as `I would like to have the choice

1718: of party as the first level of the hierarchy, the choice of state as the second level.' Our design was obviously

1719: under-factored for such interaction sequences. We can define the {\it personability} of a representation as the

1720: fraction of interaction sequences in a (external examiner) model that are describable as partial evaluations. For the

1721: external examiner model described in Section~\ref{eval}, the personability of the PIPE modeling

1722: (presented in Section~\ref{politics}) is thus 25/32.

1723:

1724: Notice that all of these statements assume that the model for transforming representations is {\it partial

1725: evaluation.} There are other program-transformation techniques which might be able to address the

1726: unmodelable requests above, but PIPE only provides partial evaluation as the operator for personalization.

1727: Our statements should only be interpreted in the context of personalization by partial evaluation.

1728:

1729: In practice, the decision of choosing a factoring will depend on which situations are more

1730: likely and also the composition of the space of interaction sequences $\mathcal{G}$. It is acceptable to have

1731: some interaction sequences that involve complete evaluation, as long as they are

1732: a small fraction of the total number of interaction sequences.

1733:

1734: Thus far, we have fixed the representation and analyzed the information-seeking activities for which it

1735: was over-factored, the ones for which it was under-factored, and so on. This is the designer's viewpoint.

1736: For a given site design, it allows the designer to pose questions such as `What are the information-seeking activities

1737: for which my site is personable?'

1738:

1739: An alternate viewpoint is user-driven. Given an information-seeking activity, the user asks `What sites are

1740: most personable for my activity?' This allows the user to take different site designs (along with representations),

1741: analyze them w.r.t. a conceptual model of information seeking, and rank them in order of personability. For instance,

1742: consider again the external examiner model described in Section~\ref{eval} for the politicians case study. One information

1743: system design was described in Section~\ref{politics}. The personability of this design is, as stated earlier, 25/32.

1744: Seven interaction sequences were not modelable.

1745: Another information system design is the representation in Fig.~\ref{newfactor} (left). The personability of this

1746: design is 6/32. While it accommodates six of the seven sequences, it is no longer personable for the original

1747: 25 sequences! This is because those 25 sequences are now describable as complete evaluations, which also violate

1748: the partial evaluation model! Thus, both over-factorization and under-factorization lead to unpersonable

1749: information spaces. We hypothesize that the most interesting representations are in between.

1750:

1751: An open research issue is if we have to cross the barrier

1752: from interaction to generation to arrive at over-factored representations.

1753: \section{Concluding Remarks}

1754: \label{conclu}

1755: This paper makes several major contributions. We have presented a novel modeling

1756: methodology for information personalization. PIPE enables the view of personalization

1757: as specializing representations. It models interactions with information systems

1758: and uses partial evaluation to simplify the interactions. PIPE also contributes a novel

1759: evaluation criterion for information system designs. It relates personalization to

1760: the way an information system design is factored. This has implications for how web

1761: applications are developed and deployed~\cite{autoweb}. Many web sites today are based on

1762: the generator model; the results in this paper indicate that they might not be

1763: directly personable for interaction scenarios (under partial evaluation).

1764:

1765: Our modeling makes very weak assumptions on the nature of interactions with

1766: information systems. While we have covered only web sites (and collections of web sites)

1767: in this paper, any information system technology that affords the notion of interaction

1768: sequence or the idea of factorization

1769: can be studied on similar lines. This especially applies to designs for voice-activated

1770: systems (e.g., VoiceXML), directory access protocols (e.g., LDAP), information systems

1771: that provide a dialog model of interaction, and models for organizing digital libraries (e.g.,

1772: 5S).

1773:

1774: We plan to extend the PIPE methodology in several directions. We would like to extend the

1775: modeling methodology to address earlier aspects of the personalization system design life cycle, such

1776: as requirements gathering, verification, and validation. First steps toward this goal are

1777: described in a companion paper~\cite{pipe-tochi}. Another important direction of

1778: future work involves modeling {\it context} in personalization systems. The programmatic modeling

1779: provided in PIPE suggests that context can be usefully viewed as partial information. We believe

1780: that more sophisticated forms of modeling partial information will be needed for describing context, besides

1781: values for program variables.

1782: We are also interested in relaxing our

1783: assumptions of bounded sequences that have separable structural and terminal parts. This will

1784: allow us to address other information-seeking activities such as social network navigation.

1785: In addition, we are investigating program transformation techniques

1786: that can help reason about terminal information

1787: (e.g., program slicing~\cite{slicing}),

1788: in addition to structural information.

1789:

1790: Our long-term goal is to develop a theory of reasoning about representations of information

1791: spaces. This will allow us to formally study the design and implementation of information systems

1792: in terms of the representations they employ.

1793:

1794: \section*{Acknowledgements}

1795: Many of the ideas in this paper were developed during the

1796: Spring 2001 offering of the CS 6604 course (on recommender systems and personalization) at

1797: Virginia Tech. We acknowledge helpful discussions with

1798: Jack Carroll, Marcos Gon\c{c}alves, Dennis Kafura, Priya Lakshminarayanan, Dick Nance,

1799: Manuel Perez, and Mary Beth Rosson.

1800: Rob Capra helped establish connections between PIPE and mixed-initiative interaction and provided

1801: ideas for evaluating the modeling of personalization systems.

1802: Ed Fox suggested the usage of `structural' and `terminal' information to qualify interaction

1803: sequences. Comments from several anonymous referees helped improve the presentation of the article.

1804:

1805: \bibliographystyle{plain}

1806: %\bibliographystyle{named}

1807: \bibliography{paper}

1808:

1809: \end{document}

1810:

1811: