1: \documentclass[12pt]{article}
2:
3: % %%%%%%% COMPILATION INSTRUCTIONS: %%%%%%%%%
4: % pdflatex yourname_date_s03pp.tex
5: % if you cite anything: compile as follows:
6: % pdflatex yourname_date_s03pp.tex
7: % bibtex yourname_date_s03pp
8: % pdflatex yourname_date_s03pp.tex
9: % pdflatex yourname_date_s03pp.tex
10: % this document compiles on cirrus.cs.odu.edu
11: % and should compile well on any odu.cs UNIX account
12: % including your Linux systems at home
13: % if your compilation msgs include statements
14: % like: ``Label(s) may have changed.
15: % Rerun to get cross-references right.''
16: % , recompile until they are gone
17: % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
18:
19:
20: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21: %%% LEAVE THE FOLLOWING COMMANDS IN
22: %%% DO NOT CHANGE OR DELETE
23: %%% UNTIL \begin{document} LINE
24: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25:
26: \usepackage{times}
27:
28: \usepackage[dvips]{graphicx}
29: \usepackage[usenames,dvipsnames]{color}
30:
31: \usepackage{amssymb}
32:
33: %\renewcommand{\baselinestretch}{0.8}
34: \setlength{\parindent}{0cm}
35: \usepackage[T1]{fontenc}
36: \usepackage{calc}
37: \usepackage{tabularx}
38: \usepackage{longtable}
39: \usepackage{setspace}
40: \usepackage{color}
41: \usepackage{bar}
42: \usepackage{url}
43:
44: % DOUBLE LINE SPACING
45: %\renewcommand{\baselinestretch}{2}
46:
47: \setlength{\oddsidemargin}{2.5cm-2.5cm}
48: \setlength{\textwidth}{16.3cm}
49: \setlength{\textheight}{23cm}
50: \setlength{\topmargin}{-1cm}
51: \setlength{\topskip}{0cm}
52:
53: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
54: %%% ACTUAL DOCUMENT STARTS HERE
55: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
56:
57: \begin{document}
58:
59: \title{Dynamic Linking of Smart Digital Objects Based on User Navigation Patterns}
60: \author { Aravind Elango, Johan Bollen and Michael L. Nelson\\
61: Computer Science Department\\
62: Old Dominion University\\
63: Norfolk Virginia USA - 23529\\
64: \{aelango,jbollen,mln\}@cs.odu.edu
65: }
66:
67: \maketitle
68:
69: \abstract{
70: We discuss a methodology to dynamically generate links among digital objects by means
71: of an unsupervised learning mechanism which analyzes user link traversal patterns.
72: We performed an experiment with a test bed of 150 complex data objects, referred to as buckets.
73: Each bucket manages its own content, provides methods to interact with users and individually
74: maintains a set of links to other buckets. We demonstrate that buckets were capable of dynamically adjusting
75: their links to other buckets according to user link selections, thereby generating a meaningful
76: network of bucket relations. Our results indicate such adaptive networks of linked buckets approximate the collective
77: link preferences of a community of users.}
78:
79:
80: \section{Introduction}
81: Current research in the area of recommender systems has focused on analyzing static representations
82: of user preferences, e.g. list of purchased items, to generate personalized recommendations \cite{up:reco}.
83: However, user preferences are not static and shift as users assume different roles and interests. Furthermore,
84: digital library applications often do not concern purchasable items but complex information objects,
85: and user interests must be inferred from less explicit statements of interest. One such mechanism of
86: inferring user interests is to analyze the links previously traversed by the user. We use buckets, smart
87: digital objects, which individually manage a dynamic list of links to other buckets, to generate run-time, adaptive recommendations.
88:
89: \subsection{Smart Objects: Buckets}
90:
91: Buckets are smart objects for the aggregation of data \cite{bucket:nelson2001}.
92: Buckets contain mechanisms to aggregate, manage, protect and preserve the data they contain.
93: A bucket could be thought of as an intelligent, active folder which, among
94: other functionalities, also contains interface methods to display its contents.\\
95:
96: Buckets are not simply passive, folder-like repositories: they have an internal structure.
97: A bucket may contain 0 or more elements, each of which can contain elements in their own rights.
98: An element may be a resource such as a PDF file, a data set or simply a set of other elements.
99: An element may be a ``pointer'' to any arbitrary network object, e.g. another bucket,
100: in the form of a URL. By having an element ``point'' to other buckets,
101: buckets can logically contain other buckets.\\
102:
103: Buckets have no predefined size limitations, either in terms
104: of storage capacity, or in terms of number of elements. Authors can model whatever application
105: domain they desire using the basic structure of elements. Bucket methods can be activated by user
106: HTTP requests.\\
107:
108: As an example of how methods in an bucket are invoked, consider the bucket identified by the URL:\\\\
109: \url{http://www.cs.odu.edu/~mln/naca-tn-2509/}\\
110:
111: When no bucket method is specified, the ``display'' method is assumed. Therefore the mentioned URL is
112: equivalent to: \\\\
113: \url{http://www.cs.odu.edu/~mln/naca-tn-2509/?method=display}\\
114:
115: The above mentioned URLs will induce the bucket to return an overview of the elements it contains.
116: These elements themselves can again be URLs containing requests for bucket methods.
117: A specific bucket method allows a bucket to redirect a request for its content to another object,
118: which could be another bucket. For example, the request:\\
119:
120: \url{http://www.cs.odu.edu/~mln/naca-tn-2509/?method=display&re\ direct=http://naca.larc.nasa.gov/reports/1951/naca-tn-2509/}\\
121:
122: would request the odu.edu bucket to redirect to the nasa.gov bucket.
123: All requests for external resources are first routed through the bucket that contains the link.
124: The full bucket API is discussed in \cite{bucket:nelson2000}.\\
125:
126: The motivation for buckets came from previous experience in the design, implementation and maintenance
127: of NASA scientific and technical information Digital Libraries (DLs), including the NASA Technical Report Server (NTRS)
128: \cite{ntrs:nelson1995}. Buckets are well suited for distributed applications because they can aggregate
129: heterogeneous content and remain functional in low-fidelity environments. Since they are self-contained,
130: independent and mobile, they should be resilient to changing server environments. In addition, buckets
131: can be adapted to a variety of data types and data formats.
132:
133: \subsection{Adaptive user interfaces}
134:
135: Hebb's law of learning \cite{organi:hebb1949}, an essential component of many unsupervised methods in machine learning,
136: is the basis of our efforts to generate meaningful and dynamic sets of inter-bucket links.
137: We have used a descriptive methodology \cite{role:smith1993} to bucket linking,
138: where the user interface changes based on the past
139: actions of users and not on predictions of users' future actions. An advantage of such adaptive user interfaces
140: is that they can dynamically take into account user information needs by continuously updating their structure
141: and presentation.
142:
143: \subsection{Hebb's Law of Learning}
144:
145: Hebb's law postulates that the connection between two neurons in the human brain becomes stronger
146: when the neurons are persistently activated in quick succession to one another. As such the brain continuously
147: adapts the connections between neurons based on previous experiences. Although Hebb's law represents a coarse
148: and incomplete picture of neural plasticity, it has found countless applications in machine learning. Hebb's law
149: is specifically applicable to situations in which no set of correct or erroneous responses can be defined in advance,
150: and the system needs to gradually acquire information which is only implicitly present in a given data set.\\
151:
152: For this reason, Hebbian learning has been successfully used in adaptive hypertext networks \cite{system:bollen1998}
153: which learn to reroute hyperlinks according to usage patterns. In analogy to such systems, we use a variation of Hebbian
154: learning for dynamic inter-bucket linking.
155:
156:
157: \section{Implementing Hebb's laws in buckets}
158: Fig. \ref{hebbian} gives an overview of how Hebbian learning can be interpreted for inter-
159: bucket linking. Let us imagine the user traversing 3 buckets namely b1, b2 and b3. b1 is linked to b2 and
160: b2 is linked to b3. It is assumed that there are no other links among these 3 buckets initially. When a
161: user traverses from b1 to b2, the link ($b1 \rightarrow b2$) is strengthened by a frequency reinforcement. When b2 is linked to
162: from b1, we conjure that b2 is related to b1 and strengthen the link ($b2 \rightarrow b1$). If the link ($b2 \rightarrow b1$)
163: is absent, it is created. When the user traverses from b2 to b3, the weight of the link ($b2 \rightarrow b3$) is
164: increased by a frequency reinforcement and the weight of the link ($b3 \rightarrow b2$) is incremented by a symmetry reinforcement.
165: Since the user finally reached b3 from b1 with b2 as an intermediary, we assume that b1 has some degree of
166: relation to b3 and hence we increase the strength of the link ($b1 \rightarrow b3$) by a transitivity reinforcement. If
167: the link ($b1 \rightarrow b3$) is absent, it is created.\\
168:
169: \begin{figure}
170: \begin{center}
171: \includegraphics[scale=0.7]{hebbian2}
172: \end{center}
173: \caption{\label{hebbian} Implementing Hebbian learning in buckets}
174: \end{figure}
175:
176: The approach taken to implement the above procedures was
177: suggested in \cite{adapti:bollen2002}. When a bucket b1 is expected to link to bucket b2, b1 is called with a redirect argument and this argument
178: gives the URL of the bucket it is linking to. We also pass a referer argument which essentially
179: overrides the HTTP referer argument. The values passed by the referer argument would be instrumental in
180: implementing Hebb's laws as explained below.\\
181:
182: When the link from b1 to b2 is traversed, b1 is called with the URL: \\
183: \url{http://b1?method=display&referer=b1&redirect=http://b2?method\ =display%26referer=http://b1} \\\\
184: b1 knows itself as a referer by
185: seeing the referer argument and also concludes by seeing the redirect argument that it is redirecting
186: to b2. Thus the link to b2 in b1 is incremented by a given frequency reinforcement. b2 sees that the referer
187: is b1 and increments the weight of its link ($b2 \rightarrow b1$) by a given symmetry weight. When the user
188: next traverses to b3 from b2, the following link is dynamically generated: \\
189:
190: \url{http://b2?method=display&referer=b2&redirect=http://b1?method\ =display%26redirect=http://b3?method=display%26referer=http://b2.} \\\\
191: b2 sees itself as the referer and finds that b3 is the
192: final destination based on the last redirect argument. b2 increases the weight of its link to b3 by a given frequency
193: reinforcement. After incrementing the link weight, b2 redirects to b1. b1 sees that there
194: is no referer argument and so, increases the link weight of ($b1 \rightarrow b3$) by the transitivity reinforcement.
195: Finally when b3 is called, it finds the referer argument to be b2 and increments the weight of
196: the link ($b3 \rightarrow b2$) by the symmetry reinforcement.\\
197:
198: Reinforcement values are based on our experiences with previous systems: the frequency, symmetry and transitivity reinforcements are respectively
199: defined as 1.0, 0.5 and 0.3. The frequency weight is the highest since the user directly traverses this link and we have positive confirmation
200: that this link was deemed relevant.
201:
202: \section{Experimental Test Bed}
203:
204: One hundred and fifty buckets were used for this experiment. Each bucket represented a popular
205: music artist, containing a short biography of the band and a dynamic list of related links to other buckets in the network.
206: The list of 150 music bands was composed from the top 50 bands of all times as chosen by experts from Spin Magazine
207: \cite{url:spin} and two each of their similar bands as suggested in www.allmusic.com.
208: Each bucket was initially randomly linked to 3 other buckets with a weight of 0.5 to provide an initial
209: unbiased navigation structure. Fig. \ref{display} shows the display in one such bucket.
210: The bucket displays metadata related to the band and a set of links to other artists/bands. As users traverse the
211: system, new links are created and the weights of pre-existing links are increased based on the users link
212: selection. The set of links are sorted based on their weight so that a heavily weighted link is shown higher
213: up in the list of links than a less weighted link.\\
214:
215: \begin{figure}
216: \begin{center}
217: \includegraphics[scale=0.7]{xml2}
218: \end{center}
219: \caption{\label{xml} XML representation of an element and the associated weight}
220: \end{figure}
221:
222: Every bucket has an XML file, which contains all information about the elements
223: in the bucket and their metadata. The weight of each individual link is stored as an attribute of the
224: element (URL it is pointing to) in the XML file, as shown in Fig. \ref{xml}. New elements can be added to
225: the XML file using the addElement method.\\
226:
227: An invitation to traverse the network was sent to 15 people in June 2003. The total weight associated
228: with all the links excluding the initial random links was 1719 units at the end of the experiment.
229: Taking into account the reinforcements assigned for frequency, symmetry and transitivity we estimate the system to have had approximately
230: 1041 direct traversals.
231:
232: \begin{figure}
233: \begin{center}
234: \includegraphics[scale=0.5]{display2}
235: \end{center}
236: \caption{\label{display} An example of bucket display}
237: \end{figure}
238:
239: \begin{table}
240: \begin{center}
241: \begin{tabular}{|c||c|}
242:
243: \hline
244: %(insert horizontal line)
245: \multicolumn{2}{l}{\textit{Example Bucket: 'The Clash'}} \\ \hline
246: \textbf{Links before traversal} & \textbf{Links after traversal} \\ \hline & \\
247: \textcolor{red}{The Beatles} & Smashing Pumpkins \\
248: \textcolor{red}{Glyn Jones } & \textcolor{red}{Beck } \\
249: \textcolor{red}{Beck } & Fishbone \\
250: & Nick Lowe \\
251: & \textcolor{red}{The Beatles}\\
252: & The Smiths \\
253: & Replacements \\
254: & \textcolor{red}{Glyn Jones }\\
255: & N.W.A \\
256: & Squeeze \\ & \\
257: \hline
258: \end{tabular}
259: \caption[short title here]{\label{the_clash} An example of the dynamic links generated for `The Clash'.}
260: \end{center}
261: \end{table}
262:
263:
264: \begin{table}
265: \begin{center}
266: \begin{tabular}{|c||c|}
267:
268: \hline
269: %(insert horizontal line)
270: \multicolumn{2}{l}{\textit{Example Bucket: 'The Smiths'}} \\ \hline
271: \textbf{Links before traversal} & \textbf{Links after traversal} \\ \hline & \\
272: \textcolor{red}{Elvis Costello} & \textcolor{red}{Replacements }\\
273: \textcolor{red}{Tool} & \textcolor{red}{Elvis Costello } \\
274: \textcolor{red}{Replacements} & Pretty Things \\
275: & Fishbone \\
276: & Nick Lowe \\
277: & The Beatles\\
278: & Fishbone \\
279: & The Clash \\
280: & Johnny Thunders\\
281: & Kiss \\
282: & \textcolor{red}{Tool } \\ & \\
283: \hline
284: \end{tabular}
285: \caption[short title here]{\label{the_smiths} An example of the dynamic links generated for `The Smiths'.}
286: \end{center}
287: \end{table}
288:
289:
290: \section{Results}
291:
292: Our aim is to prove that when users start surfing the collection from a portal node or bucket, a meaningful network
293: develops in which the content of highly centric nodes is similar or related to the content of the portal bucket.
294:
295: \subsection{Link Structure}
296: The bucket representing `Public Enemy' was the entry point to the network. This setup is similar to a portal
297: through which users access web services. (e.g. www.yahoo.com).
298: The portal bucket and every heavily traversed bucket starts reflecting the users preference on what other
299: buckets should be linked to the current bucket. Table \ref{the_clash} shows the links associated
300: with `The Clash' bucket before and after traversal. As users navigate the bucket, new links are dynamically
301: created and the bands which users presume are more related to `The Clash' bubble up to the top.\\
302:
303: Another similar example is shown in Table \ref{the_smiths}. It is evident that users do not associate `Tool' (
304: an initial random addition) with `The Smiths' and hence it has dropped down in the list of links as compared
305: to `Replacements' and `Elvis Costello' which are also random initial links but have maintained their positions
306: at the top of the list, and do match the nature of The Smiths as a music band.
307:
308: \subsection{Bucket Authority}
309:
310: A highly influential node within a network can be expected to have a relatively high number of outgoing and incoming
311: connections to and from other nodes in the network, a characteristic refered to as degree centrality.
312: An investigation into the degree centrality of nodes in a network will reveal the network's most important nodes. Applied
313: to the generated network of buckets, degree centrality can therefore be used to partially validate network structure.
314: Since our buckets concern music bands, degree centrality may relate to the relative importance or influence of
315: music bands according to the community of users that generated the network.\\
316:
317: We define the degree centrality of a node as the number of links that originate from or terminate
318: in that particular node. The weighted degree centrality of a node is computed as the sum of all the weights of the links
319: that originate from or terminate in that particular node. \\
320:
321: Degree centrality $dc_i$ is defined as Eq.\ref{degree_centrality} where $l_{ij}$ is 1 if there exists a link
322: from bucket $i$ to $j$, zero otherwise. Weighted degree centrality $wc_i$ is defined as Eq.
323: \ref{wt_degree_centrality} where $w_{ij}$ is the weight of the link linking bucket $i$ to bucket $j$.
324: $w_{ij}$ is 0 if bucket $k$ is not linked to bucket $j$.
325:
326: \vspace*{1cm}
327:
328: \begin{equation}
329: {dc_i = \sum_{j=1}^n l_{ij} + \sum_{j=1}^n l_{ji}}
330: \label{degree_centrality}
331: \end{equation}
332:
333: \begin{equation}
334: {wc_i = \sum_{j=1}^n w_{ij} + \sum_{j=1}^n w_{ji}}
335: \label{wt_degree_centrality}
336: \end{equation}
337:
338: \vspace*{1cm}
339:
340: \begin{figure}
341: \begin{center}
342: \input{initial_deg_bar.tex}
343: \caption{\label{initial_deg_bar} Top eight degree centrality rankings based on initial random linking.}
344: \end{figure}
345:
346: Fig. \ref{initial_deg_bar} shows the ranking of the top 8 buckets based on degree centrality rankings when
347: the network was intially setup. The rankings in this case are purely random and no user traversals had taken
348: place. In this case, the degree and weighted degree centrality rankings are the same.
349: Fig. \ref{final_deg_bar} shows the top 8 rankings based on degree centrality after approximately 1041 direct
350: traversals by 15 users. `Public Enemy' is seen to be the most popular band according to degree and weighted
351: degree centrality measures. This was expected since `Public Enemy' was the access point to the network for
352: all users. We also find influential bands such as the ``The Velvet Underground'', ``The Stooges'' and ``L.L. Cool J.''
353:
354: \begin{figure}
355: \begin{center}
356: \input{final_deg_bar.tex}
357: \caption{\label{final_deg_bar} Top eight degree centrality rankings after users surfed the system.}
358: \end{figure}
359:
360:
361: \begin{figure}
362: \begin{center}
363: \input{final_wt_bar.tex}
364: \caption{\label{final_wt_bar} Top eight weighted degree centrality rankings after users surfed the system.}
365: \end{figure}
366:
367: \subsection{Hierarchical Ranking}
368:
369: Since all users entered the network starting at the ``Public Enemy'' buckets, it makes sense to investigate
370: this buckets connections to other buckets as a means to validate network structure.
371: Fig. \ref{hierarchical_graph} gives the hierarchy of the most popular bands starting from ``Public Enemy''
372: including secondary, tertiary and reinforced but initially random links. The weight of the links
373: connecting every two bands is noted within parentheses next to the band lower in the hierarchy.
374: Each band marked with * indicates that it is an initial random link which has been reinforced.\\
375:
376: \begin{figure}
377: \begin{center}
378: \includegraphics[scale=0.5]{graph1b}
379: \end{center}
380: \caption{\label{hierarchical_graph} Hierarchical ranking of bands related to `Public Enemy'.}
381: \end{figure}
382:
383: The 150 bands were graded by two music experts on a scale from 10 to 0, with 10 signifying close relation between the
384: band and ``Public Enemy'' and 0 signifying no relationship between the bands. The rankings were later
385: normalized to a scale of 1. We compute the relationship weights between every band in Fig. \ref{hierarchical_graph} and
386: `Public Enemy' as the sum of the product of all normalized intermediary link weights in order to compare
387: network weights to the expert opinion.\\
388:
389: We can formalize this procedure as follows. Assume two buckets $b_i$ and $b_j$ are connected in the network shown in Fig. \ref{hierarchical_graph}
390: via a path $p$ of length $n$, so that the ordered set $p=\left(b_1, b_2, \cdots, b_k\right)$ represents
391: the buckets on the path that connects $b_i$ and $b_j$. Multiple paths can be identified between any two
392: buckets, therefore we have a set of $k$ paths $P = \{p_1, p_2, \cdots, p_k\}$.\\
393:
394: Eq. \ref{link_wt} is used to compute the weight of relationship of any bucket $b_i$ and the bucket $b_j$ in
395: the generated hierarchical tree, given that $W\left(b_h \in p_g, b_{h-1} \in p_g\right)$ represents the weight between
396: the bucket $b_h$ and its predecessor $b_{h-1}$ in path $p_g$.
397:
398:
399: \begin{equation}
400: W(b_i, b_j) = \sum_{g=1}^k \prod_{h=2}^n W( b_h \in p_g, b_{h-1} \in p_g)
401: \label{link_wt}
402: \end{equation}
403:
404: We examined the indirect link weights of any buckets and the ``Public Enemy'' bucket, so $b_i$ is
405: assumed to be the ``Public Enemy'' bucket in all cases.\\
406:
407: Fig. \ref{scatter_jb_net} shows the scatterplot of the expert and network relationship values to `Public Enemy'.
408: The correlation coefficient between network and expert evaluations of bucket relations to ``Public Enemy''
409: was found to be 0.48 indicating that relationships in the graph correspond to at least two expert judgments.
410: The expert opinions need further validation and with more usage the network could be expected to better reflect
411: user tastes.
412:
413:
414: \begin{figure}
415: \begin{center}
416: \input{jb-mln-netw.tex}
417: \end{center}
418: \caption{\label{scatter_jb_net} Comparison of expert and network ranking of band relationships to `Public Enemy'.}
419: \end{figure}
420:
421: \section{Future Research}
422:
423: While the importance of the nodes has been gauged based on degree and weighted degree
424: centrality measures, it would be interesting to perform an analysis based on principal components
425: and other clustering techniques. \\
426:
427: Another system feature could be decrementing the weight of rarely used links.
428: This would help filter out spurious links created by the initial random linking.
429: The basis for decrementing the link weight needs further study. Options include the
430: time for which a link has not been accessed and the frequency of access of other links. \\
431:
432: Finally, Hebbian learning could be implemented on a portion of a bucket instead of the entire bucket.
433: This would allow a bucket to have section(s) of content that are fixed and section(s) of content
434: that can adapt by Hebbian learning. Imagine bucket1 containing 2 high level
435: elements/sections (HLE1 and HLE2) each with a number of leaf elements. When the user links from
436: bucket1 to bucket2 via a link provided in HLE1, bucket2 would be aware of not only the bucket
437: it was linked from (bucket1) but also the section (HLE1) within that bucket.
438:
439: \section{Conclusion}
440:
441: We have implemented a system for the automated linking of information using a collection
442: of smart objects, labeled buckets, using a set of simple learning rules which change link weights
443: based on user retrieval patterns. The bucket networks gradually change structure as users
444: retrieve one bucket after another via a list of recommended buckets. \\
445:
446: It is evident from the results that although a collection of buckets are initially randomly linked,
447: with adequate user traversal they form a meaningful linkage with resembles the users idea of which
448: buckets should be related to which other buckets. The most centric nodes in the network happen to be
449: either influential or very popular music bands related to ``Public Enemy''. These bands have high
450: degree and weighted degree centralities. \\
451:
452: It was found in the course of analysis, that the rankings based degree centrality was more susceptible
453: to change due to drastic use even by a single user. However, weighted degree centrality offers a more
454: graded and stable approach.\\
455:
456: The random collection initially presented to the users would not return the ideal results needed to satisfy
457: the users information need. The possibility of the system returning the ideal answer set for a user's
458: information need increases with usage of the system. The usage needed to create a well suited network
459: depends on the number of buckets in the network and also on the diversity of the users' information need.
460: When the users information need is of limited scope (e.g. all users interested in rock music)
461: a meaningful network can be expected to form fairly quickly.
462:
463:
464: \bibliographystyle{plain}
465: \bibliography{proj}
466:
467: \end{document}
468: