1: \documentclass[11pt]{article}
2:
3: %\usepackage{ijcai01}
4: %\usepackage{fullpage,palatino}
5: \usepackage{fullpage}
6: \setlength{\oddsidemargin}{-0.25in}
7: \setlength{\evensidemargin}{-0.25in}
8: \setlength{\topmargin}{0.5in}
9: \setlength{\headheight}{0pt}
10: \setlength{\headsep}{0pt}
11: \setlength{\footskip}{0pt}
12: \setlength{\textheight}{8.75in}
13: \setlength{\textwidth}{7in}
14: \setlength{\marginparwidth}{0in}
15: \setlength{\marginparsep}{0in}
16: \newenvironment{descit}[1]{\begin{quote} \textit{#1}}{\end{quote}}
17:
18: \input{psfig-dvips}
19:
20: \newif\ifpdf
21: \ifx\pdfoutput\undefined
22: \pdffalse
23: \else
24: \pdfoutput=1
25: \pdftrue
26: \fi
27:
28: \ifpdf
29: \usepackage[pdftex]{graphicx}
30: \usepackage[pdftex]{color}
31: \DeclareGraphicsExtensions{.pdf,.png,.jpg}
32: \else
33: \usepackage[dvips]{graphicx}
34: \usepackage[dvips]{color}
35: \DeclareGraphicsExtensions{.eps,.epsi,.ps}
36: \fi
37:
38: \usepackage{times}
39: %\usepackage{fancyheadings}
40:
41: %\pagestyle{plain}
42: \thispagestyle{empty}
43: \pagestyle{empty}
44:
45: \def\midv{\mathop{\,|\,}}
46: \newtheorem{defn}{Definition}
47: \long\def\cbk#1{{\color{red}[CBK: #1]}}
48: \newlength\colwidth \setlength\colwidth{3.25in}
49:
50: \title{When being Weak is Brave:\\
51: Privacy Issues in Recommender Systems}
52:
53: \author{Naren Ramakrishnan, Benjamin J. Keller, and Batul J. Mirza\\
54: Department of Computer Science\\
55: Virginia Tech, VA 24061\\
56: Email:\{naren,keller,bmirza\}@cs.vt.edu\\
57: \medskip
58: \\
59: Ananth Y. Grama\\
60: Department of Computer Sciences\\
61: Purdue University, IN 47907\\
62: Email: ayg@cs.purdue.edu\\
63: \medskip
64: \\
65: George Karypis\\
66: Department of Computer Science\\
67: University of Minnesota, MN 55455\\
68: Email: karypis@cs.umn.edu}
69:
70: \begin{document}
71:
72: \maketitle
73: \thispagestyle{empty}
74: \pagestyle{empty}
75:
76: \begin{abstract}
77: \noindent
78: We explore the conflict between personalization and privacy that arises
79: from the existence of weak ties. A weak tie is an unexpected connection that
80: provides serendipitous recommendations. However, information about weak ties
81: could be used in conjunction with
82: other sources of data to uncover identities and reveal other personal
83: information. In this article, we use a graph-theoretic model to study the
84: benefit and risk from weak ties.
85: \end{abstract}
86: \newpage
87:
88: \section{Introduction}
89: Privacy in Internet services is typically thought of in terms of protecting
90: attributes of users (and can thus be related to solutions in database security).
91: However, information provided by a recommender system can also allow the
92: privacy of some of its users to be compromised, when used in conjunction with
93: other information. For example, consider a system
94: that recommends books by finding correlations between
95: a user's ratings and those of other participants for the same books (a
96: nearest neighbor algorithm~\cite{herlocker}). Suppose as a user, person X
97: rates only books on computer networking, but has an interest in Indian
98: classical music. We could
99: imagine the following dialogue between person X and the recommender system:
100:
101: \begin{descit}
102: {\bf person X:} Would I like ``Evolution of Indian Classical Music''?\\
103: {\bf recommender:} Yes.\\
104: {\bf person X:} Why? (surprised)\\
105: {\bf recommender:} People who liked the books you have liked also liked this book.
106: \end{descit}
107:
108: \noindent
109: This is an example of {\it serendipity} in recommendation: person X
110: does not expect
111: to receive a recommendation for a book on a topic outside of those
112: that he/she has rated. But, in fact, it is not by luck that the recommendation
113: was provided --- it means that at least one person has rated the book
114: in addition to the
115: books that person X has rated (and possibly others). The explanation conveys
116: this, but also indicates that the algorithm used is a nearest neighbor
117: algorithm. The serendipity, if person X is malicious, is that he/she has found
118: a candidate
119: {\it weak tie.}
120:
121: In social network theory, a tie is a relationship between people. The strength
122: of a tie is measured in terms of the number of shared associations.
123: A family is an example: a child knows her mother and her father, and her
124: parents know each other. Weak ties, on the other hand,
125: form bridges between two groups of people
126: who would not otherwise interact.
127: Most importantly for us, weak ties allow us to deduce identities. For instance,
128: one may know that a particular computer science
129: department has a networking professor
130: who was an interest in Indian classical music.
131: This professor thus provides a tie between
132: university faculty and the Indian classical music
133: aficionados. Consequently, upon meeting the {\it only}
134: Indian networking professor in that department,
135: you can safely identify this person as the Indian classical music enthusiast.
136:
137: A similar situation occurs in recommendation where weak ties provide the
138: opportunity for serendipitous recommendations. In our example, a candidate
139: weak tie has been found between networking books and books on Indian
140: classical music. We don't know if this truly is a weak tie, but can test it by
141: varying what we rate (the process would involve masquerading as different users
142: and rating different sets of books to probe around the original ratings).
143: Through this process, if the query fails for most variations on the ratings,
144: then we can be more confident that we have a weak tie. And, in fact, the
145: more restrictive the set of ratings that yield the positive recommendation,
146: the more confidence we have that there is one person who has rated the
147: books.
148:
149: Our ability to find such a minimum set of ratings is where the risk lies
150: --- we can use this rating set to determine what the other person has rated
151: using queries, and perhaps fit this information to our knowledge of people
152: who use the system. In the end, it is conceivable that we could identify
153: our hypothetical Indian music enthusiast/networking professor in the
154: recommendation system, and determine what he may have rated. If the system
155: allows us to vary the ratings, then we might be able to estimate the
156: person's ratings as well.
157:
158: In this example, a person actually forms a tie between books through their
159: ratings: a book relates to another if they have both been rated by
160: someone. The alternate view that we will take is that there is a tie between
161: two people if they have rated some number of books in common. So, in the
162: example, the Indian music enthusiast/networking professor has ties to people who
163: rated networking books and people who rate books on Indian classical music.
164: The ties to people who rate Indian classical music books are weak only
165: if there are relatively few people who have the same tastes.
166:
167: %\subsection{Modeling Risk}
168: Clearly, the task of identifying someone through their ratings is difficult,
169: and is even harder if more people have similar rating patterns. But the
170: above example illustrates the risk that may exist even in simple
171: recommendation. The risk is really to people who would
172: participate in weak ties, because they are the people who could most easily
173: be identified. In some application domains (voting preferences and membership
174: on boards), even knowing that a weak tie
175: exists constitutes a breach of privacy.
176:
177: \subsection*{Our Approach}
178: Our goal is to model the benefit from and risk to users who participate
179: in weak ties. In particular, we
180: would like to characterize benefits and risks on an algorithm-independent
181: basis. To achieve this, we use the model of
182: `jumping connections' \cite{batul-thesis}
183: that casts recommendation as making a series of jumps between people, based on
184: common ratings (in nearest neighbor recommendation, there is only one jump).
185: %The result is a model that posits (indirect)
186: %connections between people.
187: We describe how this model has relevance to
188: conventionally accepted metrics of evaluation (Section~\ref{jc-model}). Using
189: this model we can then identify causes for
190: weak ties in terms of the rating patterns of people (Section~\ref{weak-ties}).
191: Furthermore, the model
192: allows us to qualify benefits in terms of reachability in a graph, and risk
193: in terms of weak ties (Section~\ref{benefits-risks}). Finally
194: (Section~\ref{bigger-pictures}), we look at how policies and social structures
195: can be designed that can support and enable recommender systems.
196:
197: \section{Recommendation: The Jumping Connections Model}
198: \label{jc-model}
199: Recommendation algorithms work in a wide variety of ways, from forms of graph
200: search to learning. This variety presents a difficulty when attempting to
201: study the risks of recommendation in general. However, we can think of
202: recommendation as making connections between people who have rated artifacts
203: in common. We represent people, the artifacts they rated (e.g., movies),
204: and their ratings as a bipartite graph (Fig.~\ref{jc-intro} (a)). An
205: algorithm can then {\it jump} over the common artifacts to form a
206: connection between two people. Fig.~\ref{jc-intro} illustrates a skip
207: jump where two people are brought together if they rate at least
208: one movie in common.
209:
210: A jump induces a {\it social network graph} (Fig.~\ref{jc-intro} (b)),
211: which includes only people and edges between them (in social network theory,
212: such a graph shows direct relationships, but here two connected people
213: need not know one another and their connection depends on the jump). The
214: {\it recommender graph} (Fig.~\ref{jc-intro} (c)) orients the edges in the social
215: network graph and adds back the movies. An algorithm can then find paths from
216: a person making a query to a person who has rated the movie of interest.
217: Note that Fig.~\ref{jc-intro} illustrates only one way of jumping
218: --- other jumps are identified by
219: Mirza \cite{batul-thesis}.
220: %In our analysis below, it is sufficient to
221: %consider only the social network graph.
222:
223: \begin{figure}
224: \centering
225: \begin{tabular}{cc}
226: & \mbox{\psfig{figure=skip.eps,width=5in}}
227: \end{tabular}
228: \caption{Illustration of the \emph{skip} jump. (a) bipartite graph
229: of people and movies. (b) Social network graph, and (c)
230: recommender graph.}
231: \label{jc-intro}
232: \end{figure}
233:
234: In this article, we restrict our attention to {\it hammock} jumps. A hammock
235: jump of width $w$ connects two people if they have rated at least $w$
236: movies in common (a skip is a hammock jump of width one). A hammock
237: path of length $l$ is a sequence of $l$ hammocks, as illustrated in
238: Fig.~\ref{hammock-pic}. Our hypothesis is that hammock jumps underlie most
239: recommendation approaches, and at the very least can be used as the basis to
240: design metrics
241: for studying privacy issues.
242: Note that nearest neighbor algorithms (e.g., GroupLens~\cite{konstan1},
243: LikeMinds, and Firefly) use an implicit hammock sequence of length 1.
244: The `horting' algorithm of Aggarwal et al.~\cite{aggarwal1} uses
245: sequences of explicit hammock-like jumps.
246:
247: \begin{figure}
248: \centering
249: \begin{tabular}{cc}
250: & \mbox{\psfig{figure=hammocks.eps,width=5in}}
251: \end{tabular}
252: \caption{A path of hammock jumps, with a hammock width
253: $w=4$.}
254: \label{hammock-pic}
255: \end{figure}
256:
257: \begin{figure}
258: \centering
259: \begin{tabular}{cc}
260: \mbox{\psfig{figure=likeminds.epsi,width=2.3in}} &
261: \mbox{\psfig{figure=horting.epsi,width=2.3in}}\\
262: \end{tabular}
263: \caption{(left) Influence of hammock width $w$ on quality of recommendation.
264: The annotations denote the fraction of people and movies reachable for
265: different values of the hammock width. (right) Influence of
266: hammock path length $l$ on quality of recommendations. The annotations denote
267: the number of recommendations possible for each value of $l$.}
268: \label{expt1}
269: \end{figure}
270:
271: Our model completely ignores accuracy of predicted values of
272: ratings, and instead focuses on the parameters of hammock width and
273: path length. This is because
274: if recommendation is truly a matter of making (the right) connections, then
275: a recommendation of a particular movie for a given person can be
276: characterized by values for the hammock width $w$ and the hammock path
277: length $l$. Notice that we do
278: not emphasize how individual ratings for the nodes (movies) spanning a
279: hammock are transformed into a prediction.
280: In addition, it is
281: highly likely that there are multiple paths between the same set of
282: nodes, with various constraints on $w$ and $l$. Intuitively, since
283: considering more common ratings can be beneficial (see
284: \cite{herlocker} for approaches) having a wider hammock could be better (this
285: has to be carefully done when correlations between ratings are
286: considered \cite{herlocker}). But if we insist on a wide hammock, we
287: might have to traverse longer paths to reach a particular
288: movie from a given person~\cite{batul-thesis}. However, recommendations
289: involving shorter
290: path lengths are preferred, for reasons of explainability, over longer paths.
291: From a graph-theoretic point of view, $w$ and $l$ thus qualify the
292: reachability of different movies from a given person, and indirectly provide
293: a measure of the expected quality of predictions.
294:
295: Preliminary analysis of the relationship between $w$, $l$, and predictive
296: accuracy supports this intuition. Fig.~\ref{expt1} (left) shows a plot
297: of the average discrepancy between predicted and actual ratings for
298: each hammock width when using the LikeMinds algorithm (as described
299: in~\cite{aggarwal1}). These results were determined by a leave-one-out study,
300: where an available rating was masked, and a prediction was made for that
301: rating (using the remaining data). The number of common ratings between
302: the given person and the person with the highest agreement scalar (and who
303: contributed to the recommendation) was used as the hammock width. The
304: results indicate that (for the LikeMinds algorithm), wider hammocks
305: contribute to better ratings.
306: Notice that LikeMinds's hammocks do not just model commonality, they also
307: posit agreement between the rating values spanning a hammock.
308: While it is certainly true that we can get a poor quality
309: recommendation even with a wide hammock (involving perhaps noisy ratings or
310: a faulty aggregation procedure),
311: overall quality of predictions is influenced by greater hammock widths.
312: However, increasing the hammock width results in a progressive
313: disconnection of the social network graph into many components. As a result,
314: fewer and fewer connections can be made --- Fig.~\ref{expt1} (left) also
315: lists the fraction of people and movies reachable for various levels of hammock
316: width. A $w$ of 53 for instance reaches only 48\% of the
317: people and 93\% of the movies. By the time the abrupt improvement
318: in agreement values is observed (after $w \approx 110$), less than 25\%
319: of the people and only about $86\%$ of the movies are reachable.
320:
321: Fig.~\ref{expt1} (right) describes the results
322: of an experiment where a minimum hammock width constraint was set
323: at $w=113$ (according to the LikeMinds definition)
324: and the resulting recommender graph was analyzed for
325: paths of varying lengths from people to movies.
326: We used the transformation
327: technique described in \cite{aggarwal1} to make predictions of ratings from
328: others' ratings, once again using the leave-one-out method.
329: Paths in the recommender
330: graph involve 1, 2, or 3 hops to the person providing a recommendation
331: and a final hop to the movie being recommended (hence the bucketing of values
332: into 2, 3, and 4 in Fig.~\ref{expt1}, right).
333: As can be seen,
334: greater lengths (for the same $w$) cause a faster-than-linear
335: decay in the quality of predictions. We should caution that
336: horting~\cite{aggarwal1} may exhibit different behavior, though we still
337: expect longer paths to be of lower quality.
338:
339: These results support the intuition that wider hammocks and shorter paths
340: provide better ratings. Hammock widths are determined by rating patterns
341: that ensure significant overlap. We see this in the
342: MovieLens dataset for which each participant rates a minimum of 20 movies
343: and which has a connected social network graph for all
344: $w \le 17$~\cite{batul-thesis}.
345:
346: The primary cause of shorter paths is having more connections in the graph.
347: In the MovieLens dataset,
348: a recommendation is almost always possible using
349: a path of length no longer than 3. This is due to the power-law degree
350: distribution of the rating patterns. Other graphs, such
351: as {\it small-world networks} \cite{watts1}, have
352: small clusters of vertices that are connected by relatively few edges.
353: `Weak ties' are important in both these situations
354: because they make some recommendations possible, and provide others with
355: shorter paths. Therefore, weak ties are very important to recommendation.
356:
357: \section{Ties: Strong, Weak, and Brave}
358: \label{weak-ties}
359: In contrast to weak ties\footnote{It is important to note that there is
360: nothing fundamentally feeble or fragile about a weak tie; a weak tie
361: creates a powerful and robust link between nodes from different neighborhoods.},
362: a strong tie connects two people who share
363: many associations (like in a family or some other close-knit group).
364: We can think of weak ties as forming bridges between groups of people
365: who would otherwise not interact. Of course, strength and weakness
366: are relative, and there is no agreed definition of what a strong tie is
367: in terms of the number of shared associations.
368:
369: \begin{figure}
370: \centering
371: \begin{tabular}{cc}
372: & \mbox{\psfig{figure=triad.epsi,width=2.5in}}
373: \end{tabular}
374: \caption{Strong ties in a social network graph (right)
375: induced by a hammock jump on a recommendation dataset (left) with $w=2$.}
376: \label{figtriad}
377: \end{figure}
378:
379: In a graph, strong ties are characterized by a triangle of vertices
380: (a \emph{triad} in the social network literature).
381: Fig.~\ref{figtriad}
382: illustrates how these triads can occur in a social
383: network graph induced by a hammock jump.
384: In this case, the width of the hammock jump is 2, and what looks like
385: two relationships becomes three in the social network graph.
386: Notice that, in this example, if the hammock jump width were three,
387: then the resulting social network would not have a triad and so
388: neither edge would represent a strong tie.
389: It is a classical argument in social network theory that
390: no strong tie can be a bridge
391: and that two strong ties would imply a third tie~\cite{weak}.
392:
393: Weak ties are of most interest to us, because they are the foundation
394: for our notion of risk.
395: As discussed earlier,
396: a weak tie in a social setting allows people to identify someone with
397: other information that they've been given.
398: Weak ties occur simply because someone knows
399: someone else outside of their usual circle of friends; or perhaps
400: there is a person (an `outsider') who is friends with a few people who
401: each have strong(er) ties to people in different groups.
402:
403:
404: \begin{figure}
405: \centering
406: \begin{tabular}{cc}
407: \mbox{\psfig{figure=powerlaw.eps,width=2.5in}} &
408: \mbox{\psfig{figure=goodloops.eps,width=2.5in}}
409: \end{tabular}
410: \vspace{0.3in}
411: \begin{tabular}{cc}
412: \mbox{\psfig{figure=componentpl.eps,width=2.5in}} &
413: \mbox{\psfig{figure=badloops.eps,width=2.5in}}
414: \end{tabular}
415: \caption{Two different types of induced social networks that can exhibit weak
416: ties. (top left) A dataset with a power-law induces a low-risk
417: social network (top right) where increasing hammock widths cause
418: a `nested clam shells' picture. Each circle in the social network picture
419: denotes a group of people brought together. Increasing hammock widths cause
420: the circles to get progressively smaller.
421: (bottom left) A dataset with power-laws in only subgraphs and a few weak
422: ties induces a high-risk social network (bottom right) characterized by
423: the breakdown of a connected network into disconnected networks.
424: Some experimental data supporting the
425: diagrams above can be found in~\cite{batul-thesis}.}
426: \label{graphs-risks}
427: \end{figure}
428:
429: In recommendation, weak ties originate from the rating patterns of the
430: participants, but the jump process also plays a crucial role. We
431: hypothesize two fundamental rating patterns. One can be observed
432: in the public movie recommendation datasets (MovieLens and EachMovie), and
433: the other is what we would assume for a domain where people have stronger
434: bias in their tastes (such as books or music).
435:
436: The movie datasets exhibit a power-law degree distribution as illustrated
437: in Fig.~\ref{graphs-risks} (top, left). The power-law rating pattern comes from
438: preferential attachment; for example, some movies (the {\it hits}) are
439: rated by almost everyone, and some people (the {\it buffs}) rate almost
440: all movies. Weak ties are rare in this setting but might occur when a person
441: shows no strong allegiance to any genre and rates relatively few movies in
442: each (he/she would not be a buff). The real risk to these people is that
443: they might not have enough ratings in common with anyone so they can be
444: given recommendations.
445:
446: The second rating pattern would occur where most people exhibit a preference
447: for a particular kind of artifact. This is illustrated in Fig.~\ref{graphs-risks} (bottom, left) where there are three subgraphs with power-law structures,
448: connected by a relatively small number of ratings. This diagram illustrates
449: one source of weak tie in this setting, which is when someone who
450: ordinarily only rates artifacts in one domain (e.g., networking books)
451: rates an artifact in another domain (e.g., Indian classical music books).
452: Another
453: possibility is someone with more eclectic tastes who rates artifacts across
454: many domains, and unlike in the power-law graph is truly a weak tie. The risk
455: with weak ties in this rating pattern is that they may allow us to
456: identify a person whose ratings can get us from one domain to another.
457:
458: The jump process, described in Section~\ref{jc-model},
459: can also create weak ties when using common ratings as the basis for making
460: connections between people.
461: Many people might have rated
462: across several domains, but only a few have enough ratings to satisfy the
463: jump being used. A final reason relates to merging of data collected from
464: different settings. For instance, the recent purchase of eToys consumer
465: data by another retail giant signals the possibility of the creation of
466: a social network graph with weak ties.
467:
468: The risk in a weak tie really comes from being the only person with a peculiar
469: rating pattern --- there is safety in numbers, or at least in
470: homogeneous tastes (as in power-law graphs).
471: The more people who rate the same kinds of things, the less likely
472: that any one of them will be identifiable as participating in a weak tie.
473: But notice that if the jump definition weeds some of those people out,
474: the risk is still there (although it is less likely that any additional
475: information could be used to identify a single person).
476:
477: \section{The Benefits and Perils of Personalization}
478: \label{benefits-risks}
479: Intuitively, a user desires the most benefit from a recommendation that is
480: based on wider hammocks and shorter path lengths. Of course, to get these
481: qualities we have to provide more ratings, and the risk is that we might
482: introduce a weak tie. The problem then is can we relate how much
483: we rate to the benefit and risk inherent in recommendation?
484:
485: \begin{table}
486: \caption {Movies used in analyzing the benefits of ratings on personalization.
487: `Star Wars' and `Scream of Stone' had the highest and lowest number of ratings,
488: respectively.}
489: %\hspace{25mm}
490: \centering
491: \vspace{0.07in}
492: \begin{tabular}{|l|r|} \hline\hline
493: \emph{Movie Name} & \emph {Number of Ratings} \\ \hline
494: Star Wars & 583\\ \hline
495: Tommorrow Never Dies & 180 \\ \hline
496: Robin Hood: Men in Tights & 56 \\ \hline
497: Scream of Stone & 1 \\ \hline
498: \end{tabular}
499: \label{movies-listing}
500: \end{table}
501:
502: When there are multiple recommendation paths between a given combination
503: of person and movie, we would like a benefit formula that captures our
504: preference for wider hammocks and shorter path lengths.
505: By defining the benefit of a recommendation as:
506: $$\mathrm{benefit} = {w \over{l^2}}$$
507: we can give
508: more weight to improvements in path length from $2$ to $1$ than,
509: say, from $3$ to $2$. This non-linear dependence of quality of interaction
510: on the length is supported by research in diffusion processes~\cite{watts1},
511: social networks~\cite{weak} and also our own experiments (see Fig.~\ref{expt1},
512: right).
513:
514: We can explore benefit in terms of the number of artifacts that are rated.
515: Typically, recommender systems require that users rate a minimum number of
516: artifacts before they can make queries, and so we look at the incremental
517: benefit received by providing additional ratings. For this purpose, we
518: analyze the MovieLens dataset where it is required that a user rate 20 movies,
519: and add a new person. The MovieLens dataset consists of 943 people, 1682
520: movies, and is connected as a graph.
521:
522: For the experiment, we introduce a 944th person and incrementally add ratings
523: from the new person to movies (so that movies with a higher rating
524: frequency were more likely to be rated). After each rating was added the path
525: lengths $l$ to particular movies (see Table~\ref{movies-listing}) were computed
526: for each hammock width $w$. Twenty repetitions were performed for each
527: additional rating.
528:
529: \begin{figure}
530: \centering
531: \begin{tabular}{cc}
532: & \mbox{\psfig{figure=densityplot.eps,width=2.3in}}
533: \end{tabular}
534: \caption{Benefit vs. number of additional ratings required, for various
535: choices of movie destination nodes. The cells are colored with greater
536: intensities corresponding to movies with fewer ratings.}
537: \label{expt3}
538: \end{figure}
539:
540:
541: The benefit from additional ratings for the movies in
542: Table~\ref{movies-listing} is shown in Fig.~\ref{expt3}. Each colored cell
543: indicates that the particular benefit is possible for the corresponding
544: number of ratings. The feasible benefit regions are actually monotonically
545: increasing by popularity of the movie --- with more possibilities for
546: `Star Wars' than for `Tomorrow Never Dies.' The plot shows that if you
547: want a good recommendation for a less popular movie, you need to provide more
548: ratings, but can receive good recommendations for popular movies with
549: fewer ratings. In particular, requesting an improvement in benefit for
550: a `Star Wars' recommendation from
551: 5 to 14 requires no extra ratings!
552:
553: \begin{figure}
554: \centering
555: \begin{tabular}{cc}
556: & \mbox{\psfig{figure=small-world.eps,width=5in}}
557: \end{tabular}
558: \caption{Random rewiring, starting from a regular wreath network, introduces
559: weak ties that help model small-world graphs.
560: Figure adapted from \cite{watts1}.}
561: \label{small}
562: \end{figure}
563:
564: \begin{figure} \centering
565: \begin{tabular}{cc}
566: \mbox{\psfig{figure=small-world-graphs.eps,width=2.6in}} &
567: \mbox{\psfig{figure=expt2.epsi,width=2.6in}}\\
568: \end{tabular}
569: \caption{(left) Average path length and clustering coefficient versus
570: the rewiring probability $p$ (from \cite{watts1}). All measurements are
571: scaled w.r.t. the values at $p = 0$. (right) Quantifying the risk
572: as a function of rewiring probability $p$.}
573: \label{smgs}
574: \end{figure}
575:
576: The danger involved
577: in recommendation relates to the probability that a weak connection is
578: exposed; unfortunately, this is not a static property of a recommendation
579: path and can only be studied in reference
580: to the social network graph {\it in the absence} of the considered
581: connection. This means that we need a more complete understanding of the
582: dynamics by which weak ties are introduced, modeled, and employed in a social
583: network. Such an understanding
584: could take the form of a graph-generation
585: model. Here we use the model of
586: Watts and Strogatz \cite{watts1} as a basis for our study of risk.
587:
588: The intuition is that risk occurs when we have edges that are weak ties between
589: subgraphs that are cliques (or at least nearly so), and the risk decreases
590: as more of these edges are added. In particular, the risk is highest when
591: a new weak tie occurs and the lengths between people decrease dramatically.
592: As more weak ties are added, the risk decreases.
593:
594: This idea of risk can be explored in the
595: Watts-Strogatz model for small-world networks. They show how to
596: generate a spectrum of graphs from a regular wreath graph by adjusting
597: the probability $p$ of rewiring an edge
598: (Fig.~\ref{small}). When $p$ is zero, we have the wreath; but when $p$ is
599: one, we have a random graph. The risk from weak ties is low in both the wreath
600: and random graphs, but increases as the average path length drops but
601: the vertices are still clustered. Fig.~\ref{smgs} (left) illustrates
602: the relationship between length and clustering (see~\cite{watts1} for
603: details of the definitions). When $p$ is between $0$ and $0.1$, the
604: graph is a small-world network, and poses the most risk from weak ties.
605:
606: We can express the risk of weak ties in terms of $p$: the risk in having ratings
607: that form weak ties can be quantified as the rate at which $l$ reduces, as
608: a function of $p$:
609: $$\mathrm{risk} = {- {{\partial l} \over {\partial p}}}$$
610:
611: The risk for the dynamics described in Fig.~\ref{small} is given in
612: Fig.~\ref{smgs}, right (the length values are
613: scaled with respect to the length at $p=0$ before calculating the risk).
614: Notice that the risk increases
615: rapidly (as weak ties
616: are introduced) and drops down gradually (as more weak ties share
617: responsibility for length reduction). This captures our intuition pertaining
618: to disclosure of sensitive information by ferreting out weak ties.
619: However, our jumping connections model is not directly parameterized by $p$.
620:
621: To be useful, the above formula for risk must relate length reduction to
622: a metric that could be used to balance personalization and privacy. We
623: can illustrate the risk of becoming a weak tie by studying what happens
624: as we decrease the hammock width $w$ in Fig.~\ref{graphs-risks} (bottom).
625: Consider the situation when the social network graph is in three disconnected
626: components.
627: Decreasing the hammock width would introduce new edges that
628: are weak ties, which would contribute to length reduction and thus,
629: quantification of risk. However, recall that increasing the hammock width is
630: desirable from the viewpoint of benefit. Taken another way, benefits improve
631: monotonically with increasing width $w$ but risk rises rapidly (as
632: fewer weak ties share responsibility for length reduction) upto a point and
633: then drops sharply.
634:
635: \begin{figure}
636: \centering
637: \begin{tabular}{cc}
638: & \mbox{\psfig{figure=expt4.ps,width=3in}}
639: \end{tabular}
640: \caption{Risk as a function of hammock width $w$.}
641: \label{expt4}
642: \end{figure}
643:
644: To explore this setting, we created an artificial dataset that
645: consists of three subgraphs with power-law degree distributions, each
646: with 200 people and 75 artifact vertices. Each person node is linked to
647: at most 15 artifact nodes within the same subgraph. Specifically, the
648: people and artifacts were ordered, and the $b^{th}$ person rated the first
649: $\lceil 75 b^{-\epsilon} \rceil$ artifacts.
650: The value of $\epsilon$
651: was calibrated to achieve a minimum rating of $15$ artifacts. Then three
652: extra people were added who rate
653: (at most 15)
654: artifacts in all three connected components, again with a
655: `master' power-law.
656:
657: For a hammock width of 9, the social network of this graph
658: consists of three disconnected components. By decreasing the hammock
659: width, weak ties will be introduced into the social network, and the
660: path lengths decrease. The results are plotted in Fig.~\ref{expt4}
661: (lengths are scaled against the path length for $w=8$). As could
662: be expected, risk is highest when the graph is first connected.
663:
664: It is not possible to provide a traditional benefit-risk profile, as is
665: customary in analysis. This is because recommender systems aggregate the
666: ratings of many participants when computing a recommendation. A user's
667: benefits comes from `plugging into' the social network by
668: providing a sufficient number of ratings, but a user's risk depends
669: not only on what is rated but also
670: on what other people rate. Ultimately, the difficulty
671: comes from the fact that risk occurs even if recommendation queries are
672: not made, but benefit requires that the user make queries.
673:
674: The two qualitative conclusions from our studies are that (i) a few
675: weak ties are more risky than a lot of weak
676: ties, and (ii) more so, in some (induced) social networks than others.
677:
678: \section{Concluding Remarks}
679: \label{bigger-pictures}
680: The very factors that make weak ties useful are the ones that
681: raise the threat of privacy. We have demonstrated that under certain conditions,
682: recommendations could involve weak ties and could potentially compromise
683: the privacy of individuals. Like most problems in computer security, the
684: ideal deterrents are better awareness of the issues and more openness in
685: how recommender systems operate in the market place. In particular, policies
686: and methodologies employed by an individual site should be made clear.
687: Sites that involve multiple homogeneous networks have a crucial responsibility
688: in clarifying the role of weak ties in their system designs and what forms of
689: mechanisms are in place to thwart hackers.
690:
691: Ideally, recommender systems should convey to the user both benefits and risks
692: in an intuitive manner. One possibility is to present the user with plots of
693: benefit and risk versus user-modifiable parameters --- ratings, $w$, and $l$
694: (if the algorithm allows their direct specification). Another possibility is
695: to qualify the risks and benefits associated with rating each individual
696: artifact (as a function of the previous ratings in the system). Providing
697: a rating for `Scream of Stone' for instance would provide dramatic improvements
698: in benefit than providing a rating for `Star Wars.' At the same time,
699: the system should qualify the extent to which a user becomes a weak tie,
700: by such a rating.
701:
702: Singh and colleagues \cite{singh-cacm} make a provoking observation in
703: drawing comparisons from community-based networks to recommender systems ---
704: namely, that people really want to control to whom they reveal their ratings
705: but would like to know how recommendations are being made. In a distributed
706: setting, one can imagine a scenario where people specify how data collected
707: from their interactions should be modeled and used. Interfaces for
708: privacy management are woefully inadequate and their role is only now
709: being recognized \cite{etzioni-cacm}. Extending the results here to
710: a distributed setting where people can set arbitrary constraints on their
711: station in the social network graph (whether they are willing to participate
712: in a path?; are there constraints on such participation?; would they provide
713: ratings if they knew that it would contribute to a weak tie?) is a possible
714: direction for future research.
715:
716: One wonders if weak ties will happen at all, if concerns are raised about
717: their compromise. Social network theory postulates that they are
718: the primary mechanisms by which micro-level interactions can manifest at
719: macro levels, and that such ties will be kindled whenever communities have
720: to be mobilized for collective action. It remains to be seen if weak ties
721: induced by jumps in a recommender system also conform to similar
722: distributed organization.
723:
724: \bibliographystyle{plain}
725: %\bibliographystyle{named}
726: \bibliography{ppp}
727:
728: \end{document}
729:
730: