0105:cs0105028/ppp.tex

1: \documentclass[11pt]{article}

2:

3: %\usepackage{ijcai01}

4: %\usepackage{fullpage,palatino}

5: \usepackage{fullpage}

6: \setlength{\oddsidemargin}{-0.25in}

7: \setlength{\evensidemargin}{-0.25in}

8: \setlength{\topmargin}{0.5in}

9: \setlength{\headheight}{0pt}

10: \setlength{\headsep}{0pt}

11: \setlength{\footskip}{0pt}

12: \setlength{\textheight}{8.75in}

13: \setlength{\textwidth}{7in}

14: \setlength{\marginparwidth}{0in}

15: \setlength{\marginparsep}{0in}

16: \newenvironment{descit}[1]{\begin{quote} \textit{#1}}{\end{quote}}

17:

18: \input{psfig-dvips}

19:

20: \newif\ifpdf

21: \ifx\pdfoutput\undefined

22:   \pdffalse

23: \else

24:   \pdfoutput=1

25:   \pdftrue

26: \fi

27:

28: \ifpdf

29:   \usepackage[pdftex]{graphicx}

30:   \usepackage[pdftex]{color}

31:   \DeclareGraphicsExtensions{.pdf,.png,.jpg}

32: \else

33:   \usepackage[dvips]{graphicx}

34:   \usepackage[dvips]{color}

35:   \DeclareGraphicsExtensions{.eps,.epsi,.ps}

36: \fi

37:

38: \usepackage{times}

39: %\usepackage{fancyheadings}

40:

41: %\pagestyle{plain}

42: \thispagestyle{empty}

43: \pagestyle{empty}

44:

45: \def\midv{\mathop{\,|\,}}

46: \newtheorem{defn}{Definition}

47: \long\def\cbk#1{{\color{red}[CBK: #1]}}

48: \newlength\colwidth \setlength\colwidth{3.25in}

49:

50: \title{When being Weak is Brave:\\

51: Privacy Issues in Recommender Systems}

52:

53: \author{Naren Ramakrishnan, Benjamin J. Keller, and Batul J. Mirza\\

54: Department of Computer Science\\

55: Virginia Tech, VA 24061\\

56: Email:\{naren,keller,bmirza\}@cs.vt.edu\\

57: \medskip

58: \\

59: Ananth Y. Grama\\

60: Department of Computer Sciences\\

61: Purdue University, IN 47907\\

62: Email: ayg@cs.purdue.edu\\

63: \medskip

64: \\

65: George Karypis\\

66: Department of Computer Science\\

67: University of Minnesota, MN 55455\\

68: Email: karypis@cs.umn.edu}

69:

70: \begin{document}

71:

72: \maketitle

73: \thispagestyle{empty}

74: \pagestyle{empty}

75:

76: \begin{abstract}

77: \noindent

78: We explore the conflict between personalization and privacy that arises

79: from the existence of weak ties. A weak tie is an unexpected connection that

80: provides serendipitous recommendations. However, information about weak ties

81: could be used in conjunction with

82: other sources of data to uncover identities and reveal other personal

83: information. In this article, we use a graph-theoretic model to study the

84: benefit and risk from weak ties.

85: \end{abstract}

86: \newpage

87:

88: \section{Introduction}

89: Privacy in Internet services is typically thought of in terms of protecting

90: attributes of users (and can thus be related to solutions in database security).

91: However, information provided by a recommender system can also allow the

92: privacy of some of its users to be compromised, when used in conjunction with

93: other information. For example, consider a system

94: that recommends books by finding correlations between

95: a user's ratings and those of other participants for the same books (a

96: nearest neighbor algorithm~\cite{herlocker}). Suppose as a user, person X

97: rates only books on computer networking, but has an interest in Indian

98: classical music. We could

99: imagine the following dialogue between person X and the recommender system:

100:

101: \begin{descit}

102: {\bf person X:} Would I like ``Evolution of Indian Classical Music''?\\

103: {\bf recommender:} Yes.\\

104: {\bf person X:} Why? (surprised)\\

105: {\bf recommender:} People who liked the books you have liked also liked this book.

106: \end{descit}

107:

108: \noindent

109: This is an example of {\it serendipity} in recommendation: person X

110: does not expect

111: to receive a recommendation for a book on a topic outside of those

112: that he/she has rated. But, in fact, it is not by luck that the recommendation

113: was provided --- it means that at least one person has rated the book

114: in addition to the

115: books that person X has rated (and possibly others). The explanation conveys

116: this, but also indicates that the algorithm used is a nearest neighbor

117: algorithm. The serendipity, if person X is malicious, is that he/she has found

118: a candidate

119: {\it weak tie.}

120:

121: In social network theory, a tie is a relationship between people. The strength

122: of a tie is measured in terms of the number of shared associations.

123: A family is an example: a child knows her mother and her father, and her

124: parents know each other. Weak ties, on the other hand,

125: form bridges between two groups of people

126: who would not otherwise interact.

127: Most importantly for us, weak ties allow us to deduce identities. For instance,

128: one may know that a particular computer science

129: department has a networking professor

130: who was an interest in Indian classical music.

131: This professor thus provides a tie between

132: university faculty and the Indian classical music

133: aficionados. Consequently, upon meeting the {\it only}

134: Indian networking professor in that department,

135: you can safely identify this person as the Indian classical music enthusiast.

136:

137: A similar situation occurs in recommendation where weak ties provide the

138: opportunity for serendipitous recommendations. In our example, a candidate

139: weak tie has been found between networking books and books on Indian

140: classical music. We don't know if this truly is a weak tie, but can test it by

141: varying what we rate (the process would involve masquerading as different users

142: and rating different sets of books to probe around the original ratings).

143: Through this process, if the query fails for most variations on the ratings,

144: then we can be more confident that we have a weak tie. And, in fact, the

145: more restrictive the set of ratings that yield the positive recommendation,

146: the more confidence we have that there is one person who has rated the

147: books.

148:

149: Our ability to find such a minimum set of ratings is where the risk lies

150: --- we can use this rating set to determine what the other person has rated

151: using queries, and perhaps fit this information to our knowledge of people

152: who use the system. In the end, it is conceivable that we could identify

153: our hypothetical Indian music enthusiast/networking professor in the

154: recommendation system, and determine what he may have rated. If the system

155: allows us to vary the ratings, then we might be able to estimate the

156: person's ratings as well.

157:

158: In this example, a person actually forms a tie between books through their

159: ratings: a book relates to another if they have both been rated by

160: someone. The alternate view that we will take is that there is a tie between

161: two people if they have rated some number of books in common. So, in the

162: example, the Indian music enthusiast/networking professor has ties to people who

163: rated networking books and people who rate books on Indian classical music.

164: The ties to people who rate Indian classical music books are weak only

165: if there are relatively few people who have the same tastes.

166:

167: %\subsection{Modeling Risk}

168: Clearly, the task of identifying someone through their ratings is difficult,

169: and is even harder if more people have similar rating patterns. But the

170: above example illustrates the risk that may exist even in simple

171: recommendation. The risk is really to people who would

172: participate in weak ties, because they are the people who could most easily

173: be identified. In some application domains (voting preferences and membership

174: on boards), even knowing that a weak tie

175: exists constitutes a breach of privacy.

176:

177: \subsection*{Our Approach}

178: Our goal is to model the benefit from and risk to users who participate

179: in weak ties. In particular, we

180: would like to characterize benefits and risks on an algorithm-independent

181: basis. To achieve this, we use the model of

182: `jumping connections' \cite{batul-thesis}

183: that casts recommendation as making a series of jumps between people, based on

184: common ratings (in nearest neighbor recommendation, there is only one jump).

185: %The result is a model that posits (indirect)

186: %connections between people.

187: We describe how this model has relevance to

188: conventionally accepted metrics of evaluation (Section~\ref{jc-model}). Using

189: this model we can then identify causes for

190: weak ties in terms of the rating patterns of people (Section~\ref{weak-ties}).

191: Furthermore, the model

192: allows us to qualify benefits in terms of reachability in a graph, and risk

193: in terms of weak ties (Section~\ref{benefits-risks}). Finally

194: (Section~\ref{bigger-pictures}), we look at how policies and social structures

195: can be designed that can support and enable recommender systems.

196:

197: \section{Recommendation: The Jumping Connections Model}

198: \label{jc-model}

199: Recommendation algorithms work in a wide variety of ways, from forms of graph

200: search to learning. This variety presents a difficulty when attempting to

201: study the risks of recommendation in general. However, we can think of

202: recommendation as making connections between people who have rated artifacts

203: in common. We represent people, the artifacts they rated (e.g., movies),

204: and their ratings as a bipartite graph (Fig.~\ref{jc-intro} (a)). An

205: algorithm can then {\it jump} over the common artifacts to form a

206: connection between two people. Fig.~\ref{jc-intro} illustrates a skip

207: jump where two people are brought together if they rate at least

208: one movie in common.

209:

210: A jump induces a {\it social network graph} (Fig.~\ref{jc-intro} (b)),

211: which includes only people and edges between them (in social network theory,

212: such a graph shows direct relationships, but here two connected people

213: need not know one another and their connection depends on the jump). The

214: {\it recommender graph} (Fig.~\ref{jc-intro} (c)) orients the edges in the social

215: network graph and adds back the movies. An algorithm can then find paths from

216: a person making a query to a person who has rated the movie of interest.

217: Note that Fig.~\ref{jc-intro} illustrates only one way of jumping

218: --- other jumps are identified by

219: Mirza \cite{batul-thesis}.

220: %In our analysis below, it is sufficient to

221: %consider only the social network graph.

222:

223: \begin{figure}

224: \centering

225: \begin{tabular}{cc}

226: & \mbox{\psfig{figure=skip.eps,width=5in}}

227: \end{tabular}

228: \caption{Illustration of the \emph{skip} jump. (a) bipartite graph

229: of people and movies. (b) Social network graph, and (c)

230: recommender graph.}

231: \label{jc-intro}

232: \end{figure}

233:

234: In this article, we restrict our attention to {\it hammock} jumps. A hammock

235: jump of width $w$ connects two people if they have rated at least $w$

236: movies in common (a skip is a hammock jump of width one). A hammock

237: path of length $l$ is a sequence of $l$ hammocks, as illustrated in

238: Fig.~\ref{hammock-pic}. Our hypothesis is that hammock jumps underlie most

239: recommendation approaches, and at the very least can be used as the basis to

240: design metrics

241: for studying privacy issues.

242: Note that nearest neighbor algorithms (e.g., GroupLens~\cite{konstan1},

243: LikeMinds, and Firefly) use an implicit hammock sequence of length 1.

244: The `horting' algorithm of Aggarwal et al.~\cite{aggarwal1} uses

245: sequences of explicit hammock-like jumps.

246:

247: \begin{figure}

248: \centering

249: \begin{tabular}{cc}

250: & \mbox{\psfig{figure=hammocks.eps,width=5in}}

251: \end{tabular}

252: \caption{A path of hammock jumps, with a hammock width

253: $w=4$.}

254: \label{hammock-pic}

255: \end{figure}

256:

257: \begin{figure}

258: \centering

259: \begin{tabular}{cc}

260: \mbox{\psfig{figure=likeminds.epsi,width=2.3in}} &

261: \mbox{\psfig{figure=horting.epsi,width=2.3in}}\\

262: \end{tabular}

263: \caption{(left) Influence of hammock width $w$ on quality of recommendation.

264: The annotations denote the fraction of people and movies reachable for

265: different values of the hammock width. (right) Influence of

266: hammock path length $l$ on quality of recommendations. The annotations denote

267: the number of recommendations possible for each value of $l$.}

268: \label{expt1}

269: \end{figure}

270:

271: Our model completely ignores accuracy of predicted values of

272: ratings, and instead focuses on the parameters of hammock width and

273: path length. This is because

274: if recommendation is truly a matter of making (the right) connections, then

275: a recommendation of a particular movie for a given person can be

276: characterized by values for the hammock width $w$ and the hammock path

277: length $l$. Notice that we do

278: not emphasize how individual ratings for the nodes (movies) spanning a

279: hammock are transformed into a prediction.

280: In addition, it is

281: highly likely that there are multiple paths between the same set of

282: nodes, with various constraints on $w$ and $l$. Intuitively, since

283: considering more common ratings can be beneficial (see

284: \cite{herlocker} for approaches) having a wider hammock could be better (this

285: has to be carefully done when correlations between ratings are

286: considered \cite{herlocker}). But if we insist on a wide hammock, we

287: might have to traverse longer paths to reach a particular

288: movie from a given person~\cite{batul-thesis}. However, recommendations

289: involving shorter

290: path lengths are preferred, for reasons of explainability, over longer paths.

291: From a graph-theoretic point of view, $w$ and $l$ thus qualify the

292: reachability of different movies from a given person, and indirectly provide

293: a measure of the expected quality of predictions.

294:

295: Preliminary analysis of the relationship between $w$, $l$, and predictive

296: accuracy supports this intuition. Fig.~\ref{expt1} (left) shows a plot

297: of the average discrepancy between predicted and actual ratings for

298: each hammock width when using the LikeMinds algorithm (as described

299: in~\cite{aggarwal1}). These results were determined by a leave-one-out study,

300: where an available rating was masked, and a prediction was made for that

301: rating (using the remaining data). The number of common ratings between

302: the given person and the person with the highest agreement scalar (and who

303: contributed to the recommendation) was used as the hammock width. The

304: results indicate that (for the LikeMinds algorithm), wider hammocks

305: contribute to better ratings.

306: Notice that LikeMinds's hammocks do not just model commonality, they also

307: posit agreement between the rating values spanning a hammock.

308: While it is certainly true that we can get a poor quality

309: recommendation even with a wide hammock (involving perhaps noisy ratings or

310: a faulty aggregation procedure),

311: overall quality of predictions is influenced by greater hammock widths.

312: However, increasing the hammock width results in a progressive

313: disconnection of the social network graph into many components. As a result,

314: fewer and fewer connections can be made --- Fig.~\ref{expt1} (left) also

315: lists the fraction of people and movies reachable for various levels of hammock

316: width. A $w$ of 53 for instance reaches only 48\% of the

317: people and 93\% of the movies. By the time the abrupt improvement

318: in agreement values is observed (after $w \approx 110$), less than 25\%

319: of the people and only about $86\%$ of the movies are reachable.

320:

321: Fig.~\ref{expt1} (right) describes the results

322: of an  experiment where a minimum hammock width constraint was set

323: at $w=113$ (according to the LikeMinds definition)

324: and the resulting recommender graph was analyzed for

325: paths of varying lengths from people to movies.

326: We used the transformation

327: technique described in \cite{aggarwal1} to make predictions of ratings from

328: others' ratings, once again using the leave-one-out method.

329: Paths in the recommender

330: graph involve 1, 2, or 3 hops to the person providing a recommendation

331: and a final hop to the movie being recommended (hence the bucketing of values

332: into 2, 3, and 4 in Fig.~\ref{expt1}, right).

333: As can be seen,

334: greater lengths (for the same $w$) cause a faster-than-linear

335: decay in the quality of predictions. We should caution that

336: horting~\cite{aggarwal1} may exhibit different behavior, though we still

337: expect longer paths to be of lower quality.

338:

339: These results support the intuition that wider hammocks and shorter paths

340: provide better ratings. Hammock widths are determined by rating patterns

341: that ensure significant overlap. We see this in the

342: MovieLens dataset for which each participant rates a minimum of 20 movies

343: and which has a connected social network graph for all

344: $w \le 17$~\cite{batul-thesis}.

345:

346: The primary cause of shorter paths is having more connections in the graph.

347: In the MovieLens dataset,

348: a recommendation is almost always possible using

349: a path of length no longer than 3. This is due to the power-law degree

350: distribution of the rating patterns. Other graphs, such

351: as {\it small-world networks} \cite{watts1}, have

352: small clusters of vertices that are connected by relatively few edges.

353: `Weak ties' are important in both these situations

354: because they make some recommendations possible, and provide others with

355: shorter paths. Therefore, weak ties are very important to recommendation.

356:

357: \section{Ties: Strong, Weak, and Brave}

358: \label{weak-ties}

359: In contrast to weak ties\footnote{It is important to note that there is

360: nothing fundamentally feeble or fragile about a weak tie; a weak tie

361: creates a powerful and robust link between nodes from different neighborhoods.},

362: a strong tie connects two people who share

363: many associations (like in a family or some other close-knit group).

364: We can think of weak ties as forming bridges between groups of people

365: who would otherwise not interact.  Of course, strength and weakness

366: are relative, and there is no agreed definition of what a strong tie is

367: in terms of the number of shared associations.

368:

369: \begin{figure}

370: \centering

371: \begin{tabular}{cc}

372: & \mbox{\psfig{figure=triad.epsi,width=2.5in}}

373: \end{tabular}

374: \caption{Strong ties in a social network graph (right)

375: induced by a hammock jump on a recommendation dataset (left) with $w=2$.}

376: \label{figtriad}

377: \end{figure}

378:

379: In a graph, strong ties are characterized by a triangle of vertices

380: (a \emph{triad} in the social network literature).

381: Fig.~\ref{figtriad}

382: illustrates how these triads can occur in a social

383: network graph induced by a hammock jump.

384: In this case, the width of the hammock jump is 2, and what looks like

385: two relationships becomes three in the social network graph.

386: Notice that, in this example, if the hammock jump width were three,

387: then the resulting social network would not have a triad and so

388: neither edge would represent a strong tie.

389: It is a classical argument in social network theory that

390: no strong tie can be a bridge

391: and that two strong ties would imply a third tie~\cite{weak}.

392:

393: Weak ties are of most interest to us, because they are the foundation

394: for our notion of risk.

395: As discussed earlier,

396: a weak tie in a social setting allows people to identify someone with

397: other information that they've been given.

398: Weak ties occur simply because someone knows

399: someone else outside of their usual circle of friends; or perhaps

400: there is a person (an `outsider') who is friends with a few people who

401: each have strong(er) ties to people in different groups.

402:

403:

404: \begin{figure}

405: \centering

406: \begin{tabular}{cc}

407: \mbox{\psfig{figure=powerlaw.eps,width=2.5in}} &

408: \mbox{\psfig{figure=goodloops.eps,width=2.5in}}

409: \end{tabular}

410: \vspace{0.3in}

411: \begin{tabular}{cc}

412: \mbox{\psfig{figure=componentpl.eps,width=2.5in}} &

413: \mbox{\psfig{figure=badloops.eps,width=2.5in}}

414: \end{tabular}

415: \caption{Two different types of induced social networks that can exhibit weak

416: ties. (top left) A dataset with a power-law induces a low-risk

417: social network (top right) where increasing hammock widths cause

418: a `nested clam shells' picture. Each circle in the social network picture

419: denotes a group of people brought together. Increasing hammock widths cause

420: the circles to get progressively smaller.

421: (bottom left) A dataset with power-laws in only subgraphs and a few weak

422: ties induces a high-risk social network (bottom right) characterized by

423: the breakdown of a connected network into disconnected networks.

424: Some experimental data supporting the

425: diagrams above can be found in~\cite{batul-thesis}.}

426: \label{graphs-risks}

427: \end{figure}

428:

429: In recommendation, weak ties originate from the rating patterns of the

430: participants, but the jump process also plays a crucial role. We

431: hypothesize two fundamental rating patterns. One can be observed

432: in the public movie recommendation datasets (MovieLens and EachMovie), and

433: the other is what we would assume for a domain where people have stronger

434: bias in their tastes (such as books or music).

435:

436: The movie datasets exhibit a power-law degree distribution as illustrated

437: in Fig.~\ref{graphs-risks} (top, left). The power-law rating pattern comes from

438: preferential attachment; for example, some movies (the {\it hits}) are

439: rated by almost everyone, and some people (the {\it buffs}) rate almost

440: all movies. Weak ties are rare in this setting but might occur when a person

441: shows no strong allegiance to any genre and rates relatively few movies in

442: each (he/she would not be a buff). The real risk to these people is that

443: they might not have enough ratings in common with anyone so they can be

444: given recommendations.

445:

446: The second rating pattern would occur where most people exhibit a preference

447: for a particular kind of artifact. This is illustrated in Fig.~\ref{graphs-risks} (bottom, left) where there are three subgraphs with power-law structures,

448: connected by a relatively small number of ratings. This diagram illustrates

449: one source of weak tie in this setting, which is when someone who

450: ordinarily only rates artifacts in one domain (e.g., networking books)

451: rates an artifact in another domain (e.g., Indian classical music books).

452: Another

453: possibility is someone with more eclectic tastes who rates artifacts across

454: many domains, and unlike in the power-law graph is truly a weak tie. The risk

455: with weak ties in this rating pattern is that they may allow us to

456: identify a person whose ratings can get us from one domain to another.

457:

458: The jump process, described in Section~\ref{jc-model},

459: can also create weak ties when using common ratings as the basis for making

460: connections between people.

461: Many people might have rated

462: across several domains, but only a few have enough ratings to satisfy the

463: jump being used. A final reason relates to merging of data collected from

464: different settings. For instance, the recent purchase of eToys consumer

465: data by another retail giant signals the possibility of the creation of

466: a social network graph with weak ties.

467:

468: The risk in a weak tie really comes from being the only person with a peculiar

469: rating pattern --- there is safety in numbers, or at least in

470: homogeneous tastes (as in power-law graphs).

471: The more people who rate the same kinds of things, the less likely

472: that any one of them will be identifiable as participating in a weak tie.

473: But notice that if the jump definition weeds some of those people out,

474: the risk is still there (although it is less likely that any additional

475: information could be used to identify a single person).

476:

477: \section{The Benefits and Perils of Personalization}

478: \label{benefits-risks}

479: Intuitively, a user desires the most benefit from a recommendation that is

480: based on wider hammocks and shorter path lengths. Of course, to get these

481: qualities we have to provide more ratings, and the risk is that we might

482: introduce a weak tie. The problem then is can we relate how much

483: we rate to the benefit and risk inherent in recommendation?

484:

485: \begin{table}

486: \caption {Movies used in analyzing the benefits of ratings on personalization.

487: `Star Wars' and `Scream of Stone' had the highest and lowest number of ratings,

488: respectively.}

489: %\hspace{25mm}

490: \centering

491: \vspace{0.07in}

492: \begin{tabular}{|l|r|} \hline\hline

493: \emph{Movie Name} & \emph {Number of Ratings} \\ \hline

494: Star Wars & 583\\ \hline

495: Tommorrow Never Dies & 180 \\ \hline

496: Robin Hood: Men in Tights & 56 \\ \hline

497: Scream of Stone & 1 \\ \hline

498: \end{tabular}

499: \label{movies-listing}

500: \end{table}

501:

502: When there are multiple recommendation paths between a given combination

503: of person and movie, we would like a benefit formula that captures our

504: preference for wider hammocks and shorter path lengths.

505: By defining the benefit of a recommendation as:

506: $$\mathrm{benefit} = {w \over{l^2}}$$

507: we can give

508: more weight to improvements in path length from $2$ to $1$ than,

509: say, from $3$ to $2$. This non-linear dependence of quality of interaction

510: on the length is supported by research in diffusion processes~\cite{watts1},

511: social networks~\cite{weak} and also our own experiments (see Fig.~\ref{expt1},

512: right).

513:

514: We can explore benefit in terms of the number of artifacts that are rated.

515: Typically, recommender systems require that users rate a minimum number of

516: artifacts before they can make queries, and so we look at the incremental

517: benefit received by providing additional ratings. For this purpose, we

518: analyze the MovieLens dataset where it is required that a user rate 20 movies,

519: and add a new person. The MovieLens dataset consists of 943 people, 1682

520: movies, and is connected as a graph.

521:

522: For the experiment, we introduce a 944th person and incrementally add ratings

523: from the new person to movies (so that movies with a higher rating

524: frequency were more likely to be rated). After each rating was added the path

525: lengths $l$ to particular movies (see Table~\ref{movies-listing}) were computed

526: for each hammock width $w$. Twenty repetitions were performed for each

527: additional rating.

528:

529: \begin{figure}

530: \centering

531: \begin{tabular}{cc}

532: & \mbox{\psfig{figure=densityplot.eps,width=2.3in}}

533: \end{tabular}

534: \caption{Benefit vs. number of additional ratings required, for various

535: choices of movie destination nodes. The cells are colored with greater

536: intensities corresponding to movies with fewer ratings.}

537: \label{expt3}

538: \end{figure}

539:

540:

541: The benefit from additional ratings for the movies in

542: Table~\ref{movies-listing} is shown in Fig.~\ref{expt3}. Each colored cell

543: indicates that the particular benefit is possible for the corresponding

544: number of ratings. The feasible benefit regions are actually monotonically

545: increasing by popularity of the movie --- with more possibilities for

546: `Star Wars' than for `Tomorrow Never Dies.' The plot shows that if you

547: want a good recommendation for a less popular movie, you need to provide more

548: ratings, but can receive good recommendations for popular movies with

549: fewer ratings. In particular, requesting an improvement in benefit for

550: a `Star Wars' recommendation from

551: 5 to 14 requires no extra ratings!

552:

553: \begin{figure}

554: \centering

555: \begin{tabular}{cc}

556: & \mbox{\psfig{figure=small-world.eps,width=5in}}

557: \end{tabular}

558: \caption{Random rewiring, starting from a regular wreath network, introduces

559: weak ties that help model small-world graphs.

560: Figure adapted from \cite{watts1}.}

561: \label{small}

562: \end{figure}

563:

564: \begin{figure} \centering

565: \begin{tabular}{cc}

566: \mbox{\psfig{figure=small-world-graphs.eps,width=2.6in}} &

567: \mbox{\psfig{figure=expt2.epsi,width=2.6in}}\\

568: \end{tabular}

569: \caption{(left) Average path length and clustering coefficient versus

570: the rewiring probability $p$ (from \cite{watts1}). All measurements are

571: scaled w.r.t. the values at $p = 0$. (right) Quantifying the risk

572: as a function of rewiring probability $p$.}

573: \label{smgs}

574: \end{figure}

575:

576: The danger involved

577: in recommendation relates to the probability that a weak connection is

578: exposed; unfortunately, this is not a static property of a recommendation

579: path and can only be studied in reference

580: to the social network graph {\it in the absence} of the considered

581: connection. This means that we need a more complete understanding of the

582: dynamics by which weak ties are introduced, modeled, and employed in a social

583: network. Such an understanding

584: could take the form of a graph-generation

585: model. Here we use the model of

586: Watts and Strogatz \cite{watts1} as a basis for our study of risk.

587:

588: The intuition is that risk occurs when we have edges that are weak ties between

589: subgraphs that are cliques (or at least nearly so), and the risk decreases

590: as more of these edges are added. In particular, the risk is highest when

591: a new weak tie occurs and the lengths between people decrease dramatically.

592: As more weak ties are added, the risk decreases.

593:

594: This idea of risk can be explored in the

595: Watts-Strogatz model for small-world networks. They show how to

596: generate a spectrum of graphs from a regular wreath graph by adjusting

597: the probability $p$ of rewiring an edge

598: (Fig.~\ref{small}). When $p$ is zero, we have the wreath; but when $p$ is

599: one, we have a random graph. The risk from weak ties is low in both the wreath

600: and random graphs, but increases as the average path length drops but

601: the vertices are still clustered. Fig.~\ref{smgs} (left) illustrates

602: the relationship between length and clustering (see~\cite{watts1} for

603: details of the definitions). When $p$ is between $0$ and $0.1$, the

604: graph is a small-world network, and poses the most risk from weak ties.

605:

606: We can express the risk of weak ties in terms of $p$: the risk in having ratings

607: that form weak ties can be quantified as the rate at which $l$ reduces, as

608: a function of $p$:

609: $$\mathrm{risk} = {- {{\partial l} \over {\partial p}}}$$

610:

611: The risk for the dynamics described in Fig.~\ref{small} is given in

612: Fig.~\ref{smgs}, right (the length values are

613: scaled with respect to the length at $p=0$ before calculating the risk).

614: Notice that the risk increases

615: rapidly (as weak ties

616: are introduced) and drops down gradually (as more weak ties share

617: responsibility for length reduction). This captures our intuition pertaining

618: to disclosure of sensitive information by ferreting out weak ties.

619: However, our jumping connections model is not directly parameterized by $p$.

620:

621: To be useful, the above formula for risk must relate length reduction to

622: a metric that could be used to balance personalization and privacy. We

623: can illustrate the risk of becoming a weak tie by studying what happens

624: as we decrease the hammock width $w$ in Fig.~\ref{graphs-risks} (bottom).

625: Consider the situation when the social network graph is in three disconnected

626: components.

627: Decreasing the hammock width would introduce new edges that

628: are weak ties, which would contribute to length reduction and thus,

629: quantification of risk. However, recall that increasing the hammock width is

630: desirable from the viewpoint of benefit. Taken another way, benefits improve

631: monotonically with increasing width $w$ but risk rises rapidly (as

632: fewer weak ties share responsibility for length reduction) upto a point and

633: then drops sharply.

634:

635: \begin{figure}

636: \centering

637: \begin{tabular}{cc}

638: & \mbox{\psfig{figure=expt4.ps,width=3in}}

639: \end{tabular}

640: \caption{Risk as a function of hammock width $w$.}

641: \label{expt4}

642: \end{figure}

643:

644: To explore this setting, we created an artificial dataset that

645: consists of three subgraphs with power-law degree distributions, each

646: with 200 people and 75 artifact vertices. Each person node is linked to

647: at most 15 artifact nodes within the same subgraph. Specifically, the

648: people and artifacts were ordered, and the $b^{th}$ person rated the first

649: $\lceil 75 b^{-\epsilon} \rceil$ artifacts.

650: The value of $\epsilon$

651: was calibrated to achieve a minimum rating of $15$ artifacts. Then three

652: extra people were added who rate

653: (at most 15)

654: artifacts in all three connected components, again with a

655: `master' power-law.

656:

657: For a hammock width of 9, the social network of this graph

658: consists of three disconnected components. By decreasing the hammock

659: width, weak ties will be introduced into the social network, and the

660: path lengths decrease. The results are plotted in Fig.~\ref{expt4}

661: (lengths are scaled against the path length for $w=8$). As could

662: be expected, risk is highest when the graph is first connected.

663:

664: It is not possible to provide a traditional benefit-risk profile, as is

665: customary in analysis. This is because recommender systems aggregate the

666: ratings of many participants when computing a recommendation. A user's

667: benefits comes from `plugging into' the social network by

668: providing a sufficient number of ratings, but a user's risk depends

669: not only on what is rated but also

670: on what other people rate. Ultimately, the difficulty

671: comes from the fact that risk occurs even if recommendation queries are

672: not made, but benefit requires that the user make queries.

673:

674: The two qualitative conclusions from our studies are that (i) a few

675: weak ties are more risky than a lot of weak

676: ties, and (ii) more so, in some (induced) social networks than others.

677:

678: \section{Concluding Remarks}

679: \label{bigger-pictures}

680: The very factors that make weak ties useful are the ones that

681: raise the threat of privacy. We have demonstrated that under certain conditions,

682: recommendations could involve weak ties and could potentially compromise

683: the privacy of individuals. Like most problems in computer security, the

684: ideal deterrents are better awareness of the issues and more openness in

685: how recommender systems operate in the market place. In particular, policies

686: and methodologies employed by an individual site should be made clear.

687: Sites that involve multiple homogeneous networks have a crucial responsibility

688: in clarifying the role of weak ties in their system designs and what forms of

689: mechanisms are in place to thwart hackers.

690:

691: Ideally, recommender systems should convey to the user both benefits and risks

692: in an intuitive manner. One possibility is to present the user with plots of

693: benefit and risk versus user-modifiable parameters --- ratings, $w$, and $l$

694: (if the algorithm allows their direct specification). Another possibility is

695: to qualify the risks and benefits associated with rating each individual

696: artifact (as a function of the previous ratings in the system). Providing

697: a rating for `Scream of Stone' for instance would provide dramatic improvements

698: in benefit than providing a rating for `Star Wars.' At the same time,

699: the system should qualify the extent to which a user becomes a weak tie,

700: by such a rating.

701:

702: Singh and colleagues \cite{singh-cacm} make a provoking observation in

703: drawing comparisons from community-based networks to recommender systems ---

704: namely, that people really want to control to whom they reveal their ratings

705: but would like to know how recommendations are being made. In a distributed

706: setting, one can imagine a scenario where people specify how data collected

707: from their interactions should be modeled and used. Interfaces for

708: privacy management are woefully inadequate and their role is only now

709: being recognized \cite{etzioni-cacm}. Extending the results here to

710: a distributed setting where people can set arbitrary constraints on their

711: station in the social network graph (whether they are willing to participate

712: in a path?; are there constraints on such participation?; would they provide

713: ratings if they knew that it would contribute to a weak tie?) is a possible

714: direction for future research.

715:

716: One wonders if weak ties will happen at all, if concerns are raised about

717: their compromise. Social network theory postulates that they are

718: the primary mechanisms by which micro-level interactions can manifest at

719: macro levels, and that such ties will be kindled whenever communities have

720: to be mobilized for collective action. It remains to be seen if weak ties

721: induced by jumps in a recommender system also conform to similar

722: distributed organization.

723:

724: \bibliographystyle{plain}

725: %\bibliographystyle{named}

726: \bibliography{ppp}

727:

728: \end{document}

729:

730: