cs0612046/digg2.tex
1: \documentclass{acm_proc_article-sp}
2: %\documentclass{sig-alternate}
3: %\documentclass[letterpaper, 10 pt, conference]{ieeeconf}
4: \usepackage{graphicx}
5: \usepackage{wrapfig}
6: %\usepackage{epsfig}
7: 
8: \long\def\comment#1{}
9: \newcommand{\commentout}[1]{}
10: 
11: \comment{
12: %margins
13: \setlength{\topmargin}{0.0in}
14: 
15: \setlength{\headheight}{0.0in}
16: 
17: \setlength{\headsep}{0.0in}
18: 
19: \setlength{\textheight}{9in}
20: 
21: \setlength{\oddsidemargin}{0in}
22: 
23: \setlength{\textwidth}{6.5in}
24: 
25: \setlength{\parindent}{0in}
26: 
27: \setlength{\parskip}{2ex}
28: 
29: \renewcommand{\baselinestretch}{1}
30: 
31: \newcommand{\mysection}[1]{\vspace{-8pt}\section{\hskip -1em.~~#1}\vspace{-3pt}}
32: \newcommand{\mysubsection}[1]{\vspace{-3pt}\subsection{\hskip -1em.~~#1}
33:         \vspace{-3pt}}
34: \newcommand{\mysubsubsection}[1]{\vspace{-3pt}\subsubsection{\hskip -1em.~~#1}
35:         \vspace{-3pt}}
36: \newcommand{\myparagraph}[1]{\vspace{-3pt}\paragraph{#1}
37:         \vspace{-3pt}}
38: 
39: 
40: }
41: \newcommand{\denselist}{
42:       \setlength{\itemsep}{0pt}
43:       \setlength{\parsep}{1.5pt}
44:       \setlength{\topsep}{1.5pt}
45:       \setlength{\parskip}{2pt}
46:       \setlength{\partopsep}{0pt}
47:       \setlength{\labelwidth}{1em}
48:       \setlength{\labelsep}{0.5em} }
49: 
50: 
51: \newcommand{\bdesc}{\begin{description}\denselist}
52: \newcommand{\edesc}{\end{description}}
53: 
54: 
55: 
56: 
57: \newcommand{\etc}{{\em etc.}}
58: \newcommand{\eg}{{\em e.g.}}
59: \newcommand{\ie}{{\em i.e.}}
60: \newcommand{\noi}{\noindent}
61: \newcommand{\type}[1]{{\sc #1}}
62: \newcommand{\semtype}[1]{\textsf{#1}}
63: \newcommand{\pattern}[1]{[#1]}
64: \newcommand{\source}[1]{\emph{#1}}
65: 
66: \newcommand{\secref}[1]{Section~\ref{#1}}
67: %\newcommand{\eqref}[1]{Equation~\ref{#1}}
68: \newcommand{\figref}[1]{Figure~\ref{#1}}
69: \newcommand{\tabref}[1]{Table~\ref{#1}}
70: 
71: 
72: \begin{document}
73: 
74: %Role of Social Networks in Collaborative Information Filtering
75: %Dynamics of Collaborative Content Ranking
76: \comment{\title{\LARGE \bf Social Networks and Social Information
77: Filtering on Digg}
78: \author{Kristina Lerman\\
79: University of Southern California \\
80: Information Sciences Institute\\
81: 4676 Admiralty Way\\
82: Marina del Rey, California 90292\\
83: lerman@isi.edu }
84: 
85: }
86: 
87: \title{Social Networks and Social Information Filtering on Digg
88: %\titlenote{}
89: }
90: \numberofauthors{1}
91: 
92: \author{
93: \alignauthor
94: Kristina Lerman\\
95:        \affaddr{University of Southern California }\\
96:        \affaddr{Information Sciences Institute} \\
97:        \affaddr{4676 Admiralty Way}\\
98:        \affaddr{Marina del Rey, California 90292}\\
99:        \email{lerman@isi.edu}
100: }
101: 
102: 
103: \maketitle %\pagestyle{empty} \thispagestyle{empty}
104: 
105: 
106: 
107: \begin{abstract}
108: The new social media sites --- blogs, wikis, Flickr and Digg, among
109: others --- underscore the transformation of the Web to a
110: participatory medium in which users are actively creating,
111: evaluating and distributing information. Digg is a social news
112: aggregator which allows users to submit links to, vote on and
113: discuss news stories. Each day Digg selects a handful of stories to
114: feature on its front page. Rather than rely on the opinion of a few
115: editors, Digg aggregates opinions of thousands of its users to
116: decide which stories to promote to the front page.
117: 
118: Digg users can designate other users as ``friends'' and easily track
119: friends' activities: what new stories they submitted, commented on
120: or read. The friends interface acts as a \emph{social filtering}
121: system, recommending to user stories his or her friends liked or
122: found interesting. By tracking the votes received by newly submitted
123: stories over time, we showed that social filtering is an effective
124: information filtering approach. Specifically, we showed that (a)
125: users tend to like stories submitted by friends and (b) users tend
126: to like stories their friends read and liked. As a byproduct of
127: social filtering, social networks also play a role in promoting
128: stories to Digg's front page, potentially leading to ``tyranny of
129: the minority'' situation where a disproportionate number of front
130: page stories comes from the same small group of interconnected
131: users. Despite this, social filtering is a promising new technology
132: that can be used to personalize and tailor information to individual
133: users: for example, through personal front pages.
134: 
135: 
136: \end{abstract}
137: 
138:  \keywords{ Social Network analysis; collaborative
139: filtering; social filtering}
140: 
141: \comment{\noindent \textbf{Keywords:}Social Network analysis;
142: collaborative filtering; social filtering }
143: 
144: 
145: \section{Introduction} The label ``social media'' has been
146: attached to many Web sites --- blogs, MySpace, Flickr, del.icio.us,
147: Wikipedia --- whose content is primarily user driven. The recent
148: rise of social media sites underscores the transformation of the Web
149: and how it is being used. Rather than searching for and passively
150: consuming information found on Web pages, users are now actively
151: creating, evaluating and distributing information. Newer scripting
152: technologies and software tools allow anyone to seamlessly add
153: content to a Web site --- a new blog entry, or a change to an
154: existing article, an image, a link, a vote or feedback comment ---
155: without being familiar with HTML or the underlying technologies used
156: by that site. Most of the sites also include a social networking
157: component, which enables users to build personal social networks by
158: designating other users as ``friends'' or ``contacts'' in order to
159: gain access to friends' activities. For example,
160: Flickr~\cite{flickrurl} allows users to see in real time new images
161: posted by friends. Another distinctive feature of the social media
162: sites is their transparency. Every username, every descriptive tag
163: is a hyperlink that can be used to navigate the site, and unless it
164: has been designated private, all content is publicly viewable and in
165: some cases, modifiable.
166: 
167: 
168: Many Web sites that provide information (or sell products or
169: services) use collaborative filtering technology to suggest relevant
170: documents (or products and services) to its users. Amazon and
171: Netflix, for example, use collaborative filtering to recommend new
172: books or movies to its users. Collaborative filtering-based
173: recommendation systems~\cite{Konstan97grouplens} try to find users
174: with similar interests by comparing their opinions about products.
175: They will then suggest new products that were liked by other users
176: with similar opinions. Recommender systems based on \emph{social
177: filtering}, on the other hand, suggest new products or documents
178: simply based on whether the user's designated friends found these
179: products or documents interesting. Researchers in the past have
180: recognized that social networks present in the user base of the
181: recommender system can be induced from the explicit and implicit
182: declarations of user interest, and that these social networks can in
183: turn be used to make new recommendations~\cite{perugini04}. To the
184: best of our knowledge, social media sites are the first systems to
185: directly use social networks for social filtering.
186: 
187: In this paper we show that social filtering on Digg, a social news
188: aggregator, is an effective recommendation system. Specifically, we
189: show that Digg users tend to be interested in the news stories their
190: friends find interesting. We also study the effect social filtering
191: has on the organization of stories on Digg, including unintended
192: consequences such as ``tyranny of the minority.'' We compare Digg
193: with Reddit, another social new aggregator that, unlike Digg, uses
194: collaborative filtering to recommend news stories to its readers.
195: Reddit's type of filtering appears to be much weaker, promoting
196: stories that users do not find interesting. Although social
197: filtering, as practiced by Digg, has recently come under fire, we
198: believe it to be a promising technology that will lead to new
199: generation of personalization and recommendation algorithms.
200: 
201: 
202: 
203: %\begin{wrapfigure}{l}{.60\linewidth}
204: \begin{figure*}[tbhp]
205:   \center{\includegraphics[width=6in]{digg}\\}
206:   \caption{Digg.com homepage showing front page technology stories}\label{fig:homepage}
207: \end{figure*}
208: 
209: \section{Structure of Digg}
210: 
211: Digg~\cite{diggurl} is arguably one of the most successful social
212: news aggregators. Its functionality is very simple. Users submit
213: links to stories they find online, and other users vote on these
214: stories.
215: %Voting a story up is called ``digging'' it, voting it down is called
216: %``burrying'' it.
217: When a story gets enough positive votes, or diggs, it is promoted to
218: the front page. The front page is what users see on the Digg home
219: page, while the newly submitted stories are less visible, being
220: ``hidden'' in the Upcoming stories pages.
221: 
222: A typical Digg page is shown in \figref{fig:homepage}. Each contains
223: a list of 15 stories. The stories are in reverse chronological order
224: of being submitted (for the upcoming stories queue) or promoted (for
225: the front page stories), with most recent stories appearing at the
226: top. The story's title is a link to the source, while clicking on
227: the number of diggs takes one to the page describing the story's
228: activity on digg: the discussion around it, the list of people who
229: dugg it, etc. Digg also allows users to designate other users as
230: friends. Digg makes it easy to track friends' activities. The left
231: column on the home page summarizes the number of stories the friends
232: have submitted, commented on or liked recently. It even has a handy
233: feature to see the stories at least two friends have liked (``agreed
234: on''). All these stories are also are flagged with a green ribbon
235: (see fourth story in \figref{fig:homepage}) making them easy to
236: spot. Tracking activities of friends is common feature in many
237: social media sites and is one of the major draws attracting users to
238: these sites. It offers a new paradigm for interacting with
239: information --- social filtering. Rather than actively searching for
240: new interesting content, or subscribing to a set of predefined
241: topics, users can now put other people to task of finding and
242: filtering information for them.
243: 
244: 
245: %Digg's friends interface, in fact, implements social filtering. In
246: %this paper we show that social filtering is an effective
247: %recommendation system on Digg by showing that users tend to be
248: %interested in the stories their friends find interesting. We
249: %decompose this into two claims: (1) users digg stories their friends
250: %submit, and (2) users digg stories their friends digg. In this paper
251: %we present evidence to support these claims, and study the effect
252: %that social filtering has on the organization of stories on Digg,
253: %including unintended consequences such as ``tyranny of the
254: %minority''. We compare Digg with Reddit, another social new
255: %aggregator that, unlike Digg, uses collaborative filtering to
256: %recommend new stories to its users.
257: 
258: 
259: 
260: 
261: Digg selects a handful of stories each day to feature on its front
262: page. Getting to the front page is important to users, because it
263: increases the story's visibility (most people who go to Digg only
264: read the front page stories), as well as the visibility of the user
265: who submitted the story. In fact, Digg ranks users based on how many
266: of their stories made it to the front page, and improving one's rank
267: has become a competitive sport. Although the exact formula for how a
268: story is promoted to the front page is kept secret, so as to prevent
269: users from ``gaming the system'' to promote bogus stories, it
270: appears to take into account the number of diggs a story gets and
271: the rate at which it gets them. The mechanism by which the stories
272: are promoted, therefore, does not depend on the decision of one or
273: few editors, but emerges from the activities of many users. We are
274: interested in studying the mechanism by which such consensus emerges
275: and the role social networks play in them.
276: 
277: % rest of the paper
278: 
279: %\section{Dynamics of emergent ratings} \label{sec:dynamics}
280: \section{Dynamics of diggs} \label{sec:dynamics}
281: 
282: In order to see how consensus emerges from independent decisions
283: made by many users, we tracked both new and front page stories in
284: the technology category. We collected data by scraping Digg site
285: with the help of Web wrappers, created with tools provided by Fetch
286: Technologies:
287: 
288: \begin{description}
289:   \item[digg-frontpage] wrapper extracts a list of stories
290: from the first 14 pages of the home page. For each story, it
291: extracts submitter's name, story title, time submitted, number of
292: diggs and comments the story received.
293: 
294:   \item[digg-all] wrapper extracts a list of stories
295: from the first 20 pages in the Upcoming stories queue. For each
296: story, it extracts the submitter's name, story title, time
297: submitted, number of diggs and comments the story received.
298: 
299:   \item[digg-with-history] wrapper extracts the same information as
300:   digg-frontpage wrapper, along with the list of the first 216 users
301:   who dugg the story.
302: 
303:   \item[top-users] wrapper extracts information about the first 1020 recently active users.
304:   Since Digg ranks users by how many stories they have on the front page,
305:   we collect information about 1020 of the top ranked users.
306:   For each user, it extracts the number of stories
307:   that user has submitted, commented on, and dugg; number of stories that have been promoted to
308:   the front page; number of profile views; time account was
309:   established; users's rank; the list of
310:   friends (contacts), as well as reverse friends or ``people who
311:   have befriended this user.''
312: \end{description}
313: 
314: \emph{Digg-frontpage} and \emph{digg-all} wrappers were executed
315: hourly over a period of a week in May 2006. \emph{Top-users} wrapper
316: was executed at the same time to gather a snapshot of the social
317: network of the top Digg users.
318: 
319: %\subsection{Dynamics of ratings} \label{sec:analysis}
320: 
321: \begin{figure*}[tbh]
322: \begin{tabular}{cc}
323:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
324:   \includegraphics[height=1.9in]{diggs-ts}  &
325:   \includegraphics[height=1.9in]{maxdiggs}  \\
326: %    \epsfxsize = 3.0in \epsffile{diggs-ts.eps} &
327:  %   \epsfxsize = 3.0in \epsffile{maxdiggs.eps} \\
328:   (a) & (b) \\
329: \end{tabular}
330: \caption{(a) Dynamics of ratings (diggs) of select stories that have
331: been promoted to the Digg front page. Dashed lines correspond to
332: stories submitted by users whose rank was greater than 1020, while
333: solid lines correspond to stories submitted by users whose rank was
334: less than 35. (b) Maximum number of diggs attained by a story during
335: the period of observation vs submitter's rank. Symbols on the right
336: axis correspond to low-rated users with rank$>1020$.}
337: \label{fig:diggs}
338: \end{figure*}
339: 
340: We identified stories that were submitted to Digg over the course of
341: approximately one day and followed these stories over a period of
342: six days. Of the $2858$ stories that were submitted by $1570$ users
343: during this time period, only 98 stories by 60 different users made
344: it to the front page. \figref{fig:diggs}(a) shows evolution of the
345: ratings (number of diggs) of select stories. The basic dynamics of
346: all the stories appears the same. A story accrues diggs at some
347: rate. Once it is promoted to the front page, it accumulates diggs at
348: a much faster rate. \comment{The growth in the number of diggs is
349: impressive, considering that a front page story gets 300 views for
350: every digg it receives \cite{blog article}.} As the story ages,
351: accumulation of new diggs slows down, and the story's rating
352: saturates at some value. We will call the maximum diggs a story
353: accrues its ``interestingness'', as it reflects how interesting the
354: story is to the general audience.
355: 
356: It is worth noting that top rated users are not submitting stories
357: that get the most diggs. This is shown graphically in
358: \figref{fig:diggs}(a) where stories submitted by low-rated users
359: (with rank$>1020$) are shown as dashed lines, while solid lines
360: represent stories submitted by top-rated users.
361: \figref{fig:diggs}(b) shows the maximum diggs attained by stories in
362: our dataset vs rank of the submitter (the lower the rank, the more
363: successful the user). Slightly more than half of the stories came
364: from 14 top-rated users (rank$<25$) and 48 stories came from 45
365: low-rated users. The mean ``interestingness'' of the stories
366: submitted by the top-rated users is $600$, almost half the average
367: ``interestingness'' of the stories submitted by low-rated users. A
368: second observation is that top-rated users are responsible for
369: multiple front page stories. A look at the statistics about top
370: users provided by Digg shows that this is generally the case: of the
371: more than $15,000$ front page stories submitted by the top 1020
372: users, the top $3\%$ of the users are responsible for $35\%$ of the
373: stories.
374: 
375: \section{Social networks and social filtering} If top-ranked users do not
376: submit the most interesting stories, why are they so successful? We
377: believe that social filtering play a role in promoting stories to
378: the front page. As we explained above, Digg's interface allows users
379: to designate others as ``friends'' and easily keep track of friends'
380: activities: the stories they have submitted, commented on or dugg.
381: We believe that users use this feature to filter the tremendous
382: number of new submissions on Digg. We show this by analyzing two
383: sub-claims: (a) \emph{users digg stories their friends submit}, and
384: (b) \emph{users digg stories their friends digg}.
385: 
386: 
387: \begin{figure*}[tbh]
388: \begin{tabular}{cc}
389:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
390:   \includegraphics[height=2.0in]{scatterplot}  &
391:   \includegraphics[height=2.0in]{diggs-history}\\
392: %\epsfxsize = 3.0in \epsffile{scatterplot.eps} &
393: %\epsfxsize = 3.0in \epsffile{correlation.eps}\\
394:   (a) & (b) \\
395: \end{tabular}
396:   \caption{(a) Scatter plot of the number of friends vs reverse friends for the top 1020 Digg users.
397:   (b) Number of diggers who are also among the reverse friends of the user who submitted the
398: story}\label{fig:scatterplot}
399: \end{figure*}
400: 
401: Note that the ``friend'' relationship is not symmetric: if user A
402: designates user B as a friend, user A can keep track of user B's
403: activities, but not vice versa. This makes A the \emph{reverse
404: friend} of B. \figref{fig:scatterplot}(a) shows the scatter plot of
405: the number of friends vs reverse friends of the top 1020 Digg users
406: as of May 2006. Black symbols correspond to the top 33 users. For
407: the most part, users appear to take advantage of Digg's social
408: networking feature, with the top users having bigger social
409: networks. Users below the diagonal are watching more people than are
410: watching them (fans), while users above the diagonal are being
411: watched by more other users than they are watching (celebrities).
412: Two of the biggest celebrities are users marked $a$ and $b$ on
413: \figref{fig:scatterplot}(a). These users are $kevinrose$ and
414: $diggnation$, respectively, one of the founders of Digg and a
415: podcast of the popular Digg stories.
416: 
417: 
418: \comment{
419: % role of social networks in ratings
420: A user's success rate is defined as the fraction of the stories the
421: user submitted that have been promoted to the front page. We use the
422: statistics about the activities of the top 1020 users. In our
423: analysis, we only include users who have submitted 50 or more
424: stories (514 users). Users' mean success rate vs the size of their
425: social network is shown in \figref{fig:scatterplot}(b). Although the
426: error bars are large, there is a significant correlation between the
427: size of the user's social network (specifically,  number of reverse
428: friends) and user's success rate.
429: 
430: 
431: \begin{figure*}[tbh]
432: \begin{tabular}{cc}
433:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
434:   \includegraphics[width=3.0in]{diggs-history}  &
435: %\epsfxsize = 3.0in \epsffile{diggs-history.eps} &
436: 
437: \\
438:   (a) & (b) \\
439: \end{tabular}
440:   \caption{}\label{fig:diggs-history}
441: 
442: \end{figure*}
443: 
444: }
445: 
446: \subsection{Users digg stories their friends submit} In order to
447: show that users digg stories their friends submit, we used
448:  \emph{digg-with-history} wrapper to collect
449: 195 front stories, each with a list of the first 216 users who dugg
450: the story ($15,742$ unique users total). The name of the submitter
451: is first on the list.
452: 
453: We can compare the list of users who dugg the story, or any portion
454: of this list, with the list of reverse friends of the submitter.
455: \figref{fig:scatterplot}(b) shows the number of diggers of a story
456: who are also among the reverse friends of the user who submitted the
457: story, for all 195 stories. Dashed line shows the size of the social
458: network (number of reverse friends) of the submitter. More than half
459: of the stories (102) were submitted by users with one or more
460: reverse friends, and the rest by unknown users.\footnote{These users
461: have rank $>1020$ and were not listed as friends of any of the 1020
462: users in our dataset. It is possible, though unlikely, that they
463: have reverse friends.} Thin solid line shows how many people who
464: list the submitter as a friend dugg the story within the first 215
465: diggs. All but two of the stories (submitted by SearchEngines with
466: 21 reverse friends) were dugg by submitter's reverse friends. We use
467: simple combinatorics~\cite{Papoulis} to compute the probability that
468: $k$ of the submitter's friends could have dugg the story purely by
469: chance. The probability that after picking $n=215$ users randomly
470: from a pool of $N=15,742$ you end up with $k$ that came from a group
471: of size $K$ is $ P(k,n)={n\choose k} (p)^k (1-p)^{n-k}$, where
472: $p=K/N$. Using this formula, the probability (averaged over stories
473: dugg by at least one friend) that the observed numbers of friends
474: dugg the story by chance is $P=0.005$, making it highly
475: unlikely.\footnote{If we include in the average the two stories that
476: were not dugg by any of the submitter's friends, we end up with a
477: higher, but still significant P=0.023.} Moreover, users digg stories
478: submitted by their friends very quickly. The heavy solid line in
479: \figref{fig:scatterplot}(b) shows the number of reverse friends who
480: were among the first 25 diggers. The probability that these numbers
481: could have been observed by chance is even less --- $P=0.003$. We
482: conclude that users digg stories their friends submit. A consequence
483: of this conclusion is that users with active social networks are
484: more successful in getting their stories promoted to the front page.
485: We believe that this, coupled with the observation that top-ranked
486: users have larger social networks, explains their success.
487: 
488: % duplicates
489: % ....
490: 
491: \subsection{Users digg stories their friends digg} In the previous
492: section we showed that by enabling users to quickly digg stories
493: submitted by friends, social networks play an important role in
494: promoting content to the front page. Do social networks also help
495: users discover interesting stories that were submitted by unknown
496: users (users who are not listed as friends by anyone)? Top users are
497: very active. The top $3\%$ of the 1020 recently active Digg users in
498: our dataset is not only responsible for the disproportionate share
499: of front page stories, but they also submit more than $28\%$ of the
500: stories submitted by the group of 1020 users, and digg $11\%$ and
501: comment on $8\%$ of the stories dugg by and commented on by this
502: group. Once one of these well connected users diggs a story, others
503: within his or her social network will be more likely to read it
504: thanks to the user interface of Digg that quickly allows a user to
505: view stories dugg by friends.
506: 
507: 
508: \comment{
509: % this dataset using week 1 user network data
510: \begin{table}
511:   \centering
512: \begin{tabular}{|ll|c|c|c|c|c|c|}
513:   \hline
514:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
515:   & \textbf{diggers} & \textbf{m=1} & \textbf{m=6} & \textbf{m=16} & \textbf{m=26} & \textbf{m=36} & \textbf{m=46}
516:   \\ \hline
517: (a) & visible to friends & 26  & 60  & 88 & 96  & 100  & 101  \\
518: % dugg by friends (216) & 10 & 20 & 38 & 51 & 55 & 61 \\   \hline
519:  (b) & dugg by friends & 9 & 18 & 29 & 35 & 39 & 47 \\
520:   (c) & probability & 0.002 & 0.017 & 0.047 & 0.069 & 0.061 & 0.071 \\ \hline
521: \end{tabular}
522: }
523: 
524: \begin{figure*}
525:   % Requires \usepackage{graphicx}
526: \begin{tabular}{cc}
527:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
528:   \includegraphics[height=2.0in]{history-socnet}  &
529:   \includegraphics[height=2.0in]{history-diggs}\\
530:   (a) & (b) \\
531: \end{tabular}
532:   \caption{(a) Number of reverse friends of the first $m$ diggers for the stories submitted by unknown users.
533:   (b) Number of friends of the first $m$ diggers who dugg the stories.}\label{fig:history-unknown}
534: \end{figure*}
535: 
536: 
537: 
538: 
539: 
540: \figref{fig:history-unknown} shows how digging activities of
541: well-connected users affect stories submitted by ``unknown'' users.
542: $m=1$ corresponds to the user who submitted the story, while $m=6$
543: corresponds to the story's submitter and the first five users to
544: digg it. Each line is shifted upward with respect to the preceding
545: line to aid visualization. Social networks increase story's
546: visibility. While at the time of submission, only 26 of the 101
547: stories were visible to other users within the submitter's social
548: network ($m=1$), by the time 25 others have dugg the story ($m=26$),
549: all the stories were visible to others through the friends
550: interface.
551: 
552: % dataset using week 5 user network data
553: % 99 stories, submitters had more than 20 friends. All but 2 of the stories were dugg by friends within the first 25 diggs.
554: % $P(m=1)=0.007$ within 25 diggs
555: % 96 stories were submitted by unknown users (fewer than 20 friends)
556: \begin{table*}
557:   \centering
558: \begin{tabular}{|ll|c|c|c|c|c|c|}
559:   \hline
560:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
561:   & \textbf{diggers} & \textbf{m=1} & \textbf{m=6} & \textbf{m=16} & \textbf{m=26} & \textbf{m=36} & \textbf{m=46}
562:   \\ \hline
563: (a) & visible to friends & 34 & 75  & 94 & 96  & 96  & 96  \\
564: % dugg by friends (216) & 10 & 20 & 38 & 51 & 55 & 61 \\   \hline
565:  (b) & dugg by friends & 10 & 23 & 37 & 46 & 49 & 55 \\
566:   (c) & probability & 0.005 & 0.028 & 0.060 & 0.077 & 0.090 & 0.094 \\ \hline
567: \end{tabular}
568: 
569:   \caption{Number of stories posted by ``unknown'' users that were (a) made visible to other users through the
570:   digging activities of well-connected users, (b) dugg by friends of the first $m$ diggers within the next 25 diggs,
571:   and for the stories that were dugg by friends, (c) the average probability that the observed numbers of friends
572:   could have dugg the story by chance }\label{tbl:history-unknown}
573: \end{table*}
574: 
575: 
576: 
577: 
578: Do users digg stories dugg by friends? To answer this question we
579: look at the 25 diggs that come after the first $m$ diggs and see how
580: many of them come from friends of the $m$ diggers.  Only ten of the
581: stories were dugg by submitter's reverse friends. After five more
582: users dugg the stories ($m=6$), 75 became visible to others through
583: the friends interface, and of these 23 were dugg by friends. After
584: 25 users have dugg the story, all 96 stories were visible through
585: the friends interface, and almost half of these were dugg by
586: friends. \tabref{tbl:history-unknown} summarizes the observations
587: and presents the probability that the observed numbers of friends
588: dugg the story by chance. The probabilities for $m=26$--$m=46$ are
589: above the $0.05$ significance level, and possibly reflect the
590: increased visibility the story receives once it makes it to the
591: front page. Although the effect is not quite as dramatic as one in
592: the previous section, we believe that the data shows that users do
593: use the friends interface to find new interesting stories.
594: 
595: 
596: 
597: \section{Comparison with Reddit}
598: Reddit~\cite{redditurl} is another social news aggregator that
599: allows users to submit and vote on stories. Stories that get enough
600: positive votes are then promoted to the ``hot'' page, Reddit's
601: version of the front page. Unlike Digg, Reddit does not have an
602: explicit social networking component which allows a user to track
603: friends's activities or browse another user's network of
604: friends.\footnote{Reddit added \emph{friends} feature in summer of
605: 2006, a month after we collected data from the site. At the time of
606: the paper this feature is fairly rudimentary --- it simply allows
607: the user to quickly spot stories submitted by friends by
608: highlighting them.} Instead, Reddit lets users discover new
609: interesting stories through its recommendation system that uses
610: collaborative filtering to suggest stories that were liked by other
611: users with similar voting patterns. Alternately, a user can browse
612: through the newly submitted stories.
613: 
614: 
615: 
616: 
617: \begin{figure*}[tbh]
618: \begin{tabular}{cc}
619:   % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
620:   \includegraphics[width=3.0in]{reddit}  &
621: \includegraphics[width=3.0in]{comparison} \\
622:   (a) & (b) \\
623: \end{tabular}
624:   \caption{(a) Points accumulated by stories on Reddit's hot page over a period of a day. Square markers show
625:   when the story also appeared on the \emph{new} page. (b) Maximum rating (in points) attained by Reddit stories over
626:   the tracking period compared to the rating they had at the end of the period. }\label{fig:reddit}
627: \end{figure*}
628: 
629: Our dataset consists of statistics extracted from Reddit's new and
630: hot pages over a period of two days in May, 2006. We identified 571
631: stories submitted by 350 users over the course of approximately a
632: day. Of these, 260 stories by 192 users also appeared on the hot
633: (front) page. \figref{fig:reddit}(a) shows how the number of points
634: accumulated by stories on Reddit's hot page changes with time. Note
635: that we were only able to track the stories for up to one day past
636: submission time. At first glance, dynamics looks similar to Digg.
637: Unlike Digg, however, a story often appears on the hot page at the
638: same time it appears on the new page (squares in
639: \figref{fig:reddit}). Also, unlike Digg, Reddit allows people to
640: vote stories down. \figref{fig:reddit}(b) shows the maximum number
641: of points achieved by Reddit stories over a period of about a day
642: and the points these stories had at the end of the period. One can
643: see that the ratings of a substantial number of stories dropped, in
644: many cases to zero, while other stories appeared on the front page
645: with very few points and never went anywhere.
646: 
647: 
648: We were unable to obtain data to measure the effectiveness of
649: Reddit's recommendation algorithm. We can only state that the
650: algorithm Reddit uses to promote stories (which must consider
651: actions of users reading and voting on recommended stories) appears
652: to be less effective than Digg's in that it allows many more
653: ``uninteresting'' stories (whose ratings do not increase) to the
654: front page. This may account for the perception of Reddit as a
655: timelier source of news. On Digg, a story has to accumulate enough
656: votes before it is promoted, which takes time, while on Reddit, many
657: stories appear to be promoted soon after posting, regardless of how
658: many points they have accumulated. Although Reddit does not use the
659: friends system, thus eliminating the possibility of ``bloc voting,''
660: some users appear to be more successful than others in getting their
661: stories promoted. In our dataset, there were an average of 1.4 front
662: page stories per user on Reddit,  compared to 1.6 on Digg.
663: 
664: 
665: \section{Tyranny of the minority?}
666: 
667: The new social media sites offer a glimpse into the future of the
668: Web, where, rather than passively consuming information, users will
669: actively participate in creating, evaluating, and disseminating
670: information. Several such sites, Digg and Flickr, for example, allow
671: users to designate select users as ``friends'' and provide easy
672: interface to track friends' activities. Just as Google
673: revolutionized Web search by exploiting the link structure of the
674: Web --- created independently through the activities of many Web
675: page authors --- to evaluate the contents of information on Web
676: pages, social media sites show that it is possible to personalize
677: search through \emph{social filtering} that exploits the activities
678: of others in the user's social network.
679: 
680: 
681: We studied the role social networks and social filtering play in the
682: collaborative ranking of information. Specifically, we looked at how
683: news stories submitted to Digg are promoted to its front page.
684: Digg's goal is to have only the best of the stories featured on its
685: front page, and it employs aggregated opinion of thousands of its
686: users, rather than a few dedicated editors, to select the best
687: stories. Digg also allows users to create social networks by
688: designating others as friends and provides a seamless interface to
689: track friends' activities: what stories users in their social
690: network submitted, liked, commented on, etc. By tracking stories
691: over time, we showed that social networks play an important role in
692: collaborative information filtering.  Specifically, we showed (a)
693: users tend to like stories submitted by friends and (b) users tend
694: to like stories their friends read and like. This, in a nutshell, is
695: social filtering. Since some users are more active than others,
696: direct implementation of social filtering may lead to ``tyranny of
697: the minority,'' where a lion's share of front page stories come from
698: users with the most active social networks. This appears to be the
699: case for Digg, where visualization of the graph of mutual friends
700: shows a single cluster of composed of users among the 30 top-ranked
701: individuals, giving them an edge in future success. However,
702: precisely because these users are the most active ones, they play an
703: important role in filtering information and bringing to other
704: users's attention stories that would otherwise be buried in the
705: onslaught of new submissions.
706: 
707: \begin{figure}[tbh]
708: \includegraphics[width=3.0in]{maxdiggs_new}
709:   \caption{Maximum number of diggs attained by a front page story
710:   vs submitter's rank. Data was collected from stories submitted to Digg in early November 2006,
711:   after the change in the promotion algorithm. The vertical line divides the set in half.
712:   Symbols on the right hand axis correspond to low-rated users with rank$>1020$.
713:   }\label{fig:maxdiggs_new}
714: \end{figure}
715: 
716: 
717: Recently, a similar finding~\cite{taylorhaywardblog} resulted in a
718: controversy on Digg~\cite{USAToday}, in which users accused a
719: ``cabal'' of top users of automatically digging each other's stories
720: in order to promote them to the front page. The resulting uproar
721: prompted Digg to change the algorithm it uses to promote stories. In
722: order to discourage what was seen as ``gaming'' the system through
723: ``bloc voting,'' the new algorithm ``will look at the unique digging
724: diversity of the individuals digging the story''~\cite{diggblog}.
725: Preliminary results of the stories submitted in early November 2006
726: indicate that algorithm change did achieve the desired effect of
727: reducing the top user dominance on the front page. Our analysis of
728: the November data shows that of the 3015 stories submitted by 1866
729: users over about one day, 77 stories by 63 users were promoted to
730: the front page. \figref{fig:maxdiggs_new} shows the maximum number
731: of diggs received by these stories over a period of six days vs the
732: rank of the submitting user. Compared to \figref{fig:diggs}, front
733: page now has a greater diversity of users, with fewer users
734: responsible for multiple front page stories. In fact, in our data
735: set, there are 1.2 stories per submitting user, compared to 1.6
736: before. Although this may be seen as a positive development, the
737: change in the story promotion algorithm may have some unintended
738: consequences: it may, for example, discourage users from joining
739: social networks because their votes will be discounted. It is too
740: early to see what long term consequences, intended or not, the new
741: algorithm will have.
742: 
743: Rather than being a liability, however, social networks can be used
744: to personalize and tailor information to individual users, and drive
745: the development of new \emph{social search algorithms}. As Digg
746: matures, we expect different sub-communities to arise, each
747: representing users interested in a particular topic or a combination
748: of topics. A single user could belong to several different
749: communities, and use his or her social networks to find and filter
750: interesting new information. For example, Digg can create
751: personalized front pages for every user that are based on his or her
752: friends' readings. This will finally free individuals from ``tyranny
753: of the majority'' which results from viewing a common global front
754: page or best seller list.
755: 
756: In order to be effective for personalizing information, the social
757: networks created by users have to reflect their tastes and
758: interests. Some users appear to accumulate contacts for the sake of
759: having contacts, or reciprocate every request to be added to the
760: contacts list. On Flickr, for example, we have observed some users
761: with over 10,000 contacts. Publicly displaying one's tastes raises
762: many privacy issues, which have yet to be addressed. Promising or
763: perilous, social media appears to be the future of the Web.
764: 
765: 
766: 
767: % Next steps - social search
768: % Issues - privacy
769: % too many links for individuals
770: 
771: 
772: % Adapt2 IIS-0535182
773: % AutoSyn IIS-0413321
774: % Crowds BCS-0527725
775: \paragraph{Acknowledgements} This research is based on work
776: supported in part by the National Science Foundation under Award
777: Nos. IIS-0535182 and IIS-0413321. We are grateful to Dipsy Kapoor
778: for helping with data analysis, and to Fetch Technologies for
779: providing wrapper building and execution tools.
780: 
781: 
782: 
783: \bibliographystyle{plain}
784: \bibliography{../social}
785: 
786: 
787: \end{document}
788: