0104:cs0104009/jc.tex

1: \section{Graph Analysis}

2: \label{jcmodel}

3:

4: To address the four aspects identified in the previous section, we develop a

5: novel way to characterize algorithms for recommender systems.

6: Algorithms are distinguished, not by the predicted ratings of

7: services/artifacts they produce, but by the combinations of people and

8: artifacts that they bring together. Two algorithms are considered

9: equivalent if they bring together identical sets of nodes regardless

10: of whether they work in qualitatively different ways.

11: Our emphasis is on the role of a recommender system as a

12: mechanism for bridging entities in a social network.

13: We refer to this approach of studying recommendation as {\it jumping

14: connections}.

15:

16: Notice that the framework does not emphasize how the recommendation is

17: actually made, or the information that an algorithm uses to make connections

18: (e.g., does it rely on

19: others' ratings, on content-based features, or both?). {\it In addition, we make

20: no claims about the recommendations being `better' or that they will be

21: `better received.'} Our metrics, hence, will not lead a designer to

22: directly conclude that an algorithm $A$ is more accurate than

23: an algorithm $B$; such conclusions can only be made through

24: a field evaluation (involving feedback and reactions from users) or

25: via survey/interview procedures. By restricting its scope to exclude the

26: actual aspect of making ratings and predictions, the jumping connections

27: framework provides a systematic and rigorous way to study recommender systems.

28:

29: Of course, the choice of how to jump connections will be driven by the

30: (often conflicting) desire to reach almost every node in the graph (i.e.,

31: recommend every product for somebody, or

32: recommend some product for everybody)

33: and the strength of the jumps enjoyed when two nodes are brought together.

34: The conflict between these goals can be explicitly expressed in our

35: framework.

36:

37: It should be emphasized that our model

38: doesn't imply that algorithms only exploit

39: local structure of the recommendation dataset. Any mechanism ---

40: local or global --- could be used to jump connections. In fact, it is

41: not even necessary that algorithms employ graph-theoretic notions

42: to make connections. Our framework

43: only requires a boolean test to see if two nodes are brought together.

44:

45: Notice also that when an algorithm brings together person $X$ and artifact

46: $Y$, it could imply either a positive recommendation or a negative one. Such

47: differences are, again, not captured by our framework unless the mechanism

48: for making connections restricts its jumps, for instance, to only those

49: artifacts for which ratings satisfy some threshold. In other words,

50: thresholds for making recommendations could be abstracted into the mechanism

51: for jumping.

52:

53: Jumping connections satisfies all the aspects outlined in the previous

54: section. It involves a social-network model, and thus, emphasizes connections

55: rather than prediction. The nature of connections jumped also aids in

56: explaining the recommendations.

57: The graph-theoretic nature of jumping connections allows

58: the use of mathematical models (such as random graphs) to analyze the

59: properties of the social networks in which recommender algorithms operate.

60:

61: \subsection{The Jumping Connections Construction}

62:

63: %\begin{figure}

64: %\centering

65: %\begin{tabular}{cc}

66: %& \mbox{\psfig{figure=SAVE1.epsi,width=2in}}

67: %\end{tabular}

68: %\caption{A bipartite graph of people and movies.}

69: %\label{jc-dataset}

70: %\end{figure}

71:

72: We now develop the framework of jumping connections. We use concepts from

73: a movie recommender system to provide the intuition; this does not

74: restrict the range of applicability of jumping connections

75: and is introduced here only for ease of presentation.

76:

77: A \emph{recommender dataset} $\mathcal{R}$ consists of the ratings by a

78: group of people of movies in a collection.

79: The ratings could in fact be viewings, preferences, or other

80: constraints on movie recommendations.

81: Such a dataset can be represented as a bipartite graph

82: $G=(P\cup M,E)$ where $P$ is the set of people, $M$ is the set of

83: movies, and the edges in $E$ represent the ratings of movies.

84: We denote the number of people by $N_P=|P|$, and the number of movies

85: as $N_M=|M|$.

86:

87: We can view the set $M$ as a secondary mode that helps make

88: connections --- or jumps --- between members of $P$.

89: A \emph{jump} is a function

90: $\mathcal{J}: \mathcal{R} \mapsto S, S \subseteq P \times P$ that

91: takes as input a recommender dataset $\mathcal{R}$ and returns a set of

92: (unordered) pairs of elements of $P$.

93: Intuitively, this means that the two nodes described in a given pair

94: can be reached from one another by a single jump.

95: Notice that this definition does not prescribe how the mapping should

96: be performed, or whether it should use all the information present in

97: $\mathcal{R}$.

98: We also make the assumption that jumps can be composed in

99: the following sense: if node $B$ can be reached from $A$ in one jump,

100: and $C$ can be reached from $B$ in one jump, then $C$ is reachable

101: from $A$ in two jumps.

102: The simplest jump is the \emph{skip}, which connects two members in

103: $P$ if they have at least one movie in common.

104:

105: A jump induces a graph called a social network graph.

106: The \emph{social network graph} of a recommender dataset

107: $\mathcal{R}$ induced by a given jump $\mathcal{J}$ is a unipartite

108: undirected graph $G_s=(P,E_s)$, where the edges are given by

109: $E_s = \mathcal{J} (\mathcal{R})$.

110: Notice that the induced graph could be disconnected based on the

111: strictness of the jump function.

112: Figure~\ref{jc-intro} (b) shows the social network graph induced

113: from the example in Figure~\ref{jc-intro} (a) using a skip jump.

114:

115: %\begin{figure}

116: %\centering

117: %\begin{tabular}{cc}

118: %& \mbox{\psfig{figure=SAVE2.epsi,width=2in}}

119: %\end{tabular}

120: %\caption{Social network graph for the recommender dataset shown in

121: %Figure~\ref{jc-dataset}.}

122: %\label{jc-social}

123: %\end{figure}

124:

125: We view a recommender system as exploiting the social connections (the

126: jumps) that bring together a person with other people who have rated

127: an artifact of (potential) interest.

128: To model this, we view the unipartite social network of people as a

129: directed graph and reattach movies (seen by each person) such that

130: every movie is a sink (reinforcing its role as a secondary

131: mode).

132: The shortest paths

133: from a person to a movie in

134: this graph can then be used to

135: provide the basis for recommendations.

136: We refer to a graph induced in this fashion as a \emph{recommender

137: graph} (Figure~\ref{jc-intro} (c)).

138: Since the outdegree of every movie node is fixed at zero, paths through the

139: graph are from people to movies (through more people, if necessary).

140:

141: %\begin{figure}

142: %\centering

143: %\begin{tabular}{cc}

144: %& \mbox{\psfig{figure=SAVE3.epsi,width=3in}}

145: %\end{tabular}

146: %\caption{Recommender graph obtained by rendering the social network graph

147: %with

148: %bidirectional edges and reattaching the movies.}

149: %\label{jc-recommender}

150: %\end{figure}

151:

152: The recommender graph of a recommender dataset $\mathcal{R}$ induced

153: by a given jump function $\mathcal{J}$ is a directed graph

154: $G_r=(P \cup M,E_{sd} \cup E_{md})$, where $E_{sd}$ is an ordered set of pairs,

155: listing every pair from $\mathcal{J(R)}$ in both directions, and

156: $E_{md}$ is an ordered set of pairs, listing every pair from $E$ in

157: the direction pointing to the movie mode.

158: %Figure~\ref{jc-intro} illustrates the process of generating the social

159: %network and recommender graphs for our example recommender dataset using

160: %the skip jump function.

161:

162: \begin{figure}

163: \centering

164: \begin{tabular}{cc}

165: & \mbox{\psfig{figure=skip.eps,width=5in}}

166: \end{tabular}

167: \caption{Illustration of the \emph{skip} jump. (a) bipartite graph

168: of people and movies. (b) Social network graph, and (c)

169: recommender graph.}

170: \label{jc-intro}

171: \end{figure}

172:

173: \begin{figure}

174: \centering

175: \begin{tabular}{cc}

176: & \mbox{\psfig{figure=NewHalfBow.epsi,width=3in}}

177: \end{tabular}

178: \caption{The jumping connections construction produces a half bow-tie graph $G_r$.}

179: \label{half-bowtie}

180: \end{figure}

181:

182: \begin{figure}

183: \centering

184: \begin{tabular}{cc}

185: & \mbox{\psfig{figure=hammocks.eps,width=5in}}

186: \end{tabular}

187: \caption{A path of hammock jumps, with a hammock width

188: $w=4$.}

189: \label{hammock-pic}

190: \end{figure}

191:

192: Assuming that the jump construction does not cause $G_r$ to be disconnected,

193: the portion of $G_r$ containing only people is its strongest component:

194: every person is connected to every other person. The movies constitute

195: vertices which can be reached from the strongest component, but from which

196: it is not possible to reach the strongest component (or any other node, for

197: that matter). Thus, $G_r$ can be viewed as a `half bow-tie,'

198: (Figure~\ref{half-bowtie}) as contrasted to the full bow-tie nature of the web,

199: observed by Broder et al.~\cite{bowtie}. The circular portion

200: in the figure depicts the strongly

201: connected component derived from $G_s$. Links out of this portion of the graph

202: are from people nodes and go to sinks, which are movies.

203:

204: \subsection{Hammocks}

205: For a given recommender dataset, there are many ways of inducing the social

206: network graph and the recommender graph.

207: The simplest, the skip, is illustrated in Figure~\ref{jc-intro}.

208: Note that jumping connections provides a systematic way to

209: characterize recommender systems algorithms in the literature.

210: We will focus on one jump called the hammock jump --- a more comprehensive

211: list of jumps defined by different algorithms is explored by

212: Mirza~\cite{batul-thesis}, we do not address them for want of space.

213:

214: A hammock jump brings two people together in $G_s$

215: if they have at least $w$ movies in common in $\mathcal{R}$.

216: Formally, a pair $(p_1, p_2)$ is in $\mathcal{J} (\mathcal{R})$

217: whenever there is a set $M_{(p_1,p_2)}$ of $w$ movies such that there

218: is an edge from $p_1$ and $p_2$ to each element of $M_{(p_1,p_2)}$.

219: The number $w$ of common artifacts is called the hammock width.

220: Figure~\ref{hammock-pic} illustrates a sequence (or \emph{hammock

221: path}) of hammocks.

222:

223: There is some consensus in the community

224: that hammocks are fundamental in recommender

225: algorithms since they represent commonality of ratings.

226: It is our hypothesis that hammocks are fundamental to all recommender

227: system jumps. Early recommendation projects such as GroupLens \cite{konstan1},

228: LikeMinds \cite{lminds2}, and Firefly \cite{ringo1} can be viewed as

229: employing (simple versions

230: of) hammock jumps involving at most one intermediate person.

231:

232: The horting algorithm of

233: Aggarwal et al.~\cite{aggarwal1} extends this idea to a sequence of hammock

234: jumps. Two relations --- horting and predictability ---

235: are used as the basis for a jump.

236: A person $p_1$ {\it horts} person $p_2$ if the ratings they have in common

237: are a sufficiently large subset of the ratings of $p_1$.

238: A person {\it predicts} another if they have a reverse horting relationship,

239: and if there is a linear transformation between their ratings.

240: The algorithm first finds shortest paths of hammocks that relate to

241: predictability and then propagates ratings using the linear

242: transformations.

243: The implementation described by Aggarwal et al.~\cite{aggarwal1} uses a

244: bound on the length of the path.

245:

246: There are a number of interesting algorithmic

247: questions that can be studied.

248: First, since considering more common ratings can be beneficial (see

249: \cite{herlocker} for approaches) having a wider hammock could be better (this

250: is not exactly true, when correlations between ratings are

251: considered \cite{herlocker}).

252: Second, many recommender systems require a minimum number $\kappa$ of

253: ratings before the user may use the system, to prevent {\it free-riding}

254: on recommendations \cite{freeriding}. What is a good value for

255: $\kappa$? And, third what is the hammock diameter or how far would we have

256: to traverse to reach everyone in the social network graph?

257: We begin looking at these questions in the next section.

258:

259: \subsection{Random Graph Models}

260:

261: Our goal is to be able to answer questions about hammock width,

262: minimum ratings, and path length in a typical graph.

263: The approach we take is to use a model of random graphs adapted from

264: the work of Newman, Strogatz, and Watts~\cite{newman}.

265: This model, while having limitations, is the best-fit of existing

266: models, and as we shall see, provides imprecise but descriptive

267: results.

268:

269: A recommender dataset $\mathcal{R}$ can be characterized by the number

270: of ratings that each person makes, and the number of ratings that each

271: artifact receives.

272: These values correspond to the degree distributions in the bipartite

273: rating graph for $\mathcal{R}$.

274: These counts are relatively easy to obtain from a dataset and so could

275: be used in analysis of appropriate algorithms.

276: Therefore, we would like to be able to characterize a random bipartite graph

277: using particular degree distributions.

278: This requirement means that the more common

279: random graph models (e.g., ~\cite{erdos}) are

280: not appropriate, since they assume

281: that edges occur with equal probability.

282: On the other hand, a model recently proposed

283: by Aiello, Chung, and Lu~\cite{call-graph} is

284: based on a power-law distribution, similar to characteristics

285: observed of actual recommendation datasets (see next section).

286: But again this model is not directly parameterized by the degree

287: distribution.

288: The Newman-Strogatz-Watts model is the only (known) model that

289: characterizes a family of graphs in terms of degree distributions.

290:

291: From the original bipartite graph $G = (P \cup M, E)$ for

292: $\mathcal{R}$ we develop two models, one for the social network graph

293: $G_s$ and one for the recommender graph $G_r$.

294:

295: \subsection{Modeling the Social Network Graph} Recall that the social

296: network graph $G_s = (P,E_s)$ is undirected and $E_s$ is induced by a jump

297: function $\mathcal{J}$ on $\mathcal{R}$.

298: The Newman-Strogatz-Watts model works by characterizing the degree

299: distribution of the vertices, and then using that to compute the

300: probability of arriving at a node.

301: Together they describe a random process of following a path through

302: a graph, and allow computations of the length of paths.

303: Here we only discuss

304: the equations that are used, and not

305: the details of their derivation.

306: The application of these equations to these graphs is outlined by

307: Mirza~\cite{batul-thesis} and is based on the derivation by Newman et

308: al.~\cite{newman}.

309:

310: We describe the social network graph $G_s$ by the probability that a

311: vertex has a particular degree.

312: This is expressed as a generating function $G_0 (x)$

313: $$G_0 (x) = \sum_{k=0}^{\infty} p_k x^k, $$

314: where $p_k$ is the probability that a randomly chosen

315: vertex in $G_s$ has degree $k$.

316: This function must satisfy the property that

317: $$G_0 (1) = \sum_{k=0}^{\infty} p_k = 1.$$

318:

319: To obtain an expression that describes the typical length of a path,

320: we can consider how many steps we need to go from a node to be able to

321: get to every other node in the graph.

322: To do this we can use the number of neighbors $k$ steps away.

323: For a randomly chosen vertex in this graph, $G_0 (x)$ gives us the

324: distribution of the immediate neighbors of that vertex.

325: So, we can compute the average number of vertices $z_1$ one edge away from a

326: vertex as the average degree $z$:

327: $$z_1 = z = \sum_{k} k p_k = G_0^{'} (1) $$

328: The number of neighbors two steps away is given by

329: $$z_2 =  \sum_{k} k p_k {1 \over z} \sum_{k} k (k-1) p_k$$

330: It turns out (see~\cite{newman} for details) that the number of neighbors

331: $m$ steps away is given in terms of these two quantities:

332: $$z_m = {\left( {z_2 \over z_1} \right) }^{m-1} z_1$$

333: The path length $l_{pp}$ we are interested in is the one that is big enough to

334: reach all of the $N_P$ elements of $P$, and so $l_{pp}$

335: should satisfy the equation

336: $$1 + \sum_{m=1}^{l_{pp}} z_m = N_P $$

337: where the constant $1$ counts the initial vertex.

338: Using this equation, it can be shown that the typical length from one

339: node to another in $G_s$ is

340: \begin{equation}

341: l_{pp} = {{log[(N_P -1) (z_2 - z_1) + z_1^{2}] - log[z_1^{2}]} \over

342: {log[z_2/z_1]}} \label{length1}

343: \end{equation}

344: We use this formula as our primary means of computing the

345: distances between pairs of people in $G_s$ in the empirical evaluation

346: in the next section.

347: Since we use actual datasets, we can compute $p_k$ as the fraction of

348: vertices in the graph having degree $k$.

349:

350: \subsection{Modeling the Recommender Graph}

351: The recommender graph $G_r = (P \cup M,E_{sd} \cup E_{md})$

352: is directed, and hence the generating function for vertex degrees

353: should capture both indegrees and outdegrees:

354: $$G (x,y) = \sum_{j=0,k=0}^{j=\infty,k=\infty} p_{jk} x^j y^k,$$

355: where $p_{jk}$ is the probability that a randomly chosen vertex has

356: indegree $j$ and outdegree $k$.

357:

358: From the jumping connections construction, we know that movie vertices

359: have outdegree $0$ (the converse is not true, vertices with outdegree $0$

360: could be people nodes isolated as a result of a severe jump constraint).

361: Notice also that by using the joint distribution $p_{ij}$, independence of

362: the indegree and outdegree distributions is \emph{not} implied.

363: We show in the next section that this feature is very useful.

364: In addition, the average number of arcs entering (or leaving) a

365: vertex is zero.

366: And, so

367: $$\sum_{jk} (j-k) p_{jk} = \sum_{jk} (k-j) p_{jk} = 0.$$

368: We arrive at new expressions for $z_1$ and

369: $z_2$~\cite{batul-thesis}:

370: \begin{eqnarray*}

371: z_1 & = & \sum_{jk} k p_{jk}. \\

372: z_2 & = & \sum_{jk} j k p_{jk}.

373: \end{eqnarray*}

374: The average path length $l_r$

375: can be calculated as before:

376: \begin{equation}

377: l_r = {{log[(N_P + N_M -1) (z_2 - z_1) + z_1^{2}] - log[z_1^{2}]}

378: \over {log[z_2/z_1]}} \label{length2},

379: \end{equation}

380: where $N_P + N_M$ is the size of the recommender graph $G_r$ (assuming that

381: the graph is one giant component),

382: with $N_M$ denoting the number of movies.

383: %[OK, where was the assumption that the graph is one giant component?]

384: The length $l_r$ includes paths from people to movies, as

385: well as paths from people to people.

386: The average length of only reaching movies from people

387: $l_{pm}$ can be expressed as:

388: \begin{equation}

389: l_{pm} = {{(l_r (N_P (N_P -1) + N_P N_M) - l_{pp} N_P (N_P -1))} \over {N_P N_M}} \label{length3}

390: \end{equation}

391:

392: \subsection{Caveats with the Newman-Strogatz-Watts Equations} There are various

393: problems with using the above formulas in a realistic setting

394: \cite{heath1}. First, unlike most results in random graph theory, the

395: formulas

396: do not include any guarantees and/or confidence levels.  Second, all

397: the equations above are obtained over the ensemble of random graphs that

398: have the given degree distribution, and hence assume that all such graphs

399: are equally likely.  The specificity of the jumping connections construction implies that the

400: $G_s$ and $G_r$ graphs are poor candidates to serve as a typical random

401: instance of a graph.

402:

403: In addition, the equations utilizing $N_P$ and $N_M$ assume that all nodes

404: are reachable from any starting vertex (i.e., the graph is one giant

405: component). This will not be satisfied for very strict jumping constraints.

406: In such cases, Newman, Strogatz, and Watts suggest the substitution of these

407: values with measurements taken from the largest component of the graph.

408: Expressing the size of the components of the graph using generating

409: functions is also suggested \cite{newman}. However, the complexity of jumps

410: such as the hammock can make estimation of the cluster sizes extremely

411: difficult, if not impossible (in the Newman-Strogatz-Watts model). We leave this issue to

412: future research.

413:

414: Finally, the Newman-Strogatz-Watts model is fundamentally more

415: complicated than traditional models of random graphs.

416: It has a potentially infinite set of parameters

417: ($p_k$), doesn't address the possibility of multiple edges, loops and, by

418: not fixing the size of the graph, assumes that the same degree distribution

419: sequence applies for all graphs, of all sizes.

420: These observations hint that we cannot hope for more than a

421: qualitative indication of the dependence of the average path length on

422: the jump constraints.

423: In the next section, we describe how well these formulas perform on

424: two real-world datasets.

425: