1: \section{Graph Analysis}
2: \label{jcmodel}
3:
4: To address the four aspects identified in the previous section, we develop a
5: novel way to characterize algorithms for recommender systems.
6: Algorithms are distinguished, not by the predicted ratings of
7: services/artifacts they produce, but by the combinations of people and
8: artifacts that they bring together. Two algorithms are considered
9: equivalent if they bring together identical sets of nodes regardless
10: of whether they work in qualitatively different ways.
11: Our emphasis is on the role of a recommender system as a
12: mechanism for bridging entities in a social network.
13: We refer to this approach of studying recommendation as {\it jumping
14: connections}.
15:
16: Notice that the framework does not emphasize how the recommendation is
17: actually made, or the information that an algorithm uses to make connections
18: (e.g., does it rely on
19: others' ratings, on content-based features, or both?). {\it In addition, we make
20: no claims about the recommendations being `better' or that they will be
21: `better received.'} Our metrics, hence, will not lead a designer to
22: directly conclude that an algorithm $A$ is more accurate than
23: an algorithm $B$; such conclusions can only be made through
24: a field evaluation (involving feedback and reactions from users) or
25: via survey/interview procedures. By restricting its scope to exclude the
26: actual aspect of making ratings and predictions, the jumping connections
27: framework provides a systematic and rigorous way to study recommender systems.
28:
29: Of course, the choice of how to jump connections will be driven by the
30: (often conflicting) desire to reach almost every node in the graph (i.e.,
31: recommend every product for somebody, or
32: recommend some product for everybody)
33: and the strength of the jumps enjoyed when two nodes are brought together.
34: The conflict between these goals can be explicitly expressed in our
35: framework.
36:
37: It should be emphasized that our model
38: doesn't imply that algorithms only exploit
39: local structure of the recommendation dataset. Any mechanism ---
40: local or global --- could be used to jump connections. In fact, it is
41: not even necessary that algorithms employ graph-theoretic notions
42: to make connections. Our framework
43: only requires a boolean test to see if two nodes are brought together.
44:
45: Notice also that when an algorithm brings together person $X$ and artifact
46: $Y$, it could imply either a positive recommendation or a negative one. Such
47: differences are, again, not captured by our framework unless the mechanism
48: for making connections restricts its jumps, for instance, to only those
49: artifacts for which ratings satisfy some threshold. In other words,
50: thresholds for making recommendations could be abstracted into the mechanism
51: for jumping.
52:
53: Jumping connections satisfies all the aspects outlined in the previous
54: section. It involves a social-network model, and thus, emphasizes connections
55: rather than prediction. The nature of connections jumped also aids in
56: explaining the recommendations.
57: The graph-theoretic nature of jumping connections allows
58: the use of mathematical models (such as random graphs) to analyze the
59: properties of the social networks in which recommender algorithms operate.
60:
61: \subsection{The Jumping Connections Construction}
62:
63: %\begin{figure}
64: %\centering
65: %\begin{tabular}{cc}
66: %& \mbox{\psfig{figure=SAVE1.epsi,width=2in}}
67: %\end{tabular}
68: %\caption{A bipartite graph of people and movies.}
69: %\label{jc-dataset}
70: %\end{figure}
71:
72: We now develop the framework of jumping connections. We use concepts from
73: a movie recommender system to provide the intuition; this does not
74: restrict the range of applicability of jumping connections
75: and is introduced here only for ease of presentation.
76:
77: A \emph{recommender dataset} $\mathcal{R}$ consists of the ratings by a
78: group of people of movies in a collection.
79: The ratings could in fact be viewings, preferences, or other
80: constraints on movie recommendations.
81: Such a dataset can be represented as a bipartite graph
82: $G=(P\cup M,E)$ where $P$ is the set of people, $M$ is the set of
83: movies, and the edges in $E$ represent the ratings of movies.
84: We denote the number of people by $N_P=|P|$, and the number of movies
85: as $N_M=|M|$.
86:
87: We can view the set $M$ as a secondary mode that helps make
88: connections --- or jumps --- between members of $P$.
89: A \emph{jump} is a function
90: $\mathcal{J}: \mathcal{R} \mapsto S, S \subseteq P \times P$ that
91: takes as input a recommender dataset $\mathcal{R}$ and returns a set of
92: (unordered) pairs of elements of $P$.
93: Intuitively, this means that the two nodes described in a given pair
94: can be reached from one another by a single jump.
95: Notice that this definition does not prescribe how the mapping should
96: be performed, or whether it should use all the information present in
97: $\mathcal{R}$.
98: We also make the assumption that jumps can be composed in
99: the following sense: if node $B$ can be reached from $A$ in one jump,
100: and $C$ can be reached from $B$ in one jump, then $C$ is reachable
101: from $A$ in two jumps.
102: The simplest jump is the \emph{skip}, which connects two members in
103: $P$ if they have at least one movie in common.
104:
105: A jump induces a graph called a social network graph.
106: The \emph{social network graph} of a recommender dataset
107: $\mathcal{R}$ induced by a given jump $\mathcal{J}$ is a unipartite
108: undirected graph $G_s=(P,E_s)$, where the edges are given by
109: $E_s = \mathcal{J} (\mathcal{R})$.
110: Notice that the induced graph could be disconnected based on the
111: strictness of the jump function.
112: Figure~\ref{jc-intro} (b) shows the social network graph induced
113: from the example in Figure~\ref{jc-intro} (a) using a skip jump.
114:
115: %\begin{figure}
116: %\centering
117: %\begin{tabular}{cc}
118: %& \mbox{\psfig{figure=SAVE2.epsi,width=2in}}
119: %\end{tabular}
120: %\caption{Social network graph for the recommender dataset shown in
121: %Figure~\ref{jc-dataset}.}
122: %\label{jc-social}
123: %\end{figure}
124:
125: We view a recommender system as exploiting the social connections (the
126: jumps) that bring together a person with other people who have rated
127: an artifact of (potential) interest.
128: To model this, we view the unipartite social network of people as a
129: directed graph and reattach movies (seen by each person) such that
130: every movie is a sink (reinforcing its role as a secondary
131: mode).
132: The shortest paths
133: from a person to a movie in
134: this graph can then be used to
135: provide the basis for recommendations.
136: We refer to a graph induced in this fashion as a \emph{recommender
137: graph} (Figure~\ref{jc-intro} (c)).
138: Since the outdegree of every movie node is fixed at zero, paths through the
139: graph are from people to movies (through more people, if necessary).
140:
141: %\begin{figure}
142: %\centering
143: %\begin{tabular}{cc}
144: %& \mbox{\psfig{figure=SAVE3.epsi,width=3in}}
145: %\end{tabular}
146: %\caption{Recommender graph obtained by rendering the social network graph
147: %with
148: %bidirectional edges and reattaching the movies.}
149: %\label{jc-recommender}
150: %\end{figure}
151:
152: The recommender graph of a recommender dataset $\mathcal{R}$ induced
153: by a given jump function $\mathcal{J}$ is a directed graph
154: $G_r=(P \cup M,E_{sd} \cup E_{md})$, where $E_{sd}$ is an ordered set of pairs,
155: listing every pair from $\mathcal{J(R)}$ in both directions, and
156: $E_{md}$ is an ordered set of pairs, listing every pair from $E$ in
157: the direction pointing to the movie mode.
158: %Figure~\ref{jc-intro} illustrates the process of generating the social
159: %network and recommender graphs for our example recommender dataset using
160: %the skip jump function.
161:
162: \begin{figure}
163: \centering
164: \begin{tabular}{cc}
165: & \mbox{\psfig{figure=skip.eps,width=5in}}
166: \end{tabular}
167: \caption{Illustration of the \emph{skip} jump. (a) bipartite graph
168: of people and movies. (b) Social network graph, and (c)
169: recommender graph.}
170: \label{jc-intro}
171: \end{figure}
172:
173: \begin{figure}
174: \centering
175: \begin{tabular}{cc}
176: & \mbox{\psfig{figure=NewHalfBow.epsi,width=3in}}
177: \end{tabular}
178: \caption{The jumping connections construction produces a half bow-tie graph $G_r$.}
179: \label{half-bowtie}
180: \end{figure}
181:
182: \begin{figure}
183: \centering
184: \begin{tabular}{cc}
185: & \mbox{\psfig{figure=hammocks.eps,width=5in}}
186: \end{tabular}
187: \caption{A path of hammock jumps, with a hammock width
188: $w=4$.}
189: \label{hammock-pic}
190: \end{figure}
191:
192: Assuming that the jump construction does not cause $G_r$ to be disconnected,
193: the portion of $G_r$ containing only people is its strongest component:
194: every person is connected to every other person. The movies constitute
195: vertices which can be reached from the strongest component, but from which
196: it is not possible to reach the strongest component (or any other node, for
197: that matter). Thus, $G_r$ can be viewed as a `half bow-tie,'
198: (Figure~\ref{half-bowtie}) as contrasted to the full bow-tie nature of the web,
199: observed by Broder et al.~\cite{bowtie}. The circular portion
200: in the figure depicts the strongly
201: connected component derived from $G_s$. Links out of this portion of the graph
202: are from people nodes and go to sinks, which are movies.
203:
204: \subsection{Hammocks}
205: For a given recommender dataset, there are many ways of inducing the social
206: network graph and the recommender graph.
207: The simplest, the skip, is illustrated in Figure~\ref{jc-intro}.
208: Note that jumping connections provides a systematic way to
209: characterize recommender systems algorithms in the literature.
210: We will focus on one jump called the hammock jump --- a more comprehensive
211: list of jumps defined by different algorithms is explored by
212: Mirza~\cite{batul-thesis}, we do not address them for want of space.
213:
214: A hammock jump brings two people together in $G_s$
215: if they have at least $w$ movies in common in $\mathcal{R}$.
216: Formally, a pair $(p_1, p_2)$ is in $\mathcal{J} (\mathcal{R})$
217: whenever there is a set $M_{(p_1,p_2)}$ of $w$ movies such that there
218: is an edge from $p_1$ and $p_2$ to each element of $M_{(p_1,p_2)}$.
219: The number $w$ of common artifacts is called the hammock width.
220: Figure~\ref{hammock-pic} illustrates a sequence (or \emph{hammock
221: path}) of hammocks.
222:
223: There is some consensus in the community
224: that hammocks are fundamental in recommender
225: algorithms since they represent commonality of ratings.
226: It is our hypothesis that hammocks are fundamental to all recommender
227: system jumps. Early recommendation projects such as GroupLens \cite{konstan1},
228: LikeMinds \cite{lminds2}, and Firefly \cite{ringo1} can be viewed as
229: employing (simple versions
230: of) hammock jumps involving at most one intermediate person.
231:
232: The horting algorithm of
233: Aggarwal et al.~\cite{aggarwal1} extends this idea to a sequence of hammock
234: jumps. Two relations --- horting and predictability ---
235: are used as the basis for a jump.
236: A person $p_1$ {\it horts} person $p_2$ if the ratings they have in common
237: are a sufficiently large subset of the ratings of $p_1$.
238: A person {\it predicts} another if they have a reverse horting relationship,
239: and if there is a linear transformation between their ratings.
240: The algorithm first finds shortest paths of hammocks that relate to
241: predictability and then propagates ratings using the linear
242: transformations.
243: The implementation described by Aggarwal et al.~\cite{aggarwal1} uses a
244: bound on the length of the path.
245:
246: There are a number of interesting algorithmic
247: questions that can be studied.
248: First, since considering more common ratings can be beneficial (see
249: \cite{herlocker} for approaches) having a wider hammock could be better (this
250: is not exactly true, when correlations between ratings are
251: considered \cite{herlocker}).
252: Second, many recommender systems require a minimum number $\kappa$ of
253: ratings before the user may use the system, to prevent {\it free-riding}
254: on recommendations \cite{freeriding}. What is a good value for
255: $\kappa$? And, third what is the hammock diameter or how far would we have
256: to traverse to reach everyone in the social network graph?
257: We begin looking at these questions in the next section.
258:
259: \subsection{Random Graph Models}
260:
261: Our goal is to be able to answer questions about hammock width,
262: minimum ratings, and path length in a typical graph.
263: The approach we take is to use a model of random graphs adapted from
264: the work of Newman, Strogatz, and Watts~\cite{newman}.
265: This model, while having limitations, is the best-fit of existing
266: models, and as we shall see, provides imprecise but descriptive
267: results.
268:
269: A recommender dataset $\mathcal{R}$ can be characterized by the number
270: of ratings that each person makes, and the number of ratings that each
271: artifact receives.
272: These values correspond to the degree distributions in the bipartite
273: rating graph for $\mathcal{R}$.
274: These counts are relatively easy to obtain from a dataset and so could
275: be used in analysis of appropriate algorithms.
276: Therefore, we would like to be able to characterize a random bipartite graph
277: using particular degree distributions.
278: This requirement means that the more common
279: random graph models (e.g., ~\cite{erdos}) are
280: not appropriate, since they assume
281: that edges occur with equal probability.
282: On the other hand, a model recently proposed
283: by Aiello, Chung, and Lu~\cite{call-graph} is
284: based on a power-law distribution, similar to characteristics
285: observed of actual recommendation datasets (see next section).
286: But again this model is not directly parameterized by the degree
287: distribution.
288: The Newman-Strogatz-Watts model is the only (known) model that
289: characterizes a family of graphs in terms of degree distributions.
290:
291: From the original bipartite graph $G = (P \cup M, E)$ for
292: $\mathcal{R}$ we develop two models, one for the social network graph
293: $G_s$ and one for the recommender graph $G_r$.
294:
295: \subsection{Modeling the Social Network Graph} Recall that the social
296: network graph $G_s = (P,E_s)$ is undirected and $E_s$ is induced by a jump
297: function $\mathcal{J}$ on $\mathcal{R}$.
298: The Newman-Strogatz-Watts model works by characterizing the degree
299: distribution of the vertices, and then using that to compute the
300: probability of arriving at a node.
301: Together they describe a random process of following a path through
302: a graph, and allow computations of the length of paths.
303: Here we only discuss
304: the equations that are used, and not
305: the details of their derivation.
306: The application of these equations to these graphs is outlined by
307: Mirza~\cite{batul-thesis} and is based on the derivation by Newman et
308: al.~\cite{newman}.
309:
310: We describe the social network graph $G_s$ by the probability that a
311: vertex has a particular degree.
312: This is expressed as a generating function $G_0 (x)$
313: $$G_0 (x) = \sum_{k=0}^{\infty} p_k x^k, $$
314: where $p_k$ is the probability that a randomly chosen
315: vertex in $G_s$ has degree $k$.
316: This function must satisfy the property that
317: $$G_0 (1) = \sum_{k=0}^{\infty} p_k = 1.$$
318:
319: To obtain an expression that describes the typical length of a path,
320: we can consider how many steps we need to go from a node to be able to
321: get to every other node in the graph.
322: To do this we can use the number of neighbors $k$ steps away.
323: For a randomly chosen vertex in this graph, $G_0 (x)$ gives us the
324: distribution of the immediate neighbors of that vertex.
325: So, we can compute the average number of vertices $z_1$ one edge away from a
326: vertex as the average degree $z$:
327: $$z_1 = z = \sum_{k} k p_k = G_0^{'} (1) $$
328: The number of neighbors two steps away is given by
329: $$z_2 = \sum_{k} k p_k {1 \over z} \sum_{k} k (k-1) p_k$$
330: It turns out (see~\cite{newman} for details) that the number of neighbors
331: $m$ steps away is given in terms of these two quantities:
332: $$z_m = {\left( {z_2 \over z_1} \right) }^{m-1} z_1$$
333: The path length $l_{pp}$ we are interested in is the one that is big enough to
334: reach all of the $N_P$ elements of $P$, and so $l_{pp}$
335: should satisfy the equation
336: $$1 + \sum_{m=1}^{l_{pp}} z_m = N_P $$
337: where the constant $1$ counts the initial vertex.
338: Using this equation, it can be shown that the typical length from one
339: node to another in $G_s$ is
340: \begin{equation}
341: l_{pp} = {{log[(N_P -1) (z_2 - z_1) + z_1^{2}] - log[z_1^{2}]} \over
342: {log[z_2/z_1]}} \label{length1}
343: \end{equation}
344: We use this formula as our primary means of computing the
345: distances between pairs of people in $G_s$ in the empirical evaluation
346: in the next section.
347: Since we use actual datasets, we can compute $p_k$ as the fraction of
348: vertices in the graph having degree $k$.
349:
350: \subsection{Modeling the Recommender Graph}
351: The recommender graph $G_r = (P \cup M,E_{sd} \cup E_{md})$
352: is directed, and hence the generating function for vertex degrees
353: should capture both indegrees and outdegrees:
354: $$G (x,y) = \sum_{j=0,k=0}^{j=\infty,k=\infty} p_{jk} x^j y^k,$$
355: where $p_{jk}$ is the probability that a randomly chosen vertex has
356: indegree $j$ and outdegree $k$.
357:
358: From the jumping connections construction, we know that movie vertices
359: have outdegree $0$ (the converse is not true, vertices with outdegree $0$
360: could be people nodes isolated as a result of a severe jump constraint).
361: Notice also that by using the joint distribution $p_{ij}$, independence of
362: the indegree and outdegree distributions is \emph{not} implied.
363: We show in the next section that this feature is very useful.
364: In addition, the average number of arcs entering (or leaving) a
365: vertex is zero.
366: And, so
367: $$\sum_{jk} (j-k) p_{jk} = \sum_{jk} (k-j) p_{jk} = 0.$$
368: We arrive at new expressions for $z_1$ and
369: $z_2$~\cite{batul-thesis}:
370: \begin{eqnarray*}
371: z_1 & = & \sum_{jk} k p_{jk}. \\
372: z_2 & = & \sum_{jk} j k p_{jk}.
373: \end{eqnarray*}
374: The average path length $l_r$
375: can be calculated as before:
376: \begin{equation}
377: l_r = {{log[(N_P + N_M -1) (z_2 - z_1) + z_1^{2}] - log[z_1^{2}]}
378: \over {log[z_2/z_1]}} \label{length2},
379: \end{equation}
380: where $N_P + N_M$ is the size of the recommender graph $G_r$ (assuming that
381: the graph is one giant component),
382: with $N_M$ denoting the number of movies.
383: %[OK, where was the assumption that the graph is one giant component?]
384: The length $l_r$ includes paths from people to movies, as
385: well as paths from people to people.
386: The average length of only reaching movies from people
387: $l_{pm}$ can be expressed as:
388: \begin{equation}
389: l_{pm} = {{(l_r (N_P (N_P -1) + N_P N_M) - l_{pp} N_P (N_P -1))} \over {N_P N_M}} \label{length3}
390: \end{equation}
391:
392: \subsection{Caveats with the Newman-Strogatz-Watts Equations} There are various
393: problems with using the above formulas in a realistic setting
394: \cite{heath1}. First, unlike most results in random graph theory, the
395: formulas
396: do not include any guarantees and/or confidence levels. Second, all
397: the equations above are obtained over the ensemble of random graphs that
398: have the given degree distribution, and hence assume that all such graphs
399: are equally likely. The specificity of the jumping connections construction implies that the
400: $G_s$ and $G_r$ graphs are poor candidates to serve as a typical random
401: instance of a graph.
402:
403: In addition, the equations utilizing $N_P$ and $N_M$ assume that all nodes
404: are reachable from any starting vertex (i.e., the graph is one giant
405: component). This will not be satisfied for very strict jumping constraints.
406: In such cases, Newman, Strogatz, and Watts suggest the substitution of these
407: values with measurements taken from the largest component of the graph.
408: Expressing the size of the components of the graph using generating
409: functions is also suggested \cite{newman}. However, the complexity of jumps
410: such as the hammock can make estimation of the cluster sizes extremely
411: difficult, if not impossible (in the Newman-Strogatz-Watts model). We leave this issue to
412: future research.
413:
414: Finally, the Newman-Strogatz-Watts model is fundamentally more
415: complicated than traditional models of random graphs.
416: It has a potentially infinite set of parameters
417: ($p_k$), doesn't address the possibility of multiple edges, loops and, by
418: not fixing the size of the graph, assumes that the same degree distribution
419: sequence applies for all graphs, of all sizes.
420: These observations hint that we cannot hope for more than a
421: qualitative indication of the dependence of the average path length on
422: the jump constraints.
423: In the next section, we describe how well these formulas perform on
424: two real-world datasets.
425: