0104:cs0104009/eval.tex

1: \section{Evaluating Recommendation Algorithms}

2: \label{back}

3: Most current research efforts cast recommendation

4: as a specialized task of information retrieval/\hskip0ex filtering or

5: as a task of function approximation/\hskip0ex learning mappings

6: \cite{aggarwal1,

7: basu1,billsus,

8: eigentaste,good,herlocker,

9: hill,kitts,konstan1,

10: pennock,sarwar1,sarwar3,

11: schafer1,

12: ringo1,soboroff,

13: terveen}. Even approaches

14: that focus on clustering view clustering primarily as a

15: pre-processing step for functional modeling \cite{kohrs1}, or as a

16: technique to ensure scalability \cite{Conner1,sarwar3}

17: or to overcome sparsity of ratings \cite{ungar1}. This emphasis on functional

18: modeling and retrieval has influenced evaluation criteria for recommender

19: systems.

20:

21: Traditional information retrieval evaluation

22: metrics such as precision and recall have been applied toward recommender

23: systems involving content-based design.

24: Ideas such as cross-validation on an unseen test set have been used to

25: evaluate mappings from people to artifacts, especially in

26: collaborative filtering recommender systems. Such approaches miss

27: many desirable aspects of the recommendation process, namely:

28: \begin{itemize}

29: \item {\bf Recommendation is an indirect way

30: of bringing people together.} Social network theory \cite{wasserman-faust}

31: helps model a recommendation system of people versus artifacts as an {\it

32: affiliation network} and distinguishes between a {\it primary mode}

33: (e.g., people) and a {\it secondary mode} (e.g., movies), where a

34: {\it mode} refers to a distinct set of entities that have similar

35: attributes \cite{wasserman-faust}. The purpose of

36: the secondary mode is viewed as serving to bring entities of the primary

37: mode together (i.e., it isn't treated as a {\it first-class} mode).

38: \item {\bf Recommendation, as a process,

39: should emphasize modeling connections from people to

40: artifacts, besides predicting ratings for artifacts.}

41: In many situations, users would like to request recommendations

42: purely based on local and global constraints on the nature of

43: the specific connections explored.

44: Functional modeling techniques are

45: inadequate because they embed the task of learning a mapping from people

46: to predicted values of artifacts in a general-purpose learning system such

47: as neural networks or Bayesian classification \cite{breese}.

48: A notable exception is the work by Hofmann and Puzicha \cite{Hofmann}

49: which allows the incorporation of constraints in the form

50: of aspect models involving a latent variable.

51: \item {\bf Recommendations should be explainable and believable.} The

52: explanations should be made in terms and constructs that are natural to the

53: user/application domain. It is nearly

54: impossible to convince the user of the quality of a recommendation obtained

55: by black-box techniques such as neural networks. Furthermore, it is well

56: recognized that ``users are more satisfied with a system that produces

57: [bad recommendations] for reasons that seem to make sense to them, than they

58: are with a system that produces [bad recommendations] for semmingly stupid

59: reasons'' \cite{riloff}.

60: %Berry and Browne highlight

61: %the need for believability in search engine results

62: %(`Coming back instantenously with {\it No results found} potentially causes

63: %dissatisfaction for the user').

64: \item {\bf Recommendations are not delivered in isolation, but in the

65: context of an implicit/explicit social network.}

66: In a recommender system, the rating patterns of

67: people on artifacts induce an implicit social network and influence the

68: connectivities in this network.  Little study has been done to understand

69: how such rating patterns influence recommendations and how they

70: can be advantageously exploited.

71: \end{itemize}

72:

73: Our approach in this paper is to evaluate recommendation algorithms using

74: ideas from graph analysis. In the next section,

75: we will show how our viewpoint addresses each of

76: the above aspects, by providing novel metrics. The basic idea is to begin

77: with data that can be modeled as a network and attempt to infer useful

78: knowledge from the nodes and links of the graph. Nodes represent entities

79: in the domain (e.g., people, movies), and edges represent the relationships

80: between entities (e.g., the act of a person viewing a particular movie).

81:

82: \subsection{Related Research}

83: The idea of graph analysis as a basis to study information networks has a long

84: tradition; one of the earliest pertinent studies is Schwartz and

85: Wood \cite{graph-schwartz}. The authors describe the use of graph-theoretic

86: notions such as cliques, connected components, cores, clustering,

87: average path distances, and the inducement of secondary graphs. The focus

88: of the study was to model shared interests among a web of people,

89: using email messages as connections. Such link

90: analysis has been used to extract information in many areas such as in web

91: search engines \cite {klein2}, in exploration of associations among

92: criminals \cite{lee1}, and in the field of medicine \cite {swanson1}.

93: With the emergence of the web as a large scale graph, interest in information

94: networks has recently exploded \cite{adamic1,google,bowtie,

95: clever-journal,flake-kdd,

96: kautz1,klein2,klein1,trawling,payton1,silk,watts1}.

97:

98: Most graph-based algorithms for information networks can be studied in terms

99: of (i) the modeling of the

100: graph (e.g., what are the modes?, how do they

101: relate to the information domain?), and (ii) the structures/operations

102: that are mined/conducted on the graph.

103: One of the most celebrated examples of graph analysis arises in search

104: engines that exploit link information, in addition to textual content.

105: The Google search engine uses the web's link structure,

106: in addition to the anchor text as a factor in ranking pages, based on

107: the pages that (hyper)link to the given page \cite{google}. Google essentially models

108: a one-mode directed graph (of web pages) and uses measures involving

109: principal components to ascertain `page ranks.'

110: Jon Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm

111: goes a step further by viewing the one-mode web graph as

112: actually comprising two modes (called

113: hubs and authorities) \cite{klein2}. A hub is a node primarily with edges

114: to authorities, and so a good hub has links to many authorities. A good

115: authority is a page that is linked to by many hubs.

116: Starting with a specific

117: search query, HITS performs a text-based search to seed an initial set of

118: results. An iterative relaxation algorithm then assings hub and authority

119: weights using a matrix power iteration. Empirical results show that

120: remarkably authoritative results are obtained for search queries. The CLEVER

121: search engine is built primarily on top of the basic HITS

122: algorithm \cite{clever-journal}. The offline query-independent

123: computation in Google, as opposed to

124: the topic-induced search of CLEVER, is one of the main reasons for the

125: commercial success of the former.

126:

127: The use of link analysis in recommender systems was highlighted by the

128: ``referral chaining'' technique of the

129: ReferralWeb project \cite{kautz1}. The idea is to

130: use the co-occurrence of names in any

131: of the documents available on the web to detect the existence of direct

132: relationships between people and thus indirectly form social networks. The

133: underlying assumption is that people with similar interests swarm in the

134: same circles to discover collaborators \cite{payton1}.

135:

136: The exploration of link analysis in social structures has led to several

137: new avenues of research, most notably small-world networks. Small-world

138: networks are highly clustered but relatively sparse networks with

139: small average length. An example is the

140: folklore notion of six degrees of separation separating any two people in

141: our universe: the phenomenon where a person can discover a link to any

142: other random person through a chain of at most six acquaintances. A small-world

143: network is sufficiently clustered so that most second neighbors of a node

144: $X$ are also neighbors of $X$ (a typical ratio would be $80\%$). On the

145: other hand, the average distance between any two nodes in the graph is

146: comparable to the low characteristic path length of a random graph. Until

147: recently, a mathematical characterization of such small-world networks has

148: proven elusive.  Watts and Strogatz \cite{watts1} provide the first

149: such characterization of small-world networks in the form

150: of a graph generation model.

151:

152: \begin{figure}

153: \centering

154: \begin{tabular}{cc}

155: & \mbox{\psfig{figure=small-world.eps,width=5in}}

156: \end{tabular}

157: \caption{Generation of a small-world network by random rewiring from a

158: regular wreath network. Figure adapted from \cite{watts1}.}

159: \label{small}

160: \end{figure}

161:

162: \begin{figure} \centering

163: \begin{tabular}{cc} &

164: \mbox{\psfig{figure=small-world-graphs.eps,height=3in}}

165: \end{tabular}

166: \caption{Average path length and clustering coefficient versus

167: the rewiring probability $p$ (from \cite{watts1}). All measurements are

168: scaled w.r.t. the values at $p = 0$.}

169: \label{smgs}

170: \end{figure}

171:

172: In this model, Watts and Strogatz

173: use a regular wreath network with $n$ nodes, and $k$ edges per node

174: (to its nearest neighbors) as a starting point for the design. A small

175: fraction of the edges are then randomly rewired to arbitrary points on

176: the network. A full rewiring (probability $p=1$) leads to a completely

177: random graph, while $p=0$ corresponds to the (original) wreath

178: (Fig.~\ref{small}). The starting point in the figure is a regular wreath

179: topology of $12$ nodes with every node connected to its four nearest

180: neighbors. This structure has a high characteristic path length and

181: high clustering coefficient. The average length is the mean of the shortest

182: path lengths over all pairs of nodes. The clustering coefficient is

183: determined by first computing the local neighborhood of every node. The

184: number of edges in this neighborhood as a fraction of the total possible

185: number of edges denotes the extent of the neighborhood being a clique.

186: This factor

187: is averaged over all nodes to determine the clustering coefficient. The

188: other extreme in Fig.~\ref{small}

189: is a random network with a low characteristic path length and

190: almost no clustering. The small-world network, an interpolation between the

191: two, has the low characteristic path length (of a random network), and

192: retains the high clustering coefficient (of the wreath).  Measuring

193: properties such as average length and clustering coefficient in the region

194: $0 \leq p \leq 1$ produces surprising results (see Fig.~\ref{smgs}).

195:

196: As shown in Fig.~\ref{smgs}, only a very small fraction of edges need to be

197: rewired to bring the length down to random graph limits, and yet the

198: clustering coefficient is high. On closer inspection, it is easy to see why

199: this should be true. Even for small values of $p$ (e.g., $0.1$), the result

200: of introducing edges between distantly separated nodes reduces not only the

201: distance between these nodes but also the distances between the neighbors of

202: those nodes, and so on (these reduced paths between distant nodes are

203: called {\it shortcuts}). The introduction of these edges

204: further leads to a rapid decrease in the average length of the network, but

205: the clustering coefficient remains almost unchanged. Thus, small-world

206: networks fall in between regular and random networks, having the small

207: average lengths of random networks but high clustering coefficients akin to

208: regular networks.

209:

210: While the Watts-Strogatz model describes how small-world networks can be

211: formed, it does not explain how people are adept at actually finding short

212: paths through such networks in a decentralized fashion. Kleinberg

213: addresses precisely this issue and proves that this is not possible

214: in the family of one-dimensional Watts-Strogatz networks

215: \cite{klein1}. Embedding the notion of

216: random rewiring in a

217: two-dimensional lattice leads to one unique model for which such

218: decentralization is effective.

219:

220: The small-world network concept has implications for a variety of domains.

221: Watts and Strogatz simulate the `wildfire' like spread of an

222: infectious disease in a small-world network \cite{watts1}.

223: Adamic shows

224: that the world wide web is a small-world network and suggests that

225: search engines capable of exploiting this fact can be more

226: effective in  hyperlink modeling, crawling, and finding authoritative

227: sources \cite{adamic1}.

228:

229: Besides the Watts-Strogatz model, a variety of models from graph

230: theory are available and can be used to analyze information networks.

231: Kumar et al.~\cite{trawling} highlight the use of traditional

232: random graph models

233: to confirm the existence of properties such as cores and connected

234: components in the web. In particular, they characterize the distributions

235: of web page degrees

236: and show that they are well approximated by power laws.

237: Finally, they perform a study similar to Schwartz and Wood

238: \cite{graph-schwartz} to find cybercommunities on the web.

239: Flake et al.~\cite{flake-kdd} provide a max-flow, min-cut algorithm to

240: identify cybercommunities. They also provide a focused crawling strategy to

241: approximate such communities.

242: Broder et al.~\cite{bowtie} perform a more detailed mapping of the web

243: and demonstrate that it has a bow-tie structure, which consists of

244: a strongly connected component, as well as nodes that

245: link into but are not linked

246: from the strongly connected component, and nodes that are linked from but

247: do not link to the strongly connected component.

248: Pirolli et al.~\cite{silk} use ideas from spreading activation

249: theory to subsume link analysis, content-based modeling, and usage

250: patterns.

251:

252: A final thread of research, while not centered on information networks,

253: emphasizes the modeling of problems and applications in ways that make them

254: amenable to graph-based analyses. A good example in this category is

255: the approach of Gibson et al.~\cite{gibson1} for mining categorical datasets.

256:

257: While many of these ideas, especially link analysis,

258: have found their way into recommender systems,

259: they have been primarily viewed as mechanisms to mine or model

260: structures. In this paper, we show how ideas from graph analysis

261: can actually serve to provide novel evaluation criteria for recommender

262: systems.

263: