1: \documentclass[12pt]{article}
2: \usepackage{amsthm}
3:
4: \theoremstyle{definition}
5: \newtheorem{definition}{Definition}
6:
7: \title{A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model}
8:
9: \author{Tom Brijs\footnote{Tom Brijs is a research fellow of the Fund for Scientific Research Flanders.}
10: \quad Bart Goethals \\
11: Gilbert Swinnen \quad Koen Vanhoof \quad Geert Wets \\
12: University of Limburg
13: }
14:
15: \date{}
16:
17: \begin{document}
18: \maketitle
19:
20: \begin{abstract}
21: In recent years, data mining researchers have developed efficient
22: association rule algorithms for retail market basket analysis.
23: Still, retailers often complain about how to adopt association
24: rules to optimize concrete retail marketing-mix decisions. It is
25: in this context that, in a previous paper, the authors have
26: introduced a product selection model called
27: PROFSET.\footnote{PROFSET stands for PROFitability per SET because
28: the optimization model is based on the calculation of the
29: profitability per frequent set in order to determine the
30: cross-selling potential between products.} This model selects the
31: most interesting products from a product assortment based on
32: their cross-selling potential given some retailer defined
33: constraints. However this model suffered from an important
34: deficiency: it could not deal effectively with supermarket data,
35: and no provisions were taken to include retail category
36: management principles. Therefore, in this paper, the authors
37: present an important generalization of the existing model in
38: order to make it suitable for supermarket data as well, and to
39: enable retailers to add category restrictions to the model.
40: Experiments on real world data obtained from a Belgian
41: supermarket chain produce very promising results and demonstrate
42: the effectiveness of the generalized PROFSET model.
43: \end{abstract}
44:
45: \section{Introduction}
46:
47: Since almost all mid to large size retailers today possess
48: electronic sales transaction systems, retailers realize that
49: competitive advantage will no longer be achieved by the mere use
50: of these systems for purposes of inventory management or
51: facilitating customer check-out. In contrast, competitive
52: advantage will be gained by those retailers who are able to
53: extract the knowledge hidden in the data, generated by those
54: systems, and use it to optimize their marketing decision making.
55: In this context, knowledge about how customers are using the
56: retail store is of critical importance and distinctive
57: competencies will be built by those retailers who best succeed in
58: extracting actionable knowledge from these data. Association rule
59: mining \cite{ais} can help retailers to efficiently extract this
60: knowledge from large retail databases. We assume some
61: familiarity with the basic notions of association rule mining.
62:
63: In recent years, a lot of effort in the area of retail market
64: basket analysis has been invested in the development of
65: techniques to increase the interestingness of association rules.
66: Currently, in essence three different research tracks to study
67: the interestingness of association rules can be distinguished.
68:
69: First, a number of objective measures of interestingness have
70: been developed in order to filter out non-interesting association
71: rules based on a number of statistical properties of the rules,
72: such as support and confidence \cite{ais}, interest
73: \cite{correlation}, intensity of implication \cite{implic},
74: J-measure \cite{nar}, and correlation \cite{prune}. Other
75: measures are based on the syntactical properties of the rules
76: \cite{p_analysis}, or they are used to discover the
77: least-redundant set of rules \cite{redundancy}. Second, it was
78: recognized that domain knowledge may also play an important role
79: in determining the interestingness of association rules.
80: Therefore, a number of subjective measures of interestingness
81: have been put forward, such as unexpectedness
82: \cite{unexpectedness}, actionability \cite{actipat} and rule
83: templates \cite{interest}. Finally, the most recent stream of
84: research advocates the evaluation of the interestingness of
85: associations in the light of the micro-economic framework of the
86: retailer \cite{papadimitriou}. More specifically, a pattern in
87: the data is considered interesting only to the extent in which it
88: can be used in the decision-making process of the enterprise to
89: increase its utility.
90:
91: It is in this latter stream of research that the authors have
92: previously developed a model for product selection called PROFSET
93: \cite{profset}, that takes into account both quantitative and
94: qualitative elements of retail domain knowledge in order to
95: determine the set of products that yields maximum cross-selling
96: profits. The key idea of the model is that products should not be
97: selected based on their individual profitability, but rather on
98: the \emph{total} profitability that they generate, including
99: profits from cross-selling. However, in its previous form, one
100: major drawback of the model was its inability to deal with
101: supermarket data (i.e., large baskets). To overcome this
102: limitation, in this paper we will propose an important
103: generalization of the existing PROFSET model that will
104: effectively deal with large baskets. Furthermore, we generalize
105: the model to include category management principles specified by
106: the retailer in order to make the output of the model even more
107: realistic.
108:
109: The remainder of the paper is organized as follows. In
110: Section~\ref{overview} we will focus on the limitations of the
111: previous PROFSET model for product selection. In
112: Section~\ref{general}, we will introduce the generalized PROFSET
113: model. Section~\ref{impl} will be devoted to the empirical
114: implementation of the model and its results on real-world
115: supermarket data. Finally, Section~\ref{concl} will be reserved
116: for conclusions and further research.
117:
118: \section{The PROFSET Model} \label{overview}
119:
120: The key idea of the PROFSET model is that when evaluating the
121: business value of a product, one should not only look at the
122: individual profits generated by that product (the na\"{i}ve
123: approach), but one must also take into account the profits due to
124: cross-selling effects with other products in the assortment.
125: Therefore, to evaluate product profitability, it is essential to
126: look at frequent sets rather than at individual product items
127: since the former represent frequently co-occuring product
128: combinations in the market baskets of the customer. As was also
129: stressed by Cabena et al.\ \cite{cabena}, one disadvantage of
130: associations discovery is that there is no provision for taking
131: into account the business value of an association. The PROFSET
132: model was a first attempt to solve this problem. Indeed, in terms
133: of the associations discovered, the sale of an expensive bottle
134: of wine with oysters accounts for as much as the sale of a carton
135: of milk with cereal. This example illustrates that, when
136: evaluating the interestingness of associations, the
137: micro-economic framework of the retailer should be incorporated.
138: PROFSET was developed to maximize cross-selling opportunities by
139: evaluating the profit margin generated per frequent set of
140: products, rather than per product. In the next Section we will
141: discuss the limitations of the previous PROFSET model. More
142: details can be found elsewhere \cite{profset}.
143:
144: \subsection{Limitations} \label{limit}
145:
146: The previous PROFSET model was specifically developed for market
147: basket data from automated convenience stores. Data sets of this
148: origin are characterized by small market baskets (size 2 or 3)
149: because customers typically do not purchase many items during a
150: single shopping visit. Therefore, the profit margin generated per
151: frequent purchase combination $(X)$ could accurately be
152: approximated by adding the profit margins of the market baskets
153: $(T_j)$ containing the same set of items, i.e.\, $X =T_j$.
154: However, for supermarket data, the existing formulation of the
155: PROFSET model poses significant problems since the size of market
156: baskets typically exceeds the size of frequent itemsets. Indeed,
157: in supermarket data, frequent itemsets mostly do not contain more
158: than 7 different products, whereas the size of the average market
159: basket is typically 10 to 15. As a result, the existing profit
160: allocation heuristic cannot be used anymore since it would cause
161: the model to heavily underestimate the profit potential from
162: cross-selling effects between products. However, getting rid of
163: this heuristic is not trivial and it will be discussed in detail
164: in Section~\ref{profalloc}.
165:
166: A second limitation of the existing PROFSET model relates to
167: principles of category management. Indeed, there is an increasing
168: trend in retailing to manage product categories as separate
169: strategic business units \cite{sagit}. In other words, because
170: of the trend to offer more products, retailers can no longer
171: evaluate and manage each product individually. Instead, they
172: define product categories and define marketing actions (such as
173: promotions or store layout) on the level of these categories. The
174: generalized PROFSET model takes this domain knowledge into
175: account and therefore offers the retailer the ability to specify
176: product categories and place restrictions on them.
177:
178: \section{The Generalized PROFSET Model} \label{general}
179:
180: In this section, we will highlight the improvements being made to
181: the previous PROFSET model \cite{profset}.
182:
183: \subsection{Profit Allocation} \label{profalloc}
184:
185: Avoiding the equality constraint $X = T_j$ results in different
186: possible profit allocation systems. Indeed, it is important to
187: recognize that the margin of transaction $T_j$ can potentially be
188: allocated to different frequent subsets of that transaction. In
189: other words, how should the margin $m(T_j)$ be allocated to one
190: or more different frequent subsets of $T_j$?
191:
192: The idea here is that we would like to know the purchase
193: intentions of the customer who bought $T_j$. Unfortunately, since
194: the customer has already left the store, we do not possess this
195: information. However, if we can assume that some items occur
196: more frequently together than others because they are considered
197: complementary by customers, then frequent itemsets may be
198: interpreted as purchase intentions of customers. Consequently,
199: there is the additional problem of finding out which and how many
200: purchase intentions are represented in a particular transaction
201: $T_j$. Indeed, a transaction may contain several frequent subsets
202: of different sizes, so it is not straightforward to determine
203: which frequent sets represent the underlying purchase intentions
204: of the customer at the time of shopping. Before proposing a
205: solution to this problem, we will first define the concept of a
206: maximal frequent subset of a transaction.
207:
208: \begin{definition}
209: Let $F$ be the collection of all frequent subsets of a sales
210: transaction $T_j$. Then $X \in F$ is called \emph{maximal},
211: denoted as $X_{\it max}$, if and only if $\forall Y \in F : |Y|
212: \leq |X|$.
213: \end{definition}
214:
215: Using this definition, we will adopt the following rationale to
216: allocate the margin $m(T_j)$ of a sales transaction $T_j$.
217:
218: If there exists a frequent set $X = T_j$, then we allocate
219: $m(T_j)$ to $M(X)$, just as in the previous PROFSET model.
220: However, if there is no such frequent set, then one maximal
221: frequent subset $X$ will be drawn from all maximal frequent
222: subsets according to the probability distribution $\Theta_{T_j}$,
223: with
224: $$\Theta_{T_j}(X_{\it max}) = \frac{\mbox{support}(X_{max})}{\sum_{Y_{\it max} \in T_j} \mbox{support}(Y_{\it max})}$$
225: After this, the margin $m(X)$ is assigned to $M(X)$ and the
226: process is repeated for $T_j \setminus X$. In summary:
227: \begin{center}
228: \parbox{\columnwidth}{
229: \begin{tabbing}
230: \quad \= \quad \= \kill
231: \textbf{for} every transaction $T_j$ \textbf{do} \{ \+ \\
232: \textbf{while} ($T_j$ contains frequent sets) \textbf{do} \{ \+ \\
233: Draw X from all maximal frequent subsets \\
234: using probability distribution $\Theta_{T_j}$; \\ [\medskipamount]
235: $M(X) := M(X) + m(X)$ \\
236: with $m(X)$ the profit margin of $X$ in $T_j$; \\ [\medskipamount]
237: $T_j := T_j \setminus X$; \- \\
238: \} \- \\
239: \} \\
240: \textbf{return} all $M(X)$;
241: \end{tabbing}
242: }
243: \end{center}
244: Say, during profit allocation, we are given a transaction
245: $$T = \{\mbox{cola}, \mbox{peanuts}, \mbox{cheese}\}.$$
246: Table~\ref{subsets} contains all frequent subsets of $T$ for a
247: particular transaction da\-ta\-base.
248: \begin{table}
249: \centering \caption{Frequent Subsets of $T_{100}$} \label{subsets}
250: \begin{tabular}{|lccc|}
251: \hline
252: \textbf{Frequent Sets} & \textbf{Support} & \textbf{Maximal} & \textbf{Unique} \\
253: \hline
254: \{cola\} & $10\%$ & No & No \\
255: \{peanuts\} & $5\%$ & No & No \\
256: \{cheese\} & $8\%$ & No & No \\
257: \{cola, peanuts\} & $2\%$ & Yes & No \\
258: \{peanuts, cheese\} & $1\%$ & Yes & No \\ \hline
259: \end{tabular}
260: \end{table}
261: In this example, there is no \emph{unique} maximal frequent subset
262: of $T$. Indeed, there are two maximal frequent subsets of $T$,
263: namely \{cola, peanuts\} and \{peanuts, cheese\}. Consequently,
264: it is not obvious to which maximal frequent subset the profit
265: margin $m(T)$ should be allocated. Moreover, we would not
266: allocate the entire profit margin $m(T)$ to the selected itemset,
267: but rather the proportion $m(X)$ that corresponds to the items
268: contained in the selected maximal subset.
269:
270: Now how can one determine to which of both frequent subsets of
271: $T$ this margin should be allocated? As we have already
272: discussed, the crucial idea here is that it really depends on
273: what has been the purchase intentions of the customer who
274: purchased $T$. Unfortunately, one can never know exactly since we
275: haven't asked the customer at the time of purchase. However, the
276: support of the frequent subsets of $T$ may provide some
277: probabilistic estimation. Indeed, if the support of a frequent
278: subset is an indicator for the probability of occurrence of this
279: purchase combination, then according to the data, customers buy
280: the maximal subset \{cola, peanuts\} two times more frequently
281: than the maximal subset \{peanuts, cheese\}. Consequently, we can
282: say that it is more likely that the customer's purchase intention
283: has been \{cola, peanuts\} instead of \{peanuts, cheese\}. This
284: information is used to construct the probability distribution
285: $\Theta_{T_j}$, reflecting the relative frequencies of the
286: frequent subsets of $T$. Now, each time a sales transaction
287: \{cola, peanuts, cheese\} is encountered in the data, a random
288: draw from the probability distribution $\Theta_{T_j}$ will
289: provide the \emph{most probable} purchase intention (i.e.
290: frequent subset) for that transaction. Consequently, on average
291: in two of the three times this transaction is encountered,
292: maximal subset \{cola, peanuts\} will be selected and
293: $m(\{\mbox{cola}, \mbox{peanuts}\})$ will be allocated to
294: $M(\{\mbox{cola}, \mbox{peanuts}\})$. After this, $T$ is split up
295: as follows: $T := T \setminus \{\mbox{cola}, \mbox{peanuts}\}$
296: and the process of assigning the remaining margin is repeated as
297: if the new $T$ were a separate transaction, until $T$ does not
298: contain a frequent set anymore.
299:
300: \subsection{Category Management Restrictions} \label{category}
301:
302: As pointed out in Section~\ref{limit}, a second limitation of the
303: previous PROFSET model is its inability to include category
304: management restrictions. This sometimes causes the model to
305: exclude even all products from one or more categories because
306: they do not contribute enough to the overall profitability of the
307: optimal set. This often contradicts with the mission of
308: retailers to offer customers a wide range of products, even if
309: some of those categories or products are not profitable enough.
310: Indeed, customers expect supermarkets to carry a wide variety of
311: products and cutting away categories/departments would be against
312: the customers' expectations about the supermarket and would harm
313: the store's image. Therefore, we want to offer the retailer the
314: ability to include category restrictions into the generalized
315: PROFSET model.
316:
317: This can be accomplished by adding an additional index $k$ to the
318: product variable $Q_i$ to account for category membership, and by
319: adding constraints on the category level. Several kinds of
320: category restrictions can be introduced: which and how many
321: categories should be included in the optimal set, or how many
322: products from each category should be included. The relevance of
323: these restrictions can be illustrated by the following common
324: practices in retailing. First, when composing a promotion
325: leaflet, there is only limited space to display products and
326: therefore it is important to optimize the product composition in
327: order to maximize cross-selling effects between products and
328: avoid product cannibalization. Moreover, according to the
329: particular retail environment, the retailer will include or
330: exclude specific products or product categories in the leaflet.
331: For example, the supermarket in this study attempts to
332: differentiate from the competition by the following image
333: components: \emph{fresh}, \emph{profitable} and \emph{friendly}.
334: Therefore, the promotion leaflet of the retailer emphasizes
335: product categories that support this image, such as fresh
336: vegetables and meat, freshly-baked bread, ready-made meals, and
337: others. Second, product category constraints may reflect shelf
338: space allocations to products. For instance, large categories
339: have more product facings than smaller categories. These kind of
340: constraints can easily be included in the generalized PROFSET
341: model as will be discussed hereafter.
342:
343: \subsection{The Generalized PROFSET Model}
344:
345: Bundling the improvements suggested in Sections~\ref{profalloc}
346: and~\ref{category} results in the generalized PROFSET model as
347: presented below.
348:
349: Let categories $C_1, \ldots, C_n$ be sets of items, $L$ the set
350: of frequent itemsets, and let $P_X$, $Q_i \in \{0,1\}$ be the
351: decision variables for which the optimization routine must find
352: the optimal values. $P_X$ specifies whether an itemset $X$ will
353: positively contribute to the value of the objective function, and
354: $Q_i$ equals 1 as soon as any itemset $X$ in which it is included
355: is set to 1 ($P_X = 1$) by the optimization routine. Let ${\rm
356: Cost}_i$ be the inventory and handling cost of item $i$. The
357: objective of the following formula is to maximize all profits
358: from cross-selling effects between products:
359:
360: $$\mbox{max}\left( \sum_{X \in L} M(X) P_X - \sum_{c=1}^{n}\sum_{i \in C_c} {\rm Cost}_i Q_i\right)$$
361:
362: which is subject to the following constraints
363: \begin{eqnarray}
364: \label{c1}
365: \sum_{c=1}^{n} \sum_{i \in C_c} Q_i = \mbox{ItemMax} \\
366: \label{c2}
367: \forall X \in L,\:\forall i \in X:Q_i \geq P_X \\
368: \label{c3} \forall C_c: \sum_{i \in C_c} Q_i \geq
369: \mbox{ItemMin}_{C_c}
370: \end{eqnarray}
371:
372: Constraint~\ref{c1} determines how many items are allowed to be
373: included in the optimal set. The $\mbox{ItemMax}$ parameter,
374: specified by the retailer, will depend on the retail environment
375: in which the model is being used. For instance, it may be the
376: number of eye-catchers (products obtaining special display space)
377: in the supermarket or the number of facings in a promotion
378: leaflet. Constraint~\ref{c2} is analogous to the one in the
379: previous PROFSET model and specifies the relationship between the
380: frequent sets and the products contained in them. Finally,
381: constraint~\ref{c3} specifies the number of categories and the
382: number of products that are allowed, within each category, to
383: enter the optimal set.
384:
385: \section{Empirical Study} \label{impl}
386:
387: The empirical study is based on a data set of $18\,182$ market
388: baskets obtained from a sales outlet of a Belgian supermarket
389: chain over a period of 1 month. The store carries $9\,965$
390: different products grouped in 281 product categories. The
391: average market basket contains $10.6$ different product items. In
392: total, $3\,381$ customers own a loyalty card of the supermarket under
393: study.
394:
395: First, frequent sets and association rules were discovered from
396: the market baskets with a minimum absolute support threshold of
397: 30 transactions. The motivation behind this is that a product or
398: set of products should have been sold at least, approximately,
399: once a day to be called frequent. Slightly more than $87\%$ of
400: the products are sold less than once a day.
401:
402: The retailer in question is interested in finding the optimal set
403: of eye-catchers such that the profit from cross-selling these
404: eye-catchers is maximized. Hence, this should be represented by
405: the objective function as described in the previous section.
406: However, because of limited shelf-space for each product
407: category, the retailer specified that each product category can
408: only delegate one product to the optimal set, represented by the
409: category constraint (i.e. constraint~\ref{c3}). Subsequently, it
410: is the goal of the generalized PROFSET model to select the most
411: profitable set of products in terms of cross-selling
412: opportunities between the delegates of each category.
413:
414: For 54 $(24,7\%)$ of the 218 product categories, the generalized
415: PROFSET model selects a different product than the one with the
416: highest individual profit ranking within each category. This
417: suggests that for these products, there must be some
418: cross-selling opportunity with eye-catchers from other categories
419: which cause these products to get \emph{promoted} in the
420: profitability ranking.
421:
422: Due to space limitations Table~\ref{profit} shows the relative
423: improvements in cross-selling profit for only some categories,
424: expressed as the percentage of improvement in cross-selling
425: profits by choosing the optimal products from the generalized
426: PROFSET model instead of selecting the product with the highest
427: individual profitability within each category.
428:
429: \begin{table}
430: \caption{Cross-selling profit improvements} \label{profit}
431: \centering
432: \begin{tabular}{|lc|}
433: \hline
434: \textbf{Category} & \textbf{Improvement} \\
435: \hline
436: Washing-up liquid & 21\% \\
437: Baby food & 49\% \\
438: Margarine 1 & 189\% \\
439: Coffee biscuits & 14\% \\
440: Sandwich filling & 43\% \\
441: Candy bars & 588\% \\
442: Canned fish & N/A \\
443: Canned fruit & 3\% \\
444: Packed-up bread & 8\% \\
445: Newspapers and magazines & 55\% \\
446: $\ldots$ & $\ldots$ \\
447: \hline
448: \end{tabular}
449: \end{table}
450: \begin{table*}[t]
451: \centering \caption{Own and cross-selling profit figures (in BEF)
452: per product} \label{cross}
453: \begin{tabular}{|lccc|}
454: \hline
455: & \textbf{Own} & \textbf{Cross-selling} & \textbf{Total} \\
456: \textbf{Product} & \textbf{profit} & \textbf{profit} & \textbf{profit} \\
457: \hline
458: 1. {\sc milky way mini} & $37\,808$ & $2\,350$ & $40\,158$ \\
459: 2. {\sc melo cakes} & $34\,333$ & 0 & $34\,333$ \\
460: 3. {\sc Leo 3-pack} & $28\,728$ & 0 & $28\,728$ \\
461: 4. {\sc Leo 10-pack} 10+2 & $12\,028$ & $264\,228$ & $276\,256$ \\
462: \hline
463: \end{tabular}
464: \end{table*}
465:
466: It would lead us too far to discuss the profit improvements in
467: detail for all categories. Therefore, we will highlight one of
468: the most striking results to illustrate the power of the model.
469: Analogous conclusions can be obtained for other categories. Note
470: that N/A means that there is no alternative product available in
471: that category that has enough support to be frequent, such that
472: comparison with the product, selected by the generalized PROFSET
473: model, is not applicable. For instance, for the category candy
474: bars, the profit from cross-selling the selected eye-catcher of
475: this category with eye-catchers of other categories would increase
476: cross-selling profits by $588\%$. This can be observed in
477: Table~\ref{cross} (only relevant products are included).
478:
479:
480: Table~\ref{cross} illustrates that product 4 in the candy bars
481: category is ranked last when looking at its own profit. However,
482: due to large cross-selling effects with eye-catchers of other
483: product categories, this product becomes much more important when
484: looking at the total profit. This illustrates that for the
485: eye-catchers application, it is better to display product `Leo
486: 10-pack 10+2' than to display one of its competing products in
487: the same category. In contrast, if the objective would be the
488: selling volume of the individual product, then it would be better
489: to select product 1 as eye-catcher, but since the retailer wants
490: the customer to buy other products with it, product 4 will
491: definitely be the best choice. The association rules discovered
492: during the mining phase validate these conclusions. \\
493: [\medskipamount]
494: {\sc milky way}$\Rightarrow${\sc vegetable/fruit} \\
495: (sup=$0.17\%$, conf=$50.82\%$) \\
496: {\sc meat product and Leo 10-pack} $\Rightarrow${\sc cheese
497: product} \\
498: (sup=$0.396\%$, conf=$55\%$)
499: \\ [\medskipamount] Note that the products included in the rules
500: are all eye-catchers such as determined by the generalized
501: PROFSET model. The reason that the other items contained in the
502: association rules carry a rather abstract name, such as ``cheese
503: product'', is because this is a collective noun for products that
504: do not have an own barcode, like for instance different cheese
505: products that are weighed at the check-out after which they are
506: grouped into an abstract product name such as ``cheese product''.
507:
508: Finally, for those product categories that do not contain
509: frequent products, the generalized PROFSET model will choose the
510: product with the highest individual profit in order to maximize
511: the overall profitability of the eye-catcher set.
512:
513: \section{Further Research} \label{concl}
514:
515: The authors plan to test the proposed model in practice and
516: externally validate its performance based on a real world
517: experiment in cooperation with the Belgian supermarket chain.
518: Furthermore, additional improvements to the model will be
519: considered. More specifically, it will be studied how promotion
520: coupons affect the composition of the optimal set of products and
521: whether it is possible to measure the effect of the value price
522: reduction on the cross-selling profitability of products.
523:
524: \bibliographystyle{plain}
525: \begin{thebibliography}{10}
526:
527: \bibitem{actipat}
528: G.~Adomavicius and A.~Tuzhilin.
529: \newblock Discovery of actionable patterns in databases: the action hierarchy
530: approach.
531: \newblock In D.~Heckerman, H.~Mannila, and D.~Pregibon, editors, {\em
532: Proceedings of the Third International Conference on Knowledge Discovery \&
533: Data Mining}, pages 111--114. AAAI Press, 1997.
534:
535: \bibitem{ais}
536: R.~Agrawal, T.~Imielinski, and A.N. Swami.
537: \newblock Mining association rules between sets of items in large databases.
538: \newblock In {\em Proceedings of the 1993 {ACM SIGMOD} International Conference
539: on Management of Data}, volume 22:2 of {\em SIGMOD Record}, pages 207--216.
540: ACM Press, 1993.
541:
542: \bibitem{profset}
543: T.~Brijs, G.~Swinnen, K.~Vanhoof, and G.~Wets.
544: \newblock Using association rules for product assortment decisions: a case
545: study.
546: \newblock In Heckerman et~al. \cite{kdd99}, pages 254--260.
547:
548: \bibitem{redundancy}
549: T.~Brijs, K.~Vanhoof, and G.~Wets.
550: \newblock Reducing redundancy in characteristic rule discovery by using integer
551: programming techniques.
552: \newblock In {\em Intelligent Data Analysis Journal}, volume 4:3. Elsevier,
553: 2000.
554: \newblock To Appear.
555:
556: \bibitem{cabena}
557: P.~Cabena, P.~Hadjinian, R.~Stadler, J.~Verhees, and A.~Zanasi.
558: \newblock {\em Discovering Data Mining: From Concept to Implementation}.
559: \newblock Prentice Hall, 1997.
560:
561: \bibitem{sagit}
562: G.~Cuomo and A.~Pastore.
563: \newblock A category management application in the frozen food sector in italy:
564: The unilever-sagit case.
565: \newblock In A.~Broadbridge, editor, {\em Proceedings of the 10th International
566: Conference on Research in the Distributive Trades}, pages 225--233. Institute
567: for Retail Studies: University of Stirling, 1999.
568:
569: \bibitem{implic}
570: S.~Guillaume, F.~Guillet, and J.~Philippé.
571: \newblock Improving the discovery of association rules with intensity of
572: implication.
573: \newblock In {\em Principles of Data Mining and Knowledge Discovery}, volume
574: 1510 of {\em Lecture Notes in Artificial Intelligence}, pages 318--327.
575: Springer, 1998.
576:
577: \bibitem{kdd99}
578: D.~Heckerman, H.~Mannila, and D.~Pregibon, editors.
579: \newblock {\em Proceedings of the Fifth International Conference on Knowledge
580: Discovery \& Data Mining}. AAAI Press, 1997.
581:
582: \bibitem{papadimitriou}
583: J.~Kleinberg, C.~Papadimitriou, and P.~Raghavan.
584: \newblock A microeconomic view of data mining.
585: \newblock In {\em Knowledge Discovery and Data Mining}, volume 2:4, pages
586: 254--260. Kluwer Academic Publishers, 1998.
587:
588: \bibitem{interest}
589: M.~Klemettinen, H.~Mannila, P.~Ronkainen, H.~Toivonen, and A.I. Verkamo.
590: \newblock Finding interesting rules from large sets of discovered association
591: rules.
592: \newblock In Nabil~R. Adam, Bharat~K. Bhargava, and Yelena Yesha, editors, {\em
593: Proceedings of the Third International Conference on Information and
594: Knowledge Management}, pages 401--407. ACM Press, 1994.
595:
596: \bibitem{p_analysis}
597: B.~Liu and W.~Hsu.
598: \newblock Post-analysis of learned rules.
599: \newblock In {\em Proceedings of the Thirteenth National Conference on
600: Artificial Intelligence}, Lecture Notes in Artificial Intelligence, pages
601: 828--834. AAAI Press/MIT Press, 1996.
602:
603: \bibitem{prune}
604: B.~Liu, W.~Hsu, and Y.~Ma.
605: \newblock Pruning and summarizing the discovered associations.
606: \newblock In Heckerman et~al. \cite{kdd99}, pages 125--134.
607:
608: \bibitem{unexpectedness}
609: B.~Padmanabhan and A.~Tuzhilin.
610: \newblock Unexpectedness as a measure of interestingness in knowledge
611: discovery.
612: \newblock In {\em Decision Support Systems}, volume~27, pages 303--318.
613: Elsevier Science, 1999.
614:
615: \bibitem{correlation}
616: C.~Silverstein, S.~Brin, and R.~Motwani.
617: \newblock Beyond market baskets: generalizing association rules to dependence
618: rules.
619: \newblock In {\em Knowledge Discovery and Data Mining}, volume 2:1, pages
620: 39--68. Kluwer Academic Publishers, 1998.
621:
622: \bibitem{nar}
623: K.~Wang, S.H.W. Tay, and B.~Liu.
624: \newblock Interestingness-based interval merger for numeric association rules.
625: \newblock In R.~Agrawal, P.~Stolorz, and G.~Piatetsky-Shapiro, editors, {\em
626: Proceedings of the Fourth International Conference on Knowledge Discovery \&
627: Data Mining}, pages 121--127. AAAI Press, 1998.
628:
629: \end{thebibliography}
630:
631: \end{document}
632: