1bd01ada8f699bcd.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: % the importance of interactions
3: % the difficulties of detecting interactions
4: % frequent itemset mining
5: % random intersection
6: % efficiency and effectiveness 
7:   Interactions between several features sometimes play an important role in prediction tasks.
8:   But taking all the interactions into consideration will lead to an extremely heavy computational burden.
9:   For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied.
10:   % Many efforts have been devoted to deal with interaction selection, most of which are based on the hierarchy assumption. 
11:   % But these approaches may miss some informative interactions, and not efficient enough. 
12:   Inspired by association rule mining, we propose a method that selects interactions of categorical features, called Random Intersection Chains.
13:   It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them.
14:   At first a number of chains are generated, in which each node is the intersection of the previous node and a random chosen observation.
15:   The frequency of patterns in the tail nodes is estimated by maximum likelihood estimation, then the patterns with largest estimated frequency are selected. 
16:   After that, their confidence is calculated by Bayes’ theorem.
17:   The most confident patterns are finally returned by Random Intersection Chains.
18:   We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set.
19:   We analyze the computation complexity of the proposed algorithm and prove the convergence of the estimators.
20:   The results of a series of experiments verify the efficiency and effectiveness of the algorithm.
21: \end{abstract}
22: