1: \begin{abstract}% <- trailing '%' for backward compatibility of .sty file
2: % the importance of interactions
3: % the difficulties of detecting interactions
4: % frequent itemset mining
5: % random intersection
6: % efficiency and effectiveness
7: Interactions between several features sometimes play an important role in prediction tasks.
8: But taking all the interactions into consideration will lead to an extremely heavy computational burden.
9: For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied.
10: % Many efforts have been devoted to deal with interaction selection, most of which are based on the hierarchy assumption.
11: % But these approaches may miss some informative interactions, and not efficient enough.
12: Inspired by association rule mining, we propose a method that selects interactions of categorical features, called Random Intersection Chains.
13: It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them.
14: At first a number of chains are generated, in which each node is the intersection of the previous node and a random chosen observation.
15: The frequency of patterns in the tail nodes is estimated by maximum likelihood estimation, then the patterns with largest estimated frequency are selected.
16: After that, their confidence is calculated by Bayes’ theorem.
17: The most confident patterns are finally returned by Random Intersection Chains.
18: We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set.
19: We analyze the computation complexity of the proposed algorithm and prove the convergence of the estimators.
20: The results of a series of experiments verify the efficiency and effectiveness of the algorithm.
21: \end{abstract}
22: