abstract:1bd01ada8f699bcd.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: % the importance of interactions

3: % the difficulties of detecting interactions

4: % frequent itemset mining

5: % random intersection

6: % efficiency and effectiveness

7:   Interactions between several features sometimes play an important role in prediction tasks.

8:   But taking all the interactions into consideration will lead to an extremely heavy computational burden.

9:   For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied.

10:   % Many efforts have been devoted to deal with interaction selection, most of which are based on the hierarchy assumption.

11:   % But these approaches may miss some informative interactions, and not efficient enough.

12:   Inspired by association rule mining, we propose a method that selects interactions of categorical features, called Random Intersection Chains.

13:   It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them.

14:   At first a number of chains are generated, in which each node is the intersection of the previous node and a random chosen observation.

15:   The frequency of patterns in the tail nodes is estimated by maximum likelihood estimation, then the patterns with largest estimated frequency are selected.

16:   After that, their confidence is calculated by Bayes’ theorem.

17:   The most confident patterns are finally returned by Random Intersection Chains.

18:   We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set.

19:   We analyze the computation complexity of the proposed algorithm and prove the convergence of the estimators.

20:   The results of a series of experiments verify the efficiency and effectiveness of the algorithm.

21: \end{abstract}

22: