SMARTMINER:  A DEPTH FIRST ALGORITHM GUIDED BY TAIL INFORMATION FOR MINING MAXIMAL FREQUENT ITEMSETS
UCLA Technology Available For Licensing

Researchers in the UCLA Department of Computer Science have developed and reduced to practice an algorithm to find maximal frequent itemsets (MFI) for large datasets using tail information to augment dynamic reordering. This algorithm is faster than competing products by an order of magnitude.

This invention can be implemented as a data analyzing tool that can be bonded with database products (Sql2000, DB2, Oracle, etc.) or data processing software (SAS, Matlab, etc). It can be used to extract frequent patterns in real time for analyzing large datasets like genome data and medical data.

BACKGROUND:  There are three approaches for generating frequent itemsets (FI). The first is the candidate set generate-and-test approach. Most previous algorithms belong to this group. The second is a sampling approach. This reduces computation complexity but the result is incomplete. The third is a data transformation approach. The FP-tree method and the pattern decomposition algorithm (PDA) are examples of this approach. They both greatly reduce the original dataset and also do not need to generate candidate sets.

Generating FI, however, becomes infeasible if the frequent patterns are long because of the exponential number of frequent itemsets. Algorithms mining frequent closed itemsets (FCI) can have the same problem. Therefore, researchers turn to finding maximal frequent itemsets (MFI). A MFI is a frequent itemset that is not a subset of any other frequent itemset. Given a set of MFI, it is easy to analyze many properties of the dataset.

INNOVATION:  The SmartMiner algorithm finds exact maximal frequent itemsets for large datasets. The SmartMiner algorithm first uses global and local tail information to augment dynamic reordering to reduce the search tree. Second, the passing of tail information eliminates the need of known MFI for superset checking. Finally, SmartMiner also reduces the number of support counting for determining the frequency of tail items and thus greatly saves counting time.

DEVELOPMENT TO DATE:  Empirical testing shows that compared with Mafia and GenMax, SmartMiner generates a much smaller search tree, requires a smaller number of support counting, and does not require superset checking. Using the datasets Mushroom and Connect, experimental study reveals that SmartMiner generates the same MFI as Mafia and GenMax, but yields an order of magnitude improvement in speed.

Reference: UCLA Case No. 2003-390

For additional technical details and current licensing
availability, please contact the following UCLA office:

UCLA Office of Intellectual Property
11000 Kinross Avenue, Suite #200
Los Angeles, CA 90095-7231
Tel: 310-794-0558 Fax: 310-794-0638
email: ncd@research.ucla.edu
NCD URL:   http://www.research.ucla.edu/tech/ucla03-390.htm

Lead Inventor: Wesley Chu

UCLA Technologies Available for Licensing
http://www.research.ucla.edu/tech

Copyright © 2003 The Regents of the University of California.

keywords: communications uclancd ucla technologies intellectual property patents technology transfer invention business card