| UCLA Technology Available For Licensing |
This invention can be implemented as a data analyzing tool that can be bonded with database products (Sql2000, DB2, Oracle, etc.) or data processing software (SAS, Matlab, etc). It can be used to extract frequent patterns in real time for analyzing large datasets like genome data and medical data.
BACKGROUND: There are three approaches for generating frequent itemsets (FI). The first is the candidate set generate-and-test approach. Most previous algorithms belong to this group. The second is a sampling approach. This reduces computation complexity but the result is incomplete. The third is a data transformation approach. The FP-tree method and the pattern decomposition algorithm (PDA) are examples of this approach. They both greatly reduce the original dataset and also do not need to generate candidate sets.
Generating FI, however, becomes infeasible if the frequent patterns are long because of the exponential number of frequent itemsets. Algorithms mining frequent closed itemsets (FCI) can have the same problem. Therefore, researchers turn to finding maximal frequent itemsets (MFI). A MFI is a frequent itemset that is not a subset of any other frequent itemset. Given a set of MFI, it is easy to analyze many properties of the dataset.
INNOVATION: The SmartMiner algorithm finds exact maximal frequent itemsets for large datasets. The SmartMiner algorithm first uses global and local tail information to augment dynamic reordering to reduce the search tree. Second, the passing of tail information eliminates the need of known MFI for superset checking. Finally, SmartMiner also reduces the number of support counting for determining the frequency of tail items and thus greatly saves counting time.
DEVELOPMENT TO DATE: Empirical testing shows that compared with Mafia and GenMax, SmartMiner generates a much smaller search tree, requires a smaller number of support counting, and does not require superset checking. Using the datasets Mushroom and Connect, experimental study reveals that SmartMiner generates the same MFI as Mafia and GenMax, but yields an order of magnitude improvement in speed.
Reference: UCLA Case No. 2003-390
|
availability, please contact the following UCLA office:
|
|
Copyright © 2003 The Regents of the University of California.