| UCLA Technology Available For Licensing |
BACKGROUND: Clustering of high dimensional data for data mining has applications as far ranged as bioinformatics, marketing, machine learning, and data analysis. Heuristics based on k-means variants have dominated the field, because of their simplicity and intuitive scope, but do not guarantee performance or provable optimization limits. Various heuristics, such as k-harmonic means and approximation methods have attempted to overcome this, but in many cases still achieve only local optima or are highly dependent on initial clustering results. The current innovation constructs a k-means clusterability criterion for data sets, based on a novel probabilistic seeding process for starting configurations of Lloyd-type methods. Variants of these heuristics lead to provably near-optimal clustering solutions applied to well-clusterable instances, and are candidates for faster-than-practice existing algorithms (and, in addition, are faster than recently proposed approximation algorithms).
INNOVATION: Novel seeding processes and heuristics for k-means clustering algorithms (based on Lloyd-type methods) with provably near-optimal performance.
POTENTIAL APPLICATIONS
ADVANTAGES
DEVELOPMENT-TO-DATE: The invention has been mathematically proven. Implementation is straightforward for those skilled in clustering and data mining algorithmic implementation.
Reference: UCLA Case No. 2007-049
|
availability, please contact the following UCLA office:
|
|
Copyright © 2007 The Regents of the University of California.