SEEDING METHOD FOR K-MEANS CLUSTERING AND OTHER CLUSTERING ALGORITHMS
UCLA Technology Available For Licensing

UCLA researchers in the department of Computer Science have developed new variations on k-means clustering algorithms for data analysis, based on novel seeding methods, that lead to provably near-optimal solutions when applied to well-clusterable instances.

BACKGROUND:  Clustering of high dimensional data for data mining has applications as far ranged as bioinformatics, marketing, machine learning, and data analysis. Heuristics based on k-means variants have dominated the field, because of their simplicity and intuitive scope, but do not guarantee performance or provable optimization limits. Various heuristics, such as k-harmonic means and approximation methods have attempted to overcome this, but in many cases still achieve only local optima or are highly dependent on initial clustering results. The current innovation constructs a k-means clusterability criterion for data sets, based on a novel probabilistic seeding process for starting configurations of Lloyd-type methods. Variants of these heuristics lead to provably near-optimal clustering solutions applied to well-clusterable instances, and are candidates for faster-than-practice existing algorithms (and, in addition, are faster than recently proposed approximation algorithms).

INNOVATION:  Novel seeding processes and heuristics for k-means clustering algorithms (based on Lloyd-type methods) with provably near optimal performance.

POTENTIAL APPLICATIONS 

ADVANTAGES

DEVELOPMENT-TO-DATE:  The invention has been mathematically proven. Implementation is straightforward for those skilled in clustering and data mining algorithmic implementation.

Reference: UCLA Case No. 2007-049

For additional technical details and current licensing
availability, please contact the following UCLA office:

UCLA Office of Intellectual Property
11000 Kinross Avenue, Suite #200
Los Angeles, CA 90095-7231
Tel: 310-794-0558 Fax: 310-794-0638
email: ncd@research.ucla.edu
NCD URL:   http://www.research.ucla.edu/tech/ucla07-049.htm

Lead Inventor: Rafail Ostrovsky

UCLA Technologies Available for Licensing
http://www.research.ucla.edu/oipa/industry

Copyright © 2007 The Regents of the University of California.

keywords: clustering, data mining, k-means, Lloyd-type, bioinformatics, machine learning uclancd ucla latest inventions technology top ten 10 technologies intellectual property patents technology transfer invention business card