A KNOWLEDGE BASED APPROACH FOR SCENARIO SPECIFIC CONTENT CORRELATION IN A MEDICAL DIGITAL LIBRARY
UCLA Technology Available For Licensing

Researchers in the UCLA Department of Computer Science have developed and reduced to practice algorithms and methods for obtaining information, primarily medical information, from free text sources, such as patient medical records. The techniques involve 3 sets of innovations: (1) keyword extraction and indexing, (2) query expansion and (3) phrase based vector space models of document retrieval.

The first step is to generate conceptual phrase candidates in clinical texts and identifying phrases for indexing (UCLA Case 2003-358). The second step is query expansion, or content correlation, described herein. Finally, there is document retrieval which is described in Case 2003-510.

INNOVATION:  Query expansion is the technique wherein the user free form query cascades into further, more general or more detailed concepts using known word associations. For example, if a practitioner wants information on cancer cures, she must manually query for "cancer cure" as well as other phrases. It is preferable that the query engine automatically expand the query to "chemotherapy" or "radiation therapy" as well as "cancer cure". This provides substantially stronger queries than simple "Google" type inverted files, in which there must be an exact word match. The researchers are able to uncover more phrases or concepts not obtainable from simple noun phrases without creating irrelevant or improper combinations. The algorithms use novel indexing structures for matching text to UMLS concept as well as novel storage and searching techniques.

This innovation takes a knowledge-based approach to content correlation by utilizing the Unified Medical Language System (UMLS; http://lhncbc.nlm.nih.gov/apdb/) Metathesaurus as the knowledge source. Medical concepts such as "Surgery, Lung" have identifiers in the Thesaurus (in this case 38903). With proper indexing, query expansion techniques can be used to map query phrases to a variety of UMLS concept identifiers.

DEVELOPMENT TO DATE:  Query expansion has been implemented with superior results for document relevance and precision recall.

This work was performed by the CoBase Database Group at UCLA (http://www.cobase.cs.ucla.edu/)

Reference: UCLA Case No. 2003-357

For additional technical details and current licensing
availability, please contact the following UCLA office:

UCLA Office of Intellectual Property
11000 Kinross Avenue, Suite #200
Los Angeles, CA 90095-7231
Tel: 310-794-0558 Fax: 310-794-0638
email: ncd@research.ucla.edu
NCD URL:   http://www.research.ucla.edu/tech/ucla03-357.htm

Lead Inventor: Wesley Chu

UCLA Technologies Available for Licensing
http://www.research.ucla.edu/tech

Copyright © 2003 The Regents of the University of California.

keywords: bioinformatics datamining medical devices uclancd ucla technologies intellectual property patents technology transfer invention business card