Imitating manual curation of text-mined facts in biomedicine

Raul Rodriguez-Esteban; Ivan Iossifov; Andrey Rzhetsky

doi:10.1371/journal.pcbi.0020118

Imitating manual curation of text-mined facts in biomedicine

PLoS Comput Biol. 2006 Sep 8;2(9):e118. doi: 10.1371/journal.pcbi.0020118. Epub 2006 Jul 27.

Authors

Raul Rodriguez-Esteban¹, Ivan Iossifov, Andrey Rzhetsky

Affiliation

¹ Department of Electrical Engineering, Columbia University, New York, New York, United States of America.

Abstract

Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Abstracting and Indexing / methods*
Algorithms
Artificial Intelligence*
Automation
Biomedical Research*
Cocaine
Computational Biology
Computer Simulation
Sensitivity and Specificity

Substances

Cocaine

Abstract

Publication types

MeSH terms

Substances

Grants and funding