Automatic identification of high impact articles in PubMed to support clinical decision making

Jiantao Bian; Mohammad Amin Morid; Siddhartha Jonnalagadda; Gang Luo; Guilherme Del Fiol

doi:10.1016/j.jbi.2017.07.015

Automatic identification of high impact articles in PubMed to support clinical decision making

J Biomed Inform. 2017 Sep:73:95-103. doi: 10.1016/j.jbi.2017.07.015. Epub 2017 Jul 26.

Authors

Jiantao Bian¹, Mohammad Amin Morid², Siddhartha Jonnalagadda³, Gang Luo⁴, Guilherme Del Fiol⁵

Affiliations

¹ Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
² Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City, UT, USA.
³ Microsoft Corporation, One Microsoft Way, Redmond, WA, USA.
⁴ Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
⁵ Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA. Electronic address: guilherme.delfiol@utah.edu.

Abstract

Objectives: The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®.

Methods: Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count.

Results: The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085).

Conclusion: The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support.

MeSH terms

Algorithms
Bibliometrics*
Clinical Decision-Making*
Evidence-Based Medicine*
Humans
Information Storage and Retrieval
MEDLINE
Machine Learning*
Metadata
PubMed*
Social Media

Abstract

MeSH terms

Grants and funding