Building a framework for fake news detection in the health domain

Juan R Martinez-Rico; Lourdes Araujo; Juan Martinez-Romo

doi:10.1371/journal.pone.0305362

Building a framework for fake news detection in the health domain

PLoS One. 2024 Jul 8;19(7):e0305362. doi: 10.1371/journal.pone.0305362. eCollection 2024.

Authors

Juan R Martinez-Rico¹, Lourdes Araujo^{1

2}, Juan Martinez-Romo^{1

2}

Affiliations

¹ NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain.
² Instituto Mixto de Investigación - Escuela Nacional de Sanidad (IMIENS), Madrid, Spain.

Abstract

Disinformation in the medical field is a growing problem that carries a significant risk. Therefore, it is crucial to detect and combat it effectively. In this article, we provide three elements to aid in this fight: 1) a new framework that collects health-related articles from verification entities and facilitates their check-worthiness and fact-checking annotation at the sentence level; 2) a corpus generated using this framework, composed of 10335 sentences annotated in these two concepts and grouped into 327 articles, which we call KEANE (faKe nEws At seNtence lEvel); and 3) a new model for verifying fake news that combines specific identifiers of the medical domain with triplets subject-predicate-object, using Transformers and feedforward neural networks at the sentence level. This model predicts the fact-checking of sentences and evaluates the veracity of the entire article. After training this model on our corpus, we achieved remarkable results in the binary classification of sentences (check-worthiness F1: 0.749, fact-checking F1: 0.698) and in the final classification of complete articles (F1: 0.703). We also tested its performance against another public dataset and found that it performed better than most systems evaluated on that dataset. Moreover, the corpus we provide differs from other existing corpora in its duality of sentence-article annotation, which can provide an additional level of justification of the prediction of truth or untruth made by the model.

Copyright: © 2024 Martinez-Rico et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Deception
Disinformation*
Humans
Natural Language Processing
Neural Networks, Computer

Grants and funding

INITIALS: LAS, JMR GRANT NUMBER: PID2019-106942RB-C32 FUNDER: Spanish Ministry of Science and Innovation URL FUNDER: https://www.ciencia.gob.es/ INITIALS: JMR, LAS GRANT NUMBER: TED2021-130398B-C21 FUNDER: Spanish Ministry of Science and Innovation URL FUNDER: https://www.ciencia.gob.es/ INITIALS: JMR, LAS GRANT NUMBER: PID2022-136522OB-C21 FUNDER: Spanish Ministry of Science and Innovation URL FUNDER: https://www.ciencia.gob.es/ INITIALS: LAS GRANT NUMBER: RAICES (IMIENS 2022) FUNDER: IMIENS (Instituto Mixto de Investigación-Escuela Nacional de Sanidad) URL FUNDER: https://www.imiens.es/.