The database makes the poison: How the selection of datasets in QSAR models impacts toxicant prediction of higher tier endpoints

Regul Toxicol Pharmacol. 2024 Aug:151:105663. doi: 10.1016/j.yrtph.2024.105663. Epub 2024 Jun 12.

Abstract

As the United States and the European Union continue their steady march towards the acceptance of new approach methodologies (NAMs), we need to ensure that the available tools are fit for purpose. Critics will be well-positioned to caution against NAMs acceptance and adoption if the tools turn out to be inadequate. In this paper, we focus on Quantitative Structure Activity-Relationship (QSAR) models and highlight how the training database affects quality and performance of these models. Our analysis goes to the point of asking, "are the endpoints extracted from the experimental studies in the database trustworthy, or are they false negatives/positives themselves?" We also discuss the impacts of chemistry on QSAR models, including issues with 2-D structure analyses when dealing with isomers, metabolism, and toxicokinetics. We close our analysis with a discussion of challenges associated with translational toxicology, specifically the lack of adverse outcome pathways/adverse outcome pathway networks (AOPs/AOPNs) for many higher tier endpoints. We recognize that it takes a collaborate effort to build better and higher quality QSAR models especially for higher tier toxicological endpoints. Hence, it is critical to bring toxicologists, statisticians, and machine learning specialists together to discuss and solve these challenges to get relevant predictions.

MeSH terms

  • Adverse Outcome Pathways
  • Animals
  • Databases, Factual*
  • Endpoint Determination
  • Humans
  • Quantitative Structure-Activity Relationship*
  • Toxicology / methods