Bridging big data in the ENIGMA consortium to combine non-equivalent cognitive measures

Sci Rep. 2024 Oct 16;14(1):24289. doi: 10.1038/s41598-024-72968-x.

Abstract

Investigators in neuroscience have turned to Big Data to address replication and reliability issues by increasing sample size. These efforts unveil new questions about how to integrate data across distinct sources and instruments. The goal of this study was to link scores across common auditory verbal learning tasks (AVLTs). This international secondary analysis aggregated multisite raw data for AVLTs across 53 studies totaling 10,505 individuals. Using the ComBat-GAM algorithm, we isolated and removed the component of memory scores associated with site effects while preserving instrumental effects. After adjustment, a continuous item response theory model used multiple memory items of varying difficulty to estimate each individual's latent verbal learning ability on a single scale. Equivalent raw scores across AVLTs were then found by linking individuals through the ability scale. Harmonization reduced total cross-site score variance by 37% while preserving meaningful memory effects. Age had the largest impact on scores overall (- 11.4%), while race/ethnicity variable was not significant (p > 0.05). The resulting tools were validated on dually administered tests. The conversion tool is available online so researchers and clinicians can convert memory scores across instruments. This work demonstrates that global harmonization initiatives can address reproducibility challenges across the behavioral sciences.

Keywords: Harmonization; Item response theory; Mega analysis; Traumatic brain injury; Verbal learning.

MeSH terms

  • Adult
  • Aged
  • Big Data*
  • Cognition* / physiology
  • Female
  • Humans
  • Male
  • Memory / physiology
  • Middle Aged
  • Neuropsychological Tests
  • Reproducibility of Results
  • Verbal Learning / physiology
  • Young Adult

Grants and funding