Immune-based Machine learning Prediction of Diagnosis and Illness State in schizophrenia and bipolar Disorder: How data bias and overfitting were avoided

Brain Behav Immun. 2024 Dec 17:125:33-34. doi: 10.1016/j.bbi.2024.11.037. Online ahead of print.

Abstract

In a letter critiquing our manuscript, Takefuji highlights general pitfalls in machine learning, without directly engaging with our study. The comments provide generic advice rather than a specific critique of our methods or findings. Despite raising important topics, the concerns reflect standard risks in machine learning, which we were aware of and explicitly addressed in our analyses. We applied rigorous methods, including nested cross-validation, stratified sampling, and comprehensive performance metrics, to mitigate overfitting, class imbalance, and potential biases. Traditional statistical methods, such as ANCOVA and Spearman correlations, were employed and supplemented our machine learning analysis to validate findings. Concerns about collinearity, causality, and data preprocessing were acknowledged and addressed as detailed in the manuscript and supplementary materials. Although the critique underscores critical issues in machine learning, it does not identify specific missteps in our study. We conclude that our analyses align with best practices and sufficiently address the potential pitfalls discussed in the commentary.

Keywords: Bipolar disorder; Machine learning; Nested cross-validation; Overfitting; Schizophrenia.

Publication types

  • Letter