Learning clinical networks from medical records based on information estimates in mixed-type data

PLoS Comput Biol. 2020 May 18;16(5):e1007866. doi: 10.1371/journal.pcbi.1007866. eCollection 2020 May.

Abstract

The precise diagnostics of complex diseases require to integrate a large amount of information from heterogeneous clinical and biomedical data, whose direct and indirect interdependences are notoriously difficult to assess. To this end, we propose an efficient computational approach to simultaneously compute and assess the significance of multivariate information between any combination of mixed-type (continuous/categorical) variables. The method is then used to uncover direct, indirect and possibly causal relationships between mixed-type data from medical records, by extending a recent machine learning method to reconstruct graphical models beyond simple categorical datasets. The method is shown to outperform existing tools on benchmark mixed-type datasets, before being applied to analyze the medical records of eldery patients with cognitive disorders from La Pitié-Salpêtrière Hospital, Paris. The resulting clinical network visually captures the global interdependences in these medical records and some facets of clinical diagnosis practice, without specific hypothesis nor prior knowledge on any clinically relevant information. In particular, it provides some physiological insights linking the consequence of cerebrovascular accidents to the atrophy of important brain structures associated to cognitive impairment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Datasets as Topic
  • Humans
  • Learning*
  • Machine Learning
  • Medical Records*
  • Paris

Grants and funding

HI received funding from IRIS data science program of PSL university, DIM program from Region Ile-de-France and Labex celtisphybio. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.