A Prediction Model for Uncontrolled Type 2 Diabetes Mellitus Incorporating Area-level Social Determinants of Health

Sanjay Basu; Rajiv Narayanaswamy

doi:10.1097/MLR.0000000000001147

A Prediction Model for Uncontrolled Type 2 Diabetes Mellitus Incorporating Area-level Social Determinants of Health

Med Care. 2019 Aug;57(8):592-600. doi: 10.1097/MLR.0000000000001147.

Authors

Sanjay Basu^{1

2

3}, Rajiv Narayanaswamy⁴

Affiliations

¹ Research and Analytics, Collective Health, San Francisco, CA.
² Center for Primary Care, Harvard Medical School, Boston, MA.
³ School of Public Health, Imperial College, London, UK.
⁴ KPMG LLP, San Francisco, CA.

PMID: 31268954
DOI: 10.1097/MLR.0000000000001147

Abstract

Background: Social determinants of health (SDH) at the area level are understood to influence the likelihood of having poor glycemic control for patients with type 2 diabetes mellitus (T2DM).

Objectives: To develop a model for predicting whether a person with T2DM has uncontrolled diabetes (hemoglobin A1c ≥9%), incorporating individual and area-level (census tract) covariates.

Research design: Development and validation of machine learning models.

Subjects: Total of N=1,015,808 privately insured persons in claims data with T2DM.

Measures: C-statistic, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy.

Results: A standard logistic regression model selecting among the available individual-level covariates and area-level SDH covariates (at the census tract level) performed poorly, with a C-statistic of 0.685, sensitivity of 25.6%, specificity of 90.1%, positive predictive value of 56.9%, negative predictive value of 70.4%, and accuracy of 68.4% on a 25% held-out validation subset of the data. By contrast, machine learning models improved upon risk prediction, with the highest performance from a random forest algorithm with a C-statistic of 0.928, sensitivity of 68.5%, specificity of 94.6%, positive predictive value of 69.8%, negative predictive value of 94.3%, and accuracy of 90.6%. SDH variables alone explained 16.9% of variation in uncontrolled diabetes.

Conclusions: A predictive model developed through a machine learning approach may assist health care organizations to identify which area-level SDH data to monitor for prediction of diabetes control, for potential use in risk-adjustment and targeting.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Aged
Diabetes Mellitus, Type 2 / epidemiology*
Diabetes Mellitus, Type 2 / therapy
Female
Glycated Hemoglobin / analysis
Humans
Logistic Models
Machine Learning
Male
Middle Aged
Models, Statistical
Risk Assessment
Risk Factors
Social Determinants of Health / statistics & numerical data*

Substances

Glycated Hemoglobin A
hemoglobin A1c protein, human

Grants and funding

R21 MD012867/MD/NIMHD NIH HHS/United States