Future-oriented tweets predict lower county-level HIV prevalence in the United States

Health Psychol. 2015 Dec:34S:1252-60. doi: 10.1037/hea0000279.

Abstract

Objective: Future orientation promotes health and well-being at the individual level. Computerized text analysis of a dataset encompassing billions of words used across the United States on Twitter tested whether community-level rates of future-oriented messages correlated with lower human immunodeficiency virus (HIV) rates and moderated the association between behavioral risk indicators and HIV.

Method: Over 150 million tweets mapped to U.S. counties were analyzed using 2 methods of text analysis. First, county-level HIV rates (cases per 100,000) were regressed on aggregate usage of future-oriented language (e.g., will, gonna). A second data-driven method regressed HIV rates on individual words and phrases.

Results: Results showed that counties with higher rates of future tense on Twitter had fewer HIV cases, independent of strong structural predictors of HIV such as population density. Future-oriented messages also appeared to buffer health risk: Sexually transmitted infection rates and references to risky behavior on Twitter were associated with higher HIV prevalence in all counties except those with high rates of future orientation. Data-driven analyses likewise showed that words and phrases referencing the future (e.g., tomorrow, would be) correlated with lower HIV prevalence.

Conclusion: Integrating big data approaches to text analysis and epidemiology with psychological theory may provide an inexpensive, real-time method of anticipating outbreaks of HIV and etiologically similar diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Forecasting
  • HIV Infections / diagnosis
  • HIV Infections / epidemiology*
  • HIV Infections / psychology*
  • Humans
  • Male
  • Prevalence
  • Risk-Taking*
  • Social Media / trends*
  • United States / epidemiology