Syndromic surveillance models using Web data: the case of scarlet fever in the UK

Inform Health Soc Care. 2012 Mar;37(2):106-24. doi: 10.3109/17538157.2011.647934.

Abstract

Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.

MeSH terms

  • Consumer Health Information / statistics & numerical data
  • Disease Outbreaks
  • Humans
  • Influenza, Human / epidemiology*
  • Internet*
  • Models, Statistical
  • Population Surveillance / methods*
  • Regression Analysis
  • Scarlet Fever / epidemiology*
  • Search Engine / statistics & numerical data*
  • Syndrome
  • Time Factors
  • United Kingdom