Geostatistical prediction of water lead levels in Flint, Michigan: A multivariate approach

Sci Total Environ. 2019 Jan 10:647:1294-1304. doi: 10.1016/j.scitotenv.2018.07.459. Epub 2018 Aug 1.

Abstract

Despite several environmental crises, little research has been conducted on citywide geospatial modeling of water lead levels (WLL) in public distribution systems. This paper presents the first application of multivariate geostatistics to lead in drinking water within a distribution system, specifically in Flint, Michigan. One of the key features of the Flint data is their collection through two different sampling initiatives: (i) voluntary or homeowner-driven sampling whereby concerned citizens decided to acquire a testing kit and conduct sampling on their own (10,717 sites), and (ii) State-administered sampling where data were collected bi-weekly at 809 selected sites after training of residents by technical teams (sentinel sites). These two datasets were first averaged over the 41-week sampling period and each tax parcel to attenuate sampling fluctuations and create a set of 420 tax parcels sampled by both protocols. Both variables displayed a correlation of 0.62 while their direct and cross-semivariograms showed substantial nugget effect and a long range of 7.5 km. WLLs recorded at sentinel sites and deemed more reliable by city officials were then interpolated using cokriging to account for the more densely sampled voluntary data and information on service line composition (lead, other, or unknown) available for each of 51,045 residential tax parcels. Cross-validation demonstrated the greater prediction accuracy of the multivariate geostatistical approach relative to kriging and inverse square distance weighting interpolation using only sentinel data. This general procedure is applicable to other cities with aging infrastructure where lead in drinking water is a concern.

Keywords: Cokriging; Cross-validation; Cross-variogram; Inverse square distance weighting; Voluntary sampling.