Pollen allergies have negative impacts on health. Information about airborne pollen concentration can improve symptom management by guiding choices affecting timing of medicines and pollen exposure. Observations provide accurate pollen concentrations at point locations. However, in the contiguous United States and southern Canada (CUSSC), observations are sparse, and sampling is often seasonal, intermittent or both. Modeling pollen concentration can fill in the gaps with estimates where direct observations are unavailable and also provide much-needed forecasts. The goal of this study is to develop and evaluate statistical models that predict daily pollen concentrations using a machine learning Random Forest algorithm. To evaluate our methods, we made retrospective forecasts of four pollen types (Quercus, Cupressaceae, Ambrosia and Poaceae), each in one of four CUSSC locations. Meteorological and vegetation conditions were input to the models at city and regional scales. A data augmentation technique was investigated and found to improve model skill. Models were also developed to forecast pollen in locations where there are no observations. Forecast skill in these models were found to be greater than in previous models. Nevertheless, the skill is limited by the spatiotemporal resolution of the pollen observations.
Keywords: Allergy; Hay fever; Machine learning; Prediction; Weather.
Copyright © 2021 Elsevier B.V. All rights reserved.