Modeling approaches and performance for estimating personal exposure to household air pollution: A case study in Kenya

Abstract 

This study assessed the performance of modeling approaches to estimate personal exposure in Kenyan homes where cooking fuel combustion contributes substantially to household air pollution (HAP). We measured emissions (PM2.5 , black carbon, CO); household air pollution (PM2.5 , CO); personal exposure (PM2.5 , CO); stove use; and behavioral, socioeconomic, and household environmental characteristics (eg, ventilation and kitchen volume). We then applied various modeling approaches: a single-zone model; indirect exposure models, which combine person-location and area-level measurements; and predictive statistical models, including standard linear regression and ensemble machine learning approaches based on a set of predictors such as fuel type, room volume, and others. The single-zone model was reasonably well-correlated with measured kitchen concentrations of PM2.5 (R2  = 0.45) and CO (R2  = 0.45), but lacked precision. The best performing regression model used a combination of survey-based data and physical measurements (R2  = 0.76) and a root mean-squared error of 85 µg/m3 , and the survey-only-based regression model was able to predict PM2.5 exposures with an R2 of 0.51. Of the machine learning algorithms evaluated, extreme gradient boosting performed best, with an R2 of 0.57 and RMSE of 98 µg/m3 .