Using geographical random forest models to explore spatial patterns in the neighborhood determinants of hypertension prevalence across chicago, illinois, USA

Abstract
In the United States, the rise in hypertension prevalence has been connected to neighborhood characteristics. While various studies have found a link between neighborhood and health, they do not evaluate the relative dependence of each component in the growth of hypertension and, more significantly, how this value differs geographically (i.e., across different neighborhoods). This study ranks the contribution of ten socioeconomic neighborhood factors to hypertension prevalence in Chicago, Illinois, using multiple global and local machine learning models at the census tract level. First, we use Geographical Random Forest, a recently proposed non-linear machine learning regression method, to assess each predictive factor's spatial variation and contribution to hypertension prevalence. Then we compare GRF performance to Geographically Weighted Regression (local model), Random Forest (global model), and OLS (global model). The results indicate that GRF outperforms all models and that the importance of variables varies by census tract. Household composition is the most important factor in the Chicago tracts, while on the other hand, Housing type and Transportation is the least important factor. While the household composition is the most important determinant around north Lake Michigan, the socioeconomic condition of the neighborhood in Chicago's mid-north has the most importance on hypertension prevalence. Understanding how the importance of socioeconomic factors associated with hypertension prevalence varies spatially aids in the design and implementation of health policies based on the most critical factors identified at the local level (i.e., tract), rather than relying on broad city-level guidelines (i.e., for entire Chicago and other large cities).