Disentangling Climatic and Surface-Physical Drivers of the Urban Heat Island Using Explainable AI Across U.S. Cities
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Cities Area
2.2. Spatial Unit, Outcomes, and Predictors
2.3. Preprocessing and Transformations
2.4. Overview of ML Models
2.5. Model Training, Validation, and Comparison
2.6. Model Validation and Performance Assessment
2.7. Explainability (SHAP) and Effect Interpretation
3. Results
3.1. Cross-City Distributions of Tract-Level LST and SUHII
3.2. Spatial Patterns of Heat Drivers (Phoenix as an Illustrative Example)
3.3. Distributional Characteristics and Correlation Structure of Predictors
3.4. Comparative Performance of the Models for SUHII and LST
3.5. SHAP-Based Attribution of Tract-Scale Heat Drivers (SUHII vs. LST)
3.6. Heterogeneity of Driving Factors by City and Climate Zone
4. Discussion
4.1. Separating Absolute Surface Heat (LST) from Within-City Heat Inequality (SUHII)
4.2. Impervious Surface and Moisture Availability as the Dominant Drivers of SUHII
4.3. Solar Radiation Conditions SUHII Magnitude, but Is Not the Clearest Tract-Scale Differentiator
4.4. Nighttime Radiance as a Secondary Proxy for Urban Intensity
4.5. Why Water-Related Metrics Show Limited Explanatory Power
4.6. Implications for Heat Mitigation and Equity-Oriented Planning
4.7. Scope and Limits of Inference
5. Conclusions
- Generalizable performance: City-held-out nested CV showed strong geographic transferability, with ensemble tree models clearly outperforming linear and neural-network alternatives.
- Drivers of SUHII: Within the fitted models, SUHII is most strongly associated with impervious surface fraction (warmer relative anomalies) and surface moisture availability (cooler relative anomalies), with solar radiation acting as an additional but less spatially differentiating factor.
- Drivers of LST: LST is associated mainly with latitude and long-term mean summer air temperature, while local surface properties act as consistent but secondary modifications on this climatic baseline.
- Limited role of water-proximity metrics: Distance-to-water and water-area variables contribute relatively little within the fitted models once surface moisture is included, suggesting that static proximity measures do not capture water-related cooling as effectively as moisture-based indicators.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ALAND | Land Area (of census tract) |
| ALBEDO | Broadband Surface Albedo |
| AWATER | Water Area (of census tract) |
| CV | Cross-Validation |
| DNB | Day/Night Band (Visible Infrared Imaging Radiometer Suite) |
| DIST_CITY_CENTER | Distance to City Center |
| DIST_WATER | Distance to Nearest Major Water Body |
| ERA5-Land | ECMWF Reanalysis v5–Land (land-surface reanalysis product) |
| GEE | Google Earth Engine |
| GIS | Geographic Information System |
| K | Kelvin |
| LightGBM | Light Gradient Boosting Machine |
| log1p | Logarithm of (1 + x) transformation |
| LST | Census-Tract Mean Land Surface Temperature (°C) |
| MAE | Mean Absolute Error |
| ML | Machine Learning |
| MLP | Multilayer Perceptron |
| N (e.g., N = 5144) | Sample Size (number of observations) |
| NAIP | National Agriculture Imagery Program |
| NDMI | Normalized Difference Moisture Index |
| NIR | Near-Infrared |
| NLCD | National Land Cover Database |
| QA | Quality Assurance |
| QGIS | Quantum Geographic Information System |
| R2 | Coefficient of Determination |
| RF | Random Forest |
| RMSE | Root Mean Square Error |
| SR | Surface Reflectance |
| SVI | Social Vulnerability Index |
| SUHI | Surface Urban Heat Island |
| SUHII | Surface Urban Heat Island Intensity |
| SWIR1 | Shortwave Infrared 1 |
| TIGER | Topologically Integrated Geographic Encoding and Referencing (U.S. Census) |
| UHI | Urban Heat Island |
| USGS | United States Geological Survey |
| VIIRS | Visible Infrared Imaging Radiometer Suite |
| XAI | Explainable Artificial Intelligence |
| XGBoost | Extreme Gradient Boosting |
Appendix A
Appendix A.1
Appendix A.2
References
- Zhao, L.; Oppenheimer, M.; Zhu, Q.; Baldwin, J.W.; Ebi, K.L.; Bou-Zeid, E.; Guan, K.; Liu, X. Interactions between Urban Heat Islands and Heat Waves. Environ. Res. Lett. 2018, 13, 034003. [Google Scholar] [CrossRef]
- Zhao, L.; Fan, X.; Hong, T. Urban Heat Island Effect: Remote Sensing Monitoring and Assessment—Methods, Applications, and Future Directions. Atmosphere 2025, 16, 791. [Google Scholar] [CrossRef]
- Mutani, G.; Scalise, A.; Sufa, X.; Grasso, S. Synergising Machine Learning and Remote Sensing for Urban Heat Island Dynamics: A Comprehensive Modelling Approach. Atmosphere 2024, 15, 1435. [Google Scholar] [CrossRef]
- Kim, Y.; Yoo, C.; Im, J. Nighttime Satellite Land Surface Temperature for Urban Applications: Achievements, Challenges, and Future Prospects. GIScience Remote Sens. 2025, 62, 2527990. [Google Scholar] [CrossRef]
- Hashemi, F.; Adib, M. Examining Thermal Inequities: Land Surface Temperature, Social Vulnerability, and Historical Redlining in San Antonio, TX. Urban Clim. 2024, 55, 101960. [Google Scholar] [CrossRef]
- Chen, S.; Bruhn, S.; Seto, K.C. Trends in Socioeconomic Disparities in Urban Heat Exposure and Adaptation Options in Mid-Sized U.S. Cities. Remote Sens. Appl. Soc. Environ. 2024, 36, 101313. [Google Scholar] [CrossRef]
- Mallick, J.; Alqadhi, S. Explainable Artificial Intelligence Models for Proposing Mitigation Strategies to Combat Urbanization Impact on Land Surface Temperature Dynamics in Saudi Arabia. Urban Clim. 2025, 59, 102259. [Google Scholar] [CrossRef]
- Mansouri, A.; Erfani, A. Machine Learning Prediction of Urban Heat Island Severity in the Midwestern United States. Sustainability 2025, 17, 6193. [Google Scholar] [CrossRef]
- Ahmed, A.N.; AlDahoul, N.; Aziz, N.A.; Huang, Y.F.; Sherif, M.; El-Shafie, A. The Urban Heat Island Effect: A Review on Predictive Approaches Using Artificial Intelligence Models. City Environ. Interact. 2025, 28, 100234. [Google Scholar] [CrossRef]
- Snaiki, R.; Merabtine, A. Recent Advances on Machine Learning Techniques for Urban Heat Island Applications: A Review and New Horizons. Sustain. Cities Soc. 2025, 134, 106943. [Google Scholar] [CrossRef]
- Gaur, A.; Deb, C. Machine Learning Methods and Approaches for Urban Heat Island (UHI) Assessment: A Comprehensive Review. Renew. Sustain. Energy Rev. 2026, 234, 116903. [Google Scholar] [CrossRef]
- Darvishvand, L.; Kamkari, B.; Huang, M.J.; Hewitt, N.J. A Systematic Review of Explainable Artificial Intelligence in Urban Building Energy Modeling: Methods, Applications, and Future Directions. Sustain. Cities Soc. 2025, 128, 106492. [Google Scholar] [CrossRef]
- Feng, F.; Ren, Y.; Xu, C.; Jia, B.; Wu, S.; Lafortezza, R. Exploring the Non-Linear Impacts of Urban Features on Land Surface Temperature Using Explainable Artificial Intelligence. Urban Clim. 2024, 56, 102045. [Google Scholar] [CrossRef]
- Tahooni, A.; Kakroodi, A.A.; Kiavarz, M.; Mansourian, H. High-Resolution Urban LST Downscaling via Machine Learning and SHAP: A Case Study in a Rapidly Urbanizing Semi-Arid Region. Sustain. Cities Soc. 2025, 134, 106897. [Google Scholar] [CrossRef]
- Hong, T.; Yim, S.H.L.; Heo, Y. Interpreting Complex Relationships between Urban and Meteorological Factors and Street-Level Urban Heat Islands: Application of Random Forest and SHAP Method. Sustain. Cities Soc. 2025, 126, 106353. [Google Scholar] [CrossRef]
- Liao, S.; Liu, Z. Explaining and Reducing Urban Heat Islands Through Machine Learning: Evidence from New York City. Buildings 2026, 16, 186. [Google Scholar] [CrossRef]
- Zhang, Y.; Ge, J.; Bai, X.; Wang, S. Blue-Green Space Seasonal Influence on Land Surface Temperatures across Different Urban Functional Zones: Integrating Random Forest and Geographically Weighted Regression. J. Environ. Manag. 2025, 374, 123975. [Google Scholar] [CrossRef]
- Li, K.; Chen, Y.; Gao, S. Uncertainty of City-Based Urban Heat Island Intensity across 1112 Global Cities: Background Reference and Cloud Coverage. Remote Sens. Environ. 2022, 271, 112898. [Google Scholar] [CrossRef]
- Kong, G.; Peng, J.; Corcoran, J. Modelling Urban Heat Island Effects: A Global Analysis of 216 Cities Using Machine Learning Techniques. Comput. Urban Sci. 2025, 5, 18. [Google Scholar] [CrossRef]
- Azizi, S.; Azizi, T. Urban Climate Dynamics: Analyzing the Impact of Green Cover and Air Pollution on Land Surface Temperature—A Comparative Study Across Chicago, San Francisco, and Phoenix, USA. Atmosphere 2024, 15, 917. [Google Scholar] [CrossRef]
- Sheridan, S.; De Guzman, E.B.; Eisenman, D.P.; Sailor, D.J.; Parfrey, J.; Kalkstein, L.S. Increasing Tree Cover and High-Albedo Surfaces Reduces Heat-Related ER Visits in Los Angeles, CA. Int. J. Biometeorol. 2024, 68, 1603–1614. [Google Scholar] [CrossRef]
- Mejia, J.F.; Henao, J.J.; Eslami, E. Role of Clouds in the Urban Heat Island and Extreme Heat: Houston-Galveston Metropolitan Area Case. JGR Atmos. 2024, 129, e2024JD041243. [Google Scholar] [CrossRef]
- Suraj, K.C.; Chiluwal, A.; Magar, L.P.; Paudel, K. Investigating Urban Heat Islands in Miami, Florida, Utilizing Planet and Landsat Satellite Data. Atmosphere 2025, 16, 880. [Google Scholar] [CrossRef]
- De Wit, V.; Forsythe, K.W. Urban Structure Changes in Three Areas of Detroit, Michigan (2014–2018) Utilizing Geographic Object-Based Classification. Land 2023, 12, 763. [Google Scholar] [CrossRef]
- Bhatta, D. Grid-Level Spatial and Temporal Analysis of Land Surface Temperature and the Association with Land Use and Land Cover: A Case Study of Minnesota, USA Between 2013–2022. Available online: https://repository.stcloudstate.edu/gp_etds/18/ (accessed on 16 February 2026).
- Li, X.; Chakraborty, T.; Wang, G. Comparing Land Surface Temperature and Mean Radiant Temperature for Urban Heat Mapping in Philadelphia. Urban Clim. 2023, 51, 101615. [Google Scholar] [CrossRef]
- Gray, L. Remote Sensing-Based Analysis of Urban Heat Islands and Historical Housing Discrimination in Boston, MA. Bachelor’s Thesis, Dartmouth College, Hanover, NH, USA, 2024. Available online: https://digitalcommons.dartmouth.edu/geography_senior_theses/8 (accessed on 16 February 2026).
- Google Earth Engine TIGER: U.S. Census Tracts, 2020 (TIGER/2020/TRACT). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/TIGER_2020_TRACT (accessed on 16 February 2026).
- Google Earth Engine USGS National Land Cover Database (NLCD) 2019 Release (USGS/NLCD_RELEASES/2019_REL/NLCD). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2019_REL_NLCD (accessed on 16 February 2026).
- Google Earth Engine USGS Landsat 8 Level-2, Collection 2, Tier 1 (LANDSAT/LC08/C02/T1_L2). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2 (accessed on 16 February 2026).
- Google Earth Engine USGS Landsat 9 Level-2, Collection 2, Tier 1 (LANDSAT/LC09/C02/T1_L2). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC09_C02_T1_L2 (accessed on 16 February 2026).
- Google Earth Engine JRC Global Surface Water Mapping Layers, v1.4 (JRC/GSW1_4/GlobalSurfaceWater). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/JRC_GSW1_4_GlobalSurfaceWater (accessed on 16 February 2026).
- Google Earth Engine VIIRS Stray Light–Corrected Nighttime Day/Night Band Monthly Composites, Version 1 (NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_MONTHLY_V1_VCMSLCFG (accessed on 16 February 2026).
- Google Earth Engine ERA5-Land Hourly—ECMWF Climate Reanalysis (ECMWF/ERA5_LAND/HOURLY). 2024. Available online: https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_HOURLY (accessed on 16 February 2026).
- Naegeli, K.; Damm, A.; Huss, M.; Wulf, H.; Schaepman, M.; Hoelzle, M. Cross-Comparison of Albedo Products for Glacier Surfaces Derived from Airborne and Satellite (Sentinel-2 and Landsat 8) Optical Data. Remote Sens. 2017, 9, 110. [Google Scholar] [CrossRef]
- Jadhav, A.V.; Belange, K.; Gajbhiv, N.; Kumar, V.; Rahul, P.R.C.; Sudeepkumar, B.L.; Bhawar, R.L. Evaluation of the Reanalysis and Satellite Surface Solar Radiation Datasets Using Ground-Based Observations over India. Atmosphere 2025, 16, 957. [Google Scholar] [CrossRef]
- U.S. Geological Survey Landsat Collection 2 Surface Temperature 2024. Available online: https://www.usgs.gov/landsat-missions/landsat-collection-2-surface-temperature (accessed on 16 February 2026).
- Alejo-Sanchez, L.E.; Márquez-Grajales, A.; Salas-Martínez, F.; Franco-Arcega, A.; López-Morales, V.; Acevedo-Sandoval, O.A.; González-Ramírez, C.A.; Villegas-Vega, R. Missing Data Imputation of Climate Time Series: A Review. MethodsX 2025, 15, 103455. [Google Scholar] [CrossRef]
- Ayiah-Mensah, F.; Bosson-Amedenu, S.; Baah, E.M.; Addor, J.A. Advancements in Seasonal Rainfall Forecasting: A Seasonal Auto-Regressive Integrated Moving Average Model with Outlier Adjustments for Ghana’s Western Region. Sci. Afr. 2025, 28, e02632. [Google Scholar] [CrossRef]
- Dash, C.S.K.; Behera, A.K.; Dehuri, S.; Ghosh, A. An Outliers Detection and Elimination Framework in Classification Task of Data Mining. Decis. Anal. J. 2023, 6, 100164. [Google Scholar] [CrossRef]
- West, R.M. Best Practice in Statistics: The Use of Log Transformation. Ann Clin Biochem 2022, 59, 162–165. [Google Scholar] [CrossRef]
- Koukaras, P.; Tjortjis, C. Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices. AI 2025, 6, 257. [Google Scholar] [CrossRef]
- Tawakuli, A.; Havers, B.; Gulisano, V.; Kaiser, D.; Engel, T. Survey: Time-Series Data Preprocessing: A Survey and an Empirical Analysis. J. Eng. Res. 2025, 13, 674–711. [Google Scholar] [CrossRef]
- Sattar, M.U.; Dattana, V.; Hasan, R.; Mahmood, S.; Khan, H.W.; Hussain, S. Enhancing Supply Chain Management: A Comparative Study of Machine Learning Techniques with Cost–Accuracy and ESG-Based Evaluation for Forecasting and Risk Mitigation. Sustainability 2025, 17, 5772. [Google Scholar] [CrossRef]
- Olyasani, M.; Azimi, H.; Shiri, H. Robust Tree-Based Machine Learning Algorithms for Predicting Drag Anchor Performance. JMSE 2026, 14, 281. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Dormann, C.F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carré, G.; Marquéz, J.R.G.; Gruber, B.; Lafourcade, B.; Leitão, P.J.; et al. Collinearity: A Review of Methods to Deal with It and a Simulation Study Evaluating Their Performance. Ecography 2013, 36, 27–46. [Google Scholar] [CrossRef]
- Baggag, A.; Saad, Y. Deep Learning, Transformers and Graph Neural Networks: A Linear Algebra Perspective. Numer. Algorithms 2025, 100, 2095–2134. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach Learn 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Ploton, P.; Mortier, F.; Réjou-Méchain, M.; Barbier, N.; Picard, N.; Rossi, V.; Dormann, C.; Cornu, G.; Viennois, G.; Bayol, N.; et al. Spatial Validation Reveals Poor Predictive Performance of Large-Scale Ecological Mapping Models. Nat. Commun. 2020, 11, 4540. [Google Scholar] [CrossRef]
- Milà, C.; Ludwig, M.; Pebesma, E.; Tonne, C.; Meyer, H. Random Forests with Spatial Proxies for Environmental Modelling: Opportunities and Pitfalls. Geosci. Model Dev. 2024, 17, 6007–6033. [Google Scholar] [CrossRef]
- Koldasbayeva, D.; Tregubova, P.; Gasanov, M.; Zaytsev, A.; Petrovskaia, A.; Burnaev, E. Challenges in Data-Driven Geospatial Modeling for Environmental Research and Practice. Nat. Commun. 2024, 15, 10700. [Google Scholar] [CrossRef]
- Linnenbrink, J.; Milà, C.; Ludwig, M.; Meyer, H. kNNDM CV: K -Fold Nearest-Neighbour Distance Matching Cross-Validation for Map Accuracy Estimation. Geosci. Model Dev. 2024, 17, 5897–5912. [Google Scholar] [CrossRef]
- Hutengs, C.; Vohland, M. Downscaling Land Surface Temperatures at Regional Scales with Random Forest Regression. Remote Sens. Environ. 2016, 178, 127–141. [Google Scholar] [CrossRef]
- Shapley, L.S.; Snow, R.N. Basic Solutions of Discrete Games. Contrib. Theory Games 1952, 1, 27. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. Consistent Feature Attribution for Tree Ensembles. arXiv 2018, arXiv:1706.06060. [Google Scholar] [CrossRef]
- Li, Z.; Ma, J.; Jiang, F.; Zhang, S.; Tan, Y. Assessing the Impacts of Urban Morphological Factors on Urban Building Energy Modeling Based on Spatial Proximity Analysis and Explainable Machine Learning. J. Build. Eng. 2024, 85, 108675. [Google Scholar] [CrossRef]
- Seyrfar, A.; Ataei, H.; Movahedi, A.; Derrible, S. Data-Driven Approach for Evaluating the Energy Efficiency in Multifamily Residential Buildings. Pract. Period. Struct. Des. Constr. 2021, 26, 04020074. [Google Scholar] [CrossRef]
- Zhang, Y.; Teoh, B.K.; Wu, M.; Chen, J.; Zhang, L. Data-Driven Estimation of Building Energy Consumption and GHG Emissions Using Explainable Artificial Intelligence. Energy 2023, 262, 125468. [Google Scholar] [CrossRef]
- Senaviratna, N.A.M.R.; Cooray, T.M.J.A. Diagnosing Multicollinearity of Logistic Regression Model. Asian J. Probab. Stat. 2019, 5, 1–9. [Google Scholar] [CrossRef]
- Zaki, A.; Métwalli, A.; Aly, M.H.; Badawi, W.K. 5G and Beyond: Channel Classification Enhancement Using VIF-Driven Preprocessing and Machine Learning. Electronics 2023, 12, 3496. [Google Scholar] [CrossRef]
- Xi, W.-F.; Jiang, Q.-W.; Yang, A.-M. Using Stepwise Regression to Address Multicollinearity Is Not Appropriate. Int. J. Surg. 2024, 110, 3122–3123. [Google Scholar] [CrossRef]
- Esposito, A.; Pappaccogli, G.; Bozzeda, F.; Buccolieri, R. A Multi-City Statistical Modelling of Surface Urban Heat Island: Application to Italian Cities. Urban Clim. 2025, 64, 102717. [Google Scholar] [CrossRef]
- Hsu, A.; Sheriff, G.; Chakraborty, T.; Manya, D. Disproportionate Exposure to Urban Heat Island Intensity across Major US Cities. Nat. Commun. 2021, 12, 2721. [Google Scholar] [CrossRef]
- Tanoori, G.; Soltani, A.; Modiri, A. Machine Learning for Urban Heat Island (UHI) Analysis: Predicting Land Surface Temperature (LST) in Urban Environments. Urban Clim. 2024, 55, 101962. [Google Scholar] [CrossRef]
- Vahid, R.; Aly, M.H. A Comprehensive Systematic Review of Machine Learning Applications in Assessing Land Use/Cover Dynamics and Their Impact on Land Surface Temperatures. Urban Sci. 2025, 9, 234. [Google Scholar] [CrossRef]
- Galalizadeh, S.; Morrison-Saunders, A.; Horwitz, P.; Silberstein, R.; Blake, D. The Cooling Impact of Urban Greening: A Systematic Review of Methodologies and Data Sources. Urban For. Urban Green. 2024, 95, 128157. [Google Scholar] [CrossRef]
- Moncada-Morales, G.A.; Verichev, K.; López-Guerrero, R.E.; Carpio, M. A Global Review of Vegetation’s Interaction Effect on Urban Heat Mitigation Across Different Climates. Urban Sci. 2025, 9, 361. [Google Scholar] [CrossRef]
- Soltanifard, H.; Amani-Beni, M. The Cooling Effect of Urban Green Spaces as Nature-Based Solutions for Mitigating Urban Heat: Insights from a Decade-Long Systematic Review. Clim. Risk Manag. 2025, 49, 100731. [Google Scholar] [CrossRef]
- Shafizadeh-Moghadam, H.; Xu, T.; Murayama, Y. Climate-Specific Trends in Urban Land Surface Temperature: A Global Analysis of 432 Cities (2014–2024). J. Environ. Manag. 2025, 395, 127789. [Google Scholar] [CrossRef]
- Liu, S.; Li, X.; Shi, Z.; Geng, M.; Yu, G.; Hu, T. Urbanization Is Projected to Increase Local Surface Temperature by 2100. Commun Earth Env. 2025, 6, 988. [Google Scholar] [CrossRef]
- Hoang, N.-D.; Tran, V.-D.; Huynh, T.-C. From Data to Insights: Modeling Urban Land Surface Temperature Using Geospatial Analysis and Interpretable Machine Learning. Sensors 2025, 25, 1169. [Google Scholar] [CrossRef]
- Li, H.; Yang, J.; Xin, J.; Yu, W.; Ren, J.; Yu, H.; Xiao, X.; Xia, J. (Cecilia) Investigating the Effect of Urban Form on Land Surface Temperature at Block and Grid Scales Based on XGBoost-SHAP. Environ. Model. Softw. 2026, 195, 106738. [Google Scholar] [CrossRef]
- Kumar, P.; Debele, S.E.; Khalili, S.; Halios, C.H.; Sahani, J.; Aghamohammadi, N.; Andrade, M.D.F.; Athanassiadou, M.; Bhui, K.; Calvillo, N.; et al. Urban Heat Mitigation by Green and Blue Infrastructure: Drivers, Effectiveness, and Future Needs. Innovation 2024, 5, 100588. [Google Scholar] [CrossRef]
- Han, M.; Zhang, T.; Si, Z. Optimizing Urban Blue-Green Space in Climate Adaptive Planning: A Systematic Review of Threshold Value of Efficiency Thresholds. Landsc. Ecol. 2025, 40, 13. [Google Scholar] [CrossRef]
- Li, J.; Wang, L.; Xie, X.; Zhang, X. Urban Blue Spaces and Urban Heat Island Mitigation: A Bibliometric and Systematic Review of Spatiotemporal Dynamics, Morphology, and Planning Integration. Buildings 2026, 16, 834. [Google Scholar] [CrossRef]
- Li, Y.; Svenning, J.-C.; Zhou, W.; Zhu, K.; Abrams, J.F.; Lenton, T.M.; Ripple, W.J.; Yu, Z.; Teng, S.N.; Dunn, R.R.; et al. Green Spaces Provide Substantial but Unequal Urban Cooling Globally. Nat. Commun. 2024, 15, 7108. [Google Scholar] [CrossRef]
- Rao, P.; Torreggiani, D.; Tassinari, P.; Rötzer, T.; Pauleit, S.; Rahman, M.A. Do Urban Green Spaces Cool Cities Differently across Latitudes? Spatial Variability and Climatic Drivers of Vegetation-Induced Cooling. Sustain. Cities Soc. 2025, 130, 106513. [Google Scholar] [CrossRef]
- Alonzo, M.; Ibsen, P.C.; Locke, D.H. Urban Trees and Cooling: A Review of the Recent Literature (2018 to 2024). J. Arboric. Urban For. 2025, 51, 420–444. [Google Scholar] [CrossRef]















| Region | City A | City B |
|---|---|---|
| West | Phoenix, AZ: Located in the Sonoran Desert, this city experiences extremely hot summers. Rapid urban growth and extensive development have created one of the most intense urban heat island effects in the United States [20] | Los Angeles, CA: A large metropolitan region where extreme heat is an increasing public health risk, driven by dense urban development, limited tree cover, and uneven access to cooling, producing strong neighborhood-scale heat impacts [21] |
| South | Houston, TX: A warm, humid metropolitan region where urbanization raises overall heat exposure, while sea-breeze circulation and urban-enhanced clouds partially limit peak afternoon heat [22] | Miami, FL: A hot, humid city where urban heat islands are strongest in built-up areas, while vegetated zones show consistently lower surface temperatures [23] |
| Midwest | Detroit, MI: A humid continental city where population loss and widespread demolition have reshaped the urban fabric, leaving extensive vacant land and concentrated impervious surfaces in remaining built areas [24] | Minneapolis, MN: A humid continental city where heat is concentrated in built-up areas, while higher vegetation cover is associated with lower surface temperatures [25] |
| Northeast | Philadelphia, PA: A dense city where heat exposure is highest in heavily built areas and reduced where tree canopy and street shading are present [26] | Boston, MA: Dense urban areas are warmer than surrounding regions due to impervious surfaces and reduced evapotranspiration, with the urban heat island strongest during summer [27] |
| Category | Variable | Description | Unit | Data Source |
|---|---|---|---|---|
| Land Cover/Surface Properties | NDMI | Normalized Difference Moisture Index, a proxy for surface moisture and evapotranspiration | Unitless (−1 to 1) | [30,31] |
| IMPERV | Mean impervious surface fraction within each census tract | Percent (%) | [29] | |
| ALBEDO | Surface broadband albedo, representing solar reflectance | Unitless (0–1) | derived from [30,31] | |
| AWATER | Water area within the census tract | m2 | [28] | |
| DIST_WATER | Distance to the nearest major water body (coast, river, lake) | m | [32] | |
| Urban Geometry/Built Environment | VIIRS_RAD | Mean VIIRS nighttime lights radiance, used as a proxy for human activity intensity | nW·cm−2·sr−1 | [33] |
| ALAND | Land area of census tract | m2 | [28] | |
| Spatial Context/Neighborhood Effects | INTPTLAT | Latitude of census tract internal representative point | Degrees (°) | [28] |
| DIST_CITY_CENTER | Distance from the tract centroid to the city center | km | derived from [28] | |
| Climate | MEAN_SUMMER_TEMP | Long-term mean summer air temperature (climatology) | °C | [34] |
| SOLAR_RAD | Mean incoming solar radiation (long-term average) | W·m−2 | [34] | |
| Outcome Variables | LST | Mean land surface temperature aggregated to the census tract | °C | derived from [30,31] |
| SUHII | Surface Urban Heat Island Intensity | Unitless | Derived from LST |
| Variable | Equation/Definition | References |
|---|---|---|
| NDMI | , where and are Landsat 8/9 Collection-2 Level-2 surface reflectance from bands SR_B5 and SR_B6; summer median composite; tract value = mean of pixels within tract. | [3] |
| IMPERV | , where is the NLCD 2019 impervious surface fraction (%) at 30 m; tract value = mean impervious fraction within tract. | – |
| ALBEDO | , where are Landsat 8/9 surface reflectance bands (SR_B2, B4, B5, B6, B7); summer median composite; tract mean. | [35] |
| AWATER | , where is total water area within the census-tract polygon. | – |
| DIST_WATER | , where is the tract location and is the nearest pixel with JRC Global Surface Water occurrence ≥ 50%; distance computed via Euclidean distance transform and converted to meters using analysis scale. | – |
| VIIRS_RAD | , where is monthly VIIRS DNB average radiance (avg_rad); summer months filtered and median-aggregated across years; tract mean. | – |
| ALAND | , where is total land area of the census tract. | – |
| INTPTLAT | , where is the latitude (degrees) of the tract internal representative point. | – |
| DIST_CITY_CENTER | , where is the city-center point and is the tract geometry centroid; distance converted to kilometers. | – |
| MEAN_SUMMER_TEMP | , where is ERA5-Land monthly 2-m air temperature (K); averaged over summer months (2018–2024); tract mean. | – |
| SOLAR_RAD | , where is ERA5-Land surface solar radiation downwards (J m−2 per month) and is seconds per month; summer mean; tract mean. | [36] |
| LST | , where is Landsat 8/9 scaling factor; summer median composite; tract mean. | [37] |
| SUHII | , where regional mean and standard deviation are computed over the city buffer. | [14] |
| Model | SUHII—Best Hyperparameters | LST—Best Hyperparameters |
|---|---|---|
| ElasticNet | alpha = 0.083; l1_ratio = 0.275; fit_intercept = True; max_iter = 40,000; random_state = 42 | alpha = 0.0829; l1_ratio = 0.275; fit_intercept = True; max_iter = 40,000; random_state = 42 |
| MLP | hidden_layer_sizes = (256, 128); alpha = 2.15 × 10−6; learning_rate_init = 0.0003; activation = relu; solver = adam; early_stopping = True; max_iter = 2000; n_iter_no_change = 60; random_state = 42 | hidden_layer_sizes = (128, 128); alpha = 0.00316; learning_rate_init = 0.001; activation = tanh; solver = adam; early_stopping = True; max_iter = 2000; n_iter_no_change = 60; random_state = 42 |
| RF | n_estimators = 600; max_depth = None; min_samples_split = 5; min_samples_leaf = 3; max_features = sqrt; bootstrap = False; criterion = squared_error; n_jobs = 1; random_state = 42 | n_estimators = 600; max_depth = None; min_samples_split = 2; min_samples_leaf = 3; max_features = 0.6; bootstrap = True; criterion = squared_error; n_jobs = 1; random_state = 42 |
| Extra Trees | n_estimators = 1500; max_depth = 20; min_samples_split = 2; min_samples_leaf = 1; max_features = sqrt; criterion = squared_error; n_jobs = 1; random_state = 42 | n_estimators = 1500; max_depth = 12; min_samples_split = 2; min_samples_leaf = 1; max_features = 0.8; criterion = squared_error; n_jobs = 1; random_state = 42 |
| XGBoost | n_estimators = 1200; learning_rate = 0.01; max_depth = 3; subsample = 1.0; colsample_bytree = 0.7; min_child_weight = 8; reg_lambda = 1.0; reg_alpha = 0.1; objective = reg:squarederror; tree_method = hist; n_jobs = 4; random_state = 42 | n_estimators = 600; learning_rate = 0.01; max_depth = 5; subsample = 0.7; colsample_bytree = 0.7; min_child_weight = 5; reg_lambda = 1.0; reg_alpha = 0.5; objective = reg:squarederror; tree_method = hist; n_jobs = 4; random_state = 42 |
| Variable | Mean | Median | Std | Min | Q1 | Q3 | Max | Missing (%) |
|---|---|---|---|---|---|---|---|---|
| NDMI | 0.119 | 0.106 | 0.09 | −0.412 | 0.042 | 0.181 | 0.422 | 0.33 |
| IMPERV | 59.672 | 62.746 | 19.163 | 0 | 47.386 | 73.971 | 98.608 | 0.33 |
| ALBEDO | 0.17 | 0.168 | 0.027 | 0.003 | 0.156 | 0.187 | 0.278 | 0.33 |
| AWATER | 613,833 | 0 | 18,414,222 | 0 | 0 | 16,010 | 8.2 × 108 | 0.33 |
| DIST_WATER | 3045.614 | 2458.351 | 2358.01 | 0 | 1134.857 | 4489.605 | 12,264.82 | 0.33 |
| VIIRS_RAD | 48.302 | 42.46 | 31.239 | 0.821 | 27.088 | 61.25 | 289.338 | 0.33 |
| ALAND | 2,411,929 | 1,152,250 | 23,341,993 | 0 | 603,817.5 | 2,123,429 | 1.49 × 109 | 0.33 |
| INTPTLAT | 36.415 | 34.111 | 5.823 | 25.473 | 33.473 | 42.33 | 45.174 | 0.33 |
| DIST_CITY_CENTER | 11.83 | 11.988 | 5.67 | 0.071 | 7.243 | 16.632 | 52.657 | 0.33 |
| MEAN_SUMMER_TEMP | 25.124 | 24.033 | 3.452 | 21.321 | 22.692 | 27.5 | 34.826 | 2.07 |
| SOLAR_RAD | 8.895 | 8.39 | 1.148 | 7.778 | 7.911 | 10.484 | 10.673 | 2.07 |
| SUHII | 0.442 | 0.516 | 0.671 | −2.68 | 0.065 | 0.907 | 2.088 | 0.33 |
| LST | 42.498 | 42.404 | 6.465 | 17.769 | 37.611 | 46.634 | 60.824 | 0.33 |
| Predictor | VIF |
|---|---|
| VIIRS_RAD | 4.667 |
| IMPERV | 4.425 |
| NDMI | 4.092 |
| ALBEDO | 3.179 |
| INTPTLAT | 2.674 |
| MEAN_SUMMER_TEMP | 2.558 |
| ALAND | 2.332 |
| DIST_WATER | 2.247 |
| AWATER | 2.232 |
| SOLAR_RAD | 2.176 |
| DIST_CITY_CENTER | 1.898 |
| Target | Model | R2 | MAE | RMSE |
|---|---|---|---|---|
| SUHII | XGBoost | 0.879 | 0.162 | 0.213 |
| RF | 0.867 | 0.196 | 0.251 | |
| Extra Trees | 0.844 | 0.212 | 0.272 | |
| MLP | 0.826 | 0.198 | 0.255 | |
| Elastic Net | 0.788 | 0.254 | 0.316 | |
| LST | Extra Trees | 0.908 | 0.583 | 0.745 |
| RF | 0.907 | 0.570 | 0.750 | |
| Elastic Net | 0.895 | 0.626 | 0.795 | |
| XGBoost | 0.882 | 0.659 | 0.843 | |
| MLP | 0.874 | 0.680 | 0.871 |
| Study | Scale | Outcome | Method | Main Relevance to This Study |
|---|---|---|---|---|
| Li et al. [18] | 1112 global cities | SUHII | Comparative methodological analysis | Concluded that SUHII estimates are highly sensitive to background-reference definition, supporting the use of a standardized SUHII framework for cross-city comparison |
| Feng et al. [13] | Single city (Beijing) | LST | Random forest + SHAP/explainable AI | Indicated that urban thermal drivers are non-linear and can be interpreted with SHAP-based analysis |
| Tanoori et al. [66] | Single city (Shiraz) | LST | Machine-learning model comparison | Showed that ML models, especially DNN and XGB, can predict urban LST accurately |
| Kong et al. [19] | 216 global cities | UHI intensity | SVR-based machine-learning model | Concluded that harmonized multi-city machine-learning models can predict UHI intensity and identify broad cross-city drivers |
| Mansouri and Erfani [8] | Midwestern U.S. multi-state regional scale | UHI severity | Ensemble machine learning (Random Forest, XGBoost) | Indicated that ensemble models can be applied effectively across a broad regional dataset, supporting the value of regional generalization beyond single-city studies |
| Zhao et al. [2] | Systematic review | SUHI/LST | Remote-sensing monitoring and assessment review | Showed that satellite-derived LST is valuable for mapping surface thermal patterns, but it is distinct from near-surface air temperature and should not be interpreted directly as human heat exposure |
| Present study | 8 U.S. cities, tract scale | LST and SUHII | Multiple machine-learning models + nested city-held-out CV + SHAP | Separated absolute surface heat from relative within-city thermal anomaly within a standardized tract-scale cross-city framework |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Aljarrah, O.A.B.; Goulias, D. Disentangling Climatic and Surface-Physical Drivers of the Urban Heat Island Using Explainable AI Across U.S. Cities. Sustainability 2026, 18, 3694. https://doi.org/10.3390/su18083694
Aljarrah OAB, Goulias D. Disentangling Climatic and Surface-Physical Drivers of the Urban Heat Island Using Explainable AI Across U.S. Cities. Sustainability. 2026; 18(8):3694. https://doi.org/10.3390/su18083694
Chicago/Turabian StyleAljarrah, Osama A. B., and Dimitrios Goulias. 2026. "Disentangling Climatic and Surface-Physical Drivers of the Urban Heat Island Using Explainable AI Across U.S. Cities" Sustainability 18, no. 8: 3694. https://doi.org/10.3390/su18083694
APA StyleAljarrah, O. A. B., & Goulias, D. (2026). Disentangling Climatic and Surface-Physical Drivers of the Urban Heat Island Using Explainable AI Across U.S. Cities. Sustainability, 18(8), 3694. https://doi.org/10.3390/su18083694
