• Normalized Difference Built-up Index (NDBI)

The Normalized Difference Built-up Index (NDBI) uses the NIR and SWIR bands to emphasize constructed built-up areas. It is a ratio based on mitigating the effects of terrain illumination differences as well as atmospheric effects [53,54]. A negative value of NDBI represents water bodies whereas a higher value represents build-up areas. NDBI value for vegetation is low. The following formula gives the NDBI value.

$$\text{NDBII} = \frac{(\text{SWIR1} - \text{NIR})}{(\text{SWIR1} + \text{NIR})} \tag{3}$$

For Landsat 8 data, NDBI = (Band 6 − Band 5)/(Band 6 + Band 5). This cannot be directly downloaded from the climate engine, so the individual NIR and SWIR1 bands were downloaded, then NDBI was calculated using the raster calculator tool in ArcGIS Pro. Table 4 shows the ranges of NDBI values and the corresponding build-up area classification.

**Table 4.** NDBI ranges for Build-up Area Classification.


#### *2.3. Methodology*

The workflow is divided into the following parts: (a) Meteorological Data and Land Surface Temperature Evaluation Methods, (b) LULC and LST Comparative and Correlation Analysis, (c) LST Spatiotemporal Pattern Analysis and Hotspots/Cold spots Identification, and (d) Intra-Urban Heat Island Map Generation.

The overall workflow of this methodology is shown in Figure 2. Finally, using the information obtained, data assessment and suggested area-specific mitigation strategies are provided.

**Figure 2.** Overview of the overall workflow of the study to assess the IUHI map and provide mitigation strategies.

2.3.1. Meteorological Data and Land Surface Temperature Evaluation Methods

This section focuses on the use of meteorological data collected at Port Area, Manila City, and how they are used to understand the temporal variability of air temperature, the relationship of meteorological parameters to land surface temperature during the day and night, and outdoor thermal comfort assessment.

a. Air Temperature and LST Trend and Relationship Analysis

This analysis's methodology and findings were already published by the authors in ref. [45]. There was no gap-filling technique used for missing information related to the in-situ measurements nor with the derived MODIS data specific to the meteorological data point. The in-situ data were directly taken from the weather agency which processed and prepared the data, while the MODIS data are directly downloaded from the Google earth engine. All data used were analysis-ready while any data point with a missing parameter entry was discarded and not used.

b. Outdoor Thermal Comfort Assessment

The RayMan Model was proposed by Matzarakis, a micro-scale model developed to calculate radiation fluxes in simple and complex environments [55,56]. This research used this model to assess the thermal comfort in Port Area. The scientific basis for the computations is thoroughly detailed in the Rayman Pro tool handbook [55].

Thermal indices have been developed to approximate human thermal perception [55]. In particular, Physiological Equivalent Temperature (PET) is "the air temperature at which, in a typical indoor setting (without wind and solar radiation), the energy budget of the human body is balanced with the same core and skin temperature as under the complex outdoor conditions to be assessed" [57,58].

The Thermal Comfort Assessment workflow is as follows:



**Figure 3.** RayMan Pro Graphical User Interface. Geographic data, personal data, and clothing and activity information are shown.



It should be emphasized that the data being used in this analysis are solely temporal point data from Manila City's Port Area. It is deemed that these values do not represent the entire city; therefore, meteorological data-point locations should be explored to offer a better understanding of the thermal comfort in Manila City.

#### 2.3.2. LULC Indicators and LST Evaluation Methods

This section discusses methods to evaluate satellite-derived data such as spectral indices (NDVI, NDWI, and NDBI, which are used as LULC indicators) and land surface temperature in Manila City. These methods include multivariate cluster analysis and correlation analysis.

#### a. Multivariate Cluster Analysis

Cluster analysis is a statistical method to use the values of the variables in devising a scheme for grouping the objects into classes so that similar objects are in the same class [60]. It is a multivariate method for classifying a sample of subjects (or objects) into several groups based on a set of measured characteristics, with related subjects placed in the same group.

Given that the group of values for each parameter is not known, we used the satellitederived data to group the values in each parameter (NDVI, NDWI, NDBI) together with land surface temperature (LST) and observed how each of these LULC indicators relate to LST. Specifically, since the indicator values can be used to classify land use and land cover, this is an initial step to see how the land use and land cover of different areas in Manila City relate to their thermal characteristic.

For this, the k-means algorithm as shown in Algorithm 1 was used to identify the clusters within the dataset. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs to only one group that has similar properties [61]. The k-means clustering algorithm mainly performs two tasks: (1) determine the best value for k-center points or centroids by an iterative process and (2) assign each data point to its closest k-center. Those data points which are near a particular k-center create a cluster. Hence, each cluster has data points with some commonalities, and it is away from other clusters. Shown below is the k-means clustering algorithm flow.


In this study, we used the multivariate clustering tool in ArcGIS Pro [62] to find these natural clusters of features based solely on the feature attribute values. Given the number of clusters to create, it will look for a solution where all the features within each cluster are as similar as possible, and all the clusters themselves are as different as possible. This tool utilizes unsupervised machine learning methods to determine natural clusters in the data. The classification method is considered unsupervised as they do not require a set of reclassified features to guide or train the method to find the clusters in the data. Since the tool is used to run the clustering algorithm, the following workflow was employed:


It should be noted that cluster analysis has no mechanism for differentiating between relevant and irrelevant variables. Therefore, the choice of variables included in a cluster analysis must be underpinned by conceptual considerations. This is very important because the clusters formed can be very dependent on the variables included. To see the relationship and extent of the values used in clustering, we also employed correlation analysis with the data.

#### b. LULC Indicators and LST Correlation Analysis

We use correlation analysis in addition to multivariate clustering analysis to evaluate the relationship of NDVI, NDWI, and NDBI with LST. The same method as explained in Section 2.3.1-a was used to analyze the extent and nature of the relationship between the abovementioned parameters. On the contrary, Pearson product correlation in GeoDa software was used.

#### 2.3.3. LST Spatiotemporal Pattern Analysis

In this section, we focus on analyzing the spatial and temporal pattern of Land Surface Temperature in Manila City Philippines. Since data have both spatial and temporal context, several analytical tools in the Space-Time Pattern Analysis toolset in ArcGIS Pro software [62] were used. Before doing the analysis, a space-time cube was created based on the downloaded LST raster over the period (2014 to 2021) as shown in Figure 4.

**Figure 4.** Creating a space-time cube based on yearly maximum LST from 2013 to 2022.

A time series analysis or an integrated spatial and temporal pattern analysis may be used to view and analyze spatial-temporal data using this approach. Using the prepared space-time cube as input, we perform emerging hotspot analysis and local outlier analysis to better understand the thermal situation in Manila City.

a. Emerging Hotspot Analysis

The Emerging Hot Spot Analysis tool detects statistically significant hot and cold spot patterns over time. It is used to examine land surface temperature (LST) data in Manila City to identify new, intensifying, persistent, or sporadic hot spot trends at various time-step intervals. The workflow for this is as follows:


$$G\_i^\* = \frac{\sum\_{j=1}^n w\_{i,j} x\_j - \overline{X} \sum\_{j=1}^n w\_{i,j}}{\mathcal{S} \sqrt{\frac{\left[n \sum\_{j=1}^n w\_{i,j}^2 - \left(\sum\_{j=1}^n w\_{i,j}\right)^2\right]}{n-1}}} \tag{4}$$

where *xj* is the attribute value for feature *j*, *wi*,*<sup>j</sup>* is the subscript weight between feature *i* and *j*, *n* is equal to the number of features; also:

$$\overline{X} = \frac{\sum\_{j=1}^{n} x\_j}{n} \tag{5}$$

$$S = \sqrt{\frac{\sum\_{j=1}^{n} \mathbf{x}\_j^2}{n} - \left(\overline{\mathbf{X}}\right)^2} \tag{6}$$

The *G*∗ *<sup>i</sup>* is a z-score so no further calculations are required.

**Table 6.** *G*∗ *<sup>i</sup>* statistic values for cold spot and hotspot classes at different significance levels.


The *G*∗ *<sup>i</sup>* statistic returned for each point is a z-score. The more concentrated the clustering of high values (hot spots) of LST, the bigger the z-score for statistically significant positive z-scores. The clustering of low values (called a "cold spot") of LST is stronger, the smaller the z-score is for statistically significant negative z-scores.


any trend over time. Based on the variance for the values in the point time series, the number of ties, and the number of periods, the observed sum is compared to the expected sum (zero) to determine if the difference is statistically significant. A z-score and a *p*-value are used to represent the trend for each point time series. A small *p*-value indicates that the trend is statistically significant. The sign associated with the z-score determines if the trend is an increase in point values (positive z-score) or a decrease in bin values (negative z-score).

With the resultant trend z-score and *p*-value for each location with data, and with the hot spot z-score and *p*-value for each bin, the Emerging Hot Spot Analysis tool in ArcGIS Pro categorizes each study area location as shown in Table 7 and is then reclassified as "monitor", "intervene", and "preserve". With the new classification, those categorized as diminishing, oscillating, and historical for both hot and cold spots will be reclassified as "monitor". Those with no pattern detected will be classified as "monitor" as well. On the other hand, categories such as new, consecutive, intensifying, and sporadic will have "preserve" as their new class for a cold spot and "intervene" for a hotspot.

6. An Emerging Hotspot Analysis (EHSA) Map showing areas to preserve, monitor, and intervene is generated based on the reclassification shown in Table 7.


**Table 7.** Emerging hot spot analysis trend categories, their definition, and equivalent new class.

b. Local Outlier Analysis

The Local Outlier Analysis tool identifies statistically significant clusters of high or low land surface temperature LST values as well as outliers that have values that are statistically different from their neighbors in space and time.

The workflow for this is as follows:


3. Calculate the Anselin Local Moran's *I* statistic of special association for each bin which includes a pseudo *p*-value and a CO\_Type code. The Local Moran's *I* statistic of spatial association is given as

$$I\_i = \frac{\mathfrak{x}\_i - \overline{X}}{S\_i^2} \sum\_{\substack{j=1,\ j \neq i}}^n w\_{i,j} (\mathfrak{x}\_i - \overline{X}) \tag{7}$$

where *xi* is an attribute for feature *i*, *X* is the mean corresponding attribute, *wi*,*<sup>j</sup>* is the spatial weight between features *i* and *j*, and:

$$S\_i^2 = \frac{\sum\_{j=1,\ j \neq i}^n (\mathbf{x}\_j - \overline{\mathbf{X}})^2}{n-1} \tag{8}$$

with *n* equating to the total number of features. The *zIi* score for the statistics is computed as

$$z\_{I\_i} = \frac{I\_i - E\left[I\_i\right]}{\sqrt{V\left[I\_i\right]}}\tag{9}$$

$$V[I\_i] = E\left[I\_i^2\right] - E[I\_i]^2\tag{10}$$

A positive value for *I* indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the *p*-value for the feature must be small enough for the cluster or outlier to be considered statistically significant.

In Table 8, the cluster/outlier type (CO Type) field distinguishes between a statistically significant cluster of high values (HH), a cluster of low values (LL), an outlier in which a high value is surrounded primarily by low values (HL), and an outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. This significance represents an FDR correction, which adjusts the *p*-value threshold from 0.05 to a value that better reflects the 95 percent confidence level taking into consideration multiple testing.



**Table 8.** Pixel representation of cluster and outliers based on the Anselin Local Moran's *I* statistic.


**Table 9.** Local outlier analysis trend categories, their definition, and equivalent new class.

#### 2.3.4. Intra-Urban Heat Island Map Generation

This section discusses the method of generating the intra-urban island map for Manila City, Philippines, using results from EHSA and LOA through a Suitability Analysis Model.

Figure 5 shows the overall process to produce the needed map for further assessment. The Emerging Hot Spot Analysis identifies trends in the data, such as new, intensifying, diminishing, and sporadic hot and cold spots, while the Local Outlier Analysis identifies significant clusters and outliers in the data. Through the suitability analysis, the combination of both methods ensures that locations of hot and cold spots in the city are precisely identified by eliminating outlier clusters in the final map produced. The suitability analysis model was used to combine the resulting raster map from the emerging hotspot analysis and local outlier analysis.

**Figure 5.** Overview of the Intra-Urban Heat Island (IUHI) Class of Action map generation based on EHSA and LOA maps using the suitability analysis model.

To carry out the suitability analysis, the classification classes of emerging hotspot analysis and local outlier analysis were given numerical equivalents to provide a common suitability scale.

Specifically, the following workflow was followed:


**Table 10.** Common suitability scale used to transform EHSA and LOA Classification maps.



**Table 11.** Suitability values and their equivalent IUHI Class of Action.

2.3.5. Intra-Urban Heat Island Map Assessment and Mitigation Strategies

The results in Sections 2.3.1–2.3.4 are then used to evaluate the Intra-Urban Heat Island map with the population data and urban settlement raster from the high-resolution settlement layer. Moreover, area-specific mitigation strategies will be suggested based on the visual inspection of the areas that need intervention. Possible strategies may also be taken from the identified areas to be preserved in the city. Assessment and mitigation strategies are simplified so that they serve as a basis for urban planners and policymakers for mitigation and improvement.

#### **3. Results**

*3.1. Satellite Data Retrieved from Landsat 8*

Ten distribution maps from 2013 to 2022 were obtained from Landsat 8 data through the climate engine web application. These data were further processed in ArcGIS Pro by providing an equalized histogram stretch and a specific color scheme in its symbology.
