**1. Introduction**

**\***

> In the context of raising awareness on climate change, a good understanding of urban climate phenomena is a key milestone in order to mitigate and adapt to thermal extremes within urban environments [1,2]. Cities are not only one of the main contributors to the greenhouse effect [3], but also places where many inequalities and therefore potential vulnerabilities accumulate [4–6]. Moreover, recent studies, such as those developed by Grimm et al. [7] and Youngsteadt [8], sugges<sup>t</sup> that cities could provide important insights into the socio-ecological dynamics of our near future at a global scale, thus increasing the interest for reliable urban climatic data and expanding its applications to many other disciplines.

> However, obtaining reliable climatic data within urban areas is still a challenging task due to the complexity of the urban climate. Nowadays, some of the most important advances concentrate on the modelling field [9]. Examples can be found evaluating the inter-relation between some parameters and the urban climate, such as the presence of water-bodies [10] or the emission of anthropogenic heat [11,12]. Regarding the accuracy of these numerical models, recent advances coupling urban canopy models with mesoclimatic ones have also proved their overall reliability [13,14]. However, there are still some barriers that limit their applications in other fields. For example, Computational Fluid

**Citation:** Núñez-Peiró, M.; Mavrogianni, A.; Symonds, P.; Sánchez-Guevara Sánchez, C.; Neila González, F.J. Modelling Long-Term Urban Temperatures with Less Training Data: A Comparative Study Using Neural Networks in the City of Madrid. *Sustainability* **2021**, *13*, 8143. https://doi.org/10.3390/su13158143

Academic Editor: Roberto Alonso González Lezcano

Received: 17 June 2021 Accepted: 13 July 2021 Published: 21 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Dynamics (CFD) has proved to be reliable for both building-scale models and relatively small urban areas (i.e., within a few hundred meters, [15,16]) but too computing intensive for larger domains [17,18]. Other authors, such as a Lauzet et al. [19], have highlighted that the high computational needs of high-resolution urban climate models pose a significant challenge in obtaining long-term datasets, therefore hindering their more widespread use of urban models within building energy simulations.

Conducting on-site measurement campaigns is also one of the most widespread practices towards improving urban climate knowledge [20,21]. They are still an essential component of numerical model validation processes [22]. Regarding meteorological parameters, experimental data is primarily derived from urban networks consisting of multiple sensors distributed across the city [23]. Several examples can be found in the literature for cities all around the globe, such as in Athens [24], London [25,26], Sendai [27], Szeged [28], Guangzhou [29], Kaohsiung [30], Guwahati [31], Augsburg [32], Nanjing [33,34], Rotterdam [35] or Berlin [36]. However, these urban networks are expensive to deploy and maintain, thus their use is usually constrained in time and space, limiting their suitability in long-term studies.

Other sources of experimental data might also present important drawbacks. Citizen Weather Stations (CWS) have grown exponentially in recent years [37,38] and are being used in a variety of ways, from studying the intra-urban temperature patterns [39] to complementing weather forecasts [40]. However, they require sophisticated filtering techniques and quality control procedures to manage their calibration bias, instrument errors and representativeness issues [41,42]. Mobile measurements, another widely adopted practice to study the spatial distribution of the UHI in detail, has expanded in recent years from the traditional approach of car transects [43–45] to bicycle transects [46–49] or even drone transects [50]. Despite their versatility, mobile measurements are very demanding in terms of human resources and can hardly be used to obtain time series at a fine scale (i.e., hourly). The latter is also one of the main drawbacks of remote sensing techniques, which depend on the timing of the satellite overpass, and require post-processing to address the presence of clouds and limited view angles [51].

#### *1.1. Data-Driven Approaches for Modelling Outdoor Urban Temperatures*

A widespread alternative technique for obtaining reliable and affordable long-term datasets of urban air temperatures is the development of empirical models. These models use pre-existing statistical correlations among available data to generate accurate projections without compromising their computational efficiency. Consequently, these datadriven approaches represent bespoke alternatives to more complex numerical models.

Several algorithms can be used for this purpose. A widely used technique for modelling urban temperatures is using Multiple Linear Regression (MLR), which has been tested for both temporal [52–55] and spatial predictions [56–60]. However, the increasing availability of machine learning and big data solutions is boosting the widespread use of other algorithms which, although potentially harder to interpret, are likely to improve their accuracy. Popular machine learning techniques include Support Vector Machines [61–64], Random Forest [58,60,62,65,66], or Artificial Neural Networks (ANN).

ANN seem to stand as the most popular approach for modelling the hourly evolution of outdoor urban temperatures. To the authors' knowledge, Mihalakakou et al. [67] presented the first attempt to model the outdoor temperature at an urban site using ANNs. They used the dry-bulb temperature data available from two existing meteorological stations in Athens: one located within the city (the target), and one at the outskirts (the reference site). In a follow-up study, the model was adapted for other urban sites in the same city, where they deployed a network of 23 temperature sensors across the city for 2 years [68,69].

In these early attempts to model urban temperatures using ANNs, the authors only used the air temperature from the reference site as the input. However, other researchers have explored the inclusion of additional predictors to increase model performance. The

most common ones are meteorological parameters linked with the UHI formation. Kim and Baik [70], for example, used the maximum UHI intensity of the previous day in Seoul together with wind speed, cloud cover, and relative humidity. In London, Kolokotroni et al. [71–73] used hourly air temperature, relative humidity, wind speed, cloud cover and global solar radiation. More recently, in Ontario, Demirezen et al. [74,75] used the air temperature, humidity, solar radiation, wind speed and wind direction. Other researchers have also included a time reference as an input to better capture the hourly evolution of urban temperatures. For example, Gobakis et al. [24] and Papantoniou and Kolokotsa [76] used the date in conjunction with air temperature and global solar radiation. Similarly, Heijden et al. [35] and Erdemir and Ayata [77] used the hour of the day together with other meteorological parameters. Table 1 summarizes these and other ANN studies that focused on outdoor urban temperatures and their modelling characteristics, such as the length of their datasets.

**Table 1.** Previous studies using ANN to model the outdoor air temperature in urban areas, in chronological order.


a ISO Country codes [87]. b Output of the ANN model, as declared or shown by the authors. 1 Extends further from the limits of the city, covering the surrounding regional areas. 2 Includes other cities of the same country. 3 Year not specified.

> In most of these studies, the modelling of outdoor urban air temperature time series is addressed from a common perspective: using the temperatures collected during a monitoring campaign at the urban level to train a Feed-forward Neural Network (FNN, a relatively simple type of ANN). This modelling is usually performed using data from one or several reference points, in many cases well-established meteorological observatories providing detailed and robust information on a wide range of parameters. Although this process is quite extended, it could be discussed whether other ANN topologies might

be more suitable for this purpose. Cascade Neural Networks (CNN) or Elman Neural Networks (ENN) have also been widely applied [24,72,76], the latter being simplified versions of Recurrent Neural Networks (RNN). RNNs have proved to be very effective when it comes to make forecasts, especially when Long Short-Term Memory (LSTM) is used [88]. In that sense, the work of Han et al. [86] has recently demonstrated the superiority of RNNs over FNNs for predicting outdoor urban temperatures.

However, it should be noted that the aim of most of these studies is not to make time predictions or forecasts, but to model an urban time series from a preexisting one. In other words, the purpose is to obtain an adapted version of a reference time series that already exists, being this new time series representative of a certain urban area and covering the exact same period as the data used as a reference. This simplifies the process by eliminating the time dependence of the outputs, and which justifies working with simpler neural networks, such as FNNs. In fact, and under this modelling scenario, Kolokotroni et al. [72] did not find any improvement when comparing ENNs and CNNs with FNNs.

Although empirical models are site-specific (predictions are always made for a particular urban location), they can be used to extend the temporal coverage of urban monitoring campaigns, thus potentially increasing their utility among other disciplines. And despite FNN-based models are not suitable for future projections, they are certainly useful to adapt historical records obtained outside the city to the reality of urban areas. However, there is currently a knowledge gap with regard to the amount of input data potentially needed to accurately model urban temperature time series using FNNs. Collecting experimental data is very time-consuming and resource-intensive and, while it seems a common practice to rely on one whole year of data for the training, there is no evidence that this should be a minimum requirement. This study, therefore, aims to quantify the degree to which the amount of input data needed to train FNNs can be reduced without sacrificing their accuracy. We also explore the use of the UHI intensity as an alternative output of the FNN models, instead of directly targeting the air temperature, to test the hypothesis that its lower seasonality and direct association with the input variables might help reduce the amount of required data for the training phase.

The present research is structured in three phases: first, we compared the performance of more than 5000 different FNN configurations for modelling the outdoor urban temperature (TEMP approach) and the UHI intensity (UHII approach) when trained with 12 months of data in the city of Madrid. An optimal configuration was then selected and analysed further in-depth for both approaches, including their sensitivity to input parameters. Finally, the amount of data provided during the training phase was reduced from the initial 12 months to 9, 6 and 3 months to evaluate the capacity of these models to continue producing accurate results with fewer input data.

#### **2. Materials and Methods**

#### *2.1. Study Area: The City of Madrid*

The present study focuses on the city of Madrid. Due to its size, location and climatic conditions, Madrid is characterised by a strong UHI, with nighttime UHI intensities up to 10 ◦C during calm and clear nights. During the last decades, this phenomenon has been intensively studied in the city by means of on-site measurements [89–92], remote sensing [93,94] and numerical models [95,96].

Between 2016 and 2019, a continuous monitoring campaign was carried out at 20 fixed urban sites with the aim to study the temporal patterns of the UHI in Madrid [97]. In the present study, we use part of that experimental data to define the outputs of our ANN models. More specifically, we use the hourly, dry-bulb temperature gathered at the city centre (Embajadores, see Figure 1), classified as compact midrise (LCZ 2) according to the Local Climate Zones (LCZ) scheme [98], and which registered the highest mean and nighttime UHI intensity. The data available for this study cover the period from July 2016 to September 2018 on an hourly basis (800 days or 19,200 h, in total).

All sensors used in this monitoring campaign were protected from the rain and solar radiation using a custom-made, mechanically ventilated radiation shield. They were installed in the Urban Canopy Layer (UCL) at 5–6 m above the ground, following the guidelines of the World Meteorological Organization (WMO) for urban sites [99,100]. The location of each sensor was also studied in terms of its thermal source area [101]. In that sense, the representativeness of each sensor was appraised in terms of its surroundings' homogeneity [102,103].

Quality Control (QC) procedures were also applied, consisting of a plausible value check, a time consistency check, and an internal-consistency check [104]. This analysis was complemented by a spatial consistency check [105], which analysed whether the difference between a measurement and its surroundings was too large compared to the average. For the City Centre sensor, 126 records were flagged as suspect and just three as erroneous. 72 missing values were identified due to a recording failure between the 17th and the 20th of October 2017. Both erroneous and missing values were left blank in the analysed dataset. Further details about the monitoring campaign and QC procedures can be found in [97].

In addition to the experimental data collected at the city centre, records from the nearby meteorological stations of Barajas Airport (LCZ D) and Ciudad Universitaria (LCZ 9) were used. Hourly values of dry bulb temperature, relative humidity, wind speed, wind direction and precipitation were extracted from the former, while global solar radiation was obtained from the latter. The data covered the same time period (July 2016–September 2018). Both stations are managed by the Spanish Meteorological Agency (AEMET), which complies with the requirements established by the WMO Integrated Global Observing System (WIGOS, [106,107]) regarding QC and sensor installation.

Three different types of datasets, the training, validation and the test datasets, were created. The former were used to fit and evaluate different ANN model configurations. Several training and validation datasets, which varied in length (12, 9, 6 and 3 months) and the months that they covered, were created based on almost 15 months of monitoring (July 2016–September 2017, 10,440 records/hourly measurements). All these datasets were continuous over time, and they were distributed as 80% training and 20% validation. These training and validation subsets were created by randomly sampling the data. This prevented the potential accumulation of specific events in any of these datasets (e.g., certain meteorological conditions), which could bias either the training or the validation of the models. Additionally, a test dataset was created based on the second year of recorded data (October 2017–September 2018, 8688 records/hourly measurements) to independently test the models and assess their accuracy over an entirely different year.

#### *2.2. Designing the ANNs*

Feed-forward Neural Networks (FNN) were used in this study. Although FNNs are at the baseline of supervised deep neural networks, their utility for modelling urban temperatures has been widely demonstrated in previous studies (see Section 1.1). Figure 2 outlines the two different approaches, based on two different outputs, that were adopted in this study to model urban temperatures. The first one consisted of directly targeting the air temperature at the urban site, validating its outputs with the measurements previously recorded at that location. This approach is aligned with the majority of similar studies found in the literature, and it is referred in this study as the temperature approach (TEMP approach). The second option aims at modelling the urban air temperature indirectly. In this case, the model targets the UHI intensity instead, computed as the temperature difference between the urban site (*Embajadores*) and the reference location (*Barajas Airport*, Δ TLCZ2, LCZD). The urban temperature is then derived indirectly by adding the airport temperature to the output of the model. This will be referred to as the UHII approach from this point onwards.

**Figure 1.** Distribution of the MODIFICA and AEMET networks across the city of Madrid. The data needed for the training, validation and test of the ANN model was extracted from the measurement sites in black. The classification of Madrid by Local Climate Zones, extracted from the WUDAPT database [108], is presented in the background.

**Figure 2.** Schematic representation of the two approaches used for modelling the outdoor urban temperature.

The selection of the FNN model inputs of this study were informed by previous studies in Table 1, which have identified the variables that have a strong correlation with the formation of heat islands [109,110]. They consist of six meteorological variables: dry bulb temperature (◦C), relative humidity (%), precipitation (mm), wind direction (degrees), wind speed (m/s) and global solar radiation (J/m2). The time of the day was added to these six input parameters, which was expected to reflect the daily variability of the outputs, either the temperature or the UHI intensity. Cloud cover was not used as an input parameter because the available frequency (one record every eight hours) was incompatible with the hourly frequency for the outputs. The wind speed presented strong variations at an hourly level and introduced strong oscillations in the prediction. Thus, to help avoid abrupt changes in the output, a moving average (MA) filter was applied. The use of a MA filter is a common pre-processing technique when it comes to modelling time series from data with a high variability. Examples can be found in the field of urban traffic (applying a MA to the car's acceleration [111]), atmospheric pollution (MA applied to measured PM2.5 concentration [112]) or urban climate modelling [113], the latter using a MA of order 8 (i.e., 8 h) to reduce the presence of wind gus<sup>t</sup> peaks in the dataset prior feeding their model. In this study, a MA of order 4 (4 h) was found to be sufficient to reduce the noise of the wind speed while preserving the time series trend.

All the inputs were standardized prior the FNN feeding, meaning that all variables were transformed in order to have a mean = 0 and a standard deviation=1[114,115]. A diagram of the FNN structure for both approaches can be seen in Figure 3.

**Figure 3.** Base structure of the Feed-forward Neural Network (FNN) used in this study.

#### *2.3. Comparing and Evaluating the FNNs*

Several FNN structures with different configurations were trained during the first phase of this research. Hyperparameters, such as the number of neurons per hidden layer, the activation functions, or the number of epochs, were thoroughly iterated in order to find a common, optimal configuration for both the TEMP and the UHII approach. Despite some of the tested activation functions are commonly applied for classification tasks and were not likely to give the best performance (i.e., sigmoid-like functions), they were included in the iterative process since preceding similar works made use of them [24,67]. To streamline the process and reduce the complexity of the iteration, each subsequent hidden layer adopted half the neurons of the previous one. All models initialized their weights randomly and were initially trained using 12 months of data. Each configuration was compared by iterating just one parameter (e.g., the activation functions) and leaving the others fixed, while increasing the number of neurons per hidden layer. Those parameters

that reached the best overall accuracy with the lowest number of neurons were selected. After this iterative process 5478 FNNs were trained. Table 2 summarizes the parameters used to test these configurations, as well as the ones that were finally selected. The task outlined above was performed using Python and Keras, a deep-learning library based on Tensorflow [116,117].

Once a common structure and configuration were defined, a comparative analysis of these models was carried out. First, the contribution of each input to the model output was assessed using a sensitivity analysis [114,118,119]. The 5th, 25th, 50th, 75th and 95th percentiles were used to run the sensitivity analysis for each input, while fixing the rest on their means. The time of the day was excluded from the sensitivity analysis and fixed at two different moments: noon and midnight. Next, their overall accuracy was compared for the TEMP and the UHII approach using several error metrics, such as the root mean squared error (RMSE), the median absolute deviation (MAD) or the coefficient of determination ( *R*2). Modelled results were then plotted for three different weeks to visually assess whether the modelling ability of any of these two approaches could be compromised under certain scenarios. These corresponded to a week of high atmospheric stability (and thus, strong UHI intensity), a week of high atmospheric instability (weak UHI intensity), and a week under both of these conditions.


**Table 2.** Parameters used to train and evaluate different FNNs configurations. It includes the configuration that was finally selected for both the temperature and the UHII approach.

1 The value here presented corresponds to the number of neurons contained in the first hidden layer. Each subsequent hidden layer adopts half of the value of the previous one. 2 Maximum length of the dataset. Results with shorter lengths are also presented (Table 4).

> The last step of the evaluation process consisted of modifying the amount of data provided to the neural networks during the training phase. To this end, FNN models for both the TEMP and the UHII approach were trained using 12, 9, 6 and 3 months of data, and were used to model the outdoor air temperatures for one complete year using the test dataset. The accuracy was estimated, as in the previous cases, using common error metrics. The loss of accuracy of the models trained with shorter datasets was addressed by comparing their performance with the models trained on more data, obtaining a percentage indicating the increase of error for each metric. In the case of models trained with just 3 months of data, the Mean Absolute Error (MAE) was estimated on a monthly basis to further explore its distribution along one year of modelling.
