1. Introduction
The COVID-19 (coronavirus disease 2019) pandemic, an infectious disease caused by the highly transmissible coronavirus SARS-CoV-2 (Severe Acute Respiratory Syndrome—Coronavirus-2), was first detected in December 2019 in the People’s Republic of China [
1]. In Brazil, at the national level, measures such as social distancing, lockdowns, and extensive testing to track the disease were severely hampered by political bias. The number of deaths and people infected by COVID-19 increased steadily in the first months of the pandemic [
2], and the rapid spread of the virus triggered heterogeneous health and social repercussions across Brazil’s states and municipalities [
3].
São Paulo, Brazil’s most populous state (45 million inhabitants), was severely affected by COVID-19, leading the state government to declare the closure of businesses, schools, and other non-essential services. Initially, the virus spread rapidly in the capital, also called São Paulo, and its metropolitan region, where the first case of the disease was identified in Brazil. Although the capital was a hotspot for the disease, with the highest number of cases at the beginning of the pandemic, cases quickly spread to neighboring municipalities and the interior of the state, which had the highest rate in the country during this period [
3,
4].
Without a country-wide policy, measures to restrict the movement of people were adopted by some states and others by municipalities, which generated different epidemiological scenarios. The strategies adopted to control the COVID-19 pandemic in the Metropolitan Region of São Paulo (MRSP), such as social isolation, were sometimes established independently by the municipalities that comprise it, without taking into account territorial specificities and the links between them. MRSP, with 20 million inhabitants, is the largest city in the country and is characterized by a high level of interdependence between the municipalities and high levels of circulation density in the public transportation system, with millions of commuters using buses, trains, and the subway daily [
5]. Jobs, businesses, and economic opportunities are highly concentrated in the capital, the center of MRSP, where a higher-income population lives. The combination of a high degree of internal circulation density with exclusively municipal restrictions, which did not consider this metropolitan area’s territorial dynamics, may have contributed to the ineffectiveness in controlling the spread of the disease.
The inclusion of the territorial dimension as a non-pharmacological action to combat the COVID-19 pandemic is still essential today, given the risk of the emergence and re-emergence of diseases. Therefore, the spatial analysis of the spread of the disease, through geographic information systems (GIS), not only contributes to the elaboration of territorial scenarios that could have guided and enabled strategic actions but can also guide current policies that mitigate the effects of socio-territorial inequalities on health [
6]. Therefore, this study sought to identify areas with a high and low risk of incidence and mortality from COVID-19 throughout the pandemic period, from 2020 to 2022, in the MRSP, analyzing their relationship with socioeconomic and demographic variables and considering the urban territorial dynamics of a metropolis. Our hypothesis is that the use of a congested public transportation system which had not put in place any protection measures by inhabitants of peripheral regions may have influenced the dynamics of contamination of the local population through community transmission of COVID-19.
Through the development of geoinformation technologies, detailed analyses and visualizations of the propagation patterns of the COVID-19 pandemic still play an important role in understanding spatial clusters and trends in SARS-CoV-2 transmission. This article used the free software SaTScan (an abbreviation for Space and Time Scan Statistics) that can detect increased disease activity without a priori specification of the time period, geographic location, or size. As a recognized surveillance tool, the detection of “active” and “emerging” spatiotemporal clusters of COVID-19 in Brazil was mainly carried out at the municipal scale during the COVID-19 pandemic [
4,
7,
8]. Through prospective spatiotemporal scanning analysis of the disease, they assessed whether the mortality rate, the GINI index, and social inequality were predictors of the relative risk of each cluster through a Generalized Linear Model (GLM) among Brazilian municipalities [
8]. Our study is the first in Brazil to detect spatial–temporal clusters of COVID-19 cases and deaths on a more detailed scale, in areas of the São Paulo Metropolitan Region, and to assess the socioeconomic and demographic differences between them throughout the pandemic period. Despite being a retrospective study, we seek to disseminate a viable method of health surveillance that can be carried out during health emergencies and that considers the particularities of a metropolitan urban territory in local decision-making, where regional dynamics are sometimes disregarded.
3. Materials and Methods
This is an ecological and descriptive study assessing secondary data about the incidence and mortality of COVID-19 in the 633 weighting areas of the 39 municipalities that comprise the MRSP. Weighting areas (WA) are territorial units identified by sets of contiguous census sectors belonging to the same district, for the purpose of weighting the results of the population census sample questionnaire. A census sector is a territorial unit established for survey control purposes, comprising a continuous area located in a single urban or rural block, with a size and number of households that allow the survey by a census agent [
10].
Information from the period March 2020 to February 2022 about the date of notification, sex, age, disease progression, and postal code of each patient with COVID-19 who recovered or died was accessed through a partnership with the Data Center of the State of São Paulo (CDESP), which provided data from the Epidemiological Surveillance System (SIVEP-Gripe) of the State Epidemiological Surveillance Center. These data were grouped by postal code (first five digits of the postal code) and georeferenced using the postal code database of the Centro de Estudos da Metrópole (Center for Metropolitan Studies) [
11]. Linear geometries of the postal code grouping system were intersected with the weighting areas from the 2010 IBGE Census, with cases assigned proportionally to the length of the intersected lines. This study was approved by the Research Ethics Committee of the School of Psychology, Universidade de São Paulo, report number CAAE: 71605223.2.0000.5561, 14 August 2023.
The socioeconomic variables—per capita income, persons per household, and percentage of Black, Brown (mixed-race), and Indigenous people (BBIP)—by WA were built based on data from the Brazilian Institute of Geography and Statistics (IBGE), according to the 2010 census, the most recent census available. The variables were selected according to bibliographic references that analyzed the relationship between COVID-19 spread and socioeconomic factors [
6,
7,
8].
Dasymetric mapping techniques were used to analyze the population density, which subdivide areas of origin into smaller spatial units so that there is greater internal consistency of the variable being mapped [
12]. In this study, the variable of population density was calculated by dividing the number of inhabitants in WA by the total area built for residential purposes in that area. This analysis used Google Open Buildings, a large-scale open dataset that contains the vectorization of building roof contours generated from a deep learning model that was trained to determine building areas from high-resolution satellite images. Data are available under the Creative Commons Attribution license (CC BY-4.0) and the Open Data Commons Open Database License (ODbL) v1.0 [
13].
The analyses were performed using the incidence and mortality rates for COVID-19 obtained for the 633 weighted areas of the 39 municipalities of the MRSP from March 2020 to February 2021, Year 1, and from March 2021 to February 2022, Year 2. To detect spatiotemporal clusters, the SaTScan v10.0 software was used, which uses a scanning window that varies in both space and time. This window spans examining different geographic regions and periods to identify where there is an anomalous concentration of the event [
14]. Thus, the scanning window is an interval in time, a circle or an ellipse in space, or a cylinder with a circular or elliptical base in space–time, as in our study in which multiple different window sizes were used. The Poisson probability distribution model was used, which counts cases and deaths in space and time [
15]. The cluster analysis model was built with the following conditions: COVID-19 cases and deaths were grouped by month, without cluster overlap, with circular clusters; the proportion of the population considered was 10% for the spatial scanning window, calculated by the Gini index in SatScan for purely spatial analysis. This option encourages the search for smaller true clusters and can be characterized as a coefficient of population inequality [
14]. We also calculated the RR (relative risk) of COVID-19 occurrence and mortality, considering each WA and clusters in relation to the surrounding areas.
In SaTScan, the expected number of cases is estimated based on the spatial and temporal distribution of the population. In our study, no population adjustment was necessary, since the population did not vary substantially in the territory and period analyzed. However, the rates were adjusted for sex and age, as they are potential confounders for the outcome analyzed. Thus, the software calculated the expected cases in each location, taking into account the expected cases and deaths in each demographic group. This means that the expected number of cases was adjusted to reflect the age and gender structure of the population in each location, by comparing the proportion of observed cases with those that would be expected for that demographic composition. In the case of COVID-19, the risk catching of the disease was higher among the elderly, and the software adjusted the expected cases to take this into account when a location has a predominantly elderly population. This type of adjustment ensures that the clusters identified reflect a real risk, and not just differences in the demographic structure of the population [
14,
15].
Statistical tests were calculated using the likelihood ratio. The null hypothesis (H0) is that the observed number of cases is the same as the expected number. The alternative hypothesis (H1) is that the number of observed cases and deaths exceeds the expected number of cases derived from the null model. The window with the maximum likelihood is the most likely cluster, meaning that the observed data are more likely under the hypothesis that a cluster exists, indicating a possible focus of concentration of cases. SaTScan uses this measure to assess whether the number of events in a region or period is higher than expected, taking into account the overall incidence rate outside the scanning window. The likelihood ratio therefore serves as the criterion for identifying where and when clusters are present, with the cluster with the highest likelihood ratio being identified as the most likely. A
p-value is assigned to the cluster. Results with a
p-value < 0.05 using 999 Monte Carlo simulations were considered significant [
15].
After identifying the space–time clusters, we statistically compared the values of the demographic and socioeconomic variables of the group of WAs belonging to high-mortality clusters to low-mortality clusters and between high-incidence clusters. Then, we compared the values of the groups for variables using the Mann–Whitney and Kruskal–Wallis non-parametric tests for the non-normal distribution of data. The null hypothesis was that the medians and interquartile ranges for the same variable were equal, with a significance level of 5%.
SaTScan™ version 10.0.1 (Kulldorff, Harvard Medical School, Boston, MA, USA), which uses geographical coordinates [
14], was used to identify cases grouped in space–time and time. Maps with significant clusters and their relative risks from the space–time analyses were generated in QGIS 3.28. Temporal trends were obtained in SatScan. The significance level was set at
p = 0.05. R 4.3.2 for Mac was used for database manipulation and statistical analysis.
5. Discussion
Using the multidimensional point scanning method, we identified temporal and spatiotemporal clusters of case and death notifications that demonstrated that the spread of the COVID-19 pandemic did not occur randomly or homogeneously in the MRSP. Based on surveillance data, we found that a spatiotemporal pattern of incidence and risk of death from COVID-19 during the pandemic was related to social and demographic factors and to the insertion of specific locations in the dynamics of metropolitan circulation of people and goods. The significant socioeconomic differences between the clusters express that in addition to sex, age, and comorbidities, widely discussed in the literature as mortality risk variables in relation to COVID-19, social determinants and territorial relations are also variables that can explain such an impact [
4,
7,
8].
In the purely temporal analysis, four notable moments were identified during the two periods analyzed. In the first half of both 2020 and 2021, high-incidence clusters were identified, followed by the second half of each year, displaying low-risk clusters. In the first period (March 2020 to February 2021), after just over 2 months from the first recorded case, in May 2020, a high-incidence cluster with RR > 3 was detected, followed by a prominent decrease in risk, leading to a greater relaxation of control measures in the second half of 2020 [
16].
In Brazil, although with the mandatory use of masks, crowds were promoted in the pre-election and election periods in November, in addition to the reopening of businesses and the permitting of travel in the second half of 2020 [
16]. These measures, combined with the circulation of the Alpha, Gamma and Delta variants, influenced the high rate of transmission of COVID-19 in the population in early 2021 [
17]. The present study reinforces this premise by identifying that the months of March to June 2021 stood out with the highest risk of incidence and mortality from the disease. The decrease in the number of cases and deaths in the region occurred from the second half of 2021, when a first dose of the vaccine had been administered to more than 50% of the population [
17].
Vaccination began in February 2021, but as there were few doses available, priority was given to people with comorbidities and elderly people and did not consider socioeconomic and professional aspects, with the exception of prioritizing health professionals [
18]. The poorest people who live in peripheral municipalities and needed to be at work in person and made greater use of public transportation, even though they were territorially more vulnerable and exposed to the virus, were not prioritized in the vaccination process.
In the spatiotemporal analysis, this study demonstrated that the high-incidence and high-mortality clusters were concentrated in the WA of São Paulo and neighboring municipalities, indicating that the capital was an area of influence and convergence at all times during the pandemic. Studies have already shown that COVID-19 cases began in the capital, São Paulo, and that they dispersed due to spatial contiguity, shortly after the start of the pandemic in March 2020. However, the scale of analysis in these studies was intra-urban, only in the municipality of São Paulo or inter-municipal, analyzing the dispersion throughout the state of São Paulo [
3,
4,
6,
7,
19]. Our study was the first to analyze the spatiotemporal dynamics of the COVID-19 pandemic on a more detailed scale of the MRSP, WA, which is an accessible scale of spatial analysis for the entire Brazilian territory. This method pointed to a dynamic of virus dispersion that appears to be associated with an urban dynamic of regional circulation axes that involve the capital and certain neighboring municipalities.
The high-incidence spatial-temporal cluster identified between April and June 2020 in the area where Guarulhos International Airport is located, in a city neighboring São Paulo, corroborates studies that demonstrate the influence of mobility on the spread of the SARS-CoV-2 virus [
3,
4,
8]. São Paulo/Guarulhos International Airport is the largest airport in Brazil and the second busiest in Latin America in terms of the number of passengers transported and the transportation of goods [
20]. The other incidence clusters identified are in locations with a high density of people using public transportation [
21].
The public transportation system in the SPMR follows a highly radial model, structured to transport passengers from the outskirts to the center, or from the neighborhoods to the radial transportation axes [
21]. The areas with the highest mobility rates are located in the central region of the capital, while the areas with the highest immobility rates are located in the outskirts and in neighboring municipalities [
22]. These peripheral areas of the capital and neighboring municipalities are home to the majority of the population that still needed to use public transportation to attend essential services that continued to operate in person even during the implementation of control and social distancing measures [
22].
Only essential services such as food, supplies, health, banking, cleaning, and security services continued to operate in person during the pandemic [
16,
17]. Because of this, there was a reduction in the number of public transportation services to avoid economic losses for the companies providing these services [
23]. This measure was adopted to respond to the drop in the number of passengers, which accompanied the migration of activities to remote work. However, for those whose work did not allow them to stay at home, the reduction in the number of vehicles increased waiting times for trips and, at times, increased crowding, which may have favored the transmission of the novel coronavirus [
23,
24].
Our study indicates that social determinants related to income and race influenced the incidence and mortality rates of the disease and need to be considered in the continuation of studies on the relationship between the territorial process of spread of the COVID-19 pandemic and urban mobility. Social behaviors, often managed by economic subsistence needs, were decisive for the pattern of virus transmission [
25]. In the present analysis of the spatiotemporal clusters of disease incidence, there were statistically significant differences in the socioeconomic variables per capita income and percentage of BBIP among the three spatiotemporal clusters with the highest risk of incidence that were detected in the same period. This reflects a concentration of areas with a high risk of disease incidence also in an area of lower social vulnerability.
Despite the limitations of the analysis that considered the average of an area with widely fragmented social and economic conditions, this result expresses the need for analyses that seek to deepen the understanding of how complex socioeconomic dynamics intertwine with territorial dynamics and interfere in the spread of diseases. We understand that analyses of the relationship between health and social vulnerabilities need to be carried out spatially, as this relationship does not materialize homogeneously throughout the territory. Regarding COVID-19, although there were other municipalities that were equally or more vulnerable in the metropolitan region, they were not as affected as those where daily interaction with the capital was more frequent. In our study, clusters were identified in densely occupied areas and point to a pattern of disease spread that is related to income and ethnicity, as well as to the circulation dynamics of a metropolitan region.
Regarding COVID-19 mortality, our study reveals low- and high-mortality clusters at different times during the pandemic in the MRSP, in addition to significant differences in income and ethnicity between these clusters. It was shown that low-risk mortality clusters had a higher average per capita income, a lower BBIP percentage, and fewer people per household. The capital of São Paulo is very segregated along ethnic–racial lines. Although 36% of the capital population is Black, some high-income districts are almost 95% white [
26]. Generally speaking, there has been a consolidation of the districts, places, and positions of the white social classes in the most developed, rich, and urbanized areas in the center southwest quadrant of the city, while in the distant outskirts, in the favelas and in low-income housing complexes, the Black population has become increasingly concentrated [
27].
Studies show a correlation between COVID-19 mortality and socioeconomic indicators, suggesting that living conditions directly affected vulnerability to the disease, as evidenced by the impact on impoverished and Black populations [
28,
29]. In Brazil, despite the cash transfer policy adopted, called “Emergency Aid”, studies show that mortality rates increased as formal remuneration decreased, highlighting the differentiated impact of the pandemic [
30]. This situation may be a reflection of limited access to quality health services in impoverished areas [
28].
In our study, we observed that there were no low-incidence clusters in the center southwest region of the city of São Paulo, but there was a low-mortality cluster, in an area with a concentration of the highest incomes in the MRSP. In contrast, high-mortality clusters were observed in the most peripheral region of the capital São Paulo, as well as in neighboring municipalities. By integrating epidemiological models with georeferenced data and socioeconomic indicators, we analyzed how the virus spread in a complex urban environment, characterizing significant territorial disparities in incidence and mortality risk. The method employed in the present study has been widely used to detect statistically significant spatiotemporal clusters of diseases, as well as to calculate relative risks, contributing to the real-time geographic surveillance of diseases and early detection of epidemics and retrospective analysis [
3,
7,
8,
25]. We highlight the role of social inequalities interwoven with the spatial dynamics of COVID-19, detailing mortality risks in the 633 weighted areas of the MRSP. Additionally, we provide insights into how urban mobility and specific variables contributed to the spread of coronavirus infection.
The limitations of this study include the use of aggregated data from the corresponding areas, without controlling for individual patient conditions, such as chronic disease conditions. There are also inherent biases in the dataset used due to differentiated access to healthcare, as there was no mass testing to track the disease and, at times, the tests were not accessible throughout the outpatient network, which led to underreporting of cases [
31]. Another limitation was the use of information on per capita income, BBIP, and people per household from the 2010 census. Data from the 2022 census have not yet been made available on an WA scale and, therefore, our study probably underestimates changes in the socioeconomic structure even if no such changes occurred in urban aspects in the territories analyzed during this period.
The choice of analysis period influences SaTScan’s ability to detect clusters [
32]. The choice of parameters, such as the maximum temporal window for cluster detection, the Poisson distribution, and the shape of the cluster (circular or elliptical), can have a greater or lesser impact on the sensitivity of the model to detect clusters [
14]. In the COVID-19 pandemic, the temporal distribution of cases was very heterogeneous and there were outbreaks concentrated in certain months during 2020 and 2021. Our study was intentionally carried out in two periods, Year 1 and Year 2, because clusters that are evident in shorter periods may be diluted in longer periods of analysis. The incidence of COVID-19 decreased significantly in the second half of 2021, with lower rates and a more homogeneous distribution, and the software can interpret that the data are closer to what was expected, reducing the chance of identifying significant clusters.
Even though health actions are strongly associated with medical measures, the spread of the pandemic exposed the need for a territorialized reading of health problems in order to design public policies. The use of geotechnologies during the COVID-19 pandemic, both in academic publications and on information panels of health institutions, highlights the importance of such analyses in public health management. However, access to maps as an efficient means of communicating about the spread of diseases is still a challenge due to the difficulty in incorporating urban complexity and limitations in access to data and qualified labor. This study sought an epidemiological investigation model accessible to public health surveillance management and a subsequent statistical analysis of social variables to contribute to the prioritization of policies and actions to mitigate the spread and impacts of diseases.