1. Introduction
Cleft lip and palate (CLP) is a congenital anomaly that affects the facial structure. CLP is the more prevalent congenital craniofacial anomaly worldwide, affecting between 0.7–1.5/1000 newly live births. The prevalence of CLP in Mexico has been estimated in 0.6 to 0.9/1000 births [
1]. The CLP has important implications for the patient and his family, including swallowing and language development. This disorder causes psychological and social afflictions, such as discrimination, low self-esteem and difficulty to interact in society [
2], as well as economic implications in terms of health care, plastic surgery, and rehabilitation. The Global Burden of Disease for CLP in 2016 was calculated at 3.4/100,000 disability-adjusted life years (DALYs) with 95% uncertainty intervals of 2.1 to 5.3 [
3].
CLP is a common congenital anomaly with complex etiology, involving genetic and environmental factors [
4]. Studies have identified associations between different health problems and environmental pollution [
5,
6,
7]. On the other hand, some types of cancer, cardiovascular and respiratory diseases, and congenital malformations have been associated with different pollutants, particularly environmental particulate matter 10 micrometers or less in diameter (PM
10), and particulate matter 2.5 micrometers or less in diameter (PM
2.5) and the prevalence of CLP [
8,
9].
The Monterrey Metropolitan Area (MMA) is one of the largest cities in Mexico and Latin America, with nearly 5 million people agglomerated in 13 municipalities. This urban area is characterized by strong industrial activity and high pollution due to production of rubber and cardboard, mining of metals and stone, manufacturing of engines and industrial machinery, among others [
10]. In addition to these industrial activities, the MMA concentrates more than 2 million vehicles, which exacerbate the air pollution affecting the health of the urban community. The local air-quality monitoring station (SIMA) showed the following data for the period 2009–2014: the 24-h air quality standard of PM
10 (75 μg/m
3) was exceeded between 224 days/year to 275 days/year and the 24-h air quality standard of PM2.5 (45 μg/m
3) was exceeded between 21 days/year and 58 days/year. Finally, the one-hour air quality of ozone (0.095 ppm) was exceeded between 36 days/year and 95 days/year [
11].
The chronic exposure to pollutants represents an important health risk for the population. The air quality data in the same period showed that the annual air quality standard of PM
10 (40 μg/m
3) was exceeded between 1.5 times to 2.5 times, depending on the zone in the MMA; and the annual air quality standard of PM
2.5 (12 μg/m
3) was exceeded two to three times. Because of this chronic population exposure, the World Health Organization qualified the MMA as one of the most polluted cities in Mexico and Latin America [
12].
The population was exposed to different concentrations of pollutants across the MMA. The chemical composition of PM
2.5 showed 50% was composed by primary components (elemental carbon, crustal material, salts, and trace metals) and secondary organic aerosols (SOA), and the other half was represented by inorganic aerosols (ammonium sulfate, ammonium nitrate) produced by different sources (refinery, industrial activity, vehicles, urban development, and wind erosion) [
13]. Different health effects may be expected for these chronic exposures. Proaire 2016–2025 published a list of industrial sources of air pollutants in Monterrey and their geographical location [
11].
This research aimed at answering the following research questions: do CLP cases present a random distribution or tend to concentrate in certain areas of the city? If so, what degrees of concentrations of cases are observed? Finally, what CLP cases concentrated in the space are associated with high pollution values? By answering these questions, this study may contribute to understand the epidemiology of congenital malformations in our environment. It will also identify spatial clusters over a continuous space, providing more empirical evidence in the Latin American context, since most of the studies that analyze the spatial distribution of congenital malformations are reported for discrete spaces; that is, the analysis units are usually polygons with geopolitical delimitations.
The research was structured as follows: literature review, description of investigations that identified the main risk factors for the CLP, emphasizing environmental contamination. Similarly, some studies addressing the spatial distribution of congenital malformations were reviewed. The nature and processing of the database and the spatial statistics techniques used are described below. The main results and limitations are analyzed at the end.
2. Environmental Risk Factors and Some Sociodemographic Characteristics of Cleft Lip and Palate (CLP)
Several reports have found a relationship between the risk of CLP and prematurity, alcohol and tobacco consumption, and drug abuse in the early stages of pregnancy [
14,
15,
16,
17,
18,
19]. Recently, Angulo et al. found that consumption of tobacco, the lack of vitamins and folic acid supplementation are significantly associated with CLP [
20]. Environmental pollution has also been associated to CLP.
Langlois et al. found an association between CLP and radon [
21]. Gonzalez et al. conducted an ecological study in Mexico and found correlations between urban environmental contamination, solid waste, life expectancy, healthcare for pregnant women and the incidence of CLP [
1]. Similarly, a study by Benitez et al. [
22] in Itapua, Paraguay, showed significant associations between congenital malformations and exposure to pesticides. The authors also found that pregnant women were exposed to this type of pollutants due to geographical proximity to agriculture areas where pesticides were dispersed. Garcia et al., also demonstrate an association between pesticides and congenital malformations [
23].
An environmental study has documented a relationship between heavy metals exposure such as lead, nickel, mercury, cadmium, among other substances, and risk of congenital malformations such as CLP [
24]. Recent studies have found evidence of the influence of environmental pollution on CLP, specifically ozone and PM
2.5 [
25]. Hwang and Jaakkola identified mothers who were exposed to air pollution during the first two months of pregnancy as having increased risk of delivering children with CLP [
26]. Likewise, Desrosiers, et al. found that exposure of pregnant women to chlorinated solvents during pregnancy was positively associated with CLP [
9].
Bentov et al. found a relationship between geographic proximity of industrial parks and congenital malformations [
8,
27]. Social exclusion could have some health implications, since groups living in marginalized areas tend to have low educational levels and poor health habits, such as smoking, drinking alcohol, being exposed to contaminants, or not taking vitamin supplements during pregnancy [
28,
29]. Some sociodemographic factors, such as social exclusion, low economic and educational level, and geographical marginalization have been related to increased incidence of CLP in Mexico [
30]. Alfwaress et al. reported a similar situation in Jordan: CLP children were born in families with low income and low educational levels [
31].
3. Method
This was an exploratory, ecological and transversal research aimed at analyzing the spatial distribution of CLP cases and its geographical association with environmental pollution in the MMA. Although this does not establish casualty relationships, the study aims at finding spatial associations between the study groups in order to know the degree of interactions between CLP and environmental pollution.
3.1. Data
Clinical information was obtained from a database of patients attending Casa Azul A.C. in the last 5 years. This non-profit medical organization is dedicated to assisting low-income patients with CLP to afford integral therapy [
40]. Inclusion criteria included all isolated CLP cases of 3 to 9 years, of either sex. The final sample was constituted by 333 cases, excluding syndromatic forms of CLP (Trisomies, van der Woude syndrome, Treacher Collins syndrome, etc.). Their geographical location was obtained by means of latitude and longitude coordinates. All patient families reported no urban mobility, living in the same house, at least during the patient gestational period, in order to know the exposure of the mother during pregnancy to environmental pollution interpolated to their location.
The geographical coordinates were processed using the Crimestat 3.2 and ArcGIS 10.4 software to calculate the degree of concentration and the spatial Clusters, using the Nearest Neighbor Index technique (NNI) and the Nearest Neighbor Hierarchical Clustering (NNHC) technique. With the ArcGIS software, spatial interpolation techniques were used to estimate values of environmental pollution over a continuous space, these techniques are detailed in the section on spatial statistics techniques.
For the catalog of polluting industries and their emissions, the Sistema Integral de Monitoreo Ambiental del Estado de Nuevo León (SIMA) [
41] was used. This system reports the polluting substances emitted by more than 300 industries from 2010 to 2015 in Nuevo Leon. (See
Table 1).
The PM
10 data were obtained from 10 environmental monitoring stations. These data were interpolated in their mean values (see
Table 2).
Likewise, a cartographic data and basic geostatistical areas shapefiles were used for the 13 municipalities that integrate the MMA. These shapefiles included data on population, education levels, health indicators, and other sociodemographic variables.
3.2. Spatial Statistic Techniques
3.2.1. Nearest Neighbor Index (NNI) Analysis
The concentration degree, the space points, and the cluster identification were calculated for industrial emissions and the CLP cases with the NNI [
42,
43]. This technique compares the mean distance of the nearest points, and matches it with an expected mean distance from a random hypothetical distribution. If the mean distance is shorter than the hypothetical mean distance, it can be assumed that data follows an agglomeration pattern. On the other hand, if the difference is higher than the hypothetical mean distance, then data follows a dispersion pattern [
44].
The Neighbor Nearest Distance (NND) is denoted as follows:
where:
= is the observed mean distance between each point and its nearest neighbor.
Denoted as:
where;
= is the expected mean distance for the points in a random distribution pattern.
where A is the minimum surface (square meters) that encloses a rectangle around all the points and N is the number of points. In general terms, the NNI is the ratio of the nearest neighbors’ distance observed between the random mean distances:
If the result is higher than 1, the pattern is dispersed, if the result is lower than 1, the pattern denotes agglomeration. If the result is closer to 0, there will be a large concentration in the cloud of points as seen in
Figure 1.
3.2.2. Nearest Neighbor Hierarchical Clustering (NNHC)
Although NNI is a technique that helps determine if a distribution of points is dispersed or agglomerated, it does not identify the location of the clusters. Therefore, the NNHC technique is the second technique used to identify agglomerations of CLP points. This technique identifies groups of points that are spatially close [
44], as shown in
Figure 2.
This first spatial agglomeration generates the first-order clusters. Then another analysis is performed on the unusually close agglomeration and produced the second order of clustering. This analysis can be prolonged until there is no more associations. Usually, this analysis is limited to third-order clusters. For the cluster identification, the selected setting was five CLP cases or more with significant space agglomeration.
3.2.3. Interpolation by Inverse Distance Weighted (IDW), Empirical Bayesian Kriging (EBK) and Kernel Density Approaches
The research addressed to analyze the spatial distribution of pollutants over a continuous space is the inverse distance weighted (IDW) interpolation, which assumes that things that are close to others are more similar than others that are far away. In order to predict a value in the space, IDW takes as reference its closest neighbors in a certain radius because neighbors who are closer to the point to be predicted will have more influence than the remote ones [
44].
According to Cañada [
45], spatial interpolation by IDW is denoted as follows:
where Z(
) is the value that predict the location (
s0),
n is the total sample points (emitting industry locations) near the point to be predicted, λ is the weighted value assigned to each point and it will be used for the prediction of values. The point values diminish with the distance, were Z(
) is the value observed in the location
. In other words, the sample points that are further away from the point to be predicted within a given radius will have less weight with respect to those that are closer.
In addition, software sets as default a maximum of 15 nearest neighbor points and a minimum of 10. The weighted point values might have other coexisting weights. For PM
10 values, the interpolation technique used was empirical Bayesian kriging (EBK), by means of the Geostatistical Analyst of ArcGIS 10.4.1, because this technique allows a better adjustment of air pollution data over a continuous space. With similar notation to Formula (5), the results were generalized by calculating the mean square error (MSE) denoted as follows:
where
is the value after the interpolation and
is the value measured at the point
. Similarly, Kernel Density was used to identify the areas of the MMA where CLP cases are intensified, which according to Kelsall & Diggle (1995) [
46] is denoted as follows:
where
g(x
j) is the density of cell
j,
is the distance between cell
j and a location of a CLP case
i,
h is the standard deviation of the normal distribution,
is a constant,
is a weight in the location of a CLP case and
is an intensity of the location of a CLP case.
3.2.4. Spatial Scan Statistics (SaTScan)
Another form to detecting spatial clusters is the spatial analysis of Kulldorf. In this analysis the reference is not the distance between points (NNI analysis), but the population at risk in a particular area [
47]. In this technique, we used the AGEBS with it population and CLP points. Finally, we compared the results of NNI and spatial scan statistics.
The SaTScan software has been used for health monitoring and to explore Clusters of disease in space, in time and space-time for congenital malformations [
48,
49,
50]. The SaTScan use a Poisson model for discrete sample. This method permits to identify high risk groups in AGEBs associated with CLP. The expected number of cases in each AGEB is calculated as:
where
c is the observed number of CLP,
p is the population of the census section of interest (AGEBS), and
C and
P are the total number of CLP and population, respectively. A relative risk of CLP for each AGEB is calculated by dividing the observed number of CLP by the expected number of CLP. The alternative hypothesis is that there is a high risk of CLP within the exploration window compared to the outside. Under the Poisson assumption, the likelihood function for a specific window is proportional to:
where
C is the total number of CLPs,
c is the observed number of CLPs within the window, and
E [c] is the expected number of CLPs within the window under the null hypothesis that there is no difference. Because the analysis is conditioned to the fact that the total number of cases observed,
C −
E [
c], is the expected number of cases outside the window.
is an indicator function, with
= 1, it is when the window has more cases than expected under the null hypothesis and 0 otherwise. The hypothesis test was carried out using 999 Monte Carlo simulations and of which a test statistic is calculated for each random repetition, as well as for the set of real data.
Log likelihood ratio (LLR) was calculated based on the difference of the incidences inside and outside the windows, and a Monte Carlo test helped to determine the statistical significance of the identified groups. A scan window with maximum LLR was considered the cluster with the highest probability, indicating that it was less likely to have happened by chance.
5. Discussion
We conducted an ecological exploratory research in order to identify distribution patterns and spatial associations of CLP cases with environmental pollutants. We found that CLP cases do not present a random distribution, measured with NNI and NNHC techniques. The results of these techniques showed several spatial coincidences, confirming the findings. The geographical or spatial identification of health conditions may be very useful in the understanding of the disease and for the creation of public policy.
CLP etiology probably does not come from a single factor, as a particular polluted ambient, but more likely from a multi-causality model. The ability to find particular geographical areas of health events may be useful to understand the ultimate cause or causes of disease and to use public policy to modulate or prevent it. CLP spatial distribution followed specific patters in the urban space. This coincides with the spatial agglomeration theory in the sense that everything is related to each other in space [
54]. The present results do not establish a direct causality, but they indicate geographical proximity between CLP cases and ambient pollutants, as mentioned by other authors [
55,
56].
Several clusters of CLP cases were associated with carbon dioxide in first- and second-order clusters detected with NNHC. In addition, Clusters 1 and 2 showed an association with modest amounts of nickel, lead, cadmium, mercury, arsenic and cyanides. All these pollutants are linked to congenital malformations.
Even given that there is a lot of knowledge of genetic mechanisms in congenital malformations like CLP, many genetic interactions with ambient pollutions are not well understood, probably because multifactorial effects produce them [
57]. Although more than 300 genes had been associated with CLP, recent research associate CLP cases with two candidate chromosomes, 17q and 11 [
58,
59], but the gene–environment interaction is still ambiguous [
60,
61]. More research is needed to understand this association. In order to study the influence of the ambient pollutants on CLP, personal exposure measures and characterization of pollutant point emissions are required. There are many variables that need to be adjusted, like tobacco or alcohol habits, socioeconomic level, medical access, vitamin consumption, and season of the year, among others [
28].
There is a spatial association of CLP cases with PM10 concentrations. The CLP second-order clusters showed a clear interaction with high PM10 levels, so the significant grouping of CLP was associated with high values of environmental pollution. This association might explain the detonation of certain health problems, especially congenital malformations. The PM10 concentration varies along the year, showing high pics (over 100 µg/dL) several times.
Limitations
For the identifications of points in the continuous space, it is necessary to have a precise location and this is not always easy to achieve. The environmental pollutants exposition were approximated to the addresses of the cases, but do not reflect the mobility of pregnant women to other locations.
In particular, this research faced the difficulty of the availability of information, because in Mexico there are no open data available on health problems of the population, particularly of congenital malformations, at a disaggregated level. The difficulty in conducting studies on a continuous space is increased by the confidentiality criteria that characterize this type of data, since performing a pattern analysis of points involves accurately identifying the location of the same. In this sense, it is clear that the data used to carry out this investigation constitutes a limitation, since data were not provided by a government agency, but by a civil association that deals with this type of congenital malformation.
6. Conclusions
Our study reviewed the spatial distribution of children with CLP and its association to environmental pollutants in MMA, which is one of the most polluted regions in Latin America. Although this study did not establish causal relationships, it showed the spatial interaction between CLP and environmental pollutants. With spatial statistical techniques, the space was treated as a continuous space and the CLP cases were individual points.
We acknowledge that the methods used for the present analysis have some limitation due to the quality of the data sources. We used a non-profit organization (Casa Azul A.C) data instead of public health data. Although we identify clusters of CLP, more precise data and statistics are needed to establish causality. This study provides a baseline description of air pollutions and CLP associations that will be important for future studies.
This research constitutes the first step to analyze the relationship between the CLP incidence from the spatial perspective with the use of open space applications. We have showed CLP agglomerations that interacted in space with different pollutants. More studies are needed to prove the interaction between the environment and the molecular biology of this disease. Our findings add important information to very few studies that have been published in Latin America.