1. Introduction
Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. Humans are affected by over 100 types of cancers. Exposure to ionizing radiation and air pollution are listed among other carcinogenic factors. Air pollutants include oxides of sulfur, nitrogen and carbon, as well as benzene and smog-forming particles, namely PM2.5 (in which PM stands for particulate matter)—atmospheric aerosols (suspended dust) with a diameter of no more than 2.5 μm, which according to the World Health Organization are the most harmful to human health from other atmospheric pollutants, and PM10—a mixture of suspended particles with a diameter of no more than 10 μm. The composition may additionally include such toxic substances as, for example, benzopyrene, dioxins, and furans.
Air pollution occurs when harmful or excessive quantities of substances, including gases, particulates, and biological molecules, are introduced into the Earth’s atmosphere. It may cause diseases, allergies and even death in the most severe cases [
1].
For many years, studies have been conducted to examine the impact of air pollution on the risk and incidence of cancer. Canadian researchers investigated the impact of sulfur dioxide (SO
2) air pollution on mortality in breast and colon cancer patients in 20 cities in their country. It was assumed that SO
2 absorbs ultraviolet light in the region of the spectrum which is most active in forming vitamin D in the skin. Since vitamin D plays a role in reducing the risk of colon and breast cancer, high concentrations of the pollutant (acid haze) may lead to its deficiencies in exposed populations [
2]. Indeed, statistically significant positive associations were found between air pollution and age-adjusted mortality rates for colon cancer in women and men, and breast cancer in women.
Researchers from Spain (Barcelona Institute of Global Health) and the United States (American Cancer Society) carried out a large-scale epidemiological study that linked certain air pollutants to kidney, bladder, and colon diseases. The study involved over 600,000 adults in the US who were followed up for 22 years (from 1982 to 2004). A team of researchers investigated a possible association of deaths from cancer in 29 places in the country with a long-term population exposure to three types of pollutants: PM2.5, nitrogen dioxide (NO
2), and ozone (O
3). Research has shown that PM2.5 was associated with mortality from kidney and bladder cancer, and exposure to NO
2 was associated with colorectal cancer death [
3].
In 2009, the results of the long-term studies on geographical variation in cancer mortality rates were reported. Researchers from the Nutrition and Health Research Center, located in the United States, proved that there is a connection between atmospheric air pollution from fossil fuel combustion with the risk of cancer development in the digestive tract, elimination organs (esophageal and bladder), and female reproductive organs [
4].
Studies carried out at the National Statistics Institute in Spain have allowed for the designation of industries that contribute to the emission of the most dangerous compounds into the atmosphere. These include mining, paper and wood production plants, the food industry, metal production and processing, and ceramics. Further studies were then conducted to verify whether the proximity of such industries that emit pollutants into the air could be an added risk factor for colorectal cancer mortality [
5]. A summary of the past literature (only relevant studies) is presented in
Table 1.
Poland belongs to the top European Union countries when it comes to air pollution. The very poor air quality that the inhabitants of many regions of Poland inhale should be viewed not only in terms of environmental degradation, but also as a huge example of neglected development for the country. The most polluted regions, based on a World Health Organization ranking of the EU cities with the most polluted air, are Lower Silesia, Silesia, and Lesser Poland.
As of today, only the tests carried out by the Environmental Protection Inspectorate provide the most reliable information on the state of the atmospheric air in Poland, as they are subject to rules of control and measurement quality. Of note, characteristics and assessments of the quality of the environment can also be made with the use of drones. This method is an alternative to expensive photogrammetric and time-consuming field measurements. Moreover, it is characterized by high mobility (flights below the cloud level, high time resolution, and data registration even for small areas). This method applies to the study of air quality, as well as water analysis.
In Poland, the earliest studies on the correlation between air pollution and cancer risk concerned the impact of environmental pollution on the incidence of malignant neoplasms of the upper respiratory tract, mainly in the region of Silesia [
6,
7]. Further studies extended the research area to all 16 provinces, distinguishing the type of gas or pollution occurring in a given area of the country. Concurrently, the problem of cancer in Poland has been described in [
8,
9], and the statistics of cases from 1999–2015 are presented in
Figure 1.
The aim of our study was to indicate which types of cancer are characterized by the highest correlation with the emission of selected types of air pollutants. As a consequence of individual stages of research, the most important gases and pollutants influencing the selected type of cancers (malignant neoplasm of the bronchus and lung, and both the small and large intestine) were determined. This choice was dictated by the national disease statistics—these were the most common cancers.
Scientists from Taiwan conducted a very similar study to ours [
11]. Annual mean concentrations of each air pollutant were determined at 75 air quality monitoring stations, and the concentrations were extrapolated for 349 local Taiwanese administrative areas. In total, 70 correlation coefficients between cancer incidence rates and various air pollutants were calculated. A significantly positive correlation was observed between the level of PM2.5 and the cancer incidence rate after multiple testing corrections.
2. Methods
Two databases were used in this work: a database with the results of pollutant measurements (the repository of Environmental Protection Inspection) [
12] and statistics on the formation of tumors [
10]. The former contains measurements of gases and pollutants contained in the air carried out in Poland in the years 2000–2016. The list of measured substances includes SO
2, NO
2, NO
x, CO, O
3, C
6H
6, PM10, PM2.5, Pb(PM10), As(PM10), Cd(PM10), Ni(PM10), and BaP(PM10).
The Environmental Protection Inspection tests the PM10 and PM2.5 emission and content in the air using two complementary methods: the gravimetric (reference) method (approx. 250 locations) and the automatic (approx. 180 locations) method. The data is read out every hour, and it is verified in a 4-stage system: ongoing, periodic, annual, and national verification. Two types of devices are used:
Dust collectors operating on the basis of reference methodologies. Collectors are produced by the companies Comde Derenda GmbH, MCZ GmbH, and Sven Leckel.
Meters operating in online measurement mode, according to the methodology equivalent to the reference method. These meters are manufactured by the companies Envea, Grimm Aerosol Technik, PALAS GmbH, and Thermo Fisher Scientific (US).
For gaseous pollutants (CO, SO2, NO-NO2-NOx, O3, and BTEX—volatile aromatic hydrocarbons), analyzers from companies such as Teledyne API, Thermo Fisher Scientific, Envea, Horiba, Synspec B.V., AMA Instruments GmbH, and Chromatotec are used.
The second database covers epidemiological data on the formation of cancer in Poland in the years 1999–2015, with the division into provinces and counties, types of disease according to the
International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) and patient gender. It should be noted that in Poland there is an obligation to submit a Malignant Cancer Notification Card to doctors (data came from this source). The data available in the two databases are compared in
Table 2.
We then calculated the correlation between the read value of concentrations of dangerous gases (on an annual scale) and the number of cancer cases (the cancer incidence rate, determined by the number of cases or deaths per 100,000 people tested). To perform the statistical analysis, a Python code [
13] was used to estimate the Pearson correlation coefficient and the random forest regression algorithm results. The Pearson product-moment correlation coefficient is received using the NumPy package, in which the main parameters are two arrays containing multiple variables and observations (each row represents a variable, and each column a single observation of all those variables). A random forest regression algorithm is taken from Scikit-learn, which is an open source machine learning library that supports supervised and unsupervised learning. In this procedure each tree in the ensemble is built from a sample drawn with replacement from the training set. The source code used for analysis is posted on the Github repository (
https://github.com/ntusnio/CV/blob/master/Projekt%20rak.ipynb).
The Pearson correlation coefficient is a measure of the linear correlation between two variables. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.
The random forest algorithm [
14] is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, an accurate result most of the time. It is also one of the most used algorithms because of its simplicity and the fact that it can be employed for both classification and regression tasks. The second method was used to assess the qualitative impact of individual types of hazardous gases on the statistics of the formation of small and large intestine cancers. The random forest algorithm was introduced in 1995 and for research purposes its results were verified on the basis of calculating the average accuracy of 1000 calculations on a separate part of the test data. It turned out that it gives better results than the XGBoost algorithm and the Lasso method. The random forest algorithm computes qualitative effects based on a feature importance score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model (the more an attribute is used to make key decisions with decision trees, the higher its relative importance).
The latency period for selected neoplasms was not included in the analyses. For example, lung cancer can usually occur 10–40 years from the onset of exposure. In addition, smoking (active or passive) and occupational exposure to inhalation of asbestos increase the risk of lung cancer development, and the research confirmed the synergistic effect of both of these factors. In order to prevent cancer, it is also necessary to eliminate additional factors contributing to the occurrence of cancer or mesothelioma, i.e., avoiding exposure to aromatic hydrocarbons.
The reason the latency period was not included in the analyses was because there was a wide range of neoplasms (over 100 types, according to the International Classification of Diseases), each of which is characterized by a different value, but also related to individual features. The consequences of adopting a zero latency period are similar to assuming an inappropriate value for the carcinogenesis period. As a result, the level of pollutants identified in the air in a given year will be correlated with the number of cancer cases in the year with an incorrectly selected delay. Due to the complexity and diversity of the process of changes taking place in the body’s cells, leading to the formation of cancer, as well as the fact that carcinogenesis is a long-term process (the average period of development of a tumor with a diameter of 1 cm is about 5 years, although it depends on the type of tumor and tissue) it was decided to follow an approach that does not take into account the delay in cancer formation. Thus, following this assumption, the incidence of a given type of cancer was examined in a geographical area where a given type of air pollution occurs.
The analyses did not take into account any statistical methods other than correlations, and the research was limited to comparing the content of the two available databases. The limitations associated with such an approach resulted in not taking into account other factors leading to the formation of cancer, which include the presence of carcinogens, such as physical carcinogens (e.g., ultraviolet radiation), chemical carcinogens (alcohol and tobacco addiction, occupational exposure), or biological carcinogens (some viruses).
It should be added that many studies on the influence of air pollution on cancer risk have been conducted in Poland, and their results are presented, for example, in [
15,
16,
17,
18,
19].
3. Results
First, the possible correlation between air pollution and cancer formation for all provinces in Poland was verified. For each pollutant, the best correlated cancer type was identified (
Figure 2).
Next, the most important pollutant was selected, based on the summation of the correlation values (r) for all cancer cases (C00–D09). Calculations were made for the whole country (for individual provinces these were not carried out). The results of this comparison are shown in
Figure 3. The best correlation with the incidence of cancer was noted for the emission of nitrogen oxides.
Based on the results mentioned above, we focused our further analyses on nitrogen oxides and examined the correlation of their presence in the air at the monitoring sites with the formation of various types of tumors in each province.
Table 3 shows the correlations in given provinces with the type of cancer for NO
2 and NO
x emissions.
The choice of the type of cancer that were examined in detail was guided by the ratio of deaths to malignant neoplasms in Poland in the period available in the database (
Figure 4), as well as the trends of the highest increases in disease.
Of note, our findings showed that the highest rate of increase in the number of cases in Poland is related to colorectal cancer, yet the literature suggests that cancers of the bronchus and lung cause the most deaths among men and women in the country [
20].
Detailed results for intestinal cancer in correlation with air pollution are presented in
Figure 5, in which C17 refers to the small intestine and C18 to the large intestine.
The selection of provinces resulted from the reading presented in
Table 2, in which bowel cancer (C17 and C18) was best correlated with NO
x emissions.
In the last part of the analysis, we examined which contaminants have the greatest influence on cancer incidence of malignant neoplasm of the bronchus and lung (C34), and also of the small (C17) and large intestine (C18) by means of the random forest regression model. The random forest algorithm is an ensemble learning method for classification, regression, and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of prediction (regression) of the individual trees. In the simplest terms, this method is based on the best fit of the function, in which the arguments are the measured amounts of gases and pollutants, and the result is the number of cases of a given type of cancer.
In the case of lung cancer, it turned out that the basic contaminants affecting its formation are particles with a diameter of 2.5 μm or less (PM2.5). These contaminants may not be filtered by human organs, thus enabling toxic dust to penetrate into the lungs, bronchi, blood, and thus into the brain [
21].
The results of the analysis are shown in
Figure 6a–c.
The concept of feature importance refers to a class of techniques for assigning scores to input features to a predictive model that indicates the relative importance of each feature when making a prediction. It is quantitative parameter. As shown in
Figure 6c, it can be seen that the three most important air pollutants that may affect the formation of malignant colon cancer are NO
2, As (PM10), and BaP (PM10).
Finally, the average accuracy of the random forest model was calculated, but the result was not high (only 20%). This means that air pollution is not the only factor in the formation of cancer in Poland. We may suggest that factors related to human nutrition, water quality, and smoking also need to be included.
4. Discussion
The first major observation in this study was a strong relationship between the level of PM 2.5 in the air and the incidence of lung cancer. Furthermore, we showed the effect of nitrogen oxides on the formation of tumors, and in particular the correlation between the presence of NO2 in the air and the formation of colon cancer. Consequently, our data suggest that the level of NO2 in the air and compounds present in the dust (arsenic, benzo(a)pyrene) occurring in the inhaled air may have a strong influence on the incidence of colorectal cancer.
Our results are in line with a very interesting correlation study performed in Japan, which examined the factors that could have caused the geographic variation observed in the lung and large intestinal cancer morbidity in that country. Lung cancer was highly correlated with industrialization-related factors such as localization of manufacturing industries, automobile traffic, and air pollution, whereas colon cancer was correlated with the population density of workers in the tertiary industries such as services, trade, and government. A multiple regression analysis could not detect any single factor with an exceptionally strong influence on either cancer [
22].
An important problem when examining the factors contributing to the formation of specific cancer types is the proximity of residences to incinerators or hazardous waste disposal plants. The analysis of this problem was carried out in Spain and Italy. An increased cancer-related mortality in Spain was detected in the total population residing in the vicinity of these installations as a whole, and principally in the vicinity of incinerators and scrap metal/end-of-life vehicle handling facilities in particular. Special mention should be made of the results for tumors of the pleura, stomach, liver, kidney, ovary, lung, leukemia, colon/rectum, and bladder [
23].
In the Italian analysis, no association between pollution exposure from the incinerators and all-cause and cause-specific mortality outcomes was observed in men, with the exception of colon cancer. However, exposure to the incinerators was associated with cancer mortality among women, in particular for stomach, colon, liver, and breast cancer. NO
2 levels as a proxy from other pollution sources (traffic in particular) did not exert an important confounding role [
24].
The above may be of importance in relation to recent events in Poland. In the first half of 2018, nearly 70 landfill sites were burnt, and these fires may have similar effects as those mentioned in the abovementioned articles. As a result of the burning of rubber, plastic waste, and many kinds of chemical waste, poisonous and carcinogenic substances are created. Breathing polluted air increases the risk of cancer, which will pose a serious health issue in the near future.
5. Conclusions
Lung cancer is not the only cancerous threat related to air pollution. The latest research suggests that there are other cancers linked to air pollution. Nitrogen oxides have been shown to be the most strongly correlated type of gas with cancer statistics, and there are scientific grounds to attribute to it an influence on the development of serious illnesses. In Poland, the number of deaths attributed to long-term exposure to NO
2 is estimated at 1600 annually. It is worth mentioning that nitrogen oxides also harm us indirectly. They are precursors of carcinogenic compounds formed in soils that can penetrate into food. In this case, their impact on the incidence of chronic diseases and, as a consequence, on mortality is very difficult to estimate [
25].
Our study showed that:
There are strong correlations in given provinces with the type of cancer.
Based on the analysis, it was found that the formation of C17 and C18 disease (colorectal cancer) is the most strongly correlated with the emission of nitrogen oxides in the Masovia province and in the West Pomeranian region.
Analysis of the influence of the type of air pollutants on the formation of selected types of cancer showed that:
- −
for lung cancer, the release of PM2.5 pollution plays the most important role,
- −
the most important issue in colorectal cancer is the emission of nitrogen dioxide.
Emissions of nitrogen dioxide, as well as arsenic and benzoalfapiren compounds found in suspended dust, have an effect on the development of large intestinal diseases (C18).
Our study points to the need for in-depth air pollution data collection, such as with sensors and drones, to allow for further characterization and exposure assessment. Due to the non-uniform location of measurement stations, more accurate measurements of hazardous substances could be then carried out, especially when using a swarm of drones [
26,
27].