Concentration of 222Rn in Homes

The radon concentration variable was analyzed using the 11,500 data points obtained in the various measurement campaigns mentioned above. This information was stored in the GIS database and transferred to the 10 km × 10 km cell system: the transposition of the values into the cell system was performed by calculating the arithmetic mean of the radon concentration points data (in Bq/m3) contained in each 10 km × 10 km cell. This variable therefore contains information on 5478 fields in its attribute table, corresponding to the average radon concentration (in Bq/m3) for each of the 5478 Spanish cells. The decision to use the arithmetic mean was taken because the EC-JRC suggested it in the European Radon Atlas [30–35] as the most appropriate parameter in the representation of this variable, due to the great variability of the concentrations obtained per 10 km × 10 km cell and because it is used in most epidemiological studies.

#### Exposure Rate to Terrestrial Gamma Radiation, Lithostratigraphy and Radon Potential

The information about the Spanish lithostratigraphic units was downloaded in a shape format, and so we worked with geological data of the 329 polygons and the attribute table of lithostratigraphic units provided by IGME. The data for the rates of exposure to terrestrial gamma radiation and radon potential in Spain were downloaded in a high-quality image format. These images were georeferenced to the Spanish administrative boundaries, and its polygons were later digitized in as much detail as possible (at an approximate scale of between 1:3000 and 1:5000): the 5 units with homogeneous radon level were digitized as 5 polygons, assigning them their value in Bq/m3. There were 22 terrestrial gamma radiation rates digitized as 22 polygons, assigning them their value in nGy/h. A noteworthy fact is that there are no data on exposure to terrestrial gamma radiation for the Balearic Islands or the Canary Islands, so this variable could not be taken into account when conducting the study in these areas.

As mentioned above, each of these variables in shape format contains the graphical unit/polygon (field) information stored in its attribute table, with each field being a record with information about the typology of the element or surface coverage to be analyzed (it is a homogeneous category of information). For the subsequent data analysis, it was necessary to calculate, for each variable, which was the unit or field (polygon) with the largest surface area contained in each 10 km × 10 km cell. To do this, the cell system was intersected with the variables gamma radiation, lithostratigraphies and radon potential, and the surface of the majority field was calculated in each one. In this way, each 10 km × 10 km cell was assigned the value of the field with the highest probability of occurrence in those 100 km2.

#### 2.2.3. Data Processing

The data processing was different for the input data depending on the origin of the source information: for the dependent variable (concentration of 222Rn in homes) the arithmetic mean data of the radon concentration points (in Bq/m3) contained in each 10 km × 10 km cell was transferred to the cell system. The data for the independent variables (exposure rate to terrestrial gamma radiation and lithostratigraphy), and the data for performing the comparison of the data and validating the study results (CSN radon potential) were transferred to the cell system by the generation of density maps.

The methodology of the density map creation process is shown in Figure 1 and is as follows:

**Figure 1.** Diagram of the methodology for creating the density maps.

The first step was to create a 2500 m × 2500 m dot mesh on each side, fitted to the limits of the 10 km × 10 km cells in Spain. This meant that each cell was covered homogeneously by a total of 16 points. This dot mesh allowed the extraction, for each 10 km × 10 km cell, of the points contained in each field of the study variables. The process was carried out by intersecting the dot mesh with each of the previously selected fields; in this way a series of layers were created that indicated the density of points per 10 km × 10 km cell: the minimum value (0) corresponded to the absence of that field in that cell, and the maximum value (16) was related to the total presence of that field in that cell. Thus, 22-point layers related to variable 1 (exposure rate to terrestrial gamma radiation), 329-point layers were associated with variable 2 (lithostratigraphies), and 5-point layers were created for variable 3 (CSN radon potential). Density maps were generated with each of these point layers, and fitted the limits of the 10 km × 10 km cells from the point density tool [50].

Once the density maps of each variable were generated, the relationship between the dependent variable (concentration of 222Rn in homes) and the independent variables (exposure rate to terrestrial gamma radiation and lithostratigraphy) was analyzed. In previous steps, the centroid ("x" and "y" coordinates) of the 5478 cells 10 km × 10 km had been calculated, and an identification code added to each of them. Using the extractby-points tool, the radon concentration value transferred to the cell system (dependent variable) was extracted. In the same way, with this tool, the value of the gamma rate information was extracted for each of the cells of the 22 density maps related to the fields of variable 1. The same procedure was carried out with the 329 fields of variable 2, extracting the information from the lithostratigraphic typology.

Using the data extracted, a simple linear correlation analysis was performed to check the positive or negative relationship between the two parameters for each cell. For example, in a cell with code 1 the Pearson correlation coefficient (R) of the average radon concentration is calculated, and so is the exposure rate to terrestrial gamma radiation with the field 44 nGy/h, for that same cell, the field 88 nGy/h and so on for each field of each variable. The degree of adjustment was quantified through the Pearson correlation coefficient (R), giving for each of the correlations, a value between −1 and +1. The data of these correlations were normalized into 9 categories: value1R> +0.75, value 2 (R +0.74 to +0.5), value 3 (R +0.49 to +0.25), value 4 (R +0.24 to +0.1), value 5 (R = 0), value 6 (R −0.1 to −0.24), value 7 (R −0.25 to −0.49), value 8 (R −0.50 to −0.74), and value 9 (R < −0.75). This grouping into ranges of

values facilitated the process of representing the correlations obtained between the radon concentration with respect to exposure to gamma radiation and lithostratigraphy.

#### *2.3. Development of the Relationship Maps between Independent Variables and the 222Rn Concentration in Homes and the New Radon Potential Map*

For the variable terrestrial gamma radiation exposure rate, the data were represented graphically using the cell system, which gathers the 22 fields into 5 categories, defined by their radon concentrations: 44 nGy/h correspond to 100 Bq/m3 and 89 nGy/h with 300 Bq/m3 [51]. The correlations obtained between the radon concentrations with respect to exposure to gamma radiation were represented graphically in 9 categories according to the ranges mentioned above. Similarly, the lithostratigraphy variable was represented graphically in the cell system, bringing together the 329 lithostratigraphic fields of the Iberian Peninsula, the Balearic Islands, and the Canary Islands. The correlations obtained between the radon concentrations with respect to the lithostratigraphies were also represented graphically in 9 categories.

From these two correlation maps, a new radon potential map (Radon Potential Map Calculated) was generated. The sum of the categories of both maps was represented on this calculated map, and so the numerical range of each cell was between 1 and 18: Values from 13 to 18 indicate a positive linear relationship with radon and therefore a high probability of finding high concentrations. Values from 8 to 12 indicate the absence of a relationship and therefore an average probability of finding high concentrations. Values from 1 to 7 indicate a negative linear relationship with radon and therefore a low probability of finding high concentrations.

To facilitate the interpretation of the results, and to represent the data according to their possible radon concentration range, the values were reclassified into 5 categories: Category 1 (>400 Bq/m3), Category 2 (301–400 Bq/m3), Category 3 (201–300 Bq/m3), Category 4 (101–200 Bq/m3), and Category 5 (<100 Bq/m3). The equivalences applied to the entire process are shown in Table 1 below:



The methodology used to evaluate the results was as follows: the success or failure capacity per 10 km × 10 km cell was compared for each of the variables analyzed (222Rn concentration measurements in homes, exposure rate to terrestrial gamma radiation and lithostratigraphies), from both the CSN P90 Radon Potential Map and the Radon Potential Calculated Map. It was considered a success if a cell was in the same concentration or range of values (see Table 1 equivalences) and it was considered a failure if the cell was not a match.

## **3. Results**

#### *3.1. Analysis of Variables*
