*4.1. Data Pre-Processing Using MARS*

The importance of each of the variables has been analysed, assessing their influence on the variable to be predicted. Two statistics were used: generalised cross-validation criterion (GCV) and residual sum of squares (RSS). Both criteria results (blue and red lines) together with the mean of both results (light blue bars) are shown in Figure 7.

**Figure 7.** Variable importance analysis results, using MARS algorithm.

It is clearly evidenced that variables related to atmospheric pollutants SO2 (Industrial) and Cl− (Marine) are the most important factors, together with relative humidity, in agreement with what was previously described in the literature review. They can all be considered as independent variables, susceptible to providing the model with enough information to obtain valuable predictions.

#### *4.2. First-Year Corrosion Prediction*

The result of the supersom model is a mesh of 7 × 7 hexagonal neurons trained with the Kohonen algorithm, which provides a good representation of the sample space. The resulting trained map contains all the data in a vector structure so that the training data falls on each of the neurons (Figure 8).

**Figure 8.** Number of cases on each neuron.

Each neuron, filled or not, is represented by a codebook. These neurons are arranged in such a way that nearby neurons represent points closer to each other. Analysing the result of the average corrosion values per neuron along the mesh, it can be clearly seen how the mesh is growing towards the lower right corner. Figure 9 shows this result; the larger the circle size, the higher the average corrosion. Keeping the neighbourhood properties, a uniform behaviour is shown, which indicates good training results.

**Figure 9.** Mean corrosion values per neuron. Corrosion loss in μm per year is represented by circle size.
