*2.3. Data Preparation*

A quality and consistency check of the AIIR dataset was carried out in the first phase of the data preparation to filter out typos and false records. Inconsistent records (outliers), such as soil samples with high carbonate content and low pH, were excluded from the dataset. We then selected those records from the AIIR dataset that corresponded to agricultural parcels of intensive (i.e., high fertilizer use) cultivation. The selection was made based on the amount of fertilizers applied, and records containing at least 125 kg × ha−<sup>1</sup> of nitrogen and 30 kg × ha−<sup>1</sup> of active phosphorus input were kept. In this way, the analysis of the current assessment focused on data from intensively cultivated fields.

Winter wheat (*Triticum aestivum*), maize (*Zea mays*) and sunflowers (*Helianthus annuus*), which have been sown on up to 80% of the croplands in Hungary [46] in recent decades, were selected for the productivity evaluation. In order to establish a common basis for the analysis, the yield data of these three main crops from each parcel of the dataset were normalized to a scale of 1 to 100. For the same reason, the GPP values were also normalized to a scale of 1 to 100. Normalization was applied to all wheat, maize and sunflower yield data in the five years covered by the AIIR database and to all cropland pixels in the GPP dataset.

The AIIR database with normalized yield data and the normalized GPP dataset were integrated with the climate geodatabase into a single geodatabase using geographical coordinates as unique identifiers. The result was a georeferenced dataset created to include all soil, climate, management and yield data. Productivity analysis was carried out using information of the georeferenced pixels, including their geographical coordinates.

The GPP data, originally produced at 500 m resolution, were downscaled to 100 m resolution and normalized to values between 1 and 100. The downscaling was performed by the nearest neighbor resampling method The SRTM data, which were originally produced at 30 m resolution were generalized to 100 m resolution using the bilinear interpolation technique.

All datasets were converted to the Uniform National Projection System (EOV) to create a coherent geodatabase.
