3.1. Calibration
The growth and productivity of wheat showed high variability both among cvs and across the growing seasons sown within the same cvs (
Figure 1a,b).
Valgerardo and Latino exhibit significant reductions in growth in certain years, adversely affecting productivity. For Valgerardo, dry biomass accumulation halted in 1982, with values ranging between 4157 kg ha−1 (T5) and 5447 kg ha−1 (T8) in 1982. The grain yield behaved accordingly, with values well below 1000 kg ha−1 for all straw treatments.
The following year, a storm occurring just before harvest led to lodging of the plants and resulted in grain loss. Therefore, data from this year were excluded from the reported modelling exercise.
For Latino, dry biomass and productivity at harvest in 1992 remained below 5000 kg ha−1 and 1000 kg ha−1, respectively.
Ofanto and Appulo achieved a fair stability of growth and productivity over the growing years, with comparable values in terms of TDM (slightly higher than 10,000 kg ha−1 for both) and grain yield (around 3000 kg ha−1).
Simeto and Claudio demonstrated the highest yield potential among the cvs, as evidenced by their high productivity in certain years (peaking over 5000 kg ha−1) compared to the other cvs.
However, even for these two cvs, some growing seasons proved to be unfavourable for the growth and accumulation of biomass, with limited grain yield falling below 2000 kg ha−1 for Simeto.
Ultimately, Saragolla was the cv that provided some of the highest (4508 kg ha−1 in 2021; T2) and lowest yield values (1692 kg ha−1 in 2020; T5) even if, for the worst performances, the corresponding TDM was not especially unfavourable (from 11,723 kg ha−1 to 13,974 kg ha−1).
As with Valgerardo, a storm shortly before harvesting heavily compromised grain harvesting in 2001 (cv Ofanto) and 2018 (cv Claudio for T2 and T5 treatments). Consequently, the wheat data from these growing seasons were not considered for model parametrization.
The calibrated values achieved by trial and error for the coefficients of parameters underlying crop growth involved the following: (i) assimilation of CO
2; (ii) conversion into biomass; (iii) separation in the various organs of the plant; (iv) development of the canopy and intercepted radiation; (v) root length; (vi) senescence (
Table 1 and
Table 2).
In addition to these parameters, the coefficients of algorithms governing the simulation of evapotranspiration (
Table 2), specific partitions for each phenological phase (
Table 2), and degree days (GDD;
Table 3) for achieving the phenological stages were also modified.
For the emergence, flowering, and maturity stages, an excellent match between the observed and simulated data was achieved, both in terms of similarity of values averaged for all growing seasons and in terms of inter-annual variability (
Table 4).
Accurate calibration of crop phenology is considered the primary, basic step in the application of crop simulation models [
43]. In our modelling exercise, the emergence and flowering stages of wheat, as formalized by ARMOSA, attained the highest scores, with the latter capable of capturing both the averaged GDD to reach these phenological stages and variability across years.
ARMOSA effectively formalized GDD to reach the maturity stage, albeit with a slight penalty from the low score of EF and a moderate score of d. However, this process was well depicted by NRMSE and CRM figures.
A simulation model’s accuracy in replicating crop phenology correlates with its ability to capture the genetic variability underlying canopy development and biomass accumulation within the same framework [
44].
Biomass accumulation is linked to the amount of radiation intercepted by the leaf surface, which, in turn, is responsible for converting assimilated CO2 into carbohydrates, a cultivar-specific trait.
In light of this, the coefficients of certain algorithms governing canopy development and senescence, CO
2 conversion into dry matter, maintenance respiration, and water and temperature stress for each cultivar were adjusted to best align with the simulation of biomass accumulation based on data gathered in the LTE (see
Table 1).
Regarding phenology, the calibration phase demonstrated the proficiency of ARMOSA in faithfully replicating the total dry biomass at harvest, averaged for all soil treatments (
Table 5).
Indeed, the highest score was observed for three out of four evaluation indices, with only a negligible deviation of NRMSE from the optimal value (25.77% vs. 25%).
When evaluating ARMOSA’s response for each cropping system separately (T2, T5, and T8), a remarkable match between observed TDM and the model output was evident for T2 and T8. There was a narrow deviation from the optimal value of NRMSE for the former and a slight overestimation of the model for the latter. Nonetheless, even the output of ARMOSA in replicating T5 could be deemed satisfactory, with the best performance for EF and d, but with a slight overestimation and deviation of the simulated data compared to the observed data.
The environment of the area under investigation (Mediterranean climate) is characterized by erratic rainfall patterns, leading to prolonged drought conditions, especially during the spring–summer period.
Additionally, common agronomic practices for durum wheat in the Mediterranean area do not include irrigation. The sum of these conditions subjects the crop to extremely variable water supply and water stress among years and within the same growing season [
45,
46,
47].
Examining the ratio between the standard deviation and the mean value of TDM revealed that certain cvs were more susceptible to climatic erraticism (e.g., Valgerardo, Latino, Appio) than others (Ofanto and Appulo;
Table 6).
The variability observed in the experimental yield data was influenced by climatic variables, including rainfall, which exhibited high variability with differences of up to 400 mm across growing seasons, and temperature (especially during the flowering and grain-filling period).
Lower yields were recorded in years with lower precipitation during the crop growing period (around 300 mm in 1982 and 1992). The best performances were noted in growing seasons where precipitation ensured water inputs exceeding 430 mm, particularly in 1991, 1998, and several years ranging from 2018 to 2021.
Detrimental effects of temperatures on productivity were observed in years when grain yield was not satisfactory (i.e., 1989, 1995, 2007, and 2020). In these instances, average mean temperatures reached peaks of 24–28 °C between the beginning of flowering and the waxy maturity stage of the seed (mid-April to the second ten days of May), mainly due to anomalies in maximum temperatures (heatwaves) leading to pollination failure and/or reductions in grain mass.
Accordingly, a meticulous calibration of the crop coefficients related to the mechanisms of adaptation to temperature and rainfall pattern and any water/temperature stress (i.e., WSPar, or susceptibility of the crop to water stress; TmaxCO2, or the maximum temperature threshold for the optimal development of the crop; TOffCO2, that is, the temperature above which crop growth ceases; and KET, which represents the crop coefficient at specific phenological stages of the crop) was performed for each cv.
Among 8 cvs, ARMOSA was able to accurately replicate TDM at the end of growing season for 4 of them and produced fairly good replications for 3 cvs; there was only one cv for which the simulation was not satisfactory.
It should be noted that for Saragolla, we investigated only three growing seasons (from 2019 to 2021), leading to a limited number of observations not sufficient to optimize ARMOSA’s response for this cv.
Simeto and Valgerardo were the cvs for which ARMOSA accurately simulated both the inter-annual variability and the average TDM observed in the field, with a slight overestimation for Simeto.
For the remaining cvs, there was a mixed response; for some of them, ARMOSA was efficient in replicating the biomass accumulation at harvest, returning negligible differences between the observed and simulated mean data, but less effective in capturing the variability between the various years (see NRMSE, EF, and d for Appulo, Claudio, and Ofanto).
For other cvs, the simulations comprehensively captured the inter-annual variability (i.e., Claudio and Latino) but overestimated or underestimated the average trend of TDM.
Definitively, by analysing the response of ARMOSA in simulating TDM at harvest, it emerged that the calibration process correctly trained the cropping-system model to effectively replicate the data observed in the field across the LTE under P_30 treatments.
Thus, the correct estimate of TDM by ARMOSA and therefore of biomass incorporated in the soil was the first key point for an adequate simulation of TOC dynamics.
In previous studies, ARMOSA was calibrated and validated on a wide range of climate and soil conditions throughout Europe under conventional systems and CA, simulating TOC dynamics with very good or even excellent results [
22].
Thus, the calibration step for the TOC dynamic focused only on two parameters controlling the evolution of soil organic matter, namely Khumus (1.4 × 10−4) and CMicrobEfficiency (0.4), leaving all the other parameters unchanged.
ARMOSA replicated the dynamics of TOC quite favourably, achieving the “Good” score for all the treatments under investigation (
Table 7;
Figure 2). This result was attributed to the accurate estimation of the mean value of TOC (averaged for all treatments; 64,965 vs. 64,758 kg ha
−1,
Table 7).
While the CRM index indicated a perfect alignment of the simulated values with the measured ones, it is noteworthy that ARMOSA tended to slightly underestimate the data collected in the initial course of the LTE and then overestimate the data in the middle period of the LTE (
Figure 2).
Measuring the robustness of ARMOSA in formalizing TOC dynamics in the last part of the LTE was not possible due to the absence of soil sampling, which occurred during the validation phase (as discussed in the next section).
The high variability of measured TOC, both between consecutive years and within the same sampling (indicated by a high standard deviation), was highlighted (
Figure 2).
The source of this erraticism may be a series of conditions associated with the sampling time and sampling point. The sampling dates over the years ranged from the beginning of September to the end of November. During this period, straw could be intact (i.e., early September) or already partially degraded (i.e., late November), a state also related to the time of their burial with respect to the soil sampling. This could affect the amounts of organic matter and organic carbon in the shallow layers of soil as well as the sampling point, which could be affected by the substantial content (and dynamics) of crop residues [
48].
This might explain the diminished correspondence between the measured and simulated variability of TOC (indicated by low EF and d scores). Nevertheless, ARMOSA successfully captured the high variability of this variable between the beginning and end of the growing period, attributed to the dynamic degradation of straw.
Divergent results emerged from the simulation of grain yield (refer to
Table 8).
Although the simulated total score of yield averaged for all treatments was “Fair”, only for T2 was a good result achieved, while, for the other two treatments, the outcome was not adequate.
This pattern was consequently confirmed for the simulated yield of several cultivars. Among the eight cultivars, half did not attain a satisfactory score, three achieved a fairly good score, and only one reached the maximum score (see
Table 9). NRMSE ranged from a minimum of 24.45% for Latino to a maximum of 66.51% for Claudio. The latter had a poor fit in the calibration test with EF (−9.93) and CRM (−0.23), which were the worst among the simulated varieties. In addition to Latino, the calibration of Simeto allowed satisfactory performance in terms of EF (0.1) and d (0.77), followed by Valgerardo (0.18 and 0.83 for EF and d, respectively).
The unsatisfactory outcome for Saragolla should also be highlighted, as EF and d deviated significantly from the optimum values, despite the simulation of the mean yield aligning with observed data (CRM of 0.03).
Calibration of ARMOSA was focused on the parameters controlling the partition of the biomass between the different organs, therefore reflecting the grain and the maintenance respiration of the same (
Table 2).
The observed data showed that the grain yield was not linearly related to the biomass produced at harvest.
Several authors reported poor performance when calibrating crop simulation models on wheat yield across different sites, years, and cultivars, especially in hot–arid environments.
Specifically, some authors claimed that grain production depended on genetic coefficients that were not only site-specific [
49] but also year-specific [
50,
51].
Our results after the calibration of ARMOSA confirm what was reported by [
52], who stated that it was difficult to accurately predict the production of wheat with low levels and/or in environments characterized by high temperatures.
The simulation of grain production becomes challenging when situations of water and/or thermal stress occur during seed formation [
53].
In the climatic conditions of the experimental site, recurrent periods of low rainfall and heat waves significantly compromised the potential productivity of the crop. Additionally, the occurrence of short but intense storms and strong gusts of wind resulted in lodging of the crop. These extreme events during seed filling, which significantly impact the final yield, are rarely formalized by crop growth simulation models [
54].
Nevertheless, the 1:1 regression line depicting observed and simulated data (
Figure 3) demonstrated the commendable fitness of ARMOSA in capturing the variability of the average grain yield among cultivars (
Table 8), evidenced by an R
2 value of 0.82 and an angular coefficient of 1.06.
The calibration procedure involved an intricate adjustment of parameters underlying crop growth, encompassing CO2 assimilation, biomass conversion, organ separation, canopy development, intercepted radiation, root length, and senescence. Phenological stages, including emergence, flowering, and maturity, achieved an excellent match between observed and simulated data, underscoring the importance of accurate calibration in capturing genetic variability.
Beyond phenology, the calibration phase scrutinized biomass accumulation and cultivar-specific adaptation to environmental stressors. Different cultivars exhibited varying susceptibility to climatic erraticism, necessitating meticulous calibration of crop coefficients related to temperature, rainfall patterns, and water/temperature stress. ARMOSA demonstrated varying success in replicating total dry biomass at the end of the growing season for different cultivars, reflecting the intricacies of cultivar-specific responses to environmental variations.
The simulation of grain yield emerged as a challenging aspect, with ARMOSA demonstrating a tendency to slightly overestimate yields and exhibiting broader sensitivity to climate patterns than the actual plant dynamics. The nonlinear relationship between grain yield and biomass produced at harvest added an extra layer of complexity to the calibration process. Despite these challenges, ARMOSA presented a commendable ability to capture the variability of average grain yield among cultivars, demonstrating the model’s aptitude in predicting wheat productivity under diverse conditions.
3.2. Validation
ARMOSA’s performance in simulating phenology remained consistent during validation, achieving maximum scores for emergence and flowering.
While the formalization of maturity stage did not attain the same degree of excellence (EF of −1.05 and CRM of 0.46), ARMOSA closely aligned with the observed mean values (156 days vs. 155 days;
Table 10).
Indication on the reliability of ARMOSA in replicating the productivity of the cvs (Simeto, Claudio, and Saragolla) throughout the validation process were drawn from the results of the 1:1 regression (
Table 11).
The average value of the grain yield of Claudio was aligned between the model output and the observed data (4300 kg ha−1 vs. 4392 kg ha−1). Although the standard deviation was much higher in ARMOSA than in the LTE data, the model reasonably captured the observed variability among years (see dispersion around the 1:1 regression line). What turned out to be off scale were the outcomes related to a single growing season for NT and MT, in which the simulated values (8154 kg ha−1, as mean) were much higher than the observed productivity (4565 kg ha−1).
For Saragolla, ARMOSA was inclined to slightly underestimate the actual yield (β = 0.92), but with an excellent fit between simulated and observed data (R2 = 0.99), even if the compared growing seasons numbered only two for a total of four yield productivity figures.
For Simeto, the overestimation of grain production by ARMOSA was around 24% (3267 kg ha−1 vs. 4416 kg ha−1). As for Claudio, a very high inconsistency between the output and the actual grain yield was observed for one growing season (2349 kg ha−1 vs. 5919 kg ha−1 as mean), but Simeto definitively proved to be the most difficult cv for ARMOSA to predict (although not dramatically) in the validation phase.
Evaluating ARMOSA overall for NT and MT treatments, the tendency of the model to slightly overestimate (+10%) the observed grain productivity was highlighted, to which was added the larger variability generated by the model, as computed by the coefficient of variation (ratio between the standard deviation and the mean), which was approximately 35% for ARMOSA and 26% for the LTE.
Examining ARMOSA’s overall performance for different treatments highlighted a tendency to slightly overestimate yield (+10%), coupled with increased variability compared to observed plant dynamics (CV, defined as the ratio of the standard deviation to the mean equal to 34% for ARMOSA and 26% for the LTE).
Testing the response of ARMOSA in formalizing TOC (
Figure 4b), it emerged how the model responded differently to the two soil treatments (NT and MT) and how the outputs aligned with what was observed during the LTE.
Indeed, in the LTE, TOC went from about 51,000 kg ha−1 at the beginning of the experimental test (2002) to 63,200 kg ha−1 in NT and 55,800 kg ha−1 in MT, respectively, in 2020.
ARMOSA did not go far from the observed data, returning TOC values of 63,045 kg ha−1 and 65,247 kg ha−1 for NT and MT, respectively, for 2020.
This contrasts with the results of comparing the simulated and observed data for some of the several experimental years (i.e., 2015 and 2019), in which substantial differences were recorded between ARMOSA outputs and actual soil TOC content.
This is because TOC determined by laboratory analysis is strongly affected by the organic substances deriving from the total or partial degradation of crop residues, the content of which can be extremely variable depending on the sampling point [
48].
This also explains the extreme variability of the figures (see standard deviation in
Figure 4b) observed for each sampling, both in NT and in MT.
Concerning TOC, what has been achieved represents a judicious compromise between the performances obtained in the calibration and validation steps, as optimizing one phase over the other would have diminished the overall modelling capacity of the model with respect to TOC.
The TOC pattern in MT, though showing an increase, suggests a more moderate impact on carbon sequestration than NT. Some soil disturbance in MT may accelerate decomposition, but the overall effect remained positive for organic carbon accumulation. This resulted in an annual increase in TOC ranging from 114 kg ha
−1 to 290 kg ha
−1 when simulating CA practices such as NT and MT. Similar findings were also indicated by other long-term modelling exercises with ARMOSA [
22], where, under current climatic conditions, the TOC increase reached up to 320 kg ha
−1.
Although simulations for all T2, T5, and T8 options resulted in an increase in TOC over the course of the wheat monocropping, the latter two showed slightly better performance than T2 during the steady-state phase.
The limited positive impact of water and nitrogen additions to straw on the dynamics of TOC accumulation in T5 and T8, as indicated by ARMOSA simulations, may be ascribable to the timing of water and nitrogen supply.
In the simulations, mirroring the experimental conditions, mineral nitrogen and water were applied during the summer period, characterized in the study environment by very high daytime temperatures (up to 40 °C). These conditions promoted water evaporation and reduced the activity of microorganisms involved in the mineralization of organic matter (with a correlated reduction in nitrogen supply), thus mitigating a more disruptive effect on TOC accumulation in the soil.
The better TOC dynamics (even if not dramatic) in T5 and T8, according to ARMOSA simulations, favoured a slightly higher yield performance than T2.
This indicates that soil health, using soil organic carbon as an indicator, also promotes an improvement in crop productivity. This is due to a more gradual release and increased availability of nitrogen from the soil to the crop. Following ARMOSA’s recommendations, the surface release of straw (NT) or shallow burial (MT), without prior chopping, favoured higher grain yield. This was associated with the mulching effect of residues on the soil, leading to a reduction in water loss through evaporation.
ARMOSA’s performance in simulating phenology remained consistent during validation, achieving maximum scores for emergence and flowering. While the formalization of the maturity stage did not attain the same degree of excellence, ARMOSA closely aligned with the observed mean values. This reinforced the model’s reliability in capturing critical phenological events, crucial for understanding crop development and predicting growth patterns.
Examining ARMOSA’s overall performance for different treatments highlighted a tendency to slightly overestimate yield, coupled with increased variability compared to observed plant dynamics. The model’s broader sensitivity to varying climate patterns was evident, indicating areas for potential refinement in predicting crop performance under diverse conditions.
The simulation of TOC dynamics during validation underscored ARMOSA’s ability to capture fluctuations, particularly during long-term evolution under different crop systems. The model closely aligned with observed TOC values for the year 2020 but exhibited discrepancies in certain experimental years.
The observed variability, attributed to challenges in consistently collecting soil samples under identical conditions, during the same period, and at the same sampling points across different experimental years, underscores the difficulties in accurately simulating TOC by ARMOSA.
In light of that, ARMOSA can be considered reliable in the simulation of TOC fluctuation, particularly if one considers the evolution over a period long enough to capture the correct dynamics of TOC under different crop systems [
55].
This ARMOSA study, with a primary focus on its application in the Mediterranean environment, establishes a robust foundation for comprehending crop dynamics and soil processes. While the specific regional context is integral, the findings carry substantial implications for global agriculture. The model’s adaptability and reliability, validated through successful calibration in the Mediterranean, suggest its potential utility across diverse climates and agricultural systems.
Beyond the Mediterranean scope, the study provides extensive insights into crop responses under various conditions and agronomic practices. As it accurately simulates crop- and soil-related variables, ARMOSA emerged as a versatile tool that can be fine-tuned to suit the unique conditions of various regions, making it valuable for researchers, agronomists, and policymakers involved in optimizing crop growth across different environments. A noteworthy aspect is the model’s detailed impact analysis of soil organic carbon dynamics due to agronomic practices, amplifying its applicability, particularly in regions striving to enhance soil fertility, water retention, and overall ecosystem resilience.
As it accurately simulates crop- and soil-related variables, ARMOSA emerged as a versatile tool that can be fine-tuned to suit the unique conditions of various regions, making it valuable for researchers, agronomists, and policymakers involved in optimizing crop growth across different environments.