Next Article in Journal
Current Biological Insights of Castanea sativa Mill. to Improve Crop Sustainability to Climate Change
Previous Article in Journal
Co-Culturing Seaweed with Scallops Can Inhibit the Occurrence of Vibrio by Increasing Dissolved Oxygen and pH
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Description of Ficus carica L. Italian Cultivars—I: Machine Learning Based Analysis of Leaf Morphological Traits

by
Cristiana Giordano
1,
Lorenzo Arcidiaco
1,*,
Margherita Rodolfi
2,
Tommaso Ganino
1,2,
Deborah Beghè
3 and
Raffaella Petruccelli
1
1
Insitute of BioEconomy, CNR, via Madonna del Piano 10, Sesto Fiorentino, 50019 Firenze, Italy
2
Food and Drug Department, University of Parma, Parco Area delle Scienze, 27/a, 43124 Parma, Italy
3
Economics and Management Department, University of Parma, Via J.F. Kennedy 6, 43125 Parma, Italy
*
Author to whom correspondence should be addressed.
Plants 2025, 14(3), 333; https://doi.org/10.3390/plants14030333
Submission received: 3 December 2024 / Revised: 10 January 2025 / Accepted: 18 January 2025 / Published: 23 January 2025
(This article belongs to the Section Horticultural Science and Ornamental Plants)

Abstract

:
Common fig, or simply fig (Ficus carica L.), is one of the most ancient species originated and domesticated in the Mediterranean basin. The Italian fig germplasm consists of a large number of cultivars, more than 300. This number is approximate; there are many genotypes that are still poorly known and studied that may possess interesting agronomic traits, especially in terms of response to climate change. Therefore, it is extremely important to study and preserve agrobiodiversity, but more importantly to identify simple and rapid characterization methods to catalog “hidden” cultivated plants. In this study, geometric leaf morphometry was used to explore differences among fifteen Tuscan fig cultivars. In addition, the effectiveness of a machine learning (ML) algorithm to characterize cultivars was evaluated. The study analyzed two classes of cultivars, one of plants with predominantly three-lobed leaf shape, and one five-lobed. Thirty-three descriptors for the five-lobed and twenty-three for the three-lobed. Anova analysis showed statistically significant differences for all characters analyzed and allowed an initial characterization of the material. Then, Random Forest algorithm analysis was used to reduce the number of parameters to those most significant for classification. The results showed that machine learning-based techniques are a valid system for analyzing leaves of F. carica cultivars and interpreting significant differences in leaf parameters. Classification based on the Random Forest model allowed us to filter out the main descriptors that best differentiate cultivars from each other.

1. Introduction

The common fig (Ficus carica L.), described by Caroli Linnaei in Species Plantarum [1] as Ficus “foliis plamatis”, is a deciduous, perennial tree belonging to the family Moraceae. F. carica is an excellent example of a fruit tree from the Mediterranean basin, where natural populations were present before domestication. It appears that in the Mediterranean basin area the cultivated fig was domesticated about 7000 years ago [2,3] from wild plants belonging to the genus Ficus. Zohary et al. [4] consider the fig, associated with the olive tree, grapevine, date palm, and pomegranate, one of the most important traditional fruits of Old-World agriculture that gave rise to early horticultural practices. The fig tree arrives in southern Italy (Magna Graecia, Sardinia) probably brought by the Greeks or Phoenicians between the 3rd and 8th centuries, as evidenced by archaeological findings [5,6], and then spread to the rest of the peninsula [7]. It has been harvested or cultivated for its edible fruit, consumed both as fresh and dried figs. Today, this perennial tree is widespread in all regions of the country; specifically, in the Piemonte region it is casual allochthonous, in Trentino and Valle d’Aosta it is naturalized allochthonous (Portale della Flora d’Italia). Ficus carica has great socioeconomic importance for many Mediterranean and Middle East countries which together produce 70–90% of the world’s supply of fig fruit. Worldwide, more than 1,000,000 tons per year are produced in an area of about 300,000 hectares [8]. Globally, about 30 percent of the fruit is consumed fresh in the domestic market, while 70 percent is consumed as dried figs [9]. Turkey is the country with the highest production (35% of global production) and it is also the biggest exporter of dried figs. Turkey exports 60–70% of the world’s dried fig production. Significant productions are also achieved in Egypt (19%), Algeria (13%), Morocco (12%), and Islamic Republic of Iran (7%). In Afghanistan, Tunisia, Albania, Brazil, Greece, China, India, USA and Japan are recent (or less recent) and important production areas. In Europe more than 100,000 tons per year are produced in an area of about 29,000 hectares; Spain contributes almost half of the entire European production, with about 40,000 tons on an area of more than 28,000 ha, followed by Italy, France and Portugal [8,10].
Italy is the second largest producer in Europe with 13,030 tons representing 12% of European production but only 1.3% of the world’s production (ISTAT 2023; http://dati.istat.it/ accessed on 20 January 2025). The largest contribution (80%) comes from the regions of southern Italy (Campania, Apulia, Calabria and Sicily) where cultivation in vegetable gardens and orchards is flanked by specialized crops, while more limited productions occur in areas of central-northern Italy (Tuscany, Liguria). All these regions differ, however, in the type of fruit produced. Calabria and Campania are characterized by the production of dried fruit, whereas in Apulia 90% of the production is marketed as fresh fruit [7]. In Tuscany and Liguria regions quality fruit production destined for niche markets is emerging. Italy has a rich varietal heritage. There are 24 cultivars of Ficus carica in the National Register of Varieties [11], but according to the Ligs4fun website, the main fig cultivars found in Italy are about 240 conserved both in private collections (Pomona Gardens in Apulia, or at production nurseries) and in national organizations, like CREA-MiPAAF (Research Centers for Horticulture, Fruit Growing and Citrus Farming in Rome; and Caserta) or University of Bari (Research Center for Experimentation and Training in Agriculture Basile Caramia Locorotondo; Bari, Italy) [7].
Fig has been part of the Tuscan tradition for many centuries: the first evidence of its consumption dates back to the Etruscan period. In the sanctuary of the acropolis of Volterra, in a temple of the fifth century B.C., an abundance of achenes were found, probably related to deity cults [12]. The fig plant was painted in 5th- and 4th-century B.C. tombs in lower Etruria (Tarquinia) [13] along with myrtle, laurel, and palms; the mineralized fruit was found in the excavation of a small Roman farm in Cinigiano, southern Tuscany [14] dated from the late 2nd century B.C. to the late 1st century B.C. At the archaeological site of the Etruscan and Roman port of the city of Pisa, fig wood was used to build the C-ship frame, which dates to the 1st and 2nd centuries A.D [15]. Abundant fig achenes have been found in archaeological excavations in central Florence dating from late Roman to medieval times: the extreme abundance of achenes found in several levels of excavation suggests that the fruit was not only consumed in the period of production but also dried [16]. The earliest written records of its cultivation date back to the 1200s and are of fiscal and cadastral origin. In 15th-century writings, the names of the Verdino, Brogiotto Bianco, and Nero cultivars are given in “contado fiorentino” (which stands for the countryside around Florence), showing that the crop was well established in the area [17,18]. Although fig cultivation was of limited importance, Tuscany was enriched with cultivars thanks to the zeal of the Florentines and especially the passion of the Medici grand dukes, who loved to collect new “breeds”. The court botanist Pier Antonio Micheli (Pier Antonio Micheli 1679–1737) described the cultivars that arrived on the Medici table, and that grew in the gardens of the Medici villas and Bartolomeo Bimbi (Bartolomeo Bimbi 1648–1729) immortalized them in paintings posted in the rooms of the villas, providing a “splendid” catalog depicting 43 cultivars of figs (Figure 1). Figs of the Dottato cultivar, stored in boxes filled with millet, were usually shipped from Tuscany to Paris and Vienna, while dried figs were prized mainly in Lombardy [19]. The richness of the varietal heritage present in Tuscany reached 100 cultivars of figs at the beginning of the 19th century [20]. In the late 19th and early 20th centuries, fig cultivation, almost exclusively consociated, took place in rural areas where easily storable fruits such as dried figs were preferred [18]. The decline of sharecropping was accompanied by a gradual contraction of area and production, and by the end of the 1960s, the product was mainly supplied by plants scattered throughout the territory (about 80%) and limited by secondary crops (14%). Currently, annual production is only 97 tons, and the cultivated area is about 28 hectares (ISTAT 2023; http://dati.istat.it/ accessed on 20 January 2025). Although it is a minor crop in the Italian horticultural landscape, the fig has been receiving increasing attention in recent years both for the nutritional characteristics of the fruit and for its ability to adapt to climate change, with torrid climates and low water availability. For this reason, several studies and projects have been initiated for the conservation, enhancement and characterization of native germplasm. In relation to the latter, genetic and morphological characterization studies have been undertaken for cultivar description [21,22].
For a long time, characterization was conducted with the analysis of tree, leaf and fruit morphological characters. The classification of fig cvs presents problems related to the richness of the entities in the area, which are not always adequately identified. In fact, there is often, for historical or geographical reasons, a high degree of approximation, inaccuracy and inconsistency in the identifications of locally cultivated accessions. Individuals have multiple scientific names, or only common names, or present cases of synonymy or homonymy.
Although the reliability of morphological characterization is a matter of debate, since many morphological and agronomic traits depend significantly on environmental conditions, tree age, and stage of plant development, this approach continues to be the key initial step in describing and classifying fig germplasm. Consequently, the selection of highly distinctive variables is crucial to optimize resources and enable an effective morphological characterization. Morphometric traits are useful for cultivar identification because they are simple to collect, applicable in different agricultural settings, and easily understood by different stakeholders. In particular, leaf morphology is considered a stable parameter and therefore a valid representative method of cultivar identification [23].
Giraldo et al. [22] included leaf morphology (number of lobes and shape of lamina) among the main discriminating characters. Some authors have recently reintroduced leaf character analysis as a valid method of varietal identification from a geometric morphometric point of view; since it is a stable character, statistical analysis is more effective and powerful [23,24].
Different morphological and geometric traits have been used to analyze the diversity of fig cultivars. Recently, Ciarmiello et al. [25] conducted a study on the germplasm of Campania and Nuzzo et al. [24] on thirty local entities from Basilicata (South Italy) [25].
Machine learning (ML) methodologies are increasingly utilized for their ability to automate processes and improve accuracy in the analysis of morphological traits [26], as well as in the detection and classification of leaf diseases [27]. Among these methodologies, the Random Forest (RF) algorithm is particularly noteworthy due to its robust classification capabilities and its functionality in assessing variable importance (feature importance) [28,29,30]. This capability is essential for identifying the morphological parameters that most significantly contribute to the characterization and classification of cultivars. This comprehensive analysis aims to identify the morphological features that most effectively discriminate among the studied cultivars. These intrinsic characteristics led us to choose this classifier out of many. Furthermore, these techniques have demonstrated high accuracy in distinguishing cultivars based on leaf and fruit attributes. These methods leverage spectral properties, textural features, and morphological traits captured from images to achieve robust classification, often outperforming traditional approaches that rely on visual or manual inspection [31,32].
Fig represents a rich genetic diversity within Italian cultivars, with each variety exhibiting distinct morphological and biochemical traits. Identification of fig cultivars is critical for optimizing agricultural practices and ensuring quality control in both local and export markets. Despite traditional approaches to morphological classification, advancements in ML provide a transformative tool for analyzing the subtle differences in leaf morphological traits, offering higher accuracy and scalability [33].
This study investigates the potential of ML algorithms to classify Italian fig cultivars by analyzing morphological traits of leaves and to evaluate the feasibility and effectiveness of ML algorithm in association with geometric leaf morphometry to characterize fifteen fig cultivars. The model to be developed should allow the identification of the most discriminating morphological traits and the development of a classification model for fig cultivars.

2. Results and Discussion

2.1. Morphological Analysis

The present study was conducted in Carmignano (Prato, Tuscany), an important area for local traditional fig cultivars. Agromorphological and ethnobotanical bibliographic information was retrieved and used for subsequent morphological analysis [18]. The fifteen cultivars chosen for the present study are the most representative cultivars grown in the Tuscany region and have always been present on local historical farms. All the cultivars studied are maintained on a private farm. Table 1 shows the qualitative characters reported by IPGRI [34]. Both leaf margin character and central lobe shape had a prevalent shape in the fifteen cultivars: crenate margin and lanceolate central lobe shape. In contrast, the shape of leaf base was highly varied and showed a profile from cordate-calcarate-truncate in BN to simple calcarate in CO. All the fifteen cultivars showed a little lobe in central or lateral lobe, with marked differences among cultivars. The maximum value of little lobe in central lobe was 82.35% in PO, while the minimum value was 11.11% in AL. The same trend was observed for little lobe in lateral lobe, the values ranged from 100% in FI, to 5% in DO and BN (Table 1).
The number of lobes is a fundamental characteristic for the description of Ficus carica L. cultivars [22]. Those analyzed in this study had predominantly three-lobed leaves or five-lobed leaves (Table 1). Specifically, cultivars FI, SP, PO, VE, PA and BN were classified as “pure” three-lobed, while AL, CO, PN and PE were classified as “pure” five-lobed (Table 1). Five cultivars (BC, BB, GI, PB and DO) presented mixed leaf shape (three- and five-lobed). We included them in the class of three-lobed, since it was the predominant type. Considering this detail, different descriptors were used and therefore the two groups were analyzed separately (see Section 3.2).
Descriptive values for each quantitative trait are recorded in Table 2.
When examining the Standard Error (SE) and the Standard Deviation (SD), it is evident that these metrics are considerably higher for the WxH trait (9.8 and 165.2, respectively) and for BAC (3.1 and 52.7, respectively). This indicates a high level of inter-sample variability for these traits. In contrast, several traits exhibit low values of both SE and SD (Table 2). Considering the coefficient of variation (CV), the character with the greatest variability was L3y (CV 72.5%) followed by BAC (CV 44.6%) and I3y (36.6%). L2/L1 and β angle are the characters with least variability; 8.4% and 12.5%, respectively (Table 2).
For each group (three- and five-lobed), the results of ANOVA analysis are presented in Table 3 and Table 4. In the three-lobed cvs, considering leaf size, defined by the parameters lamina width (W) and lamina length (H), the values were as follows: 22.7 cm for BB and 16.9 cm for FI for the width; the largest value of H was recorded for DO (25.6 cm) and the least in FI (19.7 cm) (Table 3). Leaf area (WxH cm2) ranged from 580.6 cm2 (BB) to 338.9 cm2 (FI). According to IPGRI [35] the cvs BB and DO were included in the very large leaf area category (>550 cm2), while BN, GI, PA, PB and SP were classified in the large category (400–550 cm2); the remaining cvs were in the medium area category (250–400 cm2). Petiole length and width are morphological traits that can act as discriminators between cultivars, as reported in the literature and in Ficus carica L. species descriptors [22,34]. Our results showed statistically significant variations only for petiole length (PL), which ranged from 9.57 cm in BB to 6.95 cm in FI. All morphometric parameters analyzed showed a statistically significant difference between cultivars. The CLL/H ratio ranged from 0.60 in BB (which also recorded the highest CLL and H value) to 0.43 in VE (which recorded the smallest CLL value). The widest angle between basal lobes (BAC) was in SP (226.1°), while the one with the smallest angle was in PA (91.4°). Variation was also observed in the Zx and Zy values representing the coordinates of the point of maximum width of the central lobe; BB and BN showed the highest value of Zx (5.11 and 5.09 cm, respectively). The remaining morphometric parameters showed variability among the eleven cultivars; for example, the coordinates of the I2 point, representing the second sinus, ranged from 4.39 cm in BN to 2.83 cm in BC for I2x, while I2y was within a range of 11.4 cm in SP and 7.76 cm in BC. Wide statistically significant variability was observed in I2/L2 where the largest values were observed in SP and VE (0.76 and 0.77, respectively), while PB and BC showed the lowest values (0.52 and 0.58, respectively; Table 3).
Table 4 shows the results of ANOVA analysis for the five-lobed class. The cultivar CO showed the highest values for most of the characters analyzed. Among the morphological characters, W registered a range between 24.6 cm in CO and 18.5 cm in PN, while H was between and 30.5 cm in CO and 20.5 cm in PN. Leaf area (WxH cm2) ranged from 385.9 cm2 (PN) to 760.2 cm2 (CO). AL and CO can be classified into the category of very large (>550 cm2), PE in the large class (400–550 cm2), and PN in the medium class category (250–400 cm2), according to IPGRI descriptors. Among the morphometric parameters analyzed, the greatest variability was observed between CO and PN for all characters of I2, L2, I3, L3 and Z. The analysis of angles, petiole sinus (BAC) and angles between α and β lobes, is also interesting. For α and β, the highest values were observed in CO, 44.6° and 45.4°, respectively, and the lowest values are 37.9° and 37.5°, respectively, in PE. The BAC angle was reversed for CO, as it recorded the lowest value, 34.5°, while the highest value was observed in PN, 113.4° (Table 4).
Nuzzo et al. [24] found a relationship between petiole sinus angle (BAC) and leaf base shape. Our results showed a greater angle in cvs with a truncate base, as observed for the three-lobed cvs VE (BAC of 145.8°) and PO (BAC of 183.3°) or decurrent as in SP (BAC of 226.1°); with a lesser angle in cvs with a calcarate base shape, as in the five-lobed cvs CO (BAC of 34.5°) and AL with BAC of 91.5° (Table 1, Table 3 and Table 4).
Leaf morphological characters play a fundamental role in plant taxonomy and represent highly discriminating phenotypic variables in cultivar characterization. The morphometric characteristics of Ficus leaves have been investigated by a limited number of authors [36,37] who conducted studies on different species of the Ficus genus. Recently, Nuzzo et al. [24] in Italy and Abdelkader et al. [23] have used a multivariate morphometric approach to characterize autochthonous cultivars. The authors agree that leaf morphometric parameters can be simple and efficient systems to characterize even close cultivars.

2.1.1. Selection of Features

Following a comprehensive evaluation of all morphological parameters, the ten most influential variables were identified based on their Gini importance scores, calculated using a Random Forest model applied to the two classes, three-lobed and five-lobed. This selection was achieved through a cross-analysis of the feature importance ranking plots.
It was observed that for the three-lobed cultivars, seven variables were sufficient to achieve an accuracy of approximately 0.65, while for the five-lobed cultivars, six variables were adequate to reach an accuracy of 0.95 (Figure 2(a1,b1)). For three-lobed cultivars, descriptors like I2/L2, α, and PL/L1 occupy the second, third, and fourth positions, respectively in the top ranks, while for five-lobed cultivars, descriptors like PLØ, I2/TP, and H rank higher than in the three-lobed set, suggesting differences in the most discriminative traits for these groups. Many descriptors (e.g., BAC, I2y, PL/L1) appear in both plots with relatively high importance, indicating their general relevance across both cultivar groups. However, specific descriptors like α (three-lobed) and I3/L3 (five-lobed) highlight distinct morphological traits associated with the structural differences in the lobes. For the three-lobed class, the selected variables were BAC, I2/L2, α, PL/L1, I2_TP, CLL, I2y, CLL/H, I2x, and PL/H; whereas for the five-lobed group, the variables included: BAC, PLØ, I2y, H, I2_TP, PL/L1, PL, I3/L3, β, and WxH.
The results highlighted the importance of BAC across both groups. It suggests a fundamental role in defining key structural differences between cultivars, possibly linked to lobe size, shape, or symmetry. Differences in rankings of descriptors such as α and H may reflect variations in the geometric or proportional characteristics specific to each cultivar type. The gradual decline in Gini importance for lower-ranked descriptors suggests that many features contribute minimally to classification. This indicates potential redundancy in the dataset, where only the top-ranked descriptors provide significant discriminatory power. Based on the rankings, descriptors like BAC, PLØ, and I2/TP in future analysis should be prioritized for developing efficient predictive models or classification systems, as they consistently appear in the top ranks. For three-lobed cultivars, additional focus on α and PL/L1 may improve model accuracy, whereas for five-lobed cultivars, emphasizing H and I3/L3 could yield better results.
Figure 3 shows the Heatmap diagrams of the ten most important descriptors according to the RF algorithm, for both the three-lobed group (Figure 3a–i,l) and the five-lobed group (Figure 3m–v). In the three-lobed group, the cultivars PO, SP, and VE exhibit a higher frequency of significant differences at the 0.01 confidence level for the descriptors BAC, I2/L2, α, I2_TP, and I2y (Figure 3a–c,e,g). Conversely, for the descriptors PL/L1, CLL/H, and PL/H, the number of significant differences is lower across all confidence levels (Figure 3d,h,l). For the five-lobed group, the Heatmaps in Figure 3 reveal that the most significant differences occur between the PN and CO cultivars for all analyzed descriptors (p > 0.005; p > 0.01). The same cultivars, PN and CO, also demonstrate marked differences compared to the cultivars PE and AL. Specifically, five descriptors differentiate PN from PE (I2y, I2_TP, PL/L1, I3/L3, and β; Figure 3o,q,r,t,u), whereas eight descriptors distinguish PN from AL (BAC, PLØ, I2y, H, I2_TP, I3/L3, and WxH; Figure 3m–q,t,v). Regarding CO, statistically significant differences were observed for nine of the ten descriptors compared to PE (only I3/L3 was not significant, Figure 3t). However, no statistically significant differences were found for the descriptors PLØ, I3/L3, and WxH between CO and AL (Figure 3n,t,v). These findings are highly noteworthy as they clearly differentiate the four cultivars.
The tree-plots resulting from the cluster analysis of the ten descriptors selected by the RF algorithm, for the three-lobed and five-lobed class are shown in Figure 4 and Figure 5.
The cluster analysis of three-lobed cultivars, conducted using the Euclidean distance metric, revealed two main groups at a distance of 5.8 (Figure 4). Cluster 1 is further subdivided into two subgroups. The first subgroup includes cultivars PB, GI, and BC, with BC joining the cluster at a distance of 3.3. The common traits shared by BC, PB, and GI (or the shared characteristics within the first subgroup) are I2x and the ratios PL/L1, CLL/H, and PL/H. These three ratios are the unifying parameters for the entire first cluster, including the second subgroup comprising DO and BB. The second group is also divided into two subgroups. The first subgroup consists of PA and BN, which exhibit nine out of ten similar descriptors (with the exception of I2x). At a distance of 4, the cluster incorporates FI, which shares eight out of ten descriptors with PA and BN (including I2y, I2_TP, CLL, BAC and α angles, and the ratios I2/L2 and CLL/H). The second subgroup comprises VE and PO, which display similarity across all RF-selected traits, and at a distance of 3, SP joins the subgroup. SP shows statistically significant differences from VE and PO only for the traits BAC, and I2x.
Analyzing the bootstrap support, the hierarchical cluster analysis of Figure 4 partitions the examined cultivars into two principal clusters at a relatively large distance, as evidenced by the topmost branching node with a 100% bootstrap value. This high value indicates that the overarching split—separating the entire left group (SP, PO, VE, FI, BN, PA) from the right group (BB, DO, BC, GI, PB)—is consistently recovered in the majority of bootstrap resampling iterations, underscoring its robustness. Focusing on the left-hand site (in orange), several sub-branches reveal moderate-to-high support levels. For instance, the node separating SP from PO–VE has a bootstrap value of 56.2%, suggesting that SP frequently clusters on its own, while PO and VE typically group together (58.5%). At a higher level, the cluster containing FI, BN, and PA joins the PO–VE subtree at a node with 78.5% support. Notably, BN and PA share a branch with 51.5% support, reflecting some variability in how these two cultivars may be grouped under resampling. Collectively, these subclusters indicate that the finer-level splits among SP, PO, VE, FI, BN, and PA are fairly consistent, though not as robust as the primary division.
On the right side of the dendrogram, a node with a 60.5% bootstrap value unifies the entire cluster of BB, DO, BC, GI, and PB. Within this cluster, BB and DO form a green subgroup supported at 56.5%, indicating a moderate affinity between these two cultivars. In contrast, BC, GI, and PB form a red subgroup with slightly varying support levels: BC branches off at 67.8%, while GI and PB cluster at a lower bootstrap value of 42.0%, suggesting their grouping may be more prone to variation under resampling.
Overall, the upper-level partition—separating the left and right clusters at 100% bootstrap—is highly reliable, capturing the most pronounced differences among the cultivars. In contrast, lower-level groupings display moderate-to-lower support values (ranging roughly from 42% to 78.5%), implying that subtle distinctions among closely related cultivars can shift depending on the subset of data sampled. This pattern aligns with typical expectations in hierarchical clustering, where major divisions are frequently more robust, and finer-scale splits exhibit more variability in their bootstrap support.
The dendrogram obtained from the cluster analysis for the five-lobed class (Figure 5) highlights a clear distinction between the CO cultivar and the other three cultivars. The dendrogram partitions cultivars into two principal clusters at a relatively large distance, as demonstrated by the topmost branching node showing a 100% bootstrap value. This high value indicates that the overarching split—separating cultivar CO from the remaining three (PN, AL, PE)—is consistently retrieved in virtually all bootstrap resampling iterations, highlighting its robustness.
Focusing next on the separation of PN from the subcluster that includes AL and PE, the node has a bootstrap value of 80.5%. Although this value demonstrates that PN tends to branch off on its own in most resampling runs, it is not as absolute as the primary division. Finally, the grouping of AL and PE shows a 68.8% bootstrap value, suggesting that while these two cultivars frequently cluster together, their pairing is less stable than the top-level splits—likely reflecting subtler differences in the traits measured.
Overall, these results indicate that the highest-level partition in the dataset—between CO and the other cultivars—is exceptionally reliable (100%), whereas lower-level subdivisions (e.g., separating PN from AL and PE, and then grouping AL with PE) exhibit moderate support.
This pattern aligns with the expectation that major clusters, capturing the most pronounced differences, remain robust across bootstrap resamples, while finer-scale splits can fluctuate due to subtler trait variations.
Figure 6 illustrates the distribution of the six variables selected across multiple cultivars using boxplots. Each panel represents a distinct trait, with individual boxplots corresponding to different cultivars. The letters above the boxplots indicate statistically significant groupings based on a post hoc analysis, highlighting differences between cultivars. CO is confirmed as the cultivar with the highest absolute values for almost all parameters.
  • Panel a: variable PL shows considerable variability among cultivars, with statistically significant differences evident across groups (e.g., cultivar CO differs significantly from cultivars BC and FI).
  • Panel b: variable I2y show statistically significant differences for all cultivars except for SP, PO and VE, which show similar values.
  • Panel c: variable PL/L1 displays a narrower range, indicating lower variability, with several cultivars sharing overlapping groups (e.g., BC, DO, PB, PE, and SP).
  • Panel d: trait I2_TP reveals intermediate variability, with notable outliers in certain cultivars, such as CO and PE.
  • Panel e: variable WxH demonstrates the greatest variability, as reflected by the wider interquartile ranges and the presence of several outliers, such as CO.
  • Panel f: trait BAC displays smaller interquartile ranges and a more consistent pattern, indicating less variability across cultivars. Significant differences evident for some cultivar (e.g., PO, SP, and VE compared with CO).
Overall, the figure highlights substantial inter-cultivar variability for several traits. PO, SP and VE exhibit similar values in almost all the descriptors (PL; I2Y; I2_TP, and WxH).
The Table 5 provides statistical analysis for comparisons among all the cultivars, divided into two sections: Pairwise comparison with the first three highest effect size values; Pairwise comparison with the last three lowest effect size values. The comparisons of the pair in the first group demonstrate strong statistical significance (very low p-values), large effect sizes, and high Bayesian factors (BF10), indicating robust evidence for the observed differences. The T-statistics are substantial (e.g., T = −25.5 for BAC: CO||SP) and p-values are exceedingly small (e.g., p = 1.59 × 10 25 ) , showing highly significant results. The Confidence intervals (e.g., [−209.54, −178.75] for “BAC: CO||SP”) are narrow, indicating precise estimates of the effect. All comparisons show “Huge” or “Very large” effect sizes based on Cohen’s criteria (e.g., Effect Size = 11.90 for “BAC: CO||PO”) and the Bayesian Factor (BF10) values are massive (e.g., B F 10 = 1.833 × 10 23 ) , lending strong Bayesian support for the alternative hypotheses. Statistical power for all comparisons in this category is 1 (or very close), signifying an extremely high probability of detecting the true effect. Group pairs such as “CO||PO” and “PA||PO” in BAC, and “PN||SP” in I2y, exhibit the largest effects and highest Bayesian evidence. Despite the strong results, some comparisons (e.g., “PL_L1: DO||PA”) have slightly lower effect sizes (1.86) but still fall into the “Very large”. For this group the results reflect substantial, reliable differences among the compared groups, with consistent evidence from frequentist (p-values and effect sizes) and Bayesian (BF10) approaches. These findings are statistically and practically significant, warranting further exploration or consideration in decision-making. In contrast, the pair in the second group (Low Effect Size) indicates negligible or very small effects, with results lacking statistical significance. The T-statistics are close to zero (e.g., T = 0.1) and p-values are high (e.g., p = 0.92), showing no significant differences between groups. The Confidence Intervals (CI95%) include zero (e.g., [−21.5, 23.82]), reflecting uncertainty and a lack of evidence for meaningful differences. The effect sizes are classified as “Very small” or “Negligible” (e.g., 0.01–0.07), indicating minimal practical significance. BF10 values are close to 1 (e.g., BF10 = 0.31), suggesting weak evidence for either hypothesis. The power of tests in this category is approximately 0.05, indicating a low likelihood of detecting an effect even if it exists. Comparisons such as I2_TP: BC||PE and PL: BB||BN show extremely small effect sizes, reinforcing the conclusion of no practical significance. The pairwise results for this group indicate no noteworthy differences among groups, with negligible impact or practical utility.
The hierarchical cluster analysis illustrated in Figure 7 partitions the examined cultivars into two principal clusters at a relatively high distance, evidenced by the topmost branching node displaying a 100% bootstrap value. This high value indicates that the fundamental split between these two overarching groups is consistently retrieved in the vast majority of bootstrap resampling iterations, underscoring its robustness.
Focusing first on the left-hand side of the dendrogram (in orange and green), cultivars VE, PO, SP form a distinct subcluster (orange) with moderate support values (e.g., 58% for the node separating VE and 61.5% for the node that groups PO and SP). Although these percentages suggest that the grouping appears in more than half of the bootstrap replicates, they are not as strong as the topmost division—reflecting some variability in the underlying traits for these samples. Meanwhile, the green subcluster (including BC, FI, PN, AL, PB, and PE) exhibits bootstrap values ranging from modest (36.2% at the node splitting off BC) to higher intermediate levels (66–70.8% for nodes connecting PN–AL and PB–PE). While these splits are fairly consistent overall, the moderate support values indicate that minor rearrangements can occur in the dendrogram under resampling, likely due to subtler morphological or numerical differences.
By contrast, the right-hand side of the dendrogram (in red) groups CO, BN, PA, DO, BB, GI together. Here, the node uniting CO with the rest shows a bootstrap value of 46.8%, revealing moderate-to-low support and suggesting greater potential for alternative placements of CO in different resampling runs. Within that cluster, the node unifying BN and PA has slightly stronger support (58.5%), whereas the splits involving DO, BB, and GI vary from moderate (61.3%) to relatively lower (53.5%). These intermediate or lower bootstrap values highlight finer-scale distinctions among these cultivars, which may shift under different bootstrap samples. Overall, the upper-level partition of the dataset—supported by a 100% bootstrap value—is highly reliable, confirming that the major separation among these cultivars is robust. Below this main split, different cluster groupings receive varying degrees of support, pointing to more nuanced trait similarities and differences that can cause slight rearrangements in the dendrogram. Consequently, while the top-level branching is consistently recovered, the lower-level subdivisions exhibit moderate to low bootstrap values, indicating a higher degree of uncertainty in the precise arrangement of some cultivars.

2.1.2. Cultivar Classification by Random Forest

The Random Forest classifier was applied to all cultivars in the dataset using the six most significant common variables, yielding an overall accuracy of 0.49 (Table 6). shows the performance metrics of a Random Forest classifier in predicting various cultivars based on precision, recall, and F1-score. The results revealed that the RF model performs effectively for certain cultivars: FI and SP achieve a precision of 1, while PB and CO achieve precisions of 0.75 and 0.71, respectively. In contrast, PN and DO exhibit lower precision values, at 0.29 and 0.25, respectively. The recall metric (Table 6) indicates perfect classification (1.0) for CO and PO, whereas the lowest recall values are observed for PN, AL, at 0.33 and 0.20, respectively, and BN, GI both with 0.0. Analysis of the validation metrics using the F1-score highlights the highest values for SP, CO, and PO (0.89, 0.83, and 0.75, respectively), while the lowest scores are associated with AL, BN, and GI (0.29 for AL and 0.0 for BN and GI).
The following insights can be drawn:
  • Class-wise Performance: Cultivars such as CO and SP show the best classification performance with high precision (0.71 and 1.00), recall (1.00), and F1-scores (0.83 and 0.89), indicating that the model accurately identifies these cultivars with few misclassifications. On the contrary, cultivars like BN and GI exhibit the poorest performance, with all metrics (precision, recall, and F1-score) at 0.00. This suggests that the classifier struggles entirely with these cultivars, likely due to class imbalance, lack of distinctive features, or other limitations in the dataset.
  • Intermediate Performance: Cultivars such as PE, PO, and PB have moderate F1-scores ranging from 0.60 to 0.75. While the recall is strong for some, such as PO (0.60 precision, 1.00 recall), the precision-recall trade-off indicates room for improvement in minimizing false positives.
  • Poor Recall for High Precision Cultivars: A notable case is FI, which shows perfect precision (1.00) but a recall of only 0.33, resulting in a moderate F1-score (0.50). This suggests that while the classifier is confident when it predicts FI, it fails to capture many actual instances, indicating underprediction for this class.
  • Weighted Metrics: the weighted average precision, recall, and F1-score are 0.49, 0.49, and 0.47, respectively. These values reflect the overall performance across all cultivars, weighted by the number of instances in each class. The low values indicate that the classifier struggles to generalize well across multiple classes.
  • Overall Accuracy: The overall accuracy of the classifier is 0.49, which is only slightly better than random chance in a binary context. This underscores the challenges the classifier faces in achieving consistent performance across cultivars.
The results presented in Figure 8 showed that nine cultivars were classified with accuracy and specific values (≥50%), particularly CO, PO, and SP exhibited high classification accuracy (100%, 100%, and 80%, respectively). Notably, cultivar AL was classified with the lowest accuracy (20%) and is misclassified 40% of the time, primarily as PN. FI e GI were confused with DO with a rate of misclassification of 66.7%. Our results showed that the RF model is capable of recognizing differences in the six morphometric parameters analyzed and classifying cultivars. However, the results point out that there may be misclassified probably due to the plasticity of the leaf, which affects the results of the model.

2.1.3. PCA Analysis

Table 7 reports the first three components from the Principal Component Analysis (PCA) of the six most significant characters of all fifteen cvs studied, and it summarizes the results of a Principal Component Analysis (PCA), detailing the contributions of six selected traits to the first three principal components (PC1, PC2, and PC3), along with the associated eigenvalues and explained variance.
The three components cumulatively explain 91.5% of the total variance. This indicates that most of the variability in the dataset can be captured by these three principal components. PC1 contributed the largest proportion of variability (42.4%) and is primarily influenced by I2y (loading: 0.51), I2_TP (loading: 0.50), and PL (loading: 0.48), with moderate contributions from WxH (0.45). PC2 explained 30.1% of the variance and is dominated by BAC (loading: 0.49) and I2_TP, I2y (both at 0.41), with a negative influence from PL_L1 (−0.51). PC3 accounted for 19% of the variance, with significant contributions from PL_L1 (0.60), BAC (0.51), and WxH (−0.55). The traits I2_TP and I2y exhibit strong positive contributions across PC1 and PC2, suggesting their importance in describing the major sources of variability. Conversely, PL_L1 and WxH contribute more prominently to PC3, indicating their relevance in capturing secondary patterns in the data. PCA reveals that the six traits are highly variable and are related to variability among cultivars. The analysis of the plot presented in Figure 9 revealed a distribution of samples with a weak degree of aggregation. Based on the length and direction of the feature vectors, it is evident that four features (BAC, PL-L1, PL, WxH) play a significant role in determining the principal components. However, these features also exhibit weak correlations among themselves. Conversely, the other two selected features (I2_TP and I2y) are strongly correlated (Figure 9).

2.2. Trichome Analysis

Trichomes are appendages that cover different organs of the plant; they originate from the cells of the epidermis and grow outward from the surface to a conspicuous size. Their location, shape, content and density play a key role in the plant’s defense against biotic and abiotic stresses. They provide a barrier in defense against pathogens [38], reflect solar radiation and filter harmful ultraviolet rays [39], absorb moisture and nutrients from the atmosphere [40,41] and reduce evapotranspiration by affecting temperature and photosynthetic rate [42,43]. Trichomes are valuable characters for taxonomic identification at different infra-generic levels and are usually used for classification purposes by many systematics [44,45]. Consequently, we assessed the presence of the trichomes of the upper and lower leaf surface of the studied cultivars. Ficus carica has glandular and non-glandular trichomes on the adaxial and abaxial surfaces of the leaf. The glandular ones are peltate; the non-glandular ones are unicellular, simple, spine-like of different sizes together with papillae-like lithocysts of the same shape but emerging from a shield-like base [46].
Both glandular and non-glandular trichomes were observed on the lower and upper pages. Most hairs on the epidermis were non-glandular, with a marked difference in size, characterized in this study in 11 length classes. Figure 10 shows the distributions of the different categories for the upper page (Figure 10a) and for the lower page (Figure 10b). Our results show that classes 1 (0.1–99 µm) to 5 (240–299 µm) are the most represented in both the upper and lower epidermis, with a clear predominance on the lower page (Figure 10a). Classes 6 to 11 are predominantly or exclusively observed on the lower leaf page (Figure 10a,b).
Table S1 shows the percentage distribution of the different classes in the studied cultivars for the upper and lower epidermis. The distribution in classes of the lower epidermis of the cultivars is also represented in the circle chart in Figure 11. The cultivars DO, PB, PO showed trichomes belonging to almost all 11 classes, while FI and VE registered hairs with sizes that fall only in classes 1 to 4 and therefore trichomes of shorter length.
The trichome frequency distributions by height class were reconstructed for the lower surface of the leaves, where their presence predominates. Subsequently, for each cultivar, the λ parameter of the Poisson distribution (Equation (2)) was derived from the observed distributions (see Section 3.4.1). Figure 12 presents the observed distributions alongside the corresponding Poisson distributions. Overall, the Poisson model demonstrates a strong alignment with the observed class frequencies across cultivars, indicating that these distributions can largely be modelled using a Poisson process. However, notable deviations occur in cultivars with extreme λ values, such as FI (λ = 0.3) and VE (λ = 0.1), where observed data diverge significantly from the predicted distributions. In contrast, higher λ values, such as PO (λ = 3.0), yield broader distributions encompassing a wider range of class frequencies, while lower λ values are associated with more concentrated distributions, with most observations confined to lower classes (Figure 12). These results suggest variability in the processes influencing class distributions among cultivars, likely driven by biological or environmental factors. The analysis underscores the utility of the Poisson model as a foundational tool for characterizing class frequency distributions, while also highlighting contexts where additional modelling may be required.
Figure 13 reports the statistical significance of λ values evaluated using the Likelihood Ratio Test (LRT). The LRT statistic is assumed to follow a chi-squared (χ2) distribution with 2 degrees of freedom (see Section 3.4.1). The results indicate varying levels of significance across the cultivars, with certain pairs of Poisson distributions exhibiting highly significant differences in λ, suggesting distinct trichome density patterns. These findings highlight the utility of LRT for comparing Poisson-distributed data and underscore the heterogeneity in λ values among the cultivars, which may reflect underlying biological or environmental influences shaping trichome distributions. The most significant results (p < 0.001, in orange) are observed predominantly between cultivars with contrasting λ values, reflecting substantial differences in trichome density distributions. For example, cultivars such as PO, SP, VE and FI exhibit strong statistical distinctions from several others (Figure 13). This suggests significant variability in the underlying biological or environmental factors influencing trichome density. Conversely, non-significant comparisons (green, NS) are scarce and mainly occur among cultivars with similar λ values, such as PA and PE (λ = 1.8), indicating closely related trichome distribution patterns. These findings underscore the effectiveness of LRT-tests in identifying cultivars with significantly divergent traits and suggest that some pairs may represent distinct phenotypic or genetic groups.
Table 8 shows the density of hairs of the superior and inferior leaf page; high variability can be observed in both surfaces. For the upper page, the highest density value was observed in AL (26.3 trichomes mm−2) and the lowest in VE (1.02 trichomes mm−2). For the lower page, PA and PE had the highest density value (93.8 and 87.5 trichomes mm−2 respectively), while VE showed the lowest value with 48.2 trichomes mm−2.
Based on the calculated λ parameters and the average trichome density, a hierarchical cluster analysis was performed using Euclidean distance (Figure 14). The results showed that, with a relative distance of 0.6, the fifteen cultivars clustered into three main groups. Cluster 1 clustered four cultivars (CO, FI, SP, and VE) that can be identified by a low density of trichomes on the lower page and the presence of trichomes that are concentrated in a limited number of classes in both epidermises. Cluster 2 aggregated six cultivars into two sub-clusters. The cultivars GI and PN, present in the first sub-cluster, were positioned very close to each other. In the second sub-cluster PO cv was closely related with BC and DO. The cultivars in cluster 2 were typified by the larger size of the hairs and an average density in the lower page. Cluster 3 encompassed five cultivars; at a dissimilarity level of 0.4–0.5, it divides into two sub-clusters. PE and PA, BB and PB contained in the first and second sub-clusters, respectively, are close to each other for the characteristics of trichomes (Figure 14). Focusing on the left cluster (containing CO, FI, SP, and VE), the node branching FI and SP from CO has a bootstrap value of 60%. While this suggests that the subcluster is observed in more than half of the bootstrap replicates, it is less stable than the main split, potentially reflecting moderate variability in the underlying morphological or numerical traits for these samples. Similarly, the split isolating VE (59.2%) exhibits a comparable level of consistency, indicating that, although VE typically segregates from CO, FI, and SP, the exact point at which it branches off can vary under different bootstrap samples.
In contrast, the right portion of the dendrogram (including AL, GI, PN, PO, BC, DO, PA, PE, BN, BB, PB) shows varying degrees of stability, with bootstrap values ranging from moderate (61% at the node separating AL, GI, and PN from PO, BC, DO, PA, PE, BN, BB, PB) to relatively high (over 90% for the splits involving BN, BB, and PB). Nodes with higher bootstrap values, such as 93% or 94%, indicate groupings that are more consistently observed under resampling, suggesting stronger similarity among those cultivars. Conversely, nodes with more modest support (around 59–72%) highlight potential overlaps or subtler morphological differences that may shift slightly from one resample to another. Overall, these results indicate that the upper-level partition of the dataset is highly reliable, whereas lower-level subdivisions show varying degrees of stability. This pattern is consistent with the expectation that major clusters—capturing the most pronounced differences among samples—tend to remain robust, whereas finer-scale splits, which hinge on subtler distinctions, may fluctuate and thus receive lower bootstrap support.

3. Materials and Methods

3.1. Plant Material

The plant materials consisted of leaf samples from fifteen Ficus carica L. cultivars collected from the private farm “Petracchi” located in Carmignano (Prato, Tuscany, IT; 43°48′49″ N, 11°1′6″ E, 189 m. a. s. l.). Trees were cultivated under the same agro-environmental conditions and according to the standard procedures of organic cultivation. A total of 287 fully mature fig leaves were randomly collected (80 pentalobate and 207 trilobate, respectively) in the middle third of the shoot and measurements were conducted on the leaves belonging to the dominant form of the cultivar. The names of the cultivars and codifications are reported in Table 1.

3.2. Morphological Descriptors

The morphological features were described using the methodology proposed by the International Plant Genetic Resource Institute [34] (Table 9). The qualitative descriptors analyzed were the number of lobes, leaf margin, shape of central lobe, presence and location of little lateral lobe, shape of leaf base. Quantitative leaf characters described were as follows: petiole length (PL; cm); petiole thickness (PLØ; cm); midrib length (L1; cm); leaf width (W; cm); leaf length (from the base of the blade to the tip of the central lobe; H; cm); leaf area (W*H; cm2); central lobe length (CLL; cm); leaf base angle (BAC; °); angles between main nerves (α = angle between L1 and L2; β = angle between L2 and L3; °). To these characters we added fifteen morphometric descriptors selected and measured only on the right side of each leaf, assuming that the leaves are symmetrical. The right half of the leaf was placed on a Cartesian plane with the origin at the point where the veins depart, and the y-axis superimposed on the central rib. For the following characters: Z (the maximum width of the central lobe, cm); L2 (apex of the second lobe; cm); L3 (apex of the third lobe; cm); I2 (sinus between lobe 2 and central lobe; cm) and I3 (sinus between L2 and L3; cm), x and y coordinates were measured. Moreover, the distance of the point from the center of the Cartesian plane was calculated using the Pythagorean Theorem (PT) applied to the x and y coordinates (Figure 15). In addition, the following ratios were calculated: I2/L2; L2/L1; I3/L3; PL/H; PL/L1; CLL/H (Table 9).
R = ( I 2 _ T P + I 3 _ T P ) ( L 2 _ T P + L 3 _ T P )

3.3. Trichome Analysis

For trichome characterization the upper and lower surface of the leaf was analyzed using Fei Quanta 200 Environment Scanning Electron Microscope (ESEM), Fei Corporation, Eindhoven, The Netherlands, operating in low-vacuum mode (the chamber pressure was kept at 1 Torr), at 25 kV. Three pieces of tissue were cut from the central part of the leaf, left and right of the central vein, and analyzed without pre-treatment. The non-glandular trichomes were measured and divided into 11 frequency classes: class number 1 includes hairs measuring from 0.1 to 99 µm; class n. 2: 100–140 µm; class n. 3: 140.1–159 µm; class n. 4: 160–239 µm; class n. 5: 240–299 µm; class n. 6: 300–319 µm; class n. 7: 320–332 µm; class n. 8: 334–358 µm; class n. 9: 360–398 µm; class n. 10: 412–438 µm; class n. 11: 450–477 µm. Then, the hair density was calculated.

3.4. Statistical, PCA Analysis, and Random Forest Model

3.4.1. Trichomes

Since each cultivar has a characteristic frequency distribution, the observed frequency distributions were analyzed, and the Lambda value of the respective Poisson distribution was derived for each cultivar (Equation (2)).
P x = e λ λ x x !
The Lambda values thus obtained were compared by means of a Likelihood Ratio Test (LRT) to determine whether the observed differences between them were statistically significant. The Likelihood Ratio Test (LRT) is a statistical test of the goodness-of-fit between two models. A relatively more complex model is compared to a simpler model to see if it fits a particular dataset significantly better [8]. The LRT is based on evaluating the log-likelihood of two models: a null model, in which both cultivars are assumed to follow a common distribution, and an alternative model, in which each cultivar follows its own specific distribution. The test statistic is calculated as follows:
L T R = 2 × ( L o g ( l i k e l i h o o d   o f   t h e a l t e r n a t i v e   m o d e l                             L o g ( l i k e l i h o o d   o f   t h e   n u l l   m o d e l )
Under the null hypothesis, the Likelihood Ratio Test (LRT) statistic is assumed to follow a chi-squared (χ2) distribution with degrees of freedom equal to the difference in the number of free parameters between the compared models. In this study, since the comparison involved two Poisson distributions differing by a single parameter (λ), the test statistic was evaluated with 2 degrees of freedom. The p-value for each pair of cultivars was computed to assess the statistical significance of differences between their respective distributions. This methodology enabled the identification of statistically significant differences in the Lambda parameter between the two cultivars, as reflected in their frequency distributions, across three significance levels (0.05, 0.01, 0.001).
Subsequently, the estimated Lambda and density values for each cultivar were utilized as input features for a cluster analysis. The analysis produced a dendrogram based on a Euclidean distance metric to determine linkage.

3.4.2. Morphological Variables

A comprehensive data analysis of fifteen fig cultivars (Table 1) was conducted on the dataset containing thirty-two quantitative morphological variables (of which ten morphological and twenty-two morphometrical descriptors; Table 9). To mitigate the impact of potential outliers, the dataset underwent preprocessing, including data control, cleaning, and the application of a statistical transformation function to each morphological feature of single cultivar. For each cultivar and morphological trait was determined the 25fh percentile (FQ), the 75fh percentile (3Q). and the Inter Quartile Range (IQR) as I Q R = 3 Q 1 Q . Then the outliers values were selected and replaced with the Winsorization technique [47].
v a l u e s 1 Q 3.5 × I Q R w e r e   s e t = 1 Q 1.5 × I Q R
v a l u e s 3 Q + 1.5 × I Q R w e r e s e t = 3 Q + 1.5 × I Q R
Unlike data trimming, which excludes extreme values, Winsorization replaces these values with defined percentiles, thereby preserving the dataset’s structure while reducing the influence of outlier.
Descriptive statistics were calculated for all morphological variables, including the mean, median, 1st and 3th quartile, Standard Error (SE; measures the uncertainty around the sample mean as an estimate of the population mean), Standard Deviation (SD; measures the variability within your data, indicating how spread out the individual data points are from the mean), Minimum, Maximum, and Coefficient of Variation (CV). These statistics were derived at three distinct levels: first, for whole dataset; second by grouping data based on the cultivar, and third, by grouping them into two classes defined by the number of leaf lobes: three-lobed and five-lobed.
To investigate whether significant differences existed in trait means across classes (three-lobed and five-lobed), an Analysis of Variance (ANOVA) was conducted using the F-statistic. Following the ANOVA, post hoc tests were performed to identify which specific group means differed while maintaining control over the overall Type I error rate. To quantitatively evaluate the morphometric differences between the cultivars, the significance of these differences was assessed using Tukey’s Honest Significant Difference (Tukey’s HSD) test [48] conducted at a 95% confidence level. Additionally, effect size (EF), Bayesian Factors (BF), and Confidence Intervals (CI) values were calculated to measure the magnitude of the observed effects relative to the standard deviation of the sample and the relative strength of the treatments. EF, BF, and CI values for each pairwise comparison were determined using Cohen’s method [49].
Prior to analysis, the dataset underwent standardization by centering the data (removing the mean) and scaling to unit variance. This preprocessing step ensures compatibility with many machine learning algorithms, which typically perform better when input data are normalized to have a mean of 0 and a standard deviation of 1.

3.4.3. Random Forest Model

To identify the most influential morphological variables characterizing the cultivars, we applied the Random Forest (RF) classifier. Feature importance was assessed using the Mean Decrease in Impurity (MDI) metric, which quantifies impurity reduction based on the Gini index.
The original dataset was divided into train and test data by applying a proportion of 80/20 and the RF model was trained on the first subset, while the second subset was used exclusively for the validation phase. In order to ensure the reproducibility of the experiment, a randomstate = 10 was adopted. The significance of each feature was evaluated by examining the mean and standard deviation of impurity reductions. This process enabled us to identify the most influential variables, providing insights into the primary factors affecting our predictions. Separate analyses were conducted for cultivars classified as three-lobed and five-lobed, yielding ranked lists of feature importance for both groups. From these rankings, the top ten features were identified for each group, and the six features common to both were selected for further analysis. These six morphological features, representing key determinants of leaf morphology across all cultivars, were subsequently subjected to Tukey’s HSD test, independently of class distinctions.
To explore the clustering behavior of the cultivars, dendrograms were constructed for each class based on the selected six features, using Euclidean distance as the linkage criterion. In order to enhance the robustness and interpretability of the dendrograms, for each node we added the related bootstrap values. These values provide a measure of clustering reliability by assessing the stability of the clusters derived from resampling techniques where the data is repeatedly resampled, and the clustering process is reapplied. For each node in the dendrogram, a bootstrap value indicates the percentage of times a cluster was reproduced across all the resampled datasets. Bootstrap values are derived from resampling techniques where the data is repeatedly resampled, and the clustering process is reapplied. For each node in the dendrogram, a bootstrap value indicates the percentage of times a cluster was reproduced across all the resampled datasets. These values give an estimate of the confidence or stability of the clusters, much like how confidence intervals work in statistical analysis.
Additionally, Principal Component Analysis (PCA) was performed on six variables, for both three-lobed and five-lobed cultivars, facilitating a comprehensive understanding of the morphometric variations. To evaluate the contribution of each principal component to the total variance in the dataset, the explained variance for each component was analyzed by class. Subsequently, the contribution of the original variables to each component was examined to assess their respective influence. Finally, the relationships between the variables and the observed classes were visualized in a three-dimensional principal component space, with the axes representing the first three principal components.
The data analyst process also involved applying the RF model to the entire dataset and using the six descriptors common to the two classes.
In order to evaluate the performances of RF classifier we derived a Classification report calculated using the confusion matrices represented in a way to express how many of a classifier’s predictions were correct, and when incorrect, where the classifier got confused.
In the confusion matrices the rows represent the true labels, and the columns represent predicted labels. Values on the diagonal represent the number (or percent, in a normalized confusion matrix) of times where the predicted label matches the true label. Values in the other cells represent instances where the classifier mislabeled an observation; the column tells us what the classifier predicted, and the row tells us what the right label was.
To evaluate the model performance, we derived the following metrics:
Accuracy: represents the proportion of correct classified instances to the total:
Accuracy = Number   of   Correct   Predictions Total   Predictions
A high accuracy suggests that the model made a large number of correct predictions in general. However, accuracy may not be a sufficient metric in cases of imbalanced classes (where some classes are much more frequent than others). In such cases, a model might achieve high accuracy simply by predicting the dominant class.
Precision: this metric is the number of correctly-identified members of a class divided by all the times the model predicted that class. In the case of Ficus dataset, the precision score is the number of correctly-identified cultivars divided by the total number of times the classifier predicted cultivars rightly or wrongly:
Precision = True   Positives True   Positives + False   Positives
Recall: is the number of members of a class that the classifier identified correctly divided by the total number of members in that class. For Ficus dataset, this would be the number of actual cultivars that the classifier correctly identified as such:
Recall = True   Positives True   Positives + False   Negatives
F1 score: combines precision and recall into one metric. If precision and recall are both high, F1 will be high, too. If they are both low, F1 will be low. If one is high and the other low, F1 will be low. F1 is a quick way to tell whether the classifier is actually good at identifying members of a class, or if it is finding shortcuts (e.g., just identifying everything as a member of a large class).
F 1 - score = 2 · Precision · Recall Precision + Recall
A high F1-score indicates that the model is performing well in balancing precision and recall and is not overly biased toward one or the other.

3.4.4. Software Used and Coding

Special procedures in Python language were also developed for the implementation of the data analyst which allowed us to derive general statistics, trends, and measure of the statistical significance, and to build the Random Forest framework. The implementation of the RF classifier was developed using class “RandomForestClassifier” of Scikit-Learn [50,51], adopting following hyperparameter: n_estimators = 100, criterion = ‘gini’, max_depth = None, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = ‘sqrt’, max_leaf_nodes = None, min_impurity_decrease = 0.0, bootstrap = True, oob_score = False, n_jobs = None, random_state = None, verbose = 0, warm_start = False, class_weight = None, ccp_alpha = 0.0, max_samples = None, monotonic_cst = None.
For the realization of the hierarchical clustering dendrograms and the bootstrap analysis, a linkage among cultivars was performed and this was followed by the transformation of the linkages into a tree structure. The linkage and the tree structure were derived using the ‘linkage’ and “to_tree” functions of Scipy-hierarchy [52], setting following parameters: method di linkage: “ward”; distance = “euclidean”; match_mode = “jaccard”, jaccard_threshold = “0.8”. Subsequently, 100 bootstrap replications were performed, recalculating the clustering each time, and finally a bootstrap support (in percentage) was derived for each node of the main dendrogram.
For data statistical analysis and production of tables and plots we used dedicated scientific python libraries, as such Pandas [53,54], Scipy [52], Statsmodels [55], and Penguin [56]. Plots were derived using Seaborn [57] and MatplotLib [58] Python modules.

4. Conclusions

The interesting aspect of the present study is the morphological analysis conducted on plants grown in a single environment; in fact, keeping plants in the same agro-environmental conditions for several years provides an important opportunity to describe phenotypic variation in accessions living in common conditions. In addition, it allows the study of phenotypic plasticity of long-lived species such as fig, in response to changing climatic conditions.
Leaf micromorphological characteristics, including the shape, size, composition, and density of trichomes on the epidermis, exhibit considerable variability. These features play a pivotal role in plant taxonomy and are highly discriminatory phenotypic traits for cultivar characterization, particularly when combined with other qualitative and quantitative morphological attributes [22,59].
The classification of fig (Ficus carica) cultivars is inherently complex due to the large number of traits to analyze—192 in total, comprising 126 qualitative and 66 quantitative variables—and the species’ unique characteristics, such as the presence of multiple leaf types [60]. Consequently, identifying a subset of reliable, discriminative variables represents a promising strategy for efficient fig germplasm characterization and classification. Nevertheless, focusing exclusively on leaf morphology still involves the analysis of numerous variables; 32 quantitative morphological variables were identified and evaluated.
To address this complexity, a machine learning-based approach employing the RF algorithm in conjunction with PCA was introduced. This methodology enabled the identification of the most effective variables for cultivar discrimination. Specifically, the study identified 10 variables per category, of which six were common across all 15 analyzed cultivars.
Among the identified traits, BAC, PLØ, and I2/TP exhibited the highest discriminative power, underscoring their potential importance in defining key structural differences between cultivars, likely associated with attributes such as lobe size, shape, or symmetry.
Preliminary findings of the research demonstrated the utility of machine learning for cultivar discrimination and suggested that automating the acquisition of morphological parameters, such as through a visual machine learning-based system, could further improve classification accuracy. To the best of our knowledge, this study represents the first application of RF combined with PCA in fig cultivar classification, and the results encourage further exploration of this approach for cultivars of other species.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants14030333/s1, Table S1: Percentage length distribution of non-glandular trichomes in the lower and upper leaf epidermis of 15 cultivars of Ficus carica.

Author Contributions

Conceptualization, C.G., T.G. and R.P.; methodology, C.G., L.A., M.R., T.G., D.B. and R.P.; Software and coding, L.A.; data curation, L.A. and M.R.; writing—original draft preparation, C.G., L.A. and R.P.; writing—review and editing, L.A., M.R., T.G. and D.B.; visualization, L.A.; supervision, R.P.; funding acquisition, C.G. and R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Project PRIMA (Partnership for Research and Innovation in the Mediterranean Area) 2022 “More on the adoption of a healthy Mediterranean Diet Acronimo: MoreMedDiet”, financed by MUR.

Data Availability Statement

All data will be provided upon request to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Linné, C.V.; Salvius, L. Caroli Linnaei …Species Plantarum: Exhibentes Plantas Rite Cognitas, Ad Genera Relatas, Cum Differentiis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, Secundum Systema Sexuale Digestas…; Impensis Laurentii Salvii: Holmiae, Sweden, 1753. [Google Scholar]
  2. Langgut, D. The Core Area of Fruit-Tree Cultivation: Central Jordan Valley (Levant), ca. 7000 BP. Palynology 2024, 48, 2347905. [Google Scholar] [CrossRef]
  3. Langgut, D.; Garfinkel, Y. 7000-Year-Old Evidence of Fruit Tree Cultivation in the Jordan Valley, Israel. Sci. Rep. 2022, 12, 7463. [Google Scholar] [CrossRef] [PubMed]
  4. Zohary, D.; Hopf, M.; Weiss, E. Domestication of Plants in the Old World: The Origin and Spread of Domesticated Plants in Southwest Asia, Europe, and the Mediterranean Basin; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  5. Walthall, D.A. Agriculture in Magna Graecia (Iron Age to Hellenistic Period). In A Companion to Ancient Agriculture; Hollander, D., Howe, T., Eds.; Wiley: New York, NY, USA, 2020; pp. 317–341. ISBN 978-1-118-97092-8. [Google Scholar]
  6. Moricca, C. Sacred and Secular Aspects of Phoenicians’ Life at Motya (Sicily, Italy) Inferred by Multidisciplinary Archaeobotanical Analyses. Rendiconti Online Della Soc. Geol. Ital. 2021, 54, 2–8. [Google Scholar] [CrossRef]
  7. Mazzeo, A.; Magarelli, A.; Ferrara, G. The Fig (Ficus carica L.): Varietal Evolution from Asia to Puglia Region, Southeastern Italy. CABI Agric. Biosci. 2024, 5, 57. [Google Scholar]
  8. Food and Agriculture Organization of the United Nations FAOSTAT Statistical Database. Available online: https://www.fao.org/statistics/en (accessed on 24 November 2024).
  9. Desa, W.N.M.; Mohammad, M.; Fudholi, A. Review of Drying Technology of Fig. Trends Food Sci. Technol. 2019, 88, 93–103. [Google Scholar] [CrossRef]
  10. Ferrara, G.; Mazzeo, A.; Colasuonno, P.; Marcotuli, I. Production and Growing Regions. In The Fig: Botany, Production and Uses; CABI GB: New York, NY, USA, 2022; pp. 47–92. [Google Scholar]
  11. Italian Parliament Official Gazette of 21/1/2017 Gen. Series No. 17. 2017. Available online: https://www.gazzettaufficiale.it/atto/serie_generale/caricaDettaglioAtto/originario?atto.dataPubblicazioneGazzetta=2017-01-21&atto.codiceRedazionale=17A00355&elenco30giorni=false (accessed on 24 November 2024).
  12. Bonamici, M.; Rosselli, L.; Taccola, E. Il Santuario Dell’Acropoli Di Volterra; DISCI-Archelogia, 2017; pp. 51–74. Available online: https://www.researchgate.net/publication/320877556_Il_santuario_dell’acropoli_di_Volterra (accessed on 24 November 2024).
  13. Caneva, G.; Zangari, G.; Lazzara, A.; D’Amato, L.; Maras, D.F. Trees and the Significance of Sacred Grove Imagery in Etruscan Funerary Paintings at Tarquinia (Italy). Rendiconti Lincei Sci. Fis. E Nat. 2024, 35, 637–654. [Google Scholar] [CrossRef]
  14. Rattighieri, E.; Rinaldi, R.; Bowes, K.; Mercuri, A.M.; Bowes, K. Land Use from Seasonal Archaeological Sites: The Archaeobotanical Evidence of Small Roman Farmhouses in Cinigiano, South-Eastern Tuscany-Central Italy. Ann. Bot. 2013, 3, 207–215. [Google Scholar]
  15. Giachi, G.; Bettazzi, F.; Chimichi, S.; Staccioli, G. Chemical Characterisation of Degraded Wood in Ships Discovered in a Recent Excavation of the Etruscan and Roman Harbour of Pisa. J. Cult. Herit. 2003, 4, 75–83. [Google Scholar] [CrossRef]
  16. Mariotti Lippi, M.; Bellini, C.; Mori Secci, M.; Gonnelli, T.; Pallecchi, P. Archaeobotany in Florence (Italy): Landscape and Urban Development from the Late Roman to the Middle Ages. Plant Biosyst. Int. J. Deal. Asp. Plant Biol. 2015, 149, 216–227. [Google Scholar] [CrossRef]
  17. Del Riccio, A. Firenze, Università Degli Studi, Biblioteca Biomedica, Fondo Ant., MSS.R.210.2_1. Available online: https://www.internetculturale.it/it/16/search?instance=magindice&q=&qq=%28%28typeTipo%3A%22manoscritti%22%29+OR+%28typeTipo%3A%22manoscritto%22%29%29&__meta_agency=it%3A+unfi%2C+universita+degli+studi+di+firenze%2C+sistema+bibliotecario+di+ateneo&pag=8 (accessed on 24 November 2024).
  18. Baldini, E. Alcuni Aspetti Della Coltura Del Fico Nella Provincia Di Firenze. Riv. Ortoflorofruttic. Ital. 1953, 185–203. Available online: https://www.jstor.org/stable/42872468 (accessed on 24 November 2024).
  19. Targioni-Tozzetti, O. Lezioni di Agricoltura Specialmente Toscana—Tomo III.; Piatti: Firenze, Italy, 1802; Volume III. [Google Scholar]
  20. Gallesio, G.; Baldini, E. Il Commercio Della Frutta Negli Scritti di Giorgio Gallesio; Accademia dei Georgofili: Firenze, Italy, 2003. [Google Scholar]
  21. Rodolfi, M.; Ganino, T.; Chiancone, B.; Petruccelli, R. Identification and Characterization of Italian Common Figs (Ficus carica) Using Nuclear Microsatellite Markers. Genet. Resour. Crop Evol. 2018, 65, 1337–1348. [Google Scholar] [CrossRef]
  22. Giraldo, E.; López-Corrales, M.; Hormaza, J.I. Selection of the Most Discriminating Morphological Qualitative Variables for Characterization of Fig Germplasm. J. Am. Soc. Hortic. Sci. 2010, 135, 240–249. [Google Scholar] [CrossRef]
  23. Abdelkader, F.; Laiadi, Z.; Boso, S.; Santiago, J.-L.; Gago, P.; Martínez, M.-C. Algerian Fig Trees: Botanical and Morphometric Leaf Characterization. Horticulturae 2023, 9, 612. [Google Scholar] [CrossRef]
  24. Nuzzo, V.; Gatto, A.; Montanaro, G. Morphological Characterization of Some Local Varieties of Fig (Ficus carica L.) Cultivated in Southern Italy. Sustainability 2022, 14, 15970. [Google Scholar] [CrossRef]
  25. Ciarmiello, L.F.; Piccirillo, P.; Carillo, P.; De Luca, A.; Woodrow, P. Determination of the Genetic Relatedness of Fig (Ficus carica L.) Accessions Using RAPD Fingerprint and Their Agro-Morphological Characterization. S. Afr. J. Bot. 2015, 97, 40–47. [Google Scholar] [CrossRef]
  26. Azlah, M.A.F.; Chua, L.S.; Rahmad, F.R.; Abdullah, F.I.; Wan Alwi, S.R. Review on Techniques for Plant Leaf Classification and Recognition. Computers 2019, 8, 77. [Google Scholar] [CrossRef]
  27. Falaschetti, L.; Manoni, L.; Di Leo, D.; Pau, D.; Tomaselli, V.; Turchetti, C. A CNN-Based Image Detector for Plant Leaf Diseases Classification. HardwareX 2022, 12, e00363. [Google Scholar] [CrossRef] [PubMed]
  28. Hastie, T.; Tibshirani, R.; Friedman, J. Random Forests. In The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2009; pp. 587–604. ISBN 978-0-387-84857-0. [Google Scholar]
  29. Cutler, A.; Cutler, D.R.; Stevens, J.R. Random Forests. In Ensemble Machine Learning; Zhang, C., Ma, Y., Eds.; Springer: New York, NY, USA, 2012; pp. 157–175. ISBN 978-1-4419-9325-0. [Google Scholar]
  30. Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  31. Ljubobratović, D.; Vuković, M.; Brkić Bakarić, M.; Jemrić, T.; Matetić, M. Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction. Electronics 2021, 10, 3115. [Google Scholar] [CrossRef]
  32. Ayala-Niño, D.; González-Camacho, J.M. Evaluation of Machine Learning Models to Identify Peach Varieties Based on Leaf Color. Agrociencia 2022, 56, 1–17. [Google Scholar] [CrossRef]
  33. Ropelewska, E.; Rutkowski, K.P. Differentiation of Peach Cultivars by Image Analysis Based on the Skin, Flesh, Stone and Seed Textures. Eur. Food Res. Technol. 2021, 247, 2371–2377. [Google Scholar] [CrossRef]
  34. International Plant Genetic Resources Institute (IPGRI). AA.VV Descriptors for Fig (Ficus carica L.); International Plant Genetic Resources Institute (IPGRRI): Rome, Italy, 2003. [Google Scholar]
  35. Descriptors for Fig: Ficus carica; IPGRI: Rome, Italy; CIHEAM: Paris, France, 2003; ISBN 978-92-9043-598-3.
  36. Mubo, S.A.; Adeniyi, J.A.; Adeyemi, E. A Morphometric Analysis of the Genus Ficus Linn. (Moraceae). Afr. J. Biotechnol. 2004, 3, 229–235. [Google Scholar] [CrossRef]
  37. Jangam, A.; Jadhav, S.; Sutar, V.; Onkar, R.; Deshmukh, S. Leaf Morphometric Studies in Some Species of Ficus L. Res. Rev. J. Bot. 2017, 6, 29–31. [Google Scholar]
  38. Wagner, G.; Wang, E.; Shepherd, R. New Approaches for Studying and Exploiting an Old Protuberance, the Plant Trichome. Ann. Bot. 2004, 93, 3. [Google Scholar] [CrossRef] [PubMed]
  39. Tattini, M.; Matteini, P.; Saracini, E.; Traversi, M.L.; Giordano, C.; Agati, G. Morphology and Biochemistry of Non-Glandular Trichomes in Cistus salvifolius L. Leaves Growing in Extreme Habitats of the Mediterranean Basin. Plant Biol. 2007, 9, 411–419. [Google Scholar] [CrossRef] [PubMed]
  40. Bei, Z.; Zhang, X.; Zhang, F.; Yan, X. The Response of Oxytropis aciphylla Ledeb. Leaf Interface to Water and Light in Gravel Deserts. Plants 2023, 12, 3922. [Google Scholar] [CrossRef]
  41. Vanhoutte, B.; Schenkels, L.; Ceusters, J.; De Proft, M.P. Water and Nutrient Uptake in Vriesea Cultivars: Trichomes vs. Roots. Environ. Exp. Bot. 2017, 136, 21–30. [Google Scholar] [CrossRef]
  42. Wang, D.; Liang, X.; Mofack, G.I.; Martin-Ducup, O. Individual Tree Extraction from Terrestrial Laser Scanning Data via Graph Pathing. For. Ecosyst. 2021, 8, 67. [Google Scholar] [CrossRef]
  43. Bickford, C.P. Ecophysiology of Leaf Trichomes. Funct. Plant Biol. 2016, 43, 807–814. [Google Scholar] [CrossRef]
  44. Song, J.-H.; Yang, S.; Choi, G. Taxonomic Implications of Leaf Micromorphology Using Microscopic Analysis: A Tool for Identification and Authentication of Korean Piperales. Plants 2020, 9, 566. [Google Scholar] [CrossRef] [PubMed]
  45. Pinto-Silva, N.P.; De Souza, K.F.; Marques Silva, O.L.; Vitarelli, N.C.; Da Paixão Noronha Pereira, A.; Soares, D.A.; Sodré, R.C.; Medeiros, D.; Caruzo, M.B.R.; Carneiro Torres, D.S.; et al. Trichomes in the Megadiverse Genus Croton (Euphorbiaceae): A Revised Classification, Identification Parameters and Standardized Terminology. Bot. J. Linn. Soc. 2023, 203, 37–49. [Google Scholar] [CrossRef]
  46. Giordano, C.; Maleci, L.; Agati, G.; Petruccelli, R. Ficus carica L. Leaf Anatomy: Trichomes and Solid Inclusions. Ann. Appl. Biol. 2020, 176, 47–54. [Google Scholar] [CrossRef]
  47. Dixon, W.J. Simplified Estimation from Censored Normal Samples. Ann. Math. Stat. 1960, 31, 385–391. [Google Scholar] [CrossRef]
  48. Abdi, H.; Williams, L.J. Tukey’s Honestly Significant Difference (HSD) Test. Encycl. Res. Des. 2010, 3, 1–5. [Google Scholar]
  49. Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Cohen, J., Ed.; Academic Press: New York, NY, USA, 1977; ISBN 978-0-12-179060-8. [Google Scholar]
  50. Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. In Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, INRIA Saclay-Ile de France, Prague, Czech Republic, 23–27 September 2013; pp. 108–122. [Google Scholar]
  51. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  52. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
  53. The Pandas Development Pandas-Dev/Pandas: Pandas 2020. Available online: https://github.com/pandas-dev/pandas/blob/main/CITATION.cff (accessed on 24 November 2024).
  54. Jordahl, K. GeoPandas: Python Tools for Geographic Data. 2014. Available online: https://github.com/geopandas/geopandas (accessed on 24 November 2024).
  55. Seabold, S.; Perktold, J. Statsmodels: Econometric and Statistical Modeling with Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28–30 June 2010. [Google Scholar]
  56. Vallat, R. Pingouin: Statistics in Python. J. Open Source Softw. 2018, 3, 1026. [Google Scholar] [CrossRef]
  57. Waskom, M. Seaborn: Statistical Data Visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
  58. Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
  59. Podgornik, M.; Vuk, I.; Vrhovnik, I.; Mavsar, D.B. A Survey and Morphological Evaluation of Fig (Ficus carica L.) Genetic Resources from Slovenia. Sci. Hortic. 2010, 125, 380–389. [Google Scholar] [CrossRef]
  60. Khadivi, A.; Mirheidari, F. Phenotypic Variability of Fig (Ficus carica L.). In Fig (Ficus carica L.): Production, Processing, and Properties; Ramadan, M.F., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 129–174. ISBN 978-3-031-16492-7. [Google Scholar]
Figure 1. Bimbi Bartolomeo, 1696, Figs, oil on canvas. Prato, Poggio a Caiano, Museo della Natura Morta. Credit: “By concession of the Ministero della Cultura—Gabinetto Fotografico delle Gallerie degli Uffizi”. Inv. Oggetti d’Arte Castello (1911) n. 614.
Figure 1. Bimbi Bartolomeo, 1696, Figs, oil on canvas. Prato, Poggio a Caiano, Museo della Natura Morta. Credit: “By concession of the Ministero della Cultura—Gabinetto Fotografico delle Gallerie degli Uffizi”. Inv. Oggetti d’Arte Castello (1911) n. 614.
Plants 14 00333 g001
Figure 2. Comparative analysis of descriptor importance rankings, as measured by Gini importance, for distinguishing between three-lobed (a) and five-lobed (b) cultivars. The rankings are derived from a Random Forest classifier and highlight the most influential descriptors contributing to classification accuracy for each group. The embedded plots (a1,b1) show the performance curve in the ensemble increases. It demonstrates a saturation point where increasing the number of trees no longer significantly improves the model’s performance. These values are 7 for three-lobed cultivars and 6 for the five-lobed. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Figure 2. Comparative analysis of descriptor importance rankings, as measured by Gini importance, for distinguishing between three-lobed (a) and five-lobed (b) cultivars. The rankings are derived from a Random Forest classifier and highlight the most influential descriptors contributing to classification accuracy for each group. The embedded plots (a1,b1) show the performance curve in the ensemble increases. It demonstrates a saturation point where increasing the number of trees no longer significantly improves the model’s performance. These values are 7 for three-lobed cultivars and 6 for the five-lobed. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Plants 14 00333 g002
Figure 3. Heatmaps of p-values obtained from Tukey’s HSD test applied to the ten most significant descriptors of three-lobed (ai,l) and five-lobed (mv) Ficus carica cultivars. The selected descriptors for three-lobed cultivars are as follows: (a) BAC; (b) I2/L2; (c) α; (d) PL/L1; (e) I2_TP; (f) CLL; (g) I2y; (h) CLL/H; (i) I2x; (l) PL/H. The selected descriptors for five-lobed cultivars include: (m) BAC; (n) PLØ; (o) I2y; (p) H; (q) I2_TP; (r) PL/L1; (s) PL; (t) I3/L3; (u) β; (v) WxH. Significance levels are reported in legend by different colors. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Figure 3. Heatmaps of p-values obtained from Tukey’s HSD test applied to the ten most significant descriptors of three-lobed (ai,l) and five-lobed (mv) Ficus carica cultivars. The selected descriptors for three-lobed cultivars are as follows: (a) BAC; (b) I2/L2; (c) α; (d) PL/L1; (e) I2_TP; (f) CLL; (g) I2y; (h) CLL/H; (i) I2x; (l) PL/H. The selected descriptors for five-lobed cultivars include: (m) BAC; (n) PLØ; (o) I2y; (p) H; (q) I2_TP; (r) PL/L1; (s) PL; (t) I3/L3; (u) β; (v) WxH. Significance levels are reported in legend by different colors. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Plants 14 00333 g003
Figure 4. Cluster analysis of three-lobed cultivars based on ten descriptors selected by the Random Forest algorithm. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Figure 4. Cluster analysis of three-lobed cultivars based on ten descriptors selected by the Random Forest algorithm. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g004
Figure 5. Cluster analysis of five-lobed cultivars based on ten descriptors selected by the Random Forest algorithm. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Figure 5. Cluster analysis of five-lobed cultivars based on ten descriptors selected by the Random Forest algorithm. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g005
Figure 6. Box plots of the six most significant variables: (a) PL; (b) I2y; (c) PL/L1; (d) I2_TP; (e) WxH; (f) BAC. The letters a–g, represent the significantly difference at (p > 0.005). Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Figure 6. Box plots of the six most significant variables: (a) PL; (b) I2y; (c) PL/L1; (d) I2_TP; (e) WxH; (f) BAC. The letters a–g, represent the significantly difference at (p > 0.005). Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Plants 14 00333 g006
Figure 7. Result of a hierarchical cluster analysis based on the six most significant variables present in all the cultivars. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Figure 7. Result of a hierarchical cluster analysis based on the six most significant variables present in all the cultivars. The clustering method appears to group the entities based on their pairwise distances, as represented along the vertical axis (Distance). The horizontal axis lists the labels of the analyzed cultivars, and the branching structure indicates the hierarchical relationships among them. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g007
Figure 8. Confusing matrix derived from to RF classifier. The codes of the cultivars are reported in Table 1.
Figure 8. Confusing matrix derived from to RF classifier. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g008
Figure 9. 3D representation of PCA where the three dimensions are defined by the first three principal components. The descriptors used are: WxH; PL; I2_tp; I2y; PL/L1; BAC. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Figure 9. 3D representation of PCA where the three dimensions are defined by the first three principal components. The descriptors used are: WxH; PL; I2_tp; I2y; PL/L1; BAC. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Plants 14 00333 g009
Figure 10. Distribution of trichomes by length classes. (a) upper epidermis. (b) lower epidermis. Class n. 1: 0.1–99 µm; class n. 2: 100–140 µm; class n. 3: 140.1–159 µm; class n. 4: 160–239 µm; class n. 5: 240–299 µm; class n. 6: 300–319 µm; class n. 7: 320–332 µm; class n. 8: 334–358 µm; class n. 9: 360–398 µm; class n. 10: 412–438 µm; class n. 11: 450–477 µm.
Figure 10. Distribution of trichomes by length classes. (a) upper epidermis. (b) lower epidermis. Class n. 1: 0.1–99 µm; class n. 2: 100–140 µm; class n. 3: 140.1–159 µm; class n. 4: 160–239 µm; class n. 5: 240–299 µm; class n. 6: 300–319 µm; class n. 7: 320–332 µm; class n. 8: 334–358 µm; class n. 9: 360–398 µm; class n. 10: 412–438 µm; class n. 11: 450–477 µm.
Plants 14 00333 g010
Figure 11. ESEM photographs. Trichomes on abaxial epidermis of Ficus carica and relative circle chart of the percentage distribution of hairs in 11 trichome length classes. The cultivars are (a): AL; (b): BC; (c): BB; (d): BN; (e): CO; (f): DO; (g): FI; (h): GI; (i): PA; (l): PB; (m): PN; (n): PE; (o): PO; (p): SP; (q): VE. Class colors: Plants 14 00333 i001. Class n. 1: 0.1–99 µm; class n. 2: 100–140 µm; class n. 3: 140.1–159 µm; class n. 4: 160–239 µm; class n. 5: 240–299 µm; class n. 6: 300–319 µm; class n. 7: 320–332 µm; class n. 8: 334–358 µm; class n. 9: 360–398 µm; class n. 10: 412–438 µm; class n. 11: 450–477 µm. The codes of the cultivars are reported in Table 1.
Figure 11. ESEM photographs. Trichomes on abaxial epidermis of Ficus carica and relative circle chart of the percentage distribution of hairs in 11 trichome length classes. The cultivars are (a): AL; (b): BC; (c): BB; (d): BN; (e): CO; (f): DO; (g): FI; (h): GI; (i): PA; (l): PB; (m): PN; (n): PE; (o): PO; (p): SP; (q): VE. Class colors: Plants 14 00333 i001. Class n. 1: 0.1–99 µm; class n. 2: 100–140 µm; class n. 3: 140.1–159 µm; class n. 4: 160–239 µm; class n. 5: 240–299 µm; class n. 6: 300–319 µm; class n. 7: 320–332 µm; class n. 8: 334–358 µm; class n. 9: 360–398 µm; class n. 10: 412–438 µm; class n. 11: 450–477 µm. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g011
Figure 12. Distribution of observed frequencies of classes alongside modeled Poisson distributions for various cultivars, each characterized by a specific λ parameter.
Figure 12. Distribution of observed frequencies of classes alongside modeled Poisson distributions for various cultivars, each characterized by a specific λ parameter.
Plants 14 00333 g012
Figure 13. Statistical significance derived using Likelihood Ratio Test (LRT) in trichome distribution patterns. LRT statistical significance evaluated against a chi-squared distribution with two degrees of freedom. Significance levels are reported in legend by different colors. The codes of the cultivars are reported in Table 1.
Figure 13. Statistical significance derived using Likelihood Ratio Test (LRT) in trichome distribution patterns. LRT statistical significance evaluated against a chi-squared distribution with two degrees of freedom. Significance levels are reported in legend by different colors. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g013
Figure 14. Dendrogram of the studied cultivars, derived using Lambda value of the respective Poisson distribution and density values. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Figure 14. Dendrogram of the studied cultivars, derived using Lambda value of the respective Poisson distribution and density values. For each node in the dendrogram, the reported bootstrap values provide a measure of clustering reliability by assessing the stability of the clusters formed. The codes of the cultivars are reported in Table 1.
Plants 14 00333 g014
Figure 15. Morphometrical descriptors used in the study for (a) three-lobed leaf and (b) five-lobed leaf. The codes of leaf descriptors are reported in.
Figure 15. Morphometrical descriptors used in the study for (a) three-lobed leaf and (b) five-lobed leaf. The codes of leaf descriptors are reported in.
Plants 14 00333 g015
Table 1. Studied fig cultivars, abbreviations (code), qualitative descriptors included in morphological analysis.
Table 1. Studied fig cultivars, abbreviations (code), qualitative descriptors included in morphological analysis.
CultivarCode Leaf Margin Shape of
Central Lobe
Shape of Leaf BasLittle Lobe in Central Lobe (%) Little Lobe in Lateral Lobe (%) N. of Lobes
ALBO AL crenate lyrate-
lanceolate
calcarate-cordate 11.11 44.44 5
BIANCO DI
CARMIGNANO
BC crenate lanceolate truncate -cordate 31.58 25.00 3 *
BROGIOTTO BIANCO BB crenate/dentato lanceolate cordate-
calcarate
25.00 20.00 3 *
BROGIOTTO
NERO
BN crenate lanceolate-
romboidale
cordate
calcarate-truncate
15.00 5.00 3
CORBOCO crenate lanceolate-
lyrate
calcarate68.42 55.00 5
DOTTATODO crenate lanceolate cordate 65.00 5.00 3 *
FIORONEFI crenate lanceolate cordate 18.18 100.00 3
GIGANTE DI
CARMIGNANO
GI crenate lanceolate cordate-truncate65.00 31.58 3 *
PARADISO PA crenate lanceolate calcarate 20.83 8.70 3
PECCIOLOBIANCOPB undulate/crenate lanceolate cordate-calcarate30.00 25.00 3 *
PECCIOLO
NERO
PN crenate linear cordate-calcarate65.00 50.00 5
PERTICONEPE crenate lanceolate-
spatulate
cordate 45.45 59.09 5
PORTOGALLO PO crenate lanceolatetruncate 82.35 78.57 3
SAN PIERO SP crenate lanceolatedecurrente 45.00 15.00 3
VERDINO VE crenate lanceolatetruncate22.22 33.33 3
* Predominant type founded in the analyzed cultivars.
Table 2. Main statistic parameters of descriptor, sorted by CV. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Table 2. Main statistic parameters of descriptor, sorted by CV. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
DescriptorNo
Sample
MeanStandard
Error
Standard
Deviation
MedianMinMax1st
Quartile
3th
Quartile
CV
L3y (cm)1342.70.21.92.5−3.89.81.53.872.5%
BAC (°)287117.83.152.7110.015.2270.085.0157.544.6%
I3y (cm)1343.30.11.23.30.56.52.44.036.6%
WxH (cm2)287483.69.8165.2460.0185.01156.0371.9567.834.1%
I2x (cm)2863.30.11.03.21.56.82.54.031.5%
I3x (cm)1346.50.11.66.42.610.55.57.524.2%
L3 (cm)13410.50.22.510.45.517.78.512.523.8%
I2_TP (cm)2879.80.12.39.84.416.08.211.523.5%
I2y (cm)2869.20.12.29.24.015.47.511.023.5%
L3x (cm)13410.00.22.310.05.215.18.011.823.0%
I2/L22870.60.00.10.60.30.90.50.722.4%
I3_TP (cm)1347.40.11.67.43.511.36.28.822.1%
CLL (cm)28712.00.22.612.06.019.310.014.021.8%
Pl Ø (cm)2870.60.00.10.50.31.00.50.621.3%
PL (cm)2878.80.11.88.55.314.07.510.021.0%
PL/H2870.40.00.10.40.20.60.30.419.1%
L2y (cm)28713.50.22.613.37.020.811.715.219.0%
L2x (cm)2879.10.11.69.05.513.58.010.018.1%
PL/L12870.40.00.10.40.30.70.40.518.1%
W (cm)28720.30.23.620.012.533.317.922.517.8%
α (°)28739.80.47.140.020.062.135.045.017.7%
L3/L11340.50.00.10.50.30.70.40.517.3%
Zx (cm)2874.70.00.84.62.57.04.05.017.3%
I3/L31340.70.00.10.70.51.00.60.817.3%
H (cm)28723.30.23.923.015.037.020.525.616.9%
L2_TP (cm)28716.40.22.716.39.724.114.418.216.6%
R (cm)1360.60.00.10.60.40.80.50.716.5%
Zy (cm)28714.10.12.313.97.720.012.515.616.3%
CLL/H2760.50.00.10.50.30.80.50.616.2%
Z_TP (cm)28714.90.12.314.69.320.813.216.515.4%
L1 (cm)28721.30.23.021.314.229.019.223.114.1%
β (°)13342.30.55.342.030.057.640.045.012.5%
L2/L12870.80.00.10.80.51.00.70.88.4%
Table 3. Leaf descriptors of three-lobed cultivars. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2–Morphological Descriptors.
Table 3. Leaf descriptors of three-lobed cultivars. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2–Morphological Descriptors.
DescriptorBCBBBNDOFIGIPAPOPBSPVE
PL (cm)7.44 ± 1.2 b9.57 ± 1.7 a9.54 ± 1.9 a8.52 ± 1.5 a–c6.95 ± 1.5 c8.74 ± 1.0 a–c9.49 ± 1.5 a8.52 ± 1.9 a–c8.02 ± 1.4 a–c8.62 ± 1.4 a–c8.93 ± 1.8 ab
PL Ø (cm)0.55 ± 0.1 ns0.51 ± 0.06 ns0.53 ± 0.07 ns0.64 ± 0.1 ns0.49 ± 0.1 ns0.54 ± 0.09 ns0.60 ± 0.1 ns0.50 ± 0.1 ns0.70 ± 0.08 ns0.51 ± 0.09 ns0.47 ± 0.08 ns
L1 (cm)19.6 ± 2.5 c–e24.0 ± 2.2 a19.8 ± 2.8 c–e22.8 ± 1.8 ab18.4 ± 1.9 e21.8 ± 2.7 a–d19.6 ± 2.3 de20.7 ± 1.8 b–e21.6 ± 2.9 a–d22.1 ± 2.3 a–c20.1 ± 2.0 c–e
I2 x (cm)2.83 ± 0.8 c2.89 ± 0.4 c4.39 ± 1.1 a3.88 ± 0.6 ab3.25 ± 0.8 bc3.42 ± 1.1 bc3.31 ± 0.7 bc3.27 ± 0.4 bc3.26 ± 1.2 bc3.98 ± 1.1 ab4.05 ± 0.9 ab
I2 y (cm)7.76 ± 1.1 f8.94 ± 1.2 d–f9.38 ± 2.1 c–e10.9 ± 1.3 ab9.37 ± 1.5 c–f9.75 ± 2.3 b–d8.82 ± 1.6 d–f11.3 ± 1.1 ab7.91 ± 1.3 ef11.4 ± 1.3 a10.9 ± 1.0 a–c
I2_TP (cm)8.30 ± 1.1 d9.40 ± 1.2 cd10.4 ± 2.1 bc11.6 ± 1.3 ab9.93 ± 1.6 b–d10.4 ± 2.4 bc9.43 ± 1.7 cd11.8 ± 1.1 ab8.61 ± 1.5 d12.1 ± 1.5 a11.6 ± 1.5 ab
L2 x (cm)8.69 ± 1.4 b–d10.5 ± 1.2 a9.26 ± 1.2 a–d9.94 ± 1.2 ab7.52 ± 0.9 d9.39 ± 1.7 a–c8.21 ± 1.1 cd7.89 ± 1.3 cd10.5 ± 1.4 a8.65 ± 1.6 b–d8.04 ± 1.2 cd
L2 y (cm)11.6 ± 2.2 c15.0 ± 2.0 ab12.3 ± 2.0 c15.9 ± 1.8 a11.6 ± 1.7 c12.3 ± 2.2 c12.3 ± 2.0 c13.7 ± 1.8 a–c12.7 ± 2.6 bc13.4 ± 1.7 bc13.0 ± 1.1 bc
L2_TP (cm)14.5 ± 2.3 c18.3 ± 2.1 ab15.5 ± 2.2 c18.8 ± 1.9 a13.8 ± 1.8 c15.7 ± 2.9 c14.8 ± 2.2 c15.9 ± 1.9 bc16.6 ± 3.0 a–c15.9 ± 2.1 bc15.3 ± 1.8 c
Z x (cm)4.27 ± 0.6 b5.11 ± 0.7 a5.09 ± 0.6 a4.31 ± 0.5 b5.05 ± 0.6 ab4.69 ± 0.9 ab4.37 ± 0.7 b4.48 ± 0.7 ab4.89 ± 0.6 ab4.88 ± 0.4 ab4.73 ± 0.7 ab
Zy (cm)14.2.3 ± 2.4 ab13.5 ± 1.6 ab14.5 ± 2.3 ab13.4 ± 2.0 ab15.7 ± 1.8 a13.4 ± 2.4 ab13.0 ± 2.44 b13.0 ± 1.2 ab12.5 ± 2.7 b14.2 ± 0.8 ab11.1 ± 1.9 b
Z_TP (cm)14.4 ± 3.2 a–c14.5 ± 1.615.4 ± 2.1 ab14.1 ± 1.9 a–c16.5 ± 1.7 a14.2 ± 2.6 a–c13.7 ± 1.7 bc14.4 ± 1.2 a–c13.0 ± 3.2 c15.0 ± 2.1 a–c13.9 ± 1.8 a–c
CLL (cm)11.8 ± 2.1 c15.1 ± 1.3 a10.4 ± 2.2 cd11.8 ± 1.7 c8.99 ± 1.1 d12.1 ± 1.9 bc10.8 ± 1.4 cd9.46 ± 1.4 d13.7 ± 2.6 ab10.8 ± 1.6 cd9.22 ± 1.8 d
W (cm)18.3 ± 1.9 cd22.7 ± 3.4 a19.3 ± 2.6 b–d21.5 ± 3.2 a–c16.9 ± 2.5 d21.7 ± 3.4 ab18.6 ± 2.5 cd18.3 ± 2.1 d21.6 ± 3.2 a–c19.3 ± 3.0 b–d17.8 ± 3.3 d
H (cm)21.2 ± 2.7 bc25.5 ± 3.5 a22.0 ± 2.9 bc25.6 ± 3.0 a19.7 ± 3.1 c23.7 ± 3.0 ab21.7 ± 2.8 bc21.1 ± 2.0 bc23.9 ± 4.0 ab22.6 ± 2.4 a–c21.4 ± 2.6 bc
BAC (°)140.5 ± 45 c113.6 ± 36 ef100.0 ± 37 cd98.7 ± 36 d110.4 ± 42 ef120.2 ± 37 cd91.4 ± 18 d183.3 ± 17 b103.1 ± 25 d226.1 ± 32 a145.8 ± 39 bc
α (°)40.1 ± 4.1 bc41.3 ± 4.2 bc41.9 ± 6.7 b36.6 ± 4.3 bc38.4 ± 6.2 b–d42.5 ± 7.5 b39.6 ± 4.7 bc31.4 ± 3.1 de49.4 ± 6.3 a29.0 ± 6.4 e36.0 ± 5.2 cd
CLL/H0.56 ± 0.06 ab0.60 ± 0.06 a0.47 ± 0.07 cd0.46 ± 0.05 cd0.46 ± 0.05 cd0.51 ± 0.07 bc0.50 ± 0.05 bc0.45 ± 0.04 cd0.57 ± 0.07 a0.48 ± 0.04 cd0.43 ± 0.07 d
PL/H 0.35 ± 0.05 bc0.38 ± 0.08 a0.44 ± 0.09 a0.32 ± 0.04 c0.35 ± 0.03 a–c0.37 ± 0.07 a–c0.44 ± 0.07 a0.40 ± 0.08 a–c0.35 ± 0.1 bc0.38 ± 0.04 a–c0.42 ± 0.09 ab
WxH (cm2)396.2 ± 82 de580.6 ± 122 a433.9 ± 95 c–e558.9 ± 124 ab338.3 ± 91 e522.3 ± 141 a–d410.5 ± 103 c–e390.8 ± 103 c–e527.5 ± 147 a–c442.4 ± 112 b–e388.5 ± 86 e
PL/L10.38 ± 0.07 b0.39 ± 0.07 b0.48 ± 0.1 a0.37 ± 0.05 b0.37 ± 0.03 b0.40 ± 0.08 b0.48 ± 0.07 a0.41 ± 0.09 ab0.38 ± 0.1 b0.39 ± 0.05 b0.44 ± 0.09 ab
L2/L10.74 ± 0.06 b0.76 ± 0.04 ab0.78 ± 0.08 ab0.82 ± 0.04 a0.75 ± 0.04 ab0.72 ± 0.08 b0.76 ± 0.09 ab0.76 ± 0.06 ab0.77 ± 0.09 ab0.72 ± 0.05 b0.76 ± 0.05 ab
I2/L20.58 ± 0.1 d–f0.51 ± 0.04 f0.67 ± 0.1 a–d0.62 ± 0.06 c–e0.72 ± 0.08 a–c0.66 ± 0.1 a–d0.64 ± 0.1 b–d0.74 ± 0.08 ab0.52 ± 0.08 ef0.76 ± 0.08 a0.77 ± 0.1 a
Data are the means ± SD (n = 20). In each column different letters represent significant differences (p < 0.05) according to Tukey’s HSD-test.
Table 4. Leaf descriptors of five-lobed cultivars. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Table 4. Leaf descriptors of five-lobed cultivars. Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
DescriptorALCOPNPE
PL (cm)9.07 ± 1.7 b11.6 ± 1.6 a8.01 ± 1.3 bc7.58 ± 1.3 c
PL Ø (cm)0.70 ± 0.09 a0.72 ± 0.13 a0.46 ± 0.07 b0.47 ± 0.09 b
L1 (cm)23.1 ± 2.4 a24.9 ± 2.6 a18.9 ± 2.6 b20.3 ± 2.8 b
I2x (cm)3.29 ± 1.5 a3.40 ± 0.9 a2.21 ± 0.5 b2.41 ± 0.7 b
I2y (cm)8.68 ± 1.9 ab10.5 ± 2.9 a5.91 ± 0.64 c7.82 ± 2.1 b
I2_TP (cm)9.32 ± 1.2 ab12.0 ± 3.0 a6.32 ± 0.7 c8.21 ± 2.1 b
L2x (cm)9.34 ± 1.89 ab10.7 ± 1.7 a8.43 ± 1.3 b8.56 ± 1.4 b
L2y (cm)14.3 ± 3.7 ab16.4 ± 2.5 a12.9 ± 2.8 b13.9 ± 2.7 b
L2 _TP (cm)17.5 ± 2.8 b19.6 ± 2.5 a15.4 ± 2.1 b16.4 ± 2.9 b
I3x (cm)7.31 ± 1.6 b8.01 ± 1.0 a4.89 ± 0.9 d5.95 ± 1.2 c
I3y (cm)3.21 ± 1.8 b3.60 ± 0.6 ab2.88 ± 0.7 b4.28 ± 1.2 a
I3_TP (cm)8.18 ± 1.6 ab8.86 ± 0.9 a5.71 ± 1.0 c7.37 ± 1.5 b
L3x (cm)10.9 ± 2.4 ab12.2 ± 1.7 a9.31 ± 1.5 b10.9 ± 2.2 ab
L3y (cm)2.76 ± 3.9 ab2.95 ± 1.3 ab2.88 ± 1.0 ab4.12 ± 1.6 a
L3_TP (cm)11.4 ± 2.2 ab12.6 ± 1.8 a9.9 ± 1.6 b11.7 ± 2.4 ab
Zx (cm)4.79 ± 1.4 a5.17 ± 0.8 a3.81 ± 0.7 b4.46 ± 0.5 ab
Zy (cm)15.1 ± 2.7 ab16.3 ± 2.3 a13.7 ± 2.0 b14.0 ± 1.9 b
Z_TP (cm)16.1 ± 2.5 ab17.1 ± 2.2 a14.2 ± 1.9 b14.9 ± 1.9 b
CLL (cm)14.3 ± 2.4 ab14.4 ± 2.4 a12.9 ± 2.3 ab12.5 ± 2.0 b
W (cm)23.7 ± 3.2 ab24.6 ± 4.4 a18.5 ± 2.5 c20.7 ± 3.7 bc
H (cm)26.1 ± 2.7 b30.5 ± 3.9 a20.5 ± 2.7 d22.4 ± 3.3 cd
BAC (°)91.5 ± 13 b34.5 ± 12.8 c113.4 ± 37 a114.8 ± 23 a
α (°)44.1 ± 8.0 ab44.6 ± 4.0 a39.7 ± 5.3 bc37.9 ± 4.5 c
β (°)41.4 ± 4.0 bc45.4 ± 2.3 a42.0 ± 5.2 ab37.5 ± 4.1 c
CLL/H0.56 ± 0.08 b0.48 ± 0.08 c0.63 ± 0.05 a0.56 ± 0.07 b
PL/H0.35 ± 0.05 b0.35 ± 0.07 b0.39 ± 0.05 a0.34 ± 0.04 b
WxH (cm2)623.6 ± 136 a760.2 ± 223 a385.9 ± 116 b478.9 ± 145 b
PL/L10.39 ± 0.06 b0.50 ± 0.09 a0.43 ± 0.05 ab0.38 ± 0.04 b
L2/L10.75 ± 0.13 ns0.79 ± 0.07 ns0.82 ± 0.06 ns0.81 ± 0.07 ns
L3/L10.50 ± 0.21 b0.51 ± 0.23 b0.52 ± 0.17 ab0.58 ± 0.06 a
I2/L20.56 ± 0.17 a0.56 ± 0.13 a0.41 ± 0.04 b0.50 ± 0.06 ab
I3/L30.72 ± 0.04 a0.71 ± 0.09 a0.59 ± 0.05 b0.64 ± 0.06 ab
R0.69 ± 0.19 a0.62 ± 0.10 ab0.48 ± 0.05 c0.55 ± 0.06 bc
Data are the means ± SD (n = 20). In each column different letters represent significant differences (p < 0.05) according to Tukey’s HSD -test.
Table 5. Results of an inferential statistical analysis, summarizing the outcomes of comparisons between groups or variables. The columns include the following: T: t-Student test statistic; p-val: p-value, indicating the statistical significance of the result; CI95%: 95% confidence interval, representing the range within which the true difference is likely to lie; Effect Size: a measure of the magnitude of the observed effect or difference between groups; BF10: Bayes Factor, comparing the likelihood of the alternative hypothesis to the null hypothesis; Power: statistical power, representing the probability of detecting a true effect if it exists; Class: classification of the effect size (e.g., Huge, Very large, Very small). The Table is divided into two sections: High Effect Size: reports comparisons with the three most significant effects (large effect sizes and very low p-values); Low Effect Size: includes comparisons with the last three smallest or negligible effects (very low effect sizes and high p-values). Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Table 5. Results of an inferential statistical analysis, summarizing the outcomes of comparisons between groups or variables. The columns include the following: T: t-Student test statistic; p-val: p-value, indicating the statistical significance of the result; CI95%: 95% confidence interval, representing the range within which the true difference is likely to lie; Effect Size: a measure of the magnitude of the observed effect or difference between groups; BF10: Bayes Factor, comparing the likelihood of the alternative hypothesis to the null hypothesis; Power: statistical power, representing the probability of detecting a true effect if it exists; Class: classification of the effect size (e.g., Huge, Very large, Very small). The Table is divided into two sections: High Effect Size: reports comparisons with the three most significant effects (large effect sizes and very low p-values); Low Effect Size: includes comparisons with the last three smallest or negligible effects (very low effect sizes and high p-values). Codes for cvs are reported in Table 1; explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Tp-valCI95%Effect
Size
BF10PowerClass
Choen
High Effect SizeBACCO||SP−25.51.59 × 1025[−209.54 −178.75]8.079.91 × 10211Huge
CO||PO−34.53.00 × 1025[−158.07 −140.4]11.901.83 × 10231Huge
PA||PO−21.24.05 × 1020[−101.3 −83.52]6.803.01 × 10181Huge
I2_TPPN||SP−17.58.91 × 1020[−6.9 −5.47]5.532.72 × 10161Huge
DO||PN16.48.14 × 1019[4.64 5.95]5.183.23 × 10151Huge
PN||PO−17.03.63 × 1013[−6.09 −4.76]6.472.02 × 10141Huge
I2yPN||SP−19.61.95 × 1021[−6.01 −4.88]6.191.09 × 10181Huge
DO||PN16.11.47 × 1018[4.43 5.7]5.091.82 × 10151Huge
PB||SP−10.75.37 × 1013[−4.26 −2.91]3.388.62 × 1091Huge
PLBC||CO−9.05.33 × 1011[−5.05 −3.2]2.861.11 × 1081Huge
CO||FI9.98.65 × 1011[3.66 5.56]3.226.88 × 1071Huge
CO||PE8.71.93 × 1010[3.06 4.91]2.727.26 × 1071Huge
PL_L1CO||PE7.54.9 × 109[0.06 0.11]2.342.16 × 1061Huge
PA||PE7.11.95 × 108[0.07 0.13]2.051.03 × 1061Huge
DO||PA−6.22.13 × 107[−0.14 −0.07]1.865.24 × 1041Very large
WxHBC||CO−6.93.0 × 108[−479.15 −262.56]2.193.02 × 1051Huge
CO||PN6.76.02 × 108[261.43 487.17]2.121.61 × 1051Huge
CO||FI7.27.28 × 108[301.84 542.02]2.221.38 × 1051Huge
Low Effect SizeBACBN||DO0.10.92[−21.5 23.82]0.030.310.0512Very small
AL||PA0.00.96[−8.45 8.87]0.010.3060.0503Very small
BB||PN0.01.00[−22.4 22.35]0.000.3090.0500Negligible
I2_TPBC||PE0.20.81[−0.96 1.21]0.070.310.0558Very small
BB||PA0.00.97[−0.81 0.78]0.010.2990.0502Very small
DO||VE0.00.99[−0.92 0.94]0.010.3150.0500Negligible
I2yBC||PB0.10.93[−0.63 0.69]0.030.310.0509Very small
PB||PE0.00.97[−1.09 1.04]0.010.3040.0502Very small
BC||PE0.00.99[−1.05 1.07]0.000.3030.0500Negligible
PLBB||BN0.10.96[−1.16 1.22]0.020.3090.0503Very small
PB||PN0.00.99[−0.86 0.87]0.000.3090.0500Negligible
DO||PO0.01.00[−1.29 1.29]0.000.3330.0500Negligible
PL_L1BB||GI−0.10.92[−0.04 0.04]0.030.310.0510Very small
FI||PE0.10.94[−0.02 0.03]0.030.3480.0505Very small
AL||SP0.10.94[−0.03 0.04]0.020.3160.0506Very small
WxHPO||VE0.10.94[−57.68 62.31]0.030.3390.0507Very small
BC||PO0.00.96[−59.82 57.02]0.020.3330.0503Very small
BC||VE0.00.97[−55.73 57.56]0.010.3150.0501Very small
Table 6. Classification Report. The codes of the cultivars are reported in Table 1.
Table 6. Classification Report. The codes of the cultivars are reported in Table 1.
CultivarsPrecisionRecallF1-Score
AL0.500.200.29
BC0.400.400.40
BB0.430.600.50
BN0.000.000.00
CO0.711.000.83
DO0.250.400.31
FI1.000.330.50
GI0.000.000.00
PA0.440.670.53
PB0.750.600.67
PN0.290.330.31
PE0.600.600.60
PO0.601.000.75
SP1.000.800.89
VE0.670.500.57
weighted
average
0.490.490.47
accuracy0.49
Table 7. The first three components from the Principal Component Analysis (PCA) of the six most significant characters of all fifteen cvs studied. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
Table 7. The first three components from the Principal Component Analysis (PCA) of the six most significant characters of all fifteen cvs studied. Explanation of leaf descriptors is given in Sez. 3.2-Morphological Descriptors.
TraitPC1PC2PC3
WxH0.45−0.12−0.55
PL0.48−0.390.24
I2_TP0.500.410.11
I2y0.510.410.09
PL_L10.19−0.510.60
BAC−0.160.490.51
Eingen Values2.551.81.14
% of variance42.430.119
Cumulative
variance (%)
42.472.591.5
Table 8. Trichome density (mm2) of upper and lower epidermis of leaves. The codes of the cultivars are reported in Table 1.
Table 8. Trichome density (mm2) of upper and lower epidermis of leaves. The codes of the cultivars are reported in Table 1.
Trichomes Density (mm2)
CvUpper
Epidermis
Lower
Epidermis
AL26.3 ± 3.26 a48.5 ± 2.38 f
BB2.02 ± 0.77 ef71.5 ± 2.65 b
BC7.52 ± 1.5 b64.7 ± 2.78 c
BN3.50 ± 0.51 d–f64.5 ± 2.88 c
CO4.25 ± 035 c–f32.2 ± 2.18 h
DO7.48 ± 1.02 bc63.2 ± 1.90 c
FI5.02 ± 0.76 b–e23.2 ± 2.91 i
GI2.04 ± 0.89 ef54.2 ± 2.65 ef
PA3.26 ± 0.46 d–f93.8 ± 1.79 a
PB2.50 ± 0.35 d–f77.0 ± 1.86 b
PE5.53 ± 0.58 b–d87.5 ± 1.10 a
PN3.01 ± 0.36 d–f55.8 ± 1.56 de
PO3.02 ± 0.32 d–f61.5 ± 2.41 cd
SP2.76 ± 0.28 d–f39.7 ± 0.68 g
VE1.02 ± 0.12 f48.2 ± 1.14 f
Data are the means ± SD. In each column different letters represent significant differences (p < 0.05) according to Tukey’s HSD-test.
Table 9. Abbreviation and units of morphometric parameters.
Table 9. Abbreviation and units of morphometric parameters.
AbbreviationDescriptionUnits
HLamina lengthcm
W Lamina widthcm
WxHArea: leaf length × width cm2
PLPetiole lengthcm
PLØPetiole diametercm
CLLLength of the central lobecm
BACPetiole sinus: angle between
left and right basal lobe
°
αAngle between L1 and L2°
β Angle between L2 and L3; °
Z_TPCentral lobe maximum width calculated
using the Pythagorean Theorem applied to Zx; Zy
cm
Zxx coordinate of the point Z
on the Cartesian plane
cm
Zyy coordinate of the point Z
on the Cartesian plane
cm
L1Apex of the central lobe,
coincides with L1y
cm
L2_TPApex of the secondary lobe calculated using the Pythagorean Theorem applied to Lx and Ly cm
L2xx coordinate of the point
L2 on the Cartesian plane
cm
L2yy coordinate of the point
L2 on the Cartesian plane
cm
I2_TPSinus 2 calculated using the Pythagorean Theorem applied to I2x; I2y cm
I2xx coordinate of the point
I2 on the Cartesian plane
cm
I2yy coordinate of the point
I2 on the Cartesian plane
cm
L3_TPApex of the tertiary lobe calculated using the Pythagorean Theorem applied to L3x; L3y cm
L3xx coordinate of the point
L3 on the Cartesian plane
cm
L3yy coordinate of the point
L3 on the Cartesian plane
cm
I3_TPSinus 3 calculated using the
Pythagorean Theorem applied to I3x; I3Y
cm
I3xx coordinate of the point
I3 on the Cartesian plane
cm
I3yy coordinate of the point
I3 on the Cartesian plane
cm
I2/L2Ratio between sinus 2 (I2_TP)
and apex of secondary lobe (L2_TP)
L2/L1Ratio between apex of secondary
lobe (L2_TP) and apex of central lobe (L1)
I3/L3Ratio between sinus 3 (I3_TP)
and apex of tertiary lobe (L3_TP)
R(I2_TP + I3_TP)/(L2_TP + L3_TP)
PL/HRatio between petiole
length and lamina length
PL/L1Ratio between petiole
length and apex of central lobe
CLL/HRatio between central
lobe length and lamina length
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Giordano, C.; Arcidiaco, L.; Rodolfi, M.; Ganino, T.; Beghè, D.; Petruccelli, R. Description of Ficus carica L. Italian Cultivars—I: Machine Learning Based Analysis of Leaf Morphological Traits. Plants 2025, 14, 333. https://doi.org/10.3390/plants14030333

AMA Style

Giordano C, Arcidiaco L, Rodolfi M, Ganino T, Beghè D, Petruccelli R. Description of Ficus carica L. Italian Cultivars—I: Machine Learning Based Analysis of Leaf Morphological Traits. Plants. 2025; 14(3):333. https://doi.org/10.3390/plants14030333

Chicago/Turabian Style

Giordano, Cristiana, Lorenzo Arcidiaco, Margherita Rodolfi, Tommaso Ganino, Deborah Beghè, and Raffaella Petruccelli. 2025. "Description of Ficus carica L. Italian Cultivars—I: Machine Learning Based Analysis of Leaf Morphological Traits" Plants 14, no. 3: 333. https://doi.org/10.3390/plants14030333

APA Style

Giordano, C., Arcidiaco, L., Rodolfi, M., Ganino, T., Beghè, D., & Petruccelli, R. (2025). Description of Ficus carica L. Italian Cultivars—I: Machine Learning Based Analysis of Leaf Morphological Traits. Plants, 14(3), 333. https://doi.org/10.3390/plants14030333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop