1. Introduction
Sour (tart) cherry (
Prunus cerasus L.) is one of the two main species from the Prunus genus, besides sweet cherry (
Prunus avium L.), with fruits globally traded. These fruit crops have been used by humans since 5000–4000 BCE, which was determined based on cherry pits from archaeological sites. Nowadays, there are many sour cherry cultivars. Due to the health benefits of cherries, tree crop cultivation should increase, and processing technology should be improved [
1]. The cherry fruit has low caloric content and significant amounts of nutrients and bioactive components, e.g., polyphenols, fiber, vitamin C, carotenoids, potassium, as well as melatonin, serotonin, and tryptophan. A small number of sour cherries is consumed fresh. Up to 97% of fruits are processed mainly for cooking or baking [
2]. Before processing, cherries are usually accurately pitted, as the unintended pits in processed cherry products may be a major concern for consumers (potential for injury) and processors (litigation) [
3]. The pit of cherry fruit accounts for 6.30% by weight or even 7–15% of the whole fruit and it consists of the shell (75–80%) and kernel (20–25%) [
4,
5]. The very hard shell contains sclerenchyma and fiber matters. The kernel contains dietary proteins and fiber, and it has antimicrobial and antioxidant activities. The kernels may be used for the production of oils for the pharmaceutical, perfume and cosmetic industries or the production of biodiesel [
4]. Additionally, cherry pit biomass may be potentially used for conversion into biochar for water remediation. This biomass may be also cofired with coal for the generation of electricity. The cherry pit biochar may be applied as catalyst supports, alkaline-functionalized gas adsorbents, electrode materials, or soil amendments for greenhouse crop production [
6,
7,
8,
9,
10,
11]. However, pits are still an important waste disposal problem for the processing industry [
4]. The traditional waste disposal should be replaced by greener ways of cherry pit biomass application [
11].
Depending on the extraction procedure and roasting process, the nutrients may pass from the sour cherry kernels into the oil at different percentages [
12]. The sour cherry cultivar may also influence the oil content of the kernel that is about 17–36% [
5]. The cultivar of cherry kernel also has a great effect on lipophilic bioactive compounds, e.g., sterols, essential fatty acids, tocopherols, tocochromanols, squalene, carotenoids [
5,
13]. Due to the dependence of the chemical properties of sour cherry kernels on the cultivar, correct cultivar recognition may be important in practice. The processing of cherry kernels may require a uniform sample of kernels with the same characteristics. Some cultivars with certain chemical properties may be more desirable for processing than others. Therefore, there may be a need for authentication to avoid adulteration and mixing different cultivars.
The application of machine learning may be useful for plant research. Machine learning as a sub-class of artificial intelligence is an important topic in the computer field. Currently, researchers strive to increase the precision of algorithms and the intelligence of machines. Learning became a significant part of machines. Due to computer vision, which is a domain of machine learning, machines can be trained for processing, analyzing, and recognizing visual data [
14]. Machine learning is intended to enable machines to learn using the available data and make predictions. The learning of computers automatically by themselves without human intervention may be important for precise prediction [
15]. The prediction models developed using machine learning and artificial intelligence can provide promising and accurate results. The models based on artificial intelligence can learn from existing data and then predict even nonlinear phenomena related to, e.g., prediction of food production, crop yield, or identification of the number of immature fruits [
16]. The application of machine learning in modern agriculture is important due to the increasing call for food, the necessity for increasing the effectiveness of agricultural practices and decreasing the environmental burden. Machine learning ensures an increase in computational power compared to conventional techniques of data processing, which can be incapable of extracting all necessary information from field data and thus meeting the growing demands of smart farming [
17]. Machine learning focused on the detection of disease, species, and weeds in crops, the prediction of crop yield and soil parameters, and the classification of crop images to evaluate the plant quality and yield can be one of the key components of the agricultural revolution [
18].
In the case of the seed industry, machine learning may be important for the production, correct cultivar identification, identification of contaminations, and quality control. The use of machine vision techniques can result in more accurate and faster classification results compared to the manual inspection performed by specialists based on the color and morphological features of seeds [
19]. Machine learning caused significant advances in seed research by providing decision-making support and facilitating the development of robust approaches in the seed industry [
20]. The usefulness of the application of machine learning for seed classification was reported in the available literature. The machine learning models were built based on various image features. In the case of cultivar discrimination of fruit seeds or pits and stones, the high efficiency of models based on texture parameters was reported for pepper seeds [
21], apple seeds [
22], peach seeds and stones [
23], sour cherry pits [
24], and sweet cherry pits [
25]. Furthermore, the geometric features proved to be useful for the pit or stone discrimination for different cultivars of apricot [
26], plum [
27,
28,
29], olive [
30], jujube [
31], and sweet cherry [
25]. However, in the present study, extensive research using dozens of geometric parameters, including linear dimensions and shape factors, was performed for the first time to discriminate sour cherry pits ‘Debreceni botermo’, ‘Łutówka’, ‘Nefris’, ‘Kelleris’ using different classifiers (machine learning algorithms). The innovative models based on the sets of selected linear dimensions, shape factors, and combined linear dimensions and shape factors were developed. This approach to distinguishing cultivars of sour cherry pits is original.
The aim of this study was to develop discriminative models based on geometric features including linear dimensions and, separately, shape factors, as well as the combination of linear dimensions and shape factors for the discrimination of the sour cherry pits of different cultivars. The discriminative power of geometric parameters for distinguishing the pairs of cultivars and all four cultivars was compared.
3. Results and Discussion
The linear dimensions of ‘Debreceni botermo’, ‘Łutówka’, ‘Nefris’, and ‘Kelleris’ cherry pits were compared to determine the differences in the mean values between cultivars (
Table 1). All four pit cultivars were different in the terms of their basic linear dimensions, such as length (
L) and width (
S). Each cultivar formed a separate homogenous group. The ‘Kelleris’ pits were characterized by the highest mean values of the parameter
L equal to 12.14 mm. Subsequently, the length of the ‘Nefris’, ‘Łutówka’, and ‘Debreceni botermo’ pits was 11.80 mm, 11.54 mm, and 11.33 mm, respectively. The mean value of parameter
S was the highest for the ‘Nefris’ pits (10.49 mm), followed by ‘Debreceni botermo’ (10.09 mm), ‘Łutówka’ (9.87 mm), and ‘Kelleris’ (9.49 mm). The four homogenous groups were also determined in the case of the length of the skeletonized object (
Lsz), Martin’s minimal radius (
Mmin), and minimal Feret diameter (
Fmin). In the case of these parameters, the ‘Nefris’ pits were characterized by the highest values (
Lsz—174.71 mm,
Mmin—4.92 mm,
Fmin—10.29 mm) and the ‘Kelleris’ pits had the lowest values (
Lsz—125.55 mm,
Mmin—4.45 mm,
Fmin—9.32 mm). In the case of many parameters (
Uw,
Ug,
Spol,
Ft,
Fh,
Maver), the ‘Debreceni botermo’, ‘Łutówka’, and ‘Kelleris’ pits were in one homogenous group and the ‘Nefris’ pits formed the second homogenous group with a statistically significantly different mean value.
The mean values of the shape factors of ‘Debreceni botermo’, ‘Łutówka’, ‘Nefris’, and ‘Kelleris’ cherry pits are presented in
Table 2. In terms of some parameters, such as mean thickness factor (
W5), compactness (
W6), area ratio (
W9), roundness (
W12) and (
W14), Malinowska ratio (
RM), and circularity (
Rc), the pits were statistically significantly different, and each cultivar formed a separate homogenous group. For one parameter, Feret ratio (
RF), the pits belonging to all cultivars were in one homogenous group with no statistically significant differences between the mean values. In the case of most shape factors, elliptic shape factor (
W1), circular shape factor (
W2), circularity (
W3), elongation and irregularity ratio (
W7), rectangular aspect ratio (
W8), radius ratio (
W10), diameter range (
W11), roundness (
W13) and (
W15), standard deviation of all radii (
SigR), Haralick ratio (
RH), Blair–Bliss ratio (
RB), and Feret ratio (
RFf), three homogenous groups were formed, and in most cases (
W7,
W10,
W11,
W13,
SigR,
RH,
RFf), the ‘Debreceni botermo’ and ‘Łutówka’ pits were in one group.
In the first step of the discriminant analysis, the cherry pits were compared in pairs including two different cultivars. The results of the discrimination based on selected linear dimensions are presented in
Table 3. The highest average accuracy of 95% was determined in the case of distinguishing between ‘Nefris’ and ‘Kelleris’ pits. The confusion matrix revealed that 95% of the pits belonging to ‘Nefris’ were correctly included in the class ‘Nefris’ and 5% incorrectly assigned to the class ‘Kelleris’, whereas 94% of ‘Kelleris’ pits were correctly included in the class ‘Kelleris’ and 6% were incorrectly included in the class ‘Nefris’. For these pit cultivars, the values of the true positive (TP) rate (‘Nefris’—0.95, ‘Kelleris’—0.94), precision (‘Nefris’—0.94, ‘Kelleris’—0.96), F-measure (‘Nefris’—0.94, ‘Kelleris’—0.95), ROC (Receiver Operating Characteristic) Area (‘Nefris’—0.97, ‘Kelleris’—0.97) and precision–recall (PRC) area (‘Nefris’—0.95, ‘Kelleris’—0.95) were the highest. It may indicate that the ‘Nefris’ and ‘Kelleris’ pits were the most different in terms of linear dimensions. It confirmed the results of the comparison of the mean values of linear dimensions (
Table 1) that indicated that for most parameters, the ‘Nefris’ and ‘Kelleris’ pits were not in one homogenous group and in some cases formed two of the most distant groups. The lowest average accuracies were observed for the discrimination of the pits of cherry ‘Łutówka’ vs. ‘Nefris’ (78%) and ‘Debreceni botermo’ vs. ‘Łutówka’ (84%). In these cases, the linear dimensions had the lowest discriminative power. The ‘Łutówka’ and ‘Nefris’ pits, as well as those of ‘Debreceni botermo’ and ‘Łutówka’ were the most similar in terms of length. The difference in length between the ‘Łutówka’ and ‘Nefris’ pits was 0.26 mm and the difference between the ‘Debreceni botermo’ and ‘Łutówka’ pits was equal to 0.21 mm (
Table 1). In the case of other pairs of cherry pits, an average accuracy of 90% was found for distinguishing ‘Debreceni botermo’ vs. ‘Kelleris’, 87% for ‘Debreceni botermo’ vs. ‘Nefris’ and ‘Łutówka’ vs. ‘Kelleris’ (
Table 3).
The results of discrimination of the pairs of pits of cherry ‘Debreceni botermo’, ‘Łutówka’, ‘Nefris’, ‘Kelleris’ based on shape factors are shown in
Table 4. The tendency was similar to the results of discriminative models built based on linear dimensions (
Table 3). In both cases, the ‘Nefris’ and ‘Kelleris’ pits were characterized by the highest average discrimination accuracy of 95% (
Table 3 and
Table 4). The sour cherry pits of ‘Łutówka’ vs. ‘Nefris’ (78%) (
Table 3 and
Table 4) and ‘Debreceni botermo’ vs. ‘Łutówka’ (84% (
Table 3), 85% (
Table 4)) had the lowest average accuracies. The other discriminative models built based on shape factors produced average accuracies of 92% for ‘Debreceni botermo’ vs. ‘Kelleris’ pits, 88% for ‘Debreceni botermo’ vs. ‘Nefris’ pits, 87% for ‘Łutówka’ vs. ‘Kelleris’ pits (
Table 4). It indicated that the accuracies for models built based on shape factors (
Table 4) were slightly higher than models built based on linear dimensions (
Table 3).
The accuracies of discrimination based on selected combined linear dimensions and shape factors (
Table 5) were higher than for the discrimination performed with shape factors (
Table 4) and linear dimensions (
Table 3). In the case of models built based on sets of combined linear dimensions and shape factors (
Table 5), the average accuracy reached 96% for distinguishing ‘Nefris’ and ‘Kelleris’. It is 1% higher than for the discrimination of the ‘Nefris’ and ‘Kelleris’ pits for models built based on linear dimensions (95%,
Table 3) and shape factors (95%,
Table 4). In addition, the lowest accuracy of 79%, determined based on combined linear dimensions and shape factors for ‘Łutówka’ vs. ‘Nefris’ pits (
Table 5), was 1% higher than for the model based on linear dimensions (78%,
Table 3) and shape factors (78%,
Table 4) for the discrimination of the ‘Łutówka’ and ‘Nefris’ pits. Furthermore, the discrimination accuracies for all other pairs of cherry pits based on combined linear dimensions and shape factors (
Table 5) increased and were equal to 86% for ‘Debreceni botermo’ vs. ‘Łutówka’, 89% for ‘Debreceni botermo’ vs. ‘Nefris’, 93% for ‘Debreceni botermo’ vs. ‘Kelleris’, and 90% for ‘Łutówka’ vs. ‘Kelleris’.
The performance of the discrimination for all four cultivars was compared for the models built separately for linear dimensions, shape factors and combined linear dimensions and shape factors (
Table 6). The average accuracy of 75% was the highest for discriminative models including combined linear dimensions and shape factors. In this analysis, the pits ‘Debreceni botermo’ and ‘Kelleris’ were characterized by an accuracy of 82%. The correctness of 76% was determined for the pits ‘Nefris’ and 59% for the pits ‘Łutówka’. The least incorrectly classified cases were between the pits ‘Nefris’ and ‘Kelleris’, and the most incorrectly classified cases were between the pits ‘Łutówka’ and ‘Nefris’. The discriminative models built based on shape factors produced an accuracy of 73%. The lowest average accuracy of discrimination of four cherry cultivars was observed for models built based on linear dimensions (72%). It indicated that combined linear dimensions and shape factors had the highest discriminative power for distinguishing the cherry pits belonging to different cultivars, and the discriminative power of linear dimensions was the lowest.
The results of the studies revealed the usefulness of the geometric parameters for the discrimination of different cultivars of sour cherry pits. Both linear dimensions and shape factors had a high discriminative power. However, the models built based on combined linear dimensions and shape factors provided the highest results, equal to 96%, for the discrimination of two pit cultivars and 75% for four pit cultivars. The results obtained by Ropelewska [
24] indicated that the textures had even higher discriminative power for the discrimination of the pits of different sour cherry cultivars. The pairs of cultivars were discriminated with an average accuracy of up to 100%, whereas, for the discrimination of four cultivars, the correctness of up to 96.25% was achieved. Ropelewska [
25] reported that for sweet cherry pits as well, the discrimination accuracies for models built based on textural features (up to 100% for two pit cultivars and 95% for three cultivars) were higher than for geometric parameters (up to 99% for two cultivars and 95% for three cultivars). Additionally, Ropelewska [
25] found that the models combining geometric and textural parameters provided the highest accuracies of up to 100% for two cultivars and 98% for three pit cultivars. The results of cultivar discrimination of sour cherry pits based on geometric parameters presented in this paper did not reach 100%. This may indicate some limitations of the developed models that make it impossible to distinguish ‘Debreceni botermo’, ‘Łutówka’, ‘Nefris’, and ‘Kelleris’ sour cherry pits based on geometric features with 100% accuracy. It prompts us to carry out further research on sour cherry pits to build discriminative models combining selected geometric and other features. However, the contribution of this study to distinguishing sour cherry pit cultivars using machine learning is significant. The linear dimensions and shape factors with the highest discriminative power were indicated. The mean values of these selected parameters differed the most among the cultivars. The next stage of the research may involve combining these geometric features and selected textures in the model to increase the discrimination accuracy. The developed models based on geometric and textural features could be more successfully applied in practice to detect falsification of sour cherry pit cultivars.