Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species

Uzer, Mustafa Serter

doi:10.3390/app15084516

Open AccessArticle

Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species

by

Mustafa Serter Uzer

Electronics and Automation, Ilgın Vocational School, Selcuk University, Konya 42600, Turkey

Appl. Sci. 2025, 15(8), 4516; https://doi.org/10.3390/app15084516

Submission received: 15 February 2025 / Revised: 8 April 2025 / Accepted: 15 April 2025 / Published: 19 April 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Pistachio is a nut originating from the Middle East, and the main varieties grown and exported in Turkey are Kirmizi and Siirt pistachios. Due to their strategic importance in the agricultural economy, they need to be classified correctly. This study aims to classify Kirmizi and Siirt pistachios using various deep learning-based models and k-fold cross-validation. For this purpose, the seven convolutional neural network models trained by transfer learning and the proposed MSU-CNN model are used for classification with k-fold cross-validation. The dataset used in this study consists of 2148 images, 1232 of which belong to Kirmizi pistachio and 916 to Siirt pistachio. The K-fold cross-validation method is applied to enhance the generalization ability of the classification model, prevent overfitting, and improve performance reliability. The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved classification accuracies of 94.88%, 96.79%, 96.79%, 97.90%, 98.88%, 99.02%, 99.21%, and 99.63%, respectively, with average results based on 5-fold cross-validation and the highest accuracy attained by ResNet-50. The performance of models was evaluated using classification accuracy, sensitivity, specificity, precision, F1-score, and ROC-AUC values. According to the results, many of the proposed models proved to be effective in the identification of pistachio species.

Keywords:

Deep learning; pistachio; convolutional neural networks; k-fold cross-validation; food classification

1. Introduction

Pistachios (Pistacia vera L.), classified under the family Anacardiaceae, are widely recognized as a global delicacy, valued for their distinctive flavor and versatile applications [1,2]. Pistachios are rich in protein, fiber, monounsaturated fats, various vitamins, minerals, phytochemicals, and polyphenols [3,4,5,6]. Both epidemiological and clinical research on pistachios increasingly highlights the health benefits associated with the consumption of tree nuts [3,6]. Pistachio is one of the agricultural products originating from the Middle East and Central Asia. The USA, Iran, Turkey, and Syria, the leading global producers of pistachios, account for nearly 90% of the total worldwide production. Pistachio is consumed globally, with Turkey ranking third in the world in pistachio production [7].

The consumer price of pistachios, as a high-cost agricultural product, is influenced by the product’s quality. Therefore, assessing the quality of shelled pistachios is crucial for economic, marketing, and export purposes. Improved quality can result in improved consumption rates and enhanced marketing opportunities. Moreover, accurately assessing the quality of pistachios through efficient and straightforward smart system-based methods is crucial for preventing economic losses in marketing and export [8,9]. In conclusion, the demand for innovative techniques and technologies for pistachio shelling and classification continues to increase.

Some studies conducted in the literature regarding pistachios are presented as follows. A smart classification system for pistachios is developed using an artificial neural network (ANN) in [8]. The ANN classifier is trained using acoustic signals generated by the impact of pistachios against a steel plate. Various signal processing techniques are used to extract the features. It has been reported that this method provided 99.89% classification accuracy. In [10], an approach that includes machine learning techniques and image processing is proposed to classify pistachios in shell. The ANN classifier achieves a success rate of 99.4%, while the Support Vector Machine (SVM) classifier achieves a success rate of 99.8%. In [11], various machine learning classifiers are used to detect pistachios with open shells, rotten pistachios, and pistachios with closed shells. First, the dataset is augmented using data augmentation, and feature extraction is performed with AlexNet and GoogleNet. Then, the most decisive 300 features are selected with PCA and classified with SVM. A maximum accuracy of 99% has been achieved by giving the features obtained from GoogleNet to the classifier.

In [1], a deep learning-based method is proposed to detect spinach mixtures in pistachios. The method achieves more than 90% accuracy on 6-class dataset combinations using various color spaces and CNN architectures. The VGGNet-19 model achieves 100% accuracy in classifying the LAB color space. VGGNet-19 on HSV and ResNet-50 on YUV models achieve more than 98% accuracy in detecting pistachio adulteration. In [12], deep autoencoder neural networks are used to separate pistachios into defective and perfect pistachios. A classification success of 80.3% is achieved in detecting defective pistachios. In [13], a dataset of 958 pistachios with defective and perfect pistachios is used for classification. GoogleNet, VGG16, and ResNet models achieve a classification accuracy of 95.8%, 95.83%, and 97.2%, respectively.

In [14], image features are extracted using deep neural network embeddings, and pistachio species are classified with small-scale machine learning algorithms trained on these feature vectors. The findings show that logistic regression applied to the features extracted from the second-to-last layer of the Painters network achieves an accuracy of 97.20%. In [15], a computer vision-based system is presented to recognize whether the shell of a pistachio is open or closed. Feature extraction and CNN-based classification are performed. They achieve an average classification success of 85.28%, 85.19%, and 83.32% for ResNet50, ResNet152, and VGG16 models, respectively. In [16], Kirmizi and Siirt pistachio images are used for classification in a CNN model based on VGG16. The VGG16 model achieves 99.91% classification accuracy. In [17], Kirmizi and Siirt pistachio types are classified using CNNs of AlexNet, VGG16, and VGG19 models. Among these, the VGG16 model achieves the highest accuracy with 98.84%. The results show that pistachio species can be successfully used for computer vision. In [18], Kirmizi and Siirt pistachio are classified with 2148 images using ResNet architecture. The highest classification accuracy is obtained as 88.58% in fold-1, and the 5-fold average classification accuracy is 86.16%.

In [19], a setup using a dual camera setup and a Cartesian manipulator is constructed to distinguish between open and closed shelled pistachios. Pistachios in this setup are detected with a deep learning-based object detection algorithm. Thus, this method provides 98% and 85% successful detection for open and closed shelled pistachios, respectively. In [20], the different frequency components generated by the impact of pistachios on a surface are used in a high sensitivity carbon microphone and a high-level programming language to distinguish between open and closed shelled pistachios. They propose an acoustic emission-based system consisting of mel frequency cepstral coefficients (MFCC) for feature extraction and the SVM method for classification, which they believe will improve the product quality and processing efficiency of pistachios. In [21], they have created a computer vision system to generate images containing 2148 high-resolution pistachio species and classify them. The resulting dataset is used in an advanced multi-stage model using principal component analysis (PCA) and k-NN technique, and a classification success of 94.18% is found for the 16 features obtained.

In [22], an unsupervised learning-based method is introduced to identify the opening of pistachio shell, which uses clustering algorithms and ResNet18 CNN to categorize pistachio features. A 99.31% accuracy in three-dimensional space has been achieved by the method, which can be applied to both qualitative and quantitative analysis of pistachio opening. Significant potential for enhancing pistachio quality detection, product classification, and reducing resource consumption is offered by this approach, providing valuable insights for future intelligent sorting equipment and quality detection of other agricultural products.

As a result, there is a need for new techniques and technologies for the categorization and labeling of pistachios. Therefore, this research aims to categorize Kirmizi and Siirt pistachios using deep learning-based classification methods for pistachio species, incorporating k-fold cross-validation.

The significant contributions of this study are as follows:

Since the quality of shelled pistachios is crucial for the economy, export, and marketing efforts according to their type, Kirmizi and Siirt pistachios have been classified using various deep learning-based classification models with k-fold cross-validation. A dataset of 2148 images, consisting of 1232 Kirmizi and 916 Siirt pistachio images, has been used for this classification.
Seven convolutional neural network (CNN) models, trained through transfer learning, have been utilized for classification, along with the proposed model, MSU-CNN, which also incorporates k-fold cross-validation.
The k-fold cross-validation technique has been utilized to enhance the generalization ability of the classification model, prevent overfitting, and improve performance reliability.
The classification performances have been evaluated not only based on classification accuracy but also using sensitivity, specificity, precision, F1-scores, and ROC-AUC values. The results show that the proposed approaches can properly identify pistachio species.
According to the existing literature, there does not appear to be a study that has utilized such a large number of models and achieved such high classification accuracy rates for pistachio species. Additionally, it seems that the proposed models perform effective and reliable classification and are relatively better and more innovative compared to the existing literature.

The organization of this study is as follows: Section 2 covers the pistachio image dataset, convolutional neural network, k-fold cross-validation, and classification performance indicators. Section 3 explains the recommended methodologies, which include pre-trained CNN models and the proposed MSU-CNN model. Section 4 includes the classification performance results for all models, as well as a comparison of the suggested techniques to the existing literature. Finally, Section 5 highlights the important findings and wraps up the study.

2. Materials and Methods

In this section, pistachio image dataset, convolutional neural network, k-fold cross validation, and classification performance indicators are discussed in detail.

2.1. Pistachio Image Dataset

The pistachio image dataset used in this study for the classification of pistachio species consists of a total of 2148 images and has 2 classes. Of these images, 1232 belong to the Kirmizi pistachio species and 916 belong to the Siirt pistachio species. The size of each image in the dataset is 600 × 600 pixels. The images consist of original pistachio samples used in the training and testing of deep learning models [17,21]. Sample images of pistachios taken from the dataset are shown in Figure 1.

More comprehensive information about the construction of the dataset is given in the following sentences. The pistachio species used in the dataset consist of images of licensed pistachio species. A computer vision system consisting of a lens mount and a Prosilica GT2000C (AVT.ca, Mississauga, Canada) image capture camera was used to acquire the pistachio images. The distance between the pistachio samples and the camera was set to 15 cm. An image capture box was used to eliminate outdoor light differences and to prevent shadows on the pistachios. The background surface was set to black to avoid unwanted noise. Noise in the resulting pistachio images was removed using image processing methods. For contrast enhancement, histogram information was obtained and histogram equalization was performed [21].

In this study, the pistachio image dataset was resized to the appropriate input size required by the proposed networks before being used in the CNNs. After dividing the dataset into five subsets for 5-fold cross-validation, training and test sets were separated for each fold. Random rotation, random horizontal translation, and random vertical translation transformations were applied to the images in the training dataset of each fold for data augmentation, thereby ensuring that the model was trained with more diverse data. For the images in the test set of each fold, no data augmentation was applied; instead, they were only resized to the appropriate input size. This process helps the model learn more generalizable features and reduces the risk of overfitting. Additionally, all processes in this study were carried out using the MATLAB (version R2024b) program.

2.2. Convolutional Neural Networks

Deep learning (DL) is a subfield of machine learning that consists of algorithms that can efficiently process large-scale, high-resolution datasets using state-of-the-art graphics processing units. These algorithms consist of interconnected nodes, connections, and parameter systems built on a predetermined architecture. Various optimization techniques and weighting methods are used in these algorithms to perform accurate classification of data. Convolutional neural networks (CNNs) used in DL are used in tasks, such as image object detection, due to their effectiveness in capturing spatial hierarchies and local patterns in visual data [1]. DL is a multilayered approach used for feature extraction and identification from large datasets [23]. It consists of different layers designed for specific tasks, including convolutional layers, activation layers, pooling layers, smoothing layers, and fully connected layers [24].

Convolutional Layer: This layer is used to derive features from the input data. Directly connected to the image dataset, low-level features, such as edges and colors, are usually extracted in this layer [25]. The input vectors are analyzed using a predefined filter that transforms the data into the feature space [26].

Activation Layer: In this layer, a nonlinear function is applied to each pixel of the images [27]. The rectified linear unit (ReLU) activation function has become increasingly popular and has replaced traditional activation functions, such as sigmoid and hyperbolic tangent functions [28].

Pooling Layer: This layer reduces the number of parameters in the network and the computational load. Reducing this computational load offers the advantages of minimizing the computation required by subsequent layers and helping to prevent overfitting by reducing the ability of the network to memorize data. Typical pooling techniques include maximum pooling, total pooling, and average pooling [29].

Flatten Layer: This layer ensures that the input data are prepared for the final layer. Neural networks need input data in the form of one-dimensional arrays, so this layer converts matrix-like data from previous layers into one-dimensional arrays. Since each pixel of the image is depicted as a single line, this operation is referred to as flattening [29].

Fully Connected Layers: The fully connected layer depends on all the nodes from the previous layer. The number of fully connected layers can vary across different architectures. In these layers, features are retained, and the learning process is performed by adjusting the weights and biases. This layer handles the main processing task by receiving input from all feature extraction phases and evaluating the outputs of all previous processing layers [17,30].

2.3. Transfer Learning

This method aims at using the knowledge gained during the training phase to solve similar problems. In recent years, this method has gained significant popularity. Transfer learning is a technique that involves transferring learned weights to neural networks. These weights are based on features previously extracted from a large dataset [31]. By utilizing a pre-trained CNN model, researchers can leverage features extracted from the final layer along with various classifier layers to address new tasks. These features can be processed by fully connected layers (classifier layers) for classification. This pre-training approach has been shown to produce positive performance outcomes [32].

2.4. Pre-Trained CNN Models

Numerous CNN architectures are presented in the literature. When selecting among them, classification accuracy, model size, and processing speed were considered. The models used in this study were determined through extensive experiments. For the dataset used, models that demonstrated the best performance were specifically preferred.

2.5. K-Fold Cross-Validation

Cross-validation is a widely used technique for evaluating the classification and prediction performance of models [7,33]. In this study, k-fold cross-validation method was used to obtain more reliable classification results. Usually, k is set as 5 or 10 [34]. In this study, k was set as 5, and accordingly, the dataset was divided into five parts. The training and testing procedures were performed for five iterations. In each iteration, one subset part was separated for testing, while the remaining four subset parts were used for training. Each subset part served as the test set once during the process. To evaluate the success of the model, the average classification accuracy was calculated by averaging the results obtained in all iterations. In summary, cross-validation is used to increase the generalization ability of the classification model, prevent overfitting and improve the performance reliability [35,36].

2.6. Confusion Matrix

Confusion matrix is applied to evaluate the prediction accuracy of datasets. The elements in the matrix are often used to measure the efficiency of classification tasks [37]. The confusion matrix for the binary classification task in this study is shown in Table 1.

Explanations of the performance metrics obtained from the confusion matrix are given in the following sentences. TP gives the total number of examples where the model correctly predicted the positive class. FP represents the total number of instances in which the model incorrectly classifies a negative sample as positive. TN gives the total number of examples where the model correctly predicted the negative class. FN represents the total number of instances in which the model incorrectly classifies a positive sample as negative.

2.7. Performance Metrics

Performance metrics are important to evaluate the effectiveness of the classifier [38,39,40,41]. In this study, performance metrics such as accuracy, sensitivity, specificity, F1-score, and precision were calculated. The formulas used to calculate these metrics are given in Table 2.

2.8. ROC and AUC

The receiver operating characteristic (ROC) curve is a widely used method to evaluate the effectiveness of a classification model in classification tasks. In the ROC curve, the false positive rate is shown on the x-axis and the true positive rate is shown on the y-axis. The area under the curve (AUC) is a metric to evaluate the classification performance of a model, and AUC values close to 1 indicate that the model performs well [42].

3. The Proposed Approaches

In this study, the classification of Kirmizi and Siirt pistachios is performed using deep learning-based approaches with k-fold cross-validation. For this purpose, seven pre-trained convolutional neural networks and a proposed model, MSU-CNN, are trained and evaluated with k-fold cross-validation to ensure robust and reliable performance. The seven pre-trained CNNs used in this study are AlexNet, GoogLeNet, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50. In addition to these, the proposed MSU-CNN model is used. The block diagram representation of the eight proposed approaches in total is given in Figure 2.

First, the image dataset is divided into 5 to create a 5-fold CV. In other words, each fold is divided into 80% training and 20% test image set. For the training dataset in each layer, image data augmentation is performed. In other words, random rotation, random horizontal translation, and random vertical translation transformations are applied to train the model with more diverse data. Before starting the training, layer change adjustments are made to the pre-trained models with transfer learning as stated in the subheadings of this section. Then, the network used in each fold is trained with the training data, and the classification accuracy for each fold is obtained. The average of these values is calculated, and the average classification accuracy is obtained.

3.1. Pre-Trained CNN Models with Transfer Learning

Seven different models created using transfer learning were run to categorize pistachio images. These models are pre-trained models such as AlexNet, GoogLeNet, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50. These seven models are fine-tuned with the transfer learning method and trained on pistachio images. Model outputs were changed to be two categories since the pistachio dataset used in the study has two classes. For all models, the pistachio dataset was divided into 5 parts according to 5-fold cross-validation; in other words, 80% of the data was divided into training data, and 20% was divided into test data in each fold. Then, the averages of the results obtained from each fold were calculated for the classification performance, and thus, more reliable results were obtained.

Transfer learning is a method that uses pre-trained models on large-scale datasets to perform tasks involving smaller datasets, thus improving model efficiency. Fine-tuning, on the other hand, represents a subset of transfer learning and is the adaptation of the weights of a pre-trained model to improve performance for a new task. In this process, selected layers of the model remain frozen while others are retrained to better adapt to the target problem. Typically, fine-tuning involves making the final layers of the model compatible with the new dataset, thus speeding up the training process.

First, one of the pre-trained models is loaded. Then, the layers of this loaded model are transferred to a layer graph via code. This graph is used to change and manage the layers of the model. A new fully connected layer is created to determine the classification output and the learning parameters in this layer are adjusted to make changes in the learning rate. In addition to this created layer, a new classification layer is created to classify the output of the model. From the layer graph of the model, the last layers are replaced with the newly defined fully connected and classification layers. Then, after adjusting the training parameters, such as the learning rate and the optimization algorithm, the model is trained with the updated classification layers. Consequently, the pre-trained network was adapted to the new task upon completion of the training process. Likewise, the other pre-trained network models used in this study underwent the same procedures.

3.2. Proposed MSU-CNN Model

Network architecture of the proposed MSU-CNN model is given in Figure 3. The description of the blocks in the proposed architecture is provided below.

Input Layer: This layer defines the input to the model and specifies the dimensions of the input images. Additionally, normalization is applied so that each feature has a mean of zero and a standard deviation of one.

First Convolutional Block: This block consists of a convolution2dLayer, batchNormalizationLayer, and reluLayer. The convolution2dLayer applies convolution using a 3 × 3 kernel with 32 filters, deriving basic features from the input. The batchNormalizationLayer normalizes activations to improve training stability and accelerate convergence. The reluLayer introduces non-linearity using the Rectified Linear Unit (ReLU) activation function, which replaces negative values with zero, facilitating efficient gradient propagation.

Block 1: This block comprises a convolution2dLayer, batchNormalizationLayer, reluLayer, and maxPooling2dLayer. A 3 × 3 convolution is performed with 64 filters and a stride of 1, keeping the output size similar to the input. The Max pooling (2 × 2, stride = 2) reduces the spatial dimensions by half, preserving essential features while lowering computational complexity.

Block 2: This block increases the number of filters to 128, maintaining the 3 × 3 kernel size. Batch normalization and ReLU activation are applied as before. Pooling further reduces the feature map dimensions, capturing higher-level feature representations.

Block 3: The number of filters is increased to 256, and convolution is applied. Batch normalization and ReLU activation follow. Pooling continues to reduce spatial dimensions.

Block 4: This block utilizes 512 filters for convolution, followed by average pooling to aggregate spatial information more effectively.

Fully Connected Layers: These layers refine the extracted features and perform classification. The initial fully connected layer consists of 1024 neurons, followed by another with 512 neurons. Dropout (0.5) is applied between these layers to prevent overfitting by randomly deactivating 50% of the neurons. The softmax layer converts the output into probability distributions over the class labels. Finally, the classification layer assigns the predicted class based on the highest probability.

4. Results and Discussion

The proposed models were implemented on a computer running a 64-bit Windows 11 Pro OS, featuring an i5-11400H processor with a 2.70 GHz clock speed, 32 GB of RAM, and a GTX 1650 GPU. The classification accuracy and training parameters of all models are given Table 3. Stochastic gradient descent with momentum (SGDM) and adaptive moment estimation (Adam) are optimization algorithms used in deep learning. These algorithms are used to update the network’s learnable parameters in a custom training loop. The learning rate for all algorithms was configured to be 0.0001.

The models, including AlexNet, GoogLeNet, the proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50, attained mean classification accuracies of 94.88%, 96.79%, 96.79%, 97.90%, 98.88%, 99.02%, 99.21%, and 99.63%, respectively, during 5-fold cross-validation. Among these, ResNet-50 attained the highest accuracy. The proposed MSU-CNN model stands out for its notable competitiveness when compared to the pre-trained models, and it demonstrates a good level of performance in accurately classifying the types of pistachios. In the 5-fold cross-validation, where the dataset is divided into 80% for training and 20% for testing, the models achieved the following highest classification accuracies from each fold: AlexNet, GoogLeNet, the proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 attained mean classification accuracies of 98.84%, 98.37%, 97.21%, 99.07%, 99.53%, 99.53%, 99.77%, and 100%, respectively.

4.1. Comparison of Proposed Models Among Themselves

The classification accuracies for each fold of the seven pre-trained CNN models with transfer learning, as well as for each fold of the proposed MSU-CNN model, are presented. Additionally, the average classification accuracy is calculated. According to these results, Resnet50 model was found to perform the best. Confusion matrices of each fold, ROC curves, and AUC values are provided for the best performing Resnet50. Furthermore, the training accuracy and loss graphs for one-fold the highest-performing Resnet50 model are also shown. The proposed MSU-CNN model ranked sixth in classification performance among the eight models, demonstrating its competitiveness compared to the other models. Mean performance metrics of all models are calculated and presented in Table 4.

The classification accuracies of AlexNet model were found to be 0.9394 for fold-1, 0.9047 for fold-2, 0.9581 for fold-3, 0.9884 for fold-4, and 0.9534 for fold-5, and the average accuracy was 0.9488. Among the folds, fold-4 reached the highest classification accuracy with a value of 0.9884.

The classification accuracies of GoogLeNet model were found to be 0.9674 for fold-1, 0.9605 for fold-2, 0.9837 for fold-3, 0.9605 for fold-4, 0.9674 for fold-5, and the average accuracy was 0.9679. Among the folds, fold-3 reached the highest classification accuracy with a value of 0.9837.

The classification accuracies of MSU-CNN model were found to be 0.9697 for fold-1, 0.9651 for fold-2, 0.9721 for fold-3, 0.9628 for fold-4, and 0.9697 for fold-5, and the average accuracy was 0.9679. Among the folds, fold-3 reached the highest classification accuracy with a value of 0.9721.

The classification accuracies of VGG-16 model were found to be 0.9697 for fold-1, 0.9814 for fold-2, 0.9744 for fold-3, 0.9907 for fold-4, and 0.9790 for fold-5, and the average accuracy was 0.9790. Among the folds, fold-4 reached the highest classification accuracy with a value of 0.9907.

The classification accuracies of EfficientNet-b0 model were found to be 0.9953 for fold-1, 0.9860 for fold-2, 0.9930 for fold-3, 0.9860 for fold-4, and 0.9837 for fold-5, and the average accuracy was 0.9888. Among the folds, fold-1 reached the highest classification accuracy with a value of 0.9953.

The classification accuracies of ResNet-18 model were found to be 0.9883 for fold-1, 0.9907 for fold-2, 0.9860 for fold-3, 0.9907 for fold-4, and 0.9953 for fold-5, and the average accuracy was 0.9902. Among the folds, fold-5 reached the highest classification accuracy with a value of 0.9953.

The classification accuracies of Inception-v3 model were found to be 0.9930 for fold-1, 0.9907 for fold-2, 0.9860 for fold-3, 0.9977 for fold-4, and 0.9930 for fold-5, and the average accuracy was 0.9921. Among the folds, fold-4 reached the highest classification accuracy with a value of 0.9977.

Among the eight proposed models, the ResNet-50 model was the best-performing CNN in this study. The performance evaluation of the ResNet-50 model is presented in Table 5. Performance evaluation accuracies were found to be 1.00 for fold-1, 1.00 for fold-2, 1.00 for fold-3, 0.9907 for fold-4, and 0.9907 for fold-5, and the average accuracy was 0.9963. Among the folds, fold-1, fold-2, and fold-3 reached the highest classification accuracy with a value of 1.00.

The training accuracy and loss graph of fold-5 for the ResNet-50 model is shown in Figure 4. In the training accuracy graph, the model’s learning speed, the improvement in its accuracy, and its ability to generalize during training can be observed. On the other hand, the loss graph shows the rate at which the model’s errors decrease, the speed of its progress, and potential issues, such as overfitting or underfitting.

The confusion matrix with 5-fold CV and ROC curves with 5-fold CV for the ResNet-50 model are given in Figure 5 and Figure 6, respectively. These figures regarding the classification performance of the proposed ResNet-50 model allow us to make inferences, such as evaluating model accuracy and detecting misclassifications. For example, for fold-4, the confusion matrix results indicate that the model correctly classified 245 instances of the ‘Kirmizi’ class and 181 instances of the ‘Siirt’ class. However, 1 ‘Kirmizi’ sample was misclassified as ‘Siirt’, and 3 ‘Siirt’ samples were incorrectly predicted as ‘Kirmizi’.

The classification accuracy values of the models used are presented in Figure 7. AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved accuracy values of 0.9488, 0.9679, 0.9679, 0.9790, 0.9888, 0.9902, 0.9921, and 0.9963, respectively. Accordingly, the ResNet-50 model achieved the highest accuracy. It was followed by Inception-v3, ResNet-18, EfficientNet-b0, VGG16, the proposed MSU-CNN, GoogLeNet, and AlexNet, respectively.

The F1-scores of the classification models is shown in Figure 8. The F1-score offers a balanced assessment of both precision and sensitivity. If either of these metrics is significantly low, the F1-score will also reflect a low value.

The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved F1-scores of 0.9466, 0.9669, 0.9672, 0.9786, 0.9886, 0.99, 0.9919, and 0.9978, respectively. According to these results, the ResNet-50 model was the best at balancing precision and sensitivity. It was followed by Inception-v3, ResNet-18, EfficientNet-b0, VGG16, the proposed MSU-CNN, GoogLeNet, and AlexNet, respectively.

The precision values of the classification models are illustrated in Figure 9. Precision indicates the accuracy of the model’s positive predictions. In other words, it evaluates the proportion of instances predicted as positive that are actually positive.

The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved precision values of 0.9553, 0.9712, 0.9668, 0.9791, 0.9878, 0.9901, 0.9916, and 0.9935, respectively. According to these results, the ResNet-50 model is the best in indicating the accuracy of the model’s positive predictions. It is followed by Inception-v3, ResNet-18, EfficientNet-b0, VGG16, GoogLeNet, the proposed MSU-CNN, and AlexNet, respectively.

The sensitivity values of the classification models are illustrated in Figure 10. Sensitivity reflects the model’s ability to accurately detect true positives. In other words, it quantifies the proportion of actual positive instances that are correctly identified.

The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved sensitivity values of 0.9426, 0.9639, 0.9678, 0.9784, 0.9894, 0.9899, 0.9923, and 0.9984, respectively. According to these results, the ResNet-50 model has the highest ability to correctly identify true positives. Then, in the sensitivity ranking, Inception-v3, ResNet-18, EfficientNet-b0, VGG16, the proposed MSU-CNN, GoogLeNet, and AlexNet models come, respectively.

The specificity values of the classification models used are presented in Figure 11. These specificity values reflect the ability of the model to correctly detect negative cases. The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved specificity values of 0.9426, 0.9639, 0.9678, 0.9784, 0.9894, 0.9899, 0.9923, and 0.9956, respectively. According to these specificity results, ResNet-50 model is the best in distinguishing true negatives. Then, Inception-v3, ResNet-18, EfficientNet-b0, VGG16, the proposed MSU-CNN, GoogLeNet, and AlexNet come next in specificity order, respectively.

The ROC-AUC values of the models are presented in Figure 12. AUC represents the area under the ROC curve. It ranges from 0 to 1 and serves as a single metric summarizing the model’s performance. An AUC of 1 indicates a perfect model, meaning it distinguishes all classes without error.

The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models achieved AUC values of 0.9936, 0.9957, 0.9952, 0.9982, 0.9995, 0.9993, 0.9997, and 0.9997, respectively. According to these results, the ResNet-50 and Inception-v3 models separate the classes with fewer errors compared to the others. It is followed by EfficientNet-b0, ResNet-18, VGG16, GoogLeNet, the proposed MSU-CNN, and AlexNet, respectively.

In this study, data augmentation was applied to the training dataset. Data augmentation only includes random rotation, random horizontal translation and random vertical translation transformation operations. Other transformation methods such as horizontal symmetry or vertical symmetry, random scaling were tried but good results were not obtained. As a result of the trials, random rotation was set to be rotated between −10 and +10 at random angles, random horizontal scrolling and random vertical scrolling were set to be shifted between −5 and +5 pixels to the right or left.

To summarize, in this study, seven pre-trained convolutional neural networks, including AlexNet, GoogLeNet, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50, and a designed model, MSU-CNN, are evaluated with k-fold cross-validation to ensure robust and reliable performance for pistachio type classification. The networks used in this study are distinguished by their different architectures and features. Each has strengths and drawbacks, leading to performance differences. The AlexNet network is a basic architecture for deep networks. It is simple and fast. However, its general characteristics can limit its learning capacity. The GoogLeNet network has a very deep structure, can learn more features, and has high computational efficiency. But the training process may be longer. Proposed MSU-CNN is a network designed specifically for this classification task, it does not have many layers compared to other networks, and it is fast. However, since it is designed for this classification task, it may cause the performance to change on different datasets. The VGG16 network tries to learn in-depth features using filters and its structure is simple, but it uses a large number of parameters. EfficientNet is an advanced network optimization strategy and its smallest model is the EfficientNet-b0 network. It generally provides higher performance with fewer parameters. ResNet-18 and ResNet-50 networks are used to train deep networks using residual connections. ResNet-50 has a deeper structure with 50 layers. Residual connections allow the network to learn effectively despite being very deep. The Inception-v3 network learns features in parallel with filters of different sizes using Inception modules. It provides high accuracy and efficient computation. However, the training process can be long and requires more computational resources. As a result, ResNet-50 and Inception-v3 networks can learn more complex features and thus achieve higher accuracies due to their high depth and parallel processing capacity.

4.2. Comparison of the Proposed Approaches with the Literature

Comparison of the proposed approaches with the literature presents in Table 6. In order to make the comparison easily, information, including sample size, class count, used classifier and classification accuracy, is given in this table.

Literature studies in the table show that both classical and deep learning-based machine learning classifiers have been used to classify pistachios. In addition, there are hybrid approaches for the classification of pistachios. When the literature studies in the table are examined, there are cases where pistachios are classified as open and closed shelled [8,15,19,22], pistachios are classified as defective or not [12,13], pistachios are classified as open, closed and rotten [11], spinach mixture in pistachios is detected with CNN [1], and pistachio varieties are classified [10,14,16,17,18,21].

Some of the studies in which pistachio varieties were classified used the same pistachio dataset, while others used different pistachio datasets. From studies, such as Ozkan et al. (2021) [21], Singh et al. (2022) [17], Avuclu (2023) [18], and Kumar et al. (2023) [14], that use the same dataset, it is seen that all other methods have higher classification accuracy except the AlexNet, Proposed MSU-CNN, and GoogLeNet models in this study when the dataset is divided into 80% training and 20% testing. In the study conducted by Singh et al. (2022) [17], the same dataset as in this study was used, and it was split into 80% training and 20% testing. They reported that the success rates obtained from the AlexNet, VGG16, and VGG19 models were 94.42%, 98.84%, and 98.14%, respectively. They achieved the highest success rate with VGG16 at 98.84%. In this study, the VGG16, EfficientNet-B0, ResNet-18, Inception-V3, and ResNet-50 models achieved higher success rates than those reported in their study. In the study by Sabah and Abu-Naser (2024) [16], which used a different dataset composed of Kirmizi and Siirt data types, a classification accuracy of 99.91% was achieved. They split the dataset into 80% training and 20% testing. Despite utilizing a different dataset, the proposed ResNet-50 model outperformed their model, achieving 100% accuracy, demonstrating its effectiveness. Additionally, the MSU-CNN model designed in this study outperforms Ozkan et al. (2021) [21], Avuclu (2023) [18], and Kumar et al. (2023) [14] using the same dataset in terms of classification accuracy. Moreover, the average classification success of the EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 models with 5-fold cross-validation used in this study is higher than Ozkan et al. (2021) [21], Singh et al. (2022) [17], Avuclu (2023) [18], and Kumar et al. (2023) [14] studies.

5. Conclusions

This study focuses on classifying Kirmizi and Siirt pistachios using deep learning-based approaches, with k-fold cross-validation employed to enhance model generalization and prevent overfitting. For classification, seven convolutional neural network models trained with transfer learning, along with the proposed MSU-CNN model, were used. The dataset consists of 2148 images, with 1232 from Kirmizi and 916 from Siirt. The models, including The AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50, achieved average classification accuracies of 94.88%, 96.79%, 96.79%, 97.90%, 98.88%, 99.02%, 99.21%, and 99.63%, respectively, for 5-fold cross-validation. The highest accuracy was attained by ResNet-50 for 5-fold cross-validation. The proposed MSU-CNN model demonstrates significant competitiveness compared to the pre-trained models and exhibits remarkable performance in accurately classifying pistachio species. Since 5-fold cross-validation splits the dataset into 80% training and 20% testing, the highest classification accuracies obtained from each fold for the models were as follows: AlexNet, GoogLeNet, proposed MSU-CNN, VGG16, EfficientNet-b0, ResNet-18, Inception-v3, and ResNet-50 achieved average classification accuracies of 98.84%, 98.37%, 97.21%, 99.07%, 99.53%, 99.53%, 99.77%, and 100%, respectively. The highest accuracy was attained by ResNet-50 for the 80% training-20% testing split. Performance evaluation was carried out using sensitivity, specificity, precision, F1-score, ROC curves, and AUC values. These findings demonstrate that the proposed models can classify pistachio species most effectively.

The future work could involve applying different artificial intelligence methods and models based on the total number of images in the dataset to achieve higher classification accuracies. The proposed model can be further adapted with more complex layer configurations, making it suitable for both pistachio classification and other classification tasks. A broader classification study could be carried out by gathering images of different pistachio varieties. Additionally, the implementation may be developed into a mobile platform for pistachio species identification in agricultural fields.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in the study can be accessed from the link https://www.muratkoklu.com/datasets/Pistachio_Image_Dataset.zip (accessed on 30 November 2024).

Acknowledgments

The author thanks the Selcuk University Scientific Research Projects Coordinatorship for supporting this manuscript.

Conflicts of Interest

The author declares no conflicts of interest.

References

Çinarer, G.; Dogan, N.; Kiliç, K.; Dogan, C. Rapid detection of adulteration in pistachio based on deep learning methodologies and affordable system. Multimed. Tools Appl. 2024, 83, 14797–14820. [Google Scholar] [CrossRef]
Grace, M.H.; Esposito, D.; Timmers, M.A.; Xiong, J.; Yousef, G.; Komarnytsky, S.; Lila, M.A. Chemical composition, antioxidant and anti-inflammatory properties of pistachio hull extracts. Food Chem. 2016, 210, 85–95. [Google Scholar] [CrossRef] [PubMed]
Mandalari, G.; Barreca, D.; Gervasi, T.; Roussell, M.A.; Klein, B.; Feeney, M.J.; Carughi, A. Pistachio Nuts (Pistacia vera L.): Production, Nutrients, Bioactives and Novel Health Effects. Plants 2022, 11, 21. [Google Scholar] [CrossRef] [PubMed]
Nadimi, A.E.; Ahmadi, Z.; Falahati-pour, S.K.; Mohamadi, M.; Nazari, A.; Hassanshahi, G.; Ekramzadeh, M. Physicochemical properties and health benefits of pistachio nuts A comprehensive review. Int. J. Vitam. Nutr. Res. 2020, 90, 564–574. [Google Scholar] [CrossRef]
Bulló, M.; Juanola-Falgarona, M.; Hernández-Alonso, P.; Salas-Salvadó, J. Nutrition attributes and health effects of pistachio nuts. Br. J. Nutr. 2015, 113, S79–S93. [Google Scholar] [CrossRef]
Derbyshire, E.; Higgs, J.; Feeney, M.J.; Carughi, A. Believe it or ‘nut’: Why it is time to set the record straight on nut protein quality: Pistachio (Pistacia vera L.) focus. Nutrients 2023, 15, 2158. [Google Scholar] [CrossRef]
Saglam, C.; Cetin, N. Prediction of Pistachio (Pistacia vera L.) Mass Based on Shape and Size Attributes by Using Machine Learning Algorithms. Food Anal. Meth. 2022, 15, 739–750. [Google Scholar] [CrossRef]
Mandavi-Jafari, S.; Salehinejad, H.; Talebi, S. A Pistachio Nuts Classification Technique: An ANN Based Signal Processing Scheme. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation, Vienna, Austria, 10–12 December 2008; pp. 447–451. [Google Scholar]
Mahmoudi, A.; Omid, M.; Aghagolzadeh, A. Artificial neural network based separation system for classifying pistachio nuts varieties. In Proceedings of the International Conference on Innovations in Food and Bioprocess Technologies, Pathum Thani, Thailand, 12–14 December 2006. [Google Scholar]
Omid, M.; Firouz, M.S.; Nouri-Ahmadabadi, H.; Mohtasebi, S.S. Classification of peeled pistachio kernels using computer vision and color features. Eng. Agric. Environ. Food 2017, 10, 259–265. [Google Scholar] [CrossRef]
Farazi, M.; Abbas-Zadeh, M.J.; Moradi, H. A machine vision based pistachio sorting using transferred mid-level image representation of Convolutional Neural Network. In Proceedings of the 2017 10th Iranian Conference on Machine Vision and Image Processing (MVIP), Isfahan, Iran, 22–23 November 2017; pp. 145–148. [Google Scholar]
Abbaszadeh, M.; Rahimifard, A.; Eftekhari, M.; Zadeh, H.G.; Fayazi, A.; Dini, A.; Danaeian, M. Deep learning-based classification of the defective pistachios via deep autoencoder neural networks. arXiv 2019, arXiv:1906.11878. [Google Scholar]
Dini, A.; Zadeh, H.G.; Rahimifard, A.; Fayazi, A.; Eftekhari, M.; Abbaszadeh, M. Designing a hardware system to separate defective pistachios from healthy ones using deep neural networks. Iran. J. Biosyst. Eng. 2020, 51, 149–159. [Google Scholar]
Kumar, S.S.; Sigappi, A.N.; Thomas, G.A.S.; Robinson, Y.H.; Raja, S.P. Classification and Analysis of Pistachio Species Through Neural Embedding-Based Feature Extraction and Small-Scale Machine Learning Techniques. Int. J. Image Graph. 2024, 24, 23. [Google Scholar] [CrossRef]
Rahimzadeh, M.; Attar, A. Detecting and counting pistachios based on deep learning. Iran J. Comput. Sci. 2022, 5, 69–81. [Google Scholar] [CrossRef]
Sabah, A.S.; Abu-Naser, S.S. Pistachio Variety Classification using Convolutional Neural Networks. Int. J. Acad. Inf. Syst. Res. (IJAISR) 2024, 8, 113–119. [Google Scholar]
Singh, D.; Taspinar, Y.S.; Kursun, R.; Cinar, I.; Koklu, M.; Ozkan, I.A.; Lee, H.N. Classification and Analysis of Pistachio Species with Pre-Trained Deep Learning Models. Electronics 2022, 11, 981. [Google Scholar] [CrossRef]
Avuçlu, E. Classification of pistachio images with the resnet deep learning model. Selcuk. J. Agric. Food Sci. 2023, 37, 291–300. [Google Scholar] [CrossRef]
Karadag, A.E.; Kiliç, A. Non-destructive robotic sorting of cracked pistachio using deep learning. Postharvest Biol. Technol. 2023, 198, 112229. [Google Scholar] [CrossRef]
Turkay, Y.; Tamay, Z.S. Pistachio Classification Based on Acoustic Systems and Machine Learning. Elektron. Elektrotech. 2024, 30, 4–13. [Google Scholar] [CrossRef]
Ozkan, I.A.; Koklu, M.; Saraçoglu, R. Classification of Pistachio Species Using Improved k-NN Classifier. Prog. Nutr. 2021, 23, e2021044. [Google Scholar] [CrossRef]
Liu, X.Y.; Yang, L.H.; Zhu, H.F.; Zhang, L.Q.; An, Y.Y.; Wei, L.N.; Han, Z.Z. The pistachio quality detection based on deep features plus unsupervised clustering. J. Food Process Eng. 2024, 47, e14519. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Dandil, E.; Polattimur, R. Dog behavior recognition and tracking based on faster R-CNN. J. Fac. Eng. Archit. Gazi Univ. 2020, 35, 819–834. [Google Scholar] [CrossRef]
Chen, H.Z.; Chen, A.; Xu, L.L.; Xie, H.; Qiao, H.L.; Lin, Q.Y.; Cai, K. A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric. Water Manag. 2020, 240, 106303. [Google Scholar] [CrossRef]
Arsa, D.M.S.; Susila, A.A.N.H. VGG16 in batik classification based on random forest. In Proceedings of the 2019 International Conference on Information Management and Technology (ICIMTech), Jakarta, Indonesia, 19–20 August 2019; pp. 295–299. [Google Scholar]
Bayar, B.; Stamm, M.C. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding And Multimedia Security, Galicia, Spain, 20–22 June 2016; pp. 5–10. [Google Scholar]
Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence And Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
Akhtar, N.; Ragavendran, U. Interpretation of intelligence in CNN-pooling processes: A methodological survey. Neural Comput. Appl. 2020, 32, 879–898. [Google Scholar] [CrossRef]
Habib, G.; Qureshi, S. Optimization and acceleration of convolutional neural networks: A survey. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 4244–4268. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 3320–3328. [Google Scholar]
Kolar, Z.; Chen, H.; Luo, X. Transfer learning and deep convolutional neural networks for safety guardrail detection in 2D images. Autom. Constr. 2018, 89, 58–70. [Google Scholar] [CrossRef]
Stegmayer, G.; Milone, D.H.; Garran, S.; Burdyn, L. Automatic recognition of quarantine citrus diseases. Expert Syst. Appl. 2013, 40, 3512–3517. [Google Scholar] [CrossRef]
Atas, M.; Yardimci, Y.; Temizel, A. A new approach to aflatoxin detection in chili pepper by machine vision. Comput. Electron. Agric. 2012, 87, 129–141. [Google Scholar] [CrossRef]
Inan, O.; Uzer, M.S. A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation. Arab. J. Sci. Eng. 2021, 46, 1199–1212. [Google Scholar] [CrossRef]
Uzer, M.S.; Inan, O.; Yilmaz, N. A hybrid breast cancer detection system via neural network and feature selection based on SBS, SFS and PCA. Neural Comput. Appl. 2013, 23, 719–728. [Google Scholar] [CrossRef]
Tutuncu, K.; Cataltas, O.; Koklu, M. Occupancy detection through light, temperature, humidity and CO₂ sensors using ANN. Int. J. Ind. Electron. Electr. Eng 2016, 5, 63–67. [Google Scholar]
Koklu, M.; Tutuncu, K. Classification of chronic kidney disease with most known data mining methods. Int. J. Adv. Sci. Eng. Technol. 2017, 5, 14–18. [Google Scholar]
Acharya, U.R.; Fernandes, S.L.; WeiKoh, J.E.; Ciaccio, E.J.; Fabell, M.K.M.; Tanik, U.J.; Rajinikanth, V.; Yeong, C.H. Automated detection of Alzheimer’s disease using brain MRI images–a study with various feature extraction techniques. J. Med. Syst. 2019, 43, 302. [Google Scholar] [CrossRef] [PubMed]
Rajinikanth, V.; Joseph Raj, A.N.; Thanaraj, K.P.; Naik, G.R. A customized VGG19 network with concatenation of deep and handcrafted features for brain tumor detection. Appl. Sci. 2020, 10, 3429. [Google Scholar] [CrossRef]
Koklu, M.; Kursun, R.; Taspinar, Y.S.; Cinar, I. Classification of date fruits into genetic varieties using image analysis. Math. Probl. Eng. 2021, 2021, 4793293. [Google Scholar] [CrossRef]
Taspinar, Y.S.; Cinar, I.; Koklu, M. Classification by a stacking model using CNN features for COVID-19 infection diagnosis. J. X-Ray Sci. Technol. 2022, 30, 73–88. [Google Scholar] [CrossRef]

Figure 1. Sample images of pistachio species utilized in the study.

Figure 2. Block representation of the proposed approaches.

Figure 3. Network architecture of proposed MSU-CNN model.

Figure 4. Training accuracy and loss graphs of fold-5 of the ResNet-50 model.

Figure 5. Confusion matrix with 5-fold CV for the ResNet-50 model.

Figure 6. ROC curves with 5-fold CV for the ResNet-50 model.

Figure 7. Comparison of classifier accuracies using 5-fold CV.

Figure 8. Comparison of classifier F1-scores using 5-fold CV.

Figure 9. Comparison of classifier precisions using 5-fold CV.

Figure 10. Comparison of classifier sensitivities using 5-fold CV.

Figure 11. Evaluation of specificity of classifiers using 5-fold CV.

Figure 12. Comparison of classifier ROC-AUC values using 5-fold CV.

Table 1. Confusion matrix representation for the binary classification.

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

Table 2. Formulas for performance metrics of the classifier.

Performance Metrics	Formulas
Accuracy	$\frac{T P + T N}{T P + T N + F P + F N}$
Sensitivity	$\frac{T P}{T P + F N}$
Specificity	$\frac{T N}{T N + F P}$
Precision	$\frac{T P}{T P + F P}$
F1-score	$\frac{2 \times T P}{2 \times T P + F P + F N}$

Table 3. Classification accuracy and training parameters of all models.

Network Models	Training Optimization Algorithms	Epochs	Batch Size	Classification Accuracies of 5-Fold Cross-Validation (%)
Network Models	Training Optimization Algorithms	Epochs	Batch Size	Fold1	Fold2	Fold3	Fold4	Fold5	Mean Accuracy
AlexNet	SGDM	20	9	93.94	90.47	95.81	98.84	95.34	94.88
GoogLeNet	SGDM	10	11	96.74	96.05	98.37	96.05	96.74	96.79
Proposed MSU-CNN	Adam	40	12	96.97	96.51	97.21	96.28	96.97	96.79
VGG-16	SGDM	8	9	96.97	98.14	97.44	99.07	97.90	97.90
EfficientNet-b0	SGDM	8	9	99.53	98.60	99.30	98.60	98.37	98.88
ResNet-18	SGDM	17	13	98.83	99.07	98.60	99.07	99.53	99.02
Inception-v3	SGDM	8	9	99.30	99.07	98.60	99.77	99.30	99.21
ResNet-50	Adam	15	16	100	100	100	99.07	99.07	99.63

Table 4. Performance metrics of all models.

Networks	Mean Performance Metrics of 5-Fold CV
Networks	Accuracy	F1-Score	Precision	Sensitivity	Specificity	ROC_AUC
AlexNet	0.9488	0.9466	0.9553	0.9426	0.9426	0.9936
GoogLeNet	0.9679	0.9669	0.9712	0.9639	0.9639	0.9957
Proposed MSU-CNN	0.9679	0.9672	0.9668	0.9678	0.9678	0.9952
VGG-16	0.979	0.9786	0.9791	0.9784	0.9784	0.9982
EfficientNet-b0	0.9888	0.9886	0.9878	0.9894	0.9894	0.9995
ResNet-18	0.9902	0.99	0.9901	0.9899	0.9899	0.9993
Inception-v3	0.9921	0.9919	0.9916	0.9923	0.9923	0.9997
ResNet-50	0.9963	0.9978	0.9935	0.9984	0.9956	0.9997

Table 5. Performance evaluation of the proposed ResNet-50 model.

Fold	Accuracy	Precision	Recall	Specificity	F1-Score	ROC_AUC
1	1	1	1	1	1	1
2	1	1	1	1	1	1
3	1	1	1	1	1	1
4	0.9907	0.9945	0.9837	0.9959	0.9891	0.9999
5	0.9907	0.9945	0.9836	0.9959	0.989	0.9987
Mean	0.9963	0.9978	0.9935	0.9984	0.9956	0.9997

Table 6. Comparison of the proposed approaches with the literature.

References	Sample Size	Class Count	Classifier	Accuracy (%)
Mahdavi-Jafari, Salehinejad, and Talebi (2008) [8]	150	3	ANN	99.89
Omid et al. (2017) [10]	850	5	ANN	99.40
Omid et al. (2017) [10]	850	5	SVM	99.80
Farazi, Abbas-Zadeh, and Moradi (2017) [11]	1000	3	AlexNet + SVM	98
Farazi, Abbas-Zadeh, and Moradi (2017) [11]	1000	3	GoogleNet + SVM	99
Abbaszadeh et al. (2019) [12]	305	2	Deep Auto-encoder Neural Networks	80.30
Dini et al. (2020) [13]	958	2	GoogleNet	95.80
			ResNet	97.20
			VGG16	95.83
Rahimzadeh and Attar (2021) [15]	3927	2	ResNet50	85.28
			ResNet152	85.19
			VGG16	83.32
Ozkan, Koklu, and Saracoglu (2021) [21]	2148	2	k-NN	94.18
Singh et al.(2022) [17]	2148	2	AlexNet	94.42
			VGG16	98.84
			VGG19	98.14
Avuclu (2023) [18]	2148	2	ResNet	86.16
Kumar et al. (2023) [14]	2148	2	DNN-based feature extraction and logistic regression	97.20
Karadag, and Kilic (2023) [19]	3700	2	Deep learning-based object detection for open- and closed-shelled pistachios, respectively	98 and 85
Çinarer et al. (2024) [1]	1200	6	VGGNet-19 in the LAB color space and applied 5-fold CV, respectively	100 and 99.16
Sabah and Abu-Naser (2024) [16]	6000	2	Augmentation technique and VGG16	99.91
Liu et al. (2024) [22]	2095	2	ResNet18 and clustering	99.31
This study	2148	2	AlexNet (5-fold CV)	94.88
			Proposed MSU-CNN (5-fold CV)	96.79
			GoogLeNet (5-fold CV)	96.79
			VGG16 (5-fold CV)	97.90
			EfficientNet-b0 (5-fold CV)	98.88
			ResNet-18 (5-fold CV)	99.02
			Inception-v3 (5-fold CV)	99.21
			ResNet-50 (5-fold CV)	99.63
			AlexNet (80% train-20% test)	98.84
			Proposed MSU-CNN (80% train-20% test)	97,21
			GoogLeNet (80% train-20% test)	98.37
			VGG16 (80% train-20% test)	99.07
			EfficientNet-b0 (80% train-20% test)	99.53
			ResNet-18 (80% train-20% test)	99.53
			Inception-v3 (80% train-20% test)	99.77
			ResNet-50 (80% train-20% test)	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Uzer, M.S. Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species. Appl. Sci. 2025, 15, 4516. https://doi.org/10.3390/app15084516

AMA Style

Uzer MS. Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species. Applied Sciences. 2025; 15(8):4516. https://doi.org/10.3390/app15084516

Chicago/Turabian Style

Uzer, Mustafa Serter. 2025. "Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species" Applied Sciences 15, no. 8: 4516. https://doi.org/10.3390/app15084516

APA Style

Uzer, M. S. (2025). Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species. Applied Sciences, 15(8), 4516. https://doi.org/10.3390/app15084516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Classification Consisting of Pre-Trained Models and Proposed Model Using K-Fold Cross-Validation for Pistachio Species

Abstract

1. Introduction

2. Materials and Methods

2.1. Pistachio Image Dataset

2.2. Convolutional Neural Networks

2.3. Transfer Learning

2.4. Pre-Trained CNN Models

2.5. K-Fold Cross-Validation

2.6. Confusion Matrix

2.7. Performance Metrics

2.8. ROC and AUC

3. The Proposed Approaches

3.1. Pre-Trained CNN Models with Transfer Learning

3.2. Proposed MSU-CNN Model

4. Results and Discussion

4.1. Comparison of Proposed Models Among Themselves

4.2. Comparison of the Proposed Approaches with the Literature

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI