Next Article in Journal
Enhancing Supercapacitor Simulation Accuracy Through a Novel Hybrid Modeling Approach
Previous Article in Journal
Effect of Gaseous Ozone and Hydrogen Peroxide Treatment on the Polyphenolic Profile of Tomato Fruits Grown Under Cover
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Classifier Model Using Fine-Tuned Convolutional Neural Network and Transfer Learning Approaches for Prostate Cancer Detection

by
Murat Sarıateş
and
Erdal Özbay
*
Department of Computer Engineering, Firat University, 23119 Elazig, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(1), 225; https://doi.org/10.3390/app15010225
Submission received: 29 November 2024 / Revised: 23 December 2024 / Accepted: 28 December 2024 / Published: 30 December 2024

Abstract

:
Background: Accurate and reliable classification models play a major role in clinical decision-making processes for prostate cancer (PCa) diagnosis. However, existing methods often demonstrate limited performance, particularly when applied to small datasets and binary classification problems. Objectives: This study aims to design a fine-tuned deep learning (DL) model capable of classifying PCa MRI images with high accuracy and to evaluate its performance by comparing it with various DL architectures. Methods: In this study, a basic convolutional neural network (CNN) model was developed and subsequently optimized using techniques such as L2 regularization, Tanh activation, dropout, and early stopping to enhance its performance. Additionally, a pyramid-type CNN architecture was designed to simultaneously evaluate both fine details and broader structures by combining low- and high-resolution information through feature maps extracted from different CNN layers. This approach enabled the model to learn complex features more effectively. For performance comparison, the developed fine-tuned enhanced pyramid network (FT-EPN) model was benchmarked against models such as Vgg16, Vgg19, Resnet50, InceptionV3, Densenet121, and Xception, which were trained using transfer learning (TL) techniques. It was also compared to next-generation models such as vision transformer (ViT) and MaxViT-v2. Results: The developed fine-tuned model achieved an accuracy rate of 96.77%, outperforming pre-trained TL models and next-generation models like ViT and MaxViT-v2. Among the TL models, Vgg19 achieved the highest accuracy rate at 92.74%. In comparison, ViT achieved an accuracy of 93.55%, while MaxViT-v2 achieved an accuracy of 95.16%. Conclusions: This study presents an optimized FT-EPN model to enhance the performance of DL models for PCa classification, offering a reference solution for future research. This model provides significant advantages in terms of classification accuracy and simplicity and has been evaluated as an effective solution in clinical applications.

1. Introduction

Cancer is one of the most important diseases due to its mortality rates and impact on public health worldwide [1]. According to the World Health Organization (WHO), millions of people are diagnosed with cancer every year and this disease is considered to be one of the most common causes of death after cardiovascular diseases [2]. The most common types of cancer include lung, breast, colorectal, prostate, and stomach cancers [3]. Prostate cancer (PCa) is the second most common type of cancer in men and usually occurs at older ages. It is predicted that PCa cases will exceed 1.5 million annually [4]. Prostate cancer diagnosis rates are higher, especially in developed countries, because screening programs and early detection methods are more common. However, treatment success is often directly related to early diagnosis. Therefore, early diagnosis and management of prostate cancer have an important place in the fight against cancer. Although screening incentives, avoidance of carcinogens, balanced diet, and regular prostate-specific antigen (PSA) tests have been suggested to reduce PCa deaths, the contribution of these methods has been limited. Traditional diagnostic methods are generally invasive, which causes difficult processes for patients and heavy financial burdens on the healthcare system [5]. The prevalence of PCa, especially in countries with high elderly populations, increases the need for medical care and the workload on healthcare personnel. Therefore, the development of automated methods that provide less invasive and more accurate diagnoses from medical imaging data has become an important research area in PCa diagnosis [6].
In recent years, artificial intelligence techniques such as deep learning (DL) and transfer learning (TL) have made significant advancements, particularly in the field of image classification [7]. DL models, such as convolutional neural networks (CNNs), have achieved high success rates in image processing and classification tasks. TL, on the other hand, delivers faster and more effective results by fine-tuning previously trained models for new tasks [8]. In this study, a CNN model was developed to classify prostate magnetic resonance imaging (MRI) data to achieve effective prostate cancer detection (PCaD). This study also aims to enhance the accuracy of this model by employing the TL technique. The use of artificial intelligence in the healthcare sector has emerged as a promising approach for PCaD [9]. In addition to various DL and TL models, machine learning (ML) models also strive to improve diagnostic accuracy while reducing the workload of medical personnel [10]. Positron emission tomography (PET) and computed tomography (CT), like MRI, are fundamental tools used to image tissues and organs in PCaD. However, these images often contain noise that must be eliminated before diagnosis [11]. In the study by Pepe et al., PSMA PET/CT achieved an overall diagnostic accuracy rate of 92.3% in detecting nodal metastases in clinically significant PCa (especially ISUP Grade Group 2), with 100% accuracy in Grade Group 2 patients and 63.6% in Grade Group 5 patients; furthermore, negative PSMA PET/CT results did not exclude the need for extended pelvic lymph node dissection (ePLND) in high-risk patients or in the presence of ductal adenocarcinoma [12]. Fiorentino et al. have determined that the Gleason score determined by biopsy, which is important in determining the treatment method in PCa, was 82.6% consistent with the Gleason score after radical prostatectomy, but in 17.4% of patients, the biopsy Gleason score underestimated the true aggressiveness of the tumor [13]. The study revealed that the Gleason score agreement increased when the ratio between the total volume of the biopsy cores in the tumor area and the tumor volume after radical prostatectomy was >0.05 and that certain threshold values related to the tumor volume during biopsy and the total biopsy volume were critical for accurate Gleason score estimation. In a review study, Sekhoacha et al. summarize the diagnostic methods of PCa, genetic mutations that play a role in the onset and progression of the disease, current treatment options, and research on alternative approaches such as conventional medicine, nanotechnology, and gene therapy, highlighting the impact of combined treatment strategies despite the disease still being incurable [14]. Transferring knowledge from different datasets to the target dataset through TL can significantly enhance the accuracy of PCa diagnosis models [15]. In this context, the fine-tuned enhanced pyramid network (FT-EPN) model proposed in our study was tested on a dataset containing benign and malignant PCa MRI images.
To evaluate the performance of the proposed model for PCaD, tests were conducted on a publicly available dataset, and the results were analyzed. The developed model aims to enhance the PCaD process and add value to the healthcare system by contributing to accurate and rapid diagnosis. This study introduces a new approach to PCaD, addressing specific limitations identified in previous research. In some prior experiments, the entire benchmark dataset was not fully utilized, or only a single split validation was typically performed. Additionally, the accuracy of many existing models remains below 90% and requires improvement. Furthermore, some studies revealed significant discrepancies between precision and recall, leading to classification bias [16,17,18]. To address these limitations, a robust CNN with a fine-tuned structure and high accuracy in classifying PCa MRI images was developed in this study. Regularization techniques and hyperparameter tuning were applied to prevent overfitting and improve the model’s generalization ability. A deep CNN architecture was designed to capture the complex features of PCa MRI images, achieving a classification accuracy of 96.77%. The model’s effectiveness was further evaluated using common TL models and popular architectures such as ViT and MaxViT-v2. In this study, we propose FT-EPN, a fine-tuned, enhanced pyramid-type deep network model that facilitates more effective learning of complex features. This is achieved by gradually adjusting the number of neurons in the layers, enabling accurate classification of PCa MRI images.
The main contributions of the paper are summarized as follows:
  • A powerful and flexible FT-EPN model was developed to classify PCa images. Unlike the basic model, this model can be fine-tuned in terms of the number of layers, neuron structure, and activation functions. Its multi-layered structure provides high accuracy by extracting both local and abstract features.
  • To improve model performance, architectural depth was enhanced using regularization techniques such as L2 regularization, dropout layers, Tanh activation, early stopping, and an appropriate learning rate. The effects of hyperparameters were tested through large-scale studies to determine the most suitable configuration, preventing overfitting and improving generalization ability.
  • A deep, pyramid-type CNN architecture was designed to better capture the complex features of prostate images. The information extraction capacity of the model was increased by adding convolutional layers before the MaxPooling layers, resulting in a classification accuracy of 96.77%.
  • The effective performance of the developed FT-EPN model was evaluated using various metrics. Classification was performed with Vgg16, Vgg19, ResNet50, InceptionV3, DenseNet121, and Xception architectures pre-trained on ImageNet, as well as popular TL models such as ViT and MaxViT-v2.

2. Related Works

In recent years, many researchers have addressed the problem of classifying PCa MRI images using ML, DL, and TL techniques. Although publicly available datasets for PCa MRI image classification are available, most of them have limited or incomplete access. Various models have been developed for PCaD, and different techniques have been applied to improve their performance. Chen et al. proposed a method for PCa classification by pre-training InceptionV3 and Vgg16 models on ImageNet and fine-tuning them with multi-parameter MRI data [19]. Xu et al. used residual networks that can learn low-level and high-level features to detect PCa and demonstrated the potential of these networks in detecting complex disease symptoms [20].
Singh et al. proposed a novel DL technique combining CNN and TL for PCaD in MRI images, achieving 87% accuracy, 85% specificity, and 89% sensitivity [21]. Linkon et al. investigated DL techniques in histopathology images for PCa diagnosis and Gleason grading, demonstrating the potential for accurate diagnosis and classification [22]. Rabilloud et al. systematically evaluated DL techniques in the digital pathology of PCa, summarizing the current status, innovations, and challenges, and achieved 99% AUC performance [23].
Alkadi et al. used deep CNN and 3D sliding window techniques to segment prostate lesions in T2W MRI images, achieving 99% AUC, 89% accuracy, and 92% recall performance [24]. Viswanath et al. evaluated the performance of supervised classifiers for PCaD with T2W MRI in a multicenter approach, aiming to improve the treatment process through early and accurate diagnosis. QDA classifier showed the highest performance with AUC values of 0.735, 0.683, and 0.768, respectively, for detecting PCa spread in three centers [25]. Abraham and Nair proposed a method for PCa grading by combining automatic and manually generated features; this method achieved a kappa score of 0.2326 and PPV of 80.26% using autoencoders to extract features and softmax for classification [26]. Song et al. developed a patch-based DCNN model to distinguish cancerous and non-cancer patients with MRI data based on regions manually marked by radiologists; the model performed with an AUC of 0.944, sensitivity of 87%, and specificity of 90.6% [27].
Yu et al. used Faster R-CNN and PI-RADSAI methods in PCaD, achieving improvement in accuracy and speed with Adam and SGD dual-optimizers and human-loop DL analysis [28]. Bygari et al. achieved 92.38% accuracy on histopathological images with a model composed of UNet, Xception, and EfficientnetB7 [29]. Zhu et al. developed a CNN method to predict the origin of bone metastatic cancer, which surpassed existing methods with 95.2% accuracy [30]. Talaat et al. achieved 95.24% accuracy and high sensitivity by improving the ResNet50-based PCaDM model with Faster R-CNN and dual optimizers [31]. Salman et al. proposed an automatic cancer detection system that achieved 97% accuracy on prostate biopsy images [32].
Recent studies have aimed to improve the accuracy and speed of DL, TL, and ViT methods for MRI-based PCaD. However, there are limitations such as dependency on large labeled datasets, interpretability difficulties, and risks of overfitting. To overcome these problems, careful data processing methods such as hybrid models and pyramid network learning strategies adapted to small datasets are used in this study.

3. Materials and Methods

In this study, experiments were conducted on various DL architectures to compare the performances of different model configurations, and the performance results were thoroughly evaluated. First, a basic model was developed, and pyramid-type FT-EPN architecture was designed by modifying the number of neurons in the layers of the basic model. Performance was further enhanced through fine-tuning and TL in this model. To assess the effectiveness of the proposed FT-EPN model, its performance was compared with traditional pre-trained TL models, the ViT—which has gained popularity in recent years for producing more effective results than CNNs, particularly with insufficient data—and the MaxViT-v2 model, a hybrid combination of CNN and ViT. The models and methods employed in this study ensured an objective and fair comparison of all model performances.

3.1. Dataset

A dataset of PCa MRI images, publicly available to researchers via the Kaggle platform, was used in this study. The dataset consists of a total of 620 prostate MRI images divided into two classes: 240 ‘benign’ and 380 ‘malignant’. Each MRI image in the dataset is black and white, with an 8-bit depth, a resolution of 96 dpi, and in JPG format. The images in the benign class have a resolution of 320 × 320 pixels, while those in the malignant class have a resolution of 256 × 256 pixels. A sample image from each class in the dataset is shown in Figure 1.
The dataset was initially converted to a NumPy 2.2.1 (Numerical Python) array, a widely used scientific computing library for the Python programming language, to facilitate model training. Labels were assigned to each dataset sample based on their class names and converted into binary vectors using the LabelEncoder, enabling effective classification. To calculate classification accuracies, the dataset was split into 60% training, 20% validation, and 20% testing sets.

3.2. Base Model

In this study, instead of using pre-built CNN architectures, the Sequential model was employed to create the basic model. One of the main reasons for this choice is the small size of the dataset, which contains only two classes. In such cases, using more complex architectures like VggNet, DenseNet, or ResNet can lead to unnecessary computational overhead and an increased risk of overfitting. Additionally, the Sequential model’s advantages in rapid prototyping and development proved highly beneficial for achieving results within a short timeframe. Its straightforward and understandable structure accelerated the development process, enabling faster testing and verification. The structure of the Sequential model, which incorporates both classification and feature extraction stages, is illustrated in Figure 2.
Conv2D: These layers are used to extract features from the input data. The first Conv2D layer uses 128 filters with a 3 × 3 kernel. The ReLU activation function is applied in this layer, which reduces negative values to zero, provides a non-linear transformation, and increases the model’s learning capacity. To ensure that the output size remains the same as the input size after the convolution process, padding = ‘same’ is used. The second Conv2D layer uses 64 filters and also employs ReLU activation. This layer processes the features extracted from the previous layer at a higher level.
MaxPooling2D: This layer reduces the dimensionality and complexity of the model by selecting the maximum value within a specified window size. One MaxPooling2D layer is added after each convolution layer. These layers represent the feature maps at a lower resolution, thereby reducing the computational load and minimizing the risk of overfitting.
Flatten: This layer converts the multidimensional outputs from the convolutional layers into a one-dimensional vector. This step is essential for fully connected layers, as they require flattened inputs to operate.
Dense: The model contains three dense layers. The first dense layer has 128 neurons and uses the ReLU activation function, providing non-linear processing of the features extracted by the convolutional layers. The second dense layer has 32 neurons, again using ReLU activation. The final dense layer is the output layer, which consists of 2 neurons and employs the sigmoid activation function. This layer is used for binary classification and outputs a probability value for each class.
The Adam optimization algorithm was used to train the Sequential-based model, with the training process carried out over 10 epochs.

3.3. Transfer Learning (TL)

TL is a learning method that reuses the knowledge from a previously trained model to solve a new problem. This approach speeds up model training and improves performance. In DL models, leveraging the features of a model that has been trained extensively on large datasets can reduce the quantity of data required for a new task and enhance generalization performance. TL enables the model to quickly adapt to complex tasks by utilizing previously learned features, while also reducing computational costs by shortening the training time. A schematic representation of TL through fine-tuning is shown in Figure 3.
In this study, to compare the performance of the proposed fine-tuned FT-EPN model, TL was applied to the same dataset using the most popular traditional models—Vgg16, Vgg19, ResNet50, InceptionV3, DenseNet121, and Xception architectures—which were pre-trained on ImageNet.
First, the Vgg16 model pre-trained on the ImageNet dataset was loaded, excluding the classification layers. The ‘include_top=False’ argument was used to ensure that only the feature extraction part of the model was retained. This allowed for the preservation of general features learned in the deep layers of the model, such as edges, colors, and textures. The trainable parameter of all layers in the model was set to False, freezing the weights and preventing them from being updated during the training process, thereby allowing for only the upper layers to learn. Since the main purpose of the TL strategy is to reuse previously learned features, this step facilitated efficient general feature extraction from the data and significantly reduced training time. The optimal value for the patience parameter was determined to be 5. Consequently, the EarlyStopping mechanism was employed to halt training when no improvement in the validation loss (val_loss) was observed for 5 consecutive epochs. The restore_best_weights=True setting ensured that the model’s weights were restored to their state when the validation loss reached its minimum value. This approach helped prevent overfitting and improved overall performance. A sequential model was then created, with the pre-trained Vgg16 base added as the first component. Following this, the following new layers were added sequentially:
Flatten: This layer flattens the data coming from the Vgg16 base. The data from the Vgg16 model are multidimensional and are transformed into a one-dimensional vector using the Flatten() layer. This process reshapes the data into a form suitable for the classifier layers, allowing for the dense layers to process it effectively.
First Dense Layer: This layer contains 128 neurons. The features extracted by Vgg16 are reduced to a 128-dimensional vector at this stage. The Tanh activation function is used, which returns values between −1 and 1, limiting the output of each neuron within this range. This function helps mitigate the vanishing gradient problem and allows for the relationships between neurons to be expressed over a broader range. The L2 regularization value is set to 0.0001, helping reduce overfitting by preventing the weights from becoming excessively large.
Second Dense Layer: This layer contains 64 neurons and converts the 128-dimensional vector from the previous layer into a 64-dimensional vector. Again, the Tanh activation function and L2 regularization are applied.
Output Dense Layer: In this layer, 2 neurons are used, corresponding to the two classes in the dataset. The sigmoid activation function, commonly used in binary classification tasks, is employed here.
The TL models were trained with the Adam optimization algorithm, using a learning rate of 0.0001, a batch size of 32, and 10 epochs as the model parameters.

3.4. Vision Transformer (ViT)

Transformers are deep neural networks initially used in the field of natural language processing. Due to their powerful capabilities, transformer-based models have often performed similarly to or better than other types of networks, such as CNNs and RNNs, in visual benchmarking studies [33]. The ViT divides each image into small patches. These patches are flattened according to their positional order and each patch is represented at a certain size. Positional encoding is used in ViT to preserve the order of the patches, allowing for the model to learn the relationship of each patch with others. ViT models use the self-Attention mechanism (SAM) to understand the context between distant pixels in the image. This enables the model to evaluate the relationship between any two points in the image. Structurally, the inter-layer SAM allows for each patch to establish relationships with all other patches, supporting more global feature learning. The structure of the ViT architecture used in this study is shown in Figure 4.
CNNs use convolutional layers to capture spatial hierarchies in images. These layers apply small local filters to learn features. Low-level features, such as basic edges and textures, are learned in the initial layers, while more complex patterns are captured in deeper layers. Structurally, CNNs have a flat hierarchy, meaning that larger, global features are represented upon moving up to the top layers. As a result, CNNs are able to work quickly and effectively on smaller datasets and are particularly successful at detecting local patterns. They are widely used, with their structures optimized for TL. In contrast, ViT models can learn more complex relationships and achieve very high accuracies when trained on large datasets with high computational power. In particular, the performance of ViT models offers a significant advantage in tasks that require modeling global relationships.
In this study, an image classification model is developed to perform PCa classification using the ViT architecture. Unlike traditional CNN-based classification approaches, the model divides the images into small patches, uses each patch as an input unit, and models the relationship of each patch with all other patches. First, the image data are rescaled and normalized to a specified size of 150 × 150 pixels and then divided into patches. These patches are connected through the SAM used in the ViT model to learn global relationships within the image. The model passes through layers that include attention layers and multi-layer perceptron (MLP) blocks for each patch. The Transformer block essentially consists of two main components: multi-head self-attention and the MLP block. First, self-attention learns the relationships between different regions (patches) of the image, thereby capturing global features. After this stage, the MLP block further enriches the representation of each patch by processing the output of the attention mechanism.
In the construction of the ViT classifier model, transformer blocks equipped with multi-layer attention and residual connections are built using patch-based image representations. In the model, MLP blocks are used to learn certain non-linear transformations, and the classification layer is adapted for binary classification with a sigmoid activation function. The model, compiled with the Adam optimization algorithm and binary cross-entropy loss function, is trained for 10 epochs with 10% validation data.

3.5. Hybrid Model (MaxViT-v2)

MaxViT, a multi-axis vision transformer, is a hybrid model that combines CNN and ViT architectures [34]. MaxViT uses a multi-axis attention mechanism to learn local and global features simultaneously. In this way, it achieves high performance by effectively capturing both local details and global context in image processing tasks. MaxViT’s block architecture combines a set of attention and convolution mechanisms designed to capture local and global features effectively. While depth convolutions and block attention extract local features, grid attention and channel attention learn global context and feature importance. The multi-axis attention mechanism allows for the model to learn relationships across dimensions simultaneously. Hopping and normalization techniques improve the training process and increase the depth of the model while preserving its performance. This hybrid approach gives MaxViT high accuracy and efficiency in computer vision tasks. Its architectural structure is shown in Figure 5.
While developing the model, a DataFrame was first created containing the location, name, label, and dimensions of each image. The images were resized to 224 × 224 and converted to tensors. Augmentation was performed by applying operations such as random horizontal flip, brightness and contrast adjustment, and random rotation. The dataset was divided into 80% training, 10% validation, and 10% test sets. The pre-trained MaxViT-v2 model was loaded using the Timm library. Then, the classification layer of the model was restructured to match the number of classes in the current dataset (2 classes). The weights of all layers were frozen, and only the classification layer was allowed to be trained. CrossEntropyLoss was used as the loss function, the Adam algorithm (lr = 0.001) was used for optimization, and the early stopping mechanism was defined to prevent overfitting during training. In this way, the model was trained for 15 epochs.

3.6. Developed Fine-Tuned Enhanced Pyramid Network (FT-EPN)

Our first strategy to improve the base model was to focus on reducing overfitting. To achieve this, we integrated dropout layers and implemented the following steps: To address overfitting, L2 weight regularization was applied, and among the tested regularization coefficients, a coefficient of 0.0001 was found to be the most effective. Dropout rate values of 0.3 and 0.4 were also evaluated, but it was observed that the L2 regularization approach performed better. L2 regularization adds a penalty term, equal to the square of the model coefficients, to the loss function of the model. This penalty is calculated based on the size of the weights and prevents the model’s weights from reaching very large values. L2 regularization squares each element of the weight matrices in the model and adds the sum of these values to the loss function. L2 regularization updates the loss function of the model, as shown in Equation (1):
L n e w = L o r i g i n a l + λ w 2
Here, new represents the updated loss value after regularization, original represents the original loss value, λ represents the regularization coefficient (0.0001), and w represents each weight in the model. This penalty prevents the weights from reaching very large values, creating a model with smaller weights. Limiting the weights prevents the model from overfitting to the training data and helps it become more generalizable. When using the Adam optimization algorithm, the most appropriate value for the learning rate was evaluated as 0.0001. The epoch step was set to 10 and the patience value was set to 3. Early stopping was applied when no progress was observed, and training was stopped if there was no improvement. A convolutional layer was added before each MaxPooling layer in the robust model created with the optimal learning rate, L2 regularization, and early stopping, resulting in an increase in accuracy.
A pyramid-type architecture was applied to help the model learn complex features more effectively, and the number of neurons in the layers was gradually adjusted. In this strategy, the model learned simple features by using fewer neurons in the first layers, and the number of neurons was increased in the deeper layers to extract more complex and high-level features. The number of neurons was reduced again towards the last layers, supporting the model’s specialization for the output. This pyramid structure fine-tuned the model’s complexity and increased the accuracy rate by allowing for the model to learn features at different levels in each layer.
Figure 6 illustrates the model’s pyramidal CNN architecture. In this pyramid-type structure, the number of filters increases gradually as the number of layers increases and decreases in the last layer. In this architecture, simple features (edges, corners, etc.) are extracted in the first layers, while more complex structures are learned as the model progresses deeper.
The last layer Conv2D (256 filters, 3 × 3 kernel size) plays an important role in the model and serves two main purposes:
  • Feature Enhancement: It provides a deeper and more detailed analysis of the features extracted in the previous layers of the model. In this layer, the complex patterns learned from the input data are processed at a higher level. As a result, the feature map becomes more meaningful and contains features that are more suitable for the model’s classification or detection tasks.
  • Reducing Channel Depth: Using 512 filters in the previous layer extracts a large number of features, which increases the computational cost. Reducing the number of filters in the last layer to 256 lightens the computational load of the model. Thus, a more compact representation is obtained before moving to the classification layer. This process helps the model work more efficiently while preserving the essential information needed.
A batch size of 32 was chosen because it provided the best accuracy and convergence speed during the training process. Considering the model’s performance and the dataset size, the reason for selecting the Tanh activation function instead of ReLU is that Tanh offers a faster and more balanced learning process.
As indicated in the graph in Figure 7, Tanh compresses the outputs between −1 and 1, ensuring that the neurons have a zero-centered distribution, which contributes to more balanced weight updates. ReLU may cause some neurons to be completely disabled because it sets negative values to zero, potentially limiting the model’s learning capacity and causing the ‘Dead Neuron’ problem. The sensitivity provided by the Tanh function, particularly in learning complex features, increased the model’s accuracy and led to a more stable convergence process.
Finally, the flatten layer flattens the multidimensional feature maps into one-dimensional arrays and converts this output into a format suitable for the fully connected layers. This process prepares the data for classification or regression tasks and enables the model to process the features more effectively in the final stage.

4. Experimental Results

In this study, classification operations were performed on a PCa dataset consisting of a total of 620 MRI images. To improve the performance values obtained with the initially designed Sequential-based model, a fine-tuned, advanced pyramid-type deep network model (FT-EPN) was designed. Then, to compare classification accuracy and other performance metrics, TL methods were applied using pre-trained CNN architectures such as Vgg16, Vgg19, Resnet50, InceptionV3, Densenet121, and Xception. Due to the limited number of samples in the dataset, the performances of ViT and its hybrid combination with CNN, the MaxViT-v2 models, which have gained popularity in recent years and can produce effective results on limited data, were compared with the high-performance values of the FT-EPN model. The results of the experiments, conducted on the common dataset across different models, were evaluated to assess the effect of the selected parameters and the trained model on classification accuracy.

4.1. Performance Metrics

Accuracy measurements from model evaluation metrics were used to compare the results and make better decisions. Accuracy is a measure that indicates the proportion of correctly predicted classes compared to the true classes. For example, accuracy is presented as the ratio of the number of cases predicted as benign for PCa by the model to all truly benign cases, and the number of cases predicted as malignant by the model to all truly malignant cases. The classification accuracy value is obtained from the metrics specified in the confusion matrix shown in Table 1.
True Positive (TP): Represents the quantity of data belonging to the positive class that is correctly classified by the classifier.
True Negative (TN): Represents the quantity of data belonging to the negative class that is correctly classified by the classifier.
False Positive (FP): Refers to data that actually belong to the negative class but are incorrectly classified as the positive class.
False Negative (FN): Refers to data that actually belong to the positive class but are incorrectly classified as the negative class.
Accuracy: To evaluate the success of the proposed model, the accuracy value is calculated by dividing the number of correct predictions by the total number of data points, as shown in Equation (2).
A c c u r a c y = T P + T N T P + T N + F P + F N
Precision: The cost of false positive predictions is especially important. Precision is an important criterion for determining the effectiveness of a model and is calculated as shown in Equation (3).
P r e c i s i o n = T P T P + F P
Recall: Recall is a useful measure when the cost of predicting false negatives is high. It is expected to be as high as possible for efficient performance and is calculated as shown in Equation (4).
R e c a l l = T P T P + F N
F1 score: This value provides the harmonic mean of precision and recall and is calculated as shown in Equation (5).
F 1   s c o r e = 2 p r e c i s i o n     r e c a l l p r e c i s i o n + r e c a l l
Receiver Operating Characteristic (ROC) Curve: A graphical tool used to evaluate the performance of a model in classification problems. Specifically, it is used to visualize the sensitivity and specificity of the model. The ROC curve shows the relationship between sensitivity and specificity at different thresholds. The X-axis shows sensitivity, defined as the false positive rate (FPR) ratio, and the Y-axis shows specificity, defined as the true positive rate (TPR) ratio. Area under the ROC curve (AUC): Refers to the area under the ROC curve. The AUC value can range from 0 to 1. If a model’s AUC value is close to 0.5, it indicates that the model is performing random classification. However, as the AUC value approaches 1, the model’s performance improves.

4.2. Performance Evaluations

In evaluating the performance of the proposed FT-EPN model for PCaD, other criteria were also examined alongside the classification accuracy rates. In this performance evaluation, the confusion matrices obtained from the models were used. Accuracy, precision, recall, and F1 score metrics were calculated using the confusion matrix. First, 90.32% classification accuracy was achieved with the developed Sequential-based model on the PCa MRI dataset using 10 epochs and the Adam optimization algorithm. The accuracy and loss curves of the Sequential-based model are shown in Figure 8, and the confusion matrix and ROC curve are shown in Figure 9.
The accuracy obtained by the Sequential-based model on a limited dataset can be considered insufficient for real-world applications. For a more comprehensive evaluation, the performance of this model is further supported by different metrics, as shown in Table 2.
Although the accuracy rate seems quite good when Table 2 is examined, it can be misleading, especially in unbalanced datasets, as the model tends to predict negative examples as positive. In addition, the difference between the precision and recall values is striking. This indicates that the model is less selective in its positive predictions and makes a large number of positive predictions.
In our FT-EPN model, a pyramid architecture was used to learn complex features more effectively, and an increase in accuracy was achieved by adding convolutional layers before each MaxPooling layer. The L2 regularization technique was chosen to prevent overfitting, and the most appropriate regularization coefficient was determined to be 0.0001. It was observed that L2 regularization provided better results than dropout. The training process was optimized by using the Adam optimization algorithm with a learning rate of 0.0001, and training was terminated when no progress was made in the model, using the early stopping method. The Tanh activation function contributed to a more balanced learning process by providing a zero-centered distribution. The accuracy and loss curves of the FT-EPN model are shown in Figure 10, and the confusion matrix and ROC curve are presented in Figure 11. The performance results are summarized in Table 3.
The proposed FT-EPN model achieved 96.77% accuracy, representing a 7% increase compared to the Sequential-based model, according to the given ROC curves and confusion matrix analysis, and demonstrated strong predictive ability.
The accuracy and overall performance of the model were evaluated using the pre-trained CNN architectures Vgg16, Vgg19, Resnet50, InceptionV3, Densenet121, and Xception, as mentioned in the Section 3.3 of this paper. Only the feature extraction layers of these models, pre-trained with the ImageNet dataset, were used, and the classification layers were re-trained. The accuracy and loss curves obtained with the TL strategy for each pre-trained model are shown in Figure 12, and the confusion matrices are shown in Figure 13. The performance results, calculated using the metrics for a comprehensive comparison of the classification success of the models, are given in Table 4.
The Vgg16 and Vgg19 models provided high test accuracy, achieving success rates of 91.13% and 92.74%, respectively, while the Resnet50 model performed lower than the other models with a rate of 81.45%. The InceptionV3, Densenet121, and Xception models also showed similar performance to the Vgg-based models, with accuracy rates of 91.13%, 90.32%, and 89.52%, respectively. Overfitting of the model was prevented, and a balanced learning process was achieved using techniques such as early stopping, the Tanh activation function, L2 regularization, and an optimal learning rate during the training process.
Although some of the pre-trained TL models were observed to achieve higher performance than the Sequential-based model, they did not provide a significant improvement in the PCa classification problem. Furthermore, the performance results of the TL models were considerably lower compared to the proposed FT-EPN for PCaD.
The performance results of the proposed FT-EPN model are finally compared with the classification performances of the popular ViT and MaxViT-v2 hybrid models. Unlike traditional CNN-based approaches, the ViT model uses SAM to divide the image into small patches and model the relationship of each patch with the others. This approach allows for capturing global contexts in the image, providing more global feature learning. MaxViT combines the strengths of both CNN and ViT architectures, enabling it to learn local and global features simultaneously. The model successfully captures both local information and the image’s global context due to the multi-axis attention mechanism. The confusion matrix and the ROC curve of the ViT architecture are shown in Figure 14. The confusion matrix and the ROC curve of the MaxViT-v2 architecture are shown in Figure 15. The performance results of the ViT and MaxViT-v2 transformer models are presented in Table 5.
When Table 5 is examined, it can be seen that the ViT model achieves a classification accuracy of 93.55%, while the MaxViT-v2 model achieves 95.16%. Accordingly, it is observed that the MaxViT-v2 hybrid model demonstrates more effective performance compared to the ViT model and also provides higher classification success than all the pre-trained TL architectures. On the other hand, the proposed FT-EPN model in this study achieves 1.6% higher classification success compared to the high-performance MaxViT-v2.

5. Conclusions

Early diagnosis of PCa is a critical factor in the course of the disease and the treatment process. Computerized diagnostic systems offer significant advantages over traditional methods by detecting small changes or details that the human eye cannot notice. In this study, it was demonstrated that the proposed FT-EPN model can assist doctors in detecting PCa cancerous tissues by providing fast and effective analysis. Thus, it aims to reduce the workload of doctors as an alternative to time-consuming and error-prone applications in traditional manual diagnosis processes. In this study, the FT-EPN model was developed and compared with different DL architectures to achieve high accuracy in classifying PCa MRI images. The FT-EPN model we developed achieved an accuracy rate of 96.77%, thanks to an effective combination of techniques such as the optimized pyramid structure, Tanh activation function, L2 regularization, and early stopping. This accuracy rate represents a 7% increase compared to the accuracy of the initially created Sequential-based model. With the added convolutional layers and appropriate regularization coefficients, the risk of overfitting was reduced, enabling the model to learn complex features more effectively. As a result of comparisons with TL models and new-generation architectures such as the popular ViT and MaxViT-v2, it was observed that our fine-tuned enhanced model provided remarkable success, especially on small two-class datasets. While the Vgg16 and Vgg19 models with TL achieved accuracy rates of 91.13% and 92.74%, respectively, the ViT model achieved 93.55% and the MaxViT-v2 hybrid model achieved 95.16%. However, our proposed FT-EPN model for PCaD offers advantages in terms of both high accuracy and model simplicity when compared to TL and new-generation transformer models. Finally, it has been shown that the fine-tuned model developed in this study is optimized according to the size and classification structure of the dataset and provides an effective solution in PCa classification by exhibiting high performance compared to the accuracy of other models. The pyramid structure of the model and the integrated optimization techniques provide strong classification performance on small datasets, making it an important reference for future research. In the future, the proposed model is expected to be further developed by training it with comprehensive datasets and to work effectively in real-world conditions.

Author Contributions

Conceptualization, E.Ö. and M.S.; methodology, E.Ö. and M.S.; implementation, M.S.; supervision, E.Ö.; data curation, M.S. All authors declare equal and joint responsibility for the study. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Scientific Research Projects Management Unit of Firat University under project MF.24.103.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All relevant data are fully available within the manuscript without restriction. The Kaggle data that support this study’s conclusions are publicly available online: https://www.kaggle.com/code/waichunchin/prostate-cancer-inception-v3/input. (accessed on 29 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gifani, P.; Shalbaf, A. Transfer Learning with Pretrained Convolutional Neural Network for Automated Gleason Grading of Prostate Cancer Tissue Microarrays. J. Med. Signals Sens. 2024, 14, 4. [Google Scholar] [CrossRef] [PubMed]
  2. Siegel, R.L.; Giaquinto, A.N.; Jemal, A. Cancer statistics, 2024. CA A Cancer J. Clin. 2024, 74, 12–49. [Google Scholar] [CrossRef] [PubMed]
  3. Chui, K.T.; Gupta, B.B.; Chi, H.R.; Arya, V.; Alhalabi, W.; Ruiz, M.T.; Shen, C.W. Transfer learning-based multi-scale denoising convolutional neural network for prostate cancer detection. Cancers 2022, 14, 3687. [Google Scholar] [CrossRef] [PubMed]
  4. Balaha, H.M.; Shaban, A.O.; El-Gendy, E.M.; Saafan, M.M. Prostate cancer grading framework based on deep transfer learning and Aquila optimizer. Neural Comput. Appl. 2024, 36, 7877–7902. [Google Scholar] [CrossRef]
  5. Abdelmaksoud, I.R.; Shalaby, A.; Mahmoud, A.; Elmogy, M.; Aboelfetouh, A.; El-Ghar, M.A.; El-Melegy, M.; Alghamdi, N.S.; El-Baz, A. Precise identification of prostate cancer from DWI using transfer learning. Sensors 2021, 21, 3664. [Google Scholar] [CrossRef]
  6. Mehmood, M.; Abbasi, S.H.; Aurangzeb, K.; Majeed, M.F.; Anwar, M.S.; Alhussein, M. A classifier model for prostate cancer diagnosis using CNNs and transfer learning with multi-parametric MRI. Front. Oncol. 2023, 13, 1225490. [Google Scholar] [CrossRef]
  7. Mashak, N.P.; Akbarizadeh, G.; Farshidi, E. Transfer learning; powerful and fast segmentation and classification prostate cancer from MRI scans, in the development set. J. Intell. Fuzzy Syst. 2023, 45, 2005–2017. [Google Scholar] [CrossRef]
  8. Davila, A.; Colan, J.; Hasegawa, Y. Comparison of fine-tuning strategies for transfer learning in medical image classification. Image Vis. Comput. 2024, 146, 105012. [Google Scholar] [CrossRef]
  9. El-Melegy, M.; Mamdouh, A.; Ali, S.; Badawy, M.; El-Ghar, M.A.; Alghamdi, N.S.; El-Baz, A. Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning. Bioengineering 2024, 11, 635. [Google Scholar] [CrossRef]
  10. Rippa, M.; Schulze, R.; Kenyon, G.; Himstedt, M.; Kwiatkowski, M.; Grobholz, R.; Wyler, S.; Cornelius, A.; Schindera, S.; Burn, F. Evaluation of Machine Learning Classification Models for False-Positive Reduction in Prostate Cancer Detection Using MRI Data. Diagnostics 2024, 14, 1677. [Google Scholar] [CrossRef]
  11. Dondi, F.; Albano, D.; Bertagna, F.; Treglia, G. Bone scintigraphy versus PSMA-targeted PET/CT or PET/MRI in prostate cancer: Lessons learned from recent systematic reviews and meta-analyses. Cancers 2022, 14, 4470. [Google Scholar] [CrossRef] [PubMed]
  12. Pepe, P.; Pepe, L.; Fiorentino, V.; Curduman, M.; Pennisi, M.; Fraggetta, F. PSMA PET/CT Accuracy in Diagnosing Prostate Cancer Nodes Metastases. Vivo 2024, 38, 2880–2885. [Google Scholar] [CrossRef] [PubMed]
  13. Fiorentino, V.; Martini, M.; Dell’aquila, M.; Musarra, T.; Orticelli, E.; Larocca, L.M.; Rossi, E.; Totaro, A.; Pinto, F.; Lenci, N.; et al. Histopathological ratios to predict Gleason score agreement between biopsy and radical prostatectomy. Diagnostics 2020, 11, 10. [Google Scholar] [CrossRef] [PubMed]
  14. Sekhoacha, M.; Riet, K.; Motloung, P.; Gumenku, L.; Adegoke, A.; Mashele, S. Prostate cancer review: Genetics, diagnosis, treatment options, and alternative approaches. Molecules 2022, 27, 5730. [Google Scholar] [CrossRef] [PubMed]
  15. Abbasi, A.A.; Hussain, L.; Awan, I.A.; Abbasi, I.; Majid, A.; Nadeem, M.S.A.; Chaudhary, Q.A. Detecting prostate cancer using deep learning convolution neural network with transfer learning approach. Cogn. Neurodynamics 2020, 14, 523–533. [Google Scholar] [CrossRef]
  16. Khosravi, P.; Lysandrou, M.; Eljalby, M.; Li, Q.; Kazemi, E.; Zisimopoulos, P.; Sigaras, A.; Brendel, M.; Barnes, J.; Ricketts, C.; et al. A deep learning approach to diagnostic classification of prostate cancer using pathology–radiology fusion. J. Magn. Reson. Imaging 2021, 54, 462–471. [Google Scholar] [CrossRef]
  17. da Silva, G.L.; Diniz, P.S.; Ferreira, J.L.; Franca, J.V.; Silva, A.C.; de Paiva, A.C.; de Cavalcanti, E.A. Superpixel-based deep convolutional neural networks and active contour model for automatic prostate segmentation on 3D MRI scans. Med. Biol. Eng. Comput. 2020, 58, 1947–1964. [Google Scholar] [CrossRef]
  18. Molahasani Majdabadi, M.; Choi, Y.; Deivalakshmi, S.; Ko, S. Capsule GAN for prostate MRI super-resolution. Multimed. Tools Appl. 2022, 81, 4119–4141. [Google Scholar] [CrossRef]
  19. Chen, Q.; Hu, S.; Long, P.; Lu, F.; Shi, Y.; Li, Y. A transfer learning approach for malignant prostate lesion detection on multiparametric MRI. Technol. Cancer Res. Treat. 2019, 18, 1533033819858363. [Google Scholar] [CrossRef]
  20. Xu, H.; Baxter, J.S.; Akin, O.; Cantor-Rivera, D. Prostate cancer detection using residual networks. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 1647–1650. [Google Scholar] [CrossRef]
  21. Singh, S.K.; Sinha, A.; Singh, H.; Mahanti, A.; Patel, A.; Mahajan, S.; Pandit, A.K.; Varadarajan, V. A novel deep learning-based technique for detecting prostate cancer in MRI images. Multimed. Tools Appl. 2024, 83, 14173–14187. [Google Scholar] [CrossRef]
  22. Linkon, A.H.M.; Labib, M.M.; Hasan, T.; Hossain, M. Deep learning in prostate cancer diagnosis and Gleason grading in histopathology images: An extensive study. Inform. Med. Unlocked 2021, 24, 100582. [Google Scholar] [CrossRef]
  23. Rabilloud, N.; Allaume, P.; Acosta, O.; De Crevoisier, R.; Bourgade, R.; Loussouarn, D.; Rioux-Leclercq, N.; Khene, Z.-E.; Mathieu, R.; Bensalah, K.; et al. Deep learning methodologies applied to digital pathology in prostate cancer: A systematic review. Diagnostics 2023, 13, 2676. [Google Scholar] [CrossRef]
  24. Alkadi, R.; Taher, F.; El-Baz, A.; Werghi, N. A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J. Digit. Imaging 2019, 32, 793–807. [Google Scholar] [CrossRef]
  25. Viswanath, S.E.; Chirra, P.V.; Yim, M.C.; Rofsky, N.M.; Purysko, A.S.; Rosen, M.A.; Bloch, B.N.; Madabhushi, A. Comparing radiomic classifiers and classifier ensembles for detection of peripheral zone prostate tumors on T2-weighted MRI: A multi-site study. BMC Med. Imaging 2019, 19, 22. [Google Scholar] [CrossRef]
  26. Abraham, B.; Nair, M.S. Computer-aided classification of prostate cancer grade groups from MRI images using texture features and stacked sparse autoencoder. Comput. Med. Imaging Graph. 2018, 69, 60–68. [Google Scholar] [CrossRef]
  27. Song, Y.; Zhang, Y.D.; Yan, X.; Liu, H.; Zhou, M.; Hu, B.; Yang, G. Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI. J. Magn. Reson. Imaging 2018, 48, 1570–1577. [Google Scholar] [CrossRef]
  28. Yu, R.; Jiang, K.-W.; Bao, J.; Hou, Y.; Yi, Y.; Wu, D.; Song, Y.; Hu, C.-H.; Yang, G.; Zhang, Y.-D. PI-RADSAI: Introducing a new human-in-the-loop AI model for prostate cancer diagnosis based on MRI. Br. J. Cancer 2023, 128, 1019–1029. [Google Scholar] [CrossRef]
  29. Bygari, R.; Rithesh, K.; Ambesange, S.; Koolagudi, S.G. Prostate Cancer Grading Using Multistage Deep Neural Networks. In Machine Learning, Image Processing, Network Security and Data Sciences: Select Proceedings of 3rd International Conference on MIND 2021; Springer Nature: Singapore, 2023; pp. 271–283. [Google Scholar]
  30. Zhu, L.; Shi, H.; Wei, H.; Wang, C.; Shi, S.; Zhang, F.; Yan, R.; Liu, Y.; He, T.; Wang, L.; et al. An accurate prediction of the origin for bone metastatic cancer using deep learning on digital pathological images. EBioMedicine 2023, 87, 104426. [Google Scholar] [CrossRef]
  31. Talaat, F.M.; El-Sappagh, S.; Alnowaiser, K.; Hassan, E. Improved prostate cancer diagnosis using a modified ResNet50-based deep learning architecture. BMC Med. Inform. Decis. Mak. 2024, 24, 23. [Google Scholar] [CrossRef]
  32. Salman, M.E.; Çakar, G.Ç.; Azimjonov, J.; Kösem, M.; Cedimoğlu, İ.H. Automated prostate cancer grading and diagnosis system using deep learning-based Yolo object detection algorithm. Expert Syst. Appl. 2022, 201, 117148. [Google Scholar] [CrossRef]
  33. Parvaiz, A.; Khalid, M.A.; Zafar, R.; Ameer, H.; Ali, M.; Fraz, M.M. Vision Transformers in medical computer vision—A contemplative retrospection. Eng. Appl. Artif. Intell. 2023, 122, 106126. [Google Scholar] [CrossRef]
  34. Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; Volume 13684, pp. 459–479. [Google Scholar] [CrossRef]
Figure 1. MRI image samples of PCa dataset: (a) benign and (b) malignant.
Figure 1. MRI image samples of PCa dataset: (a) benign and (b) malignant.
Applsci 15 00225 g001
Figure 2. The architecture of the Keras Sequential model.
Figure 2. The architecture of the Keras Sequential model.
Applsci 15 00225 g002
Figure 3. The schematic representation of the fine-tuning process with TL.
Figure 3. The schematic representation of the fine-tuning process with TL.
Applsci 15 00225 g003
Figure 4. The flow diagram of the vision transformer (ViT) architecture, (*) indicates an extra learnable class embedding, and (0–9) indicates each unrolled patch with a sequence of numbers.
Figure 4. The flow diagram of the vision transformer (ViT) architecture, (*) indicates an extra learnable class embedding, and (0–9) indicates each unrolled patch with a sequence of numbers.
Applsci 15 00225 g004
Figure 5. The schematic representation of the multi-axis vision transformer (MaxViT) architecture.
Figure 5. The schematic representation of the multi-axis vision transformer (MaxViT) architecture.
Applsci 15 00225 g005
Figure 6. A schematic representation of the developed pyramid-type model architecture, (0) and (1) indicate benign and malignant classes.
Figure 6. A schematic representation of the developed pyramid-type model architecture, (0) and (1) indicate benign and malignant classes.
Applsci 15 00225 g006
Figure 7. Graph of Tanh and RELU activation functions.
Figure 7. Graph of Tanh and RELU activation functions.
Applsci 15 00225 g007
Figure 8. Accuracy and loss curve results of the Sequential-based model.
Figure 8. Accuracy and loss curve results of the Sequential-based model.
Applsci 15 00225 g008
Figure 9. (a) Confusion matrix and (b) ROC curve results of the Sequential-based model.
Figure 9. (a) Confusion matrix and (b) ROC curve results of the Sequential-based model.
Applsci 15 00225 g009
Figure 10. Accuracy and loss curve results of the proposed FT-EPN model.
Figure 10. Accuracy and loss curve results of the proposed FT-EPN model.
Applsci 15 00225 g010
Figure 11. (a) Confusion matrix and (b) ROC curve results of the proposed FT-EPN model.
Figure 11. (a) Confusion matrix and (b) ROC curve results of the proposed FT-EPN model.
Applsci 15 00225 g011
Figure 12. Accuracy and loss curve results of the TL models.
Figure 12. Accuracy and loss curve results of the TL models.
Applsci 15 00225 g012
Figure 13. Confusion matrix results of the TL models.
Figure 13. Confusion matrix results of the TL models.
Applsci 15 00225 g013
Figure 14. (a) Confusion matrix and (b) ROC curve results of the ViT architecture.
Figure 14. (a) Confusion matrix and (b) ROC curve results of the ViT architecture.
Applsci 15 00225 g014
Figure 15. (a) Confusion matrix and (b) ROC curve results of the MaxViT-v2 architecture.
Figure 15. (a) Confusion matrix and (b) ROC curve results of the MaxViT-v2 architecture.
Applsci 15 00225 g015
Table 1. Two-label confusion matrix.
Table 1. Two-label confusion matrix.
Predicted
PositiveNegative
ActualPositiveTPFP
NegativeFNTN
Table 2. Performance results of the Sequential-base model.
Table 2. Performance results of the Sequential-base model.
ModelAccuracyPrecisionRecallF1 Score
Sequential-base model90.32%84.21%100%91.43%
Table 3. Performance results of the proposed FT-EPN model.
Table 3. Performance results of the proposed FT-EPN model.
ModelAccuracyPrecisionRecallF1 Score
Proposed FT-EPN96.77%94.74%100%97.30%
Table 4. Performance results of the TL models.
Table 4. Performance results of the TL models.
ModelsAccuracyPrecisionRecallF1 Score
Vgg1691.13%86.84%98.51%92.31%
Vgg1992.74%88.16%100%93.71%
Resnet5081.45%84.21%85.33%84.77%
InceptionV391.13%92.11%93.33%92.72%
Densenet12190.32%88.16%95.71%91.78%
Xception89.52%88.16%94.37%91.16%
Table 5. Performance results of the transformer models.
Table 5. Performance results of the transformer models.
ModelsAccuracyPrecisionRecallF1 Score
ViT93.55%100%85.71%92.31%
MaxViT-v295.16%95.65%91.67%93.62%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sarıateş, M.; Özbay, E. A Classifier Model Using Fine-Tuned Convolutional Neural Network and Transfer Learning Approaches for Prostate Cancer Detection. Appl. Sci. 2025, 15, 225. https://doi.org/10.3390/app15010225

AMA Style

Sarıateş M, Özbay E. A Classifier Model Using Fine-Tuned Convolutional Neural Network and Transfer Learning Approaches for Prostate Cancer Detection. Applied Sciences. 2025; 15(1):225. https://doi.org/10.3390/app15010225

Chicago/Turabian Style

Sarıateş, Murat, and Erdal Özbay. 2025. "A Classifier Model Using Fine-Tuned Convolutional Neural Network and Transfer Learning Approaches for Prostate Cancer Detection" Applied Sciences 15, no. 1: 225. https://doi.org/10.3390/app15010225

APA Style

Sarıateş, M., & Özbay, E. (2025). A Classifier Model Using Fine-Tuned Convolutional Neural Network and Transfer Learning Approaches for Prostate Cancer Detection. Applied Sciences, 15(1), 225. https://doi.org/10.3390/app15010225

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop