1. Introduction
Digital images are data containing valuable information. They have sensitive and critical features. The smallest units of these images are pixels. Pixels play an important role in processing information. For grayscale images, pixel values range from 0 to 255. Pixels close to 255 represent darker color tones. Pixels close to 0 represent lighter color tones. Grayscale digital images consist of a single channel [
1].
Physicians use digital images obtained through X-ray, CBCT, and intraoral scanning systems to evaluate the oral and maxillofacial regions [
2], which supports treatment planning. [
3]. This allows surgical interventions to be performed more safely and post-treatment processes to be monitored more effectively [
4]. Image processing is frequently preferred to make sense of these images consisting of shades of gray [
5].
Image processing is a field in which mathematical and algorithmic techniques are used to analyze, process, and interpret digital images. In this field; patterns, symmetry, color tones, shapes, and other features in images are processed. This allows images to become more meaningful or for information to be extracted for a specific purpose. The effectiveness of these techniques provides great benefits to healthcare professionals, especially in the field of medical imaging [
6].
In dentistry, image processing techniques play an important role in the accurate interpretation of coronal sections, which allow the evaluation of the anterior and posterior surfaces of the teeth and sagittal sections, which reveal the lateral profiles of the teeth and defects in the jaw structure. This offers a successful perspective for the detailed evaluation of intraoral structures [
7]. Especially in three-dimensional (3D) imaging systems, the combination of coronal and sagittal sections allows for a more detailed and clearer analysis of the teeth and surrounding tissues. This provides dentists to diagnose in a more precise format and to carry out more comprehensive treatment processes [
8]. On the other hand, manual interpretation of digital medical images is time consuming. It requires expertise and the risk of error is high. There is an important need to use computer-aided systems to solve these problems.
Image processing and deep learning-based convolutional neural network architectures (CNNs) are complementary approaches actively used for computer-aided systems improved in scope of the detection of healthy and diseased regions on digital images [
6]. Image processing, which is used to reveal sensitive and critical information in digital images, provides a powerful input to deep learning-based CNN architectures customized for processing visual data. Thus, the success in the evaluation of clinical images increases [
6]. In this context, image processing and CNN stand out as two critical areas [
9]. In particular, image processing improves the analysis and interpretation of images by highlighting salient features in digital images, thereby reducing error rates [
5]. It then performs more efficient and effective feature extraction through CNNs run on data optimized by image processing. This is valuable for providing classification and analysis processes with higher accuracy [
10]. It also has great potential for optimizing diagnosis and treatment processes for diseases in the health sector [
11]. For this reason, many computer-aided multidisciplinary studies have been carried out in the field of dentistry [
12]. In [
13], a deep convolutional neural network (DCNN) was used to classify tooth types from dental cone beam CT images in forensic identification. Fifty-two CT data samples were divided into 42 training and 10 test samples. During training, data augmentation techniques were used to prevent overfitting. In the classification with the AlexNet architecture, an accuracy rate of 88.8% was obtained. In [
14], the accuracy of deep learning algorithms for the diagnosis and classification of dental caries was investigated. The dataset has 382 decayed teeth and 403 healthy teeth cone beam computed tomography (CBCT) images. The data were presented to a multi-input CNN model. The accuracy rates were 95.3% for carious teeth and 94.8% for healthy teeth. The deep learning model classified the depth and type of caries with high accuracy. These results show that deep learning is an effective tool for accurate diagnosis and treatment planning in dentistry. In [
15], self-supervised learning (SSL) was used to improve the classification of dental caries. In the training with unlabeled CBCT images, the hybrid use of ResNet-18 architecture and SimCLR technique resulted in an F1-score of 88.42%, 90.44% accuracy, and 86.67% sensitivity. These results show that SSL is an effective method for improved accuracy and efficiency for tooth decay classification. In [
16], a deep neural network using 3D CBCT images for tooth classification was proposed. Combining transformer and CNN structures, this neural network aims to solve the shortcomings of the transformer model, which requires high computational complexity. In experiments with 450 training and 104 test samples, the improved model achieved 91.3% accuracy and a 99.7 AUC score. In [
17], an automatic deep convolutional neural network (DCNN) was applied on 11,980 dental radiographs collected from three dental hospitals for the classification of dental implant systems (DIS). The accuracy rates of the automatic DCNN were found to be 0.954, 0.955 and 0.853 in terms of the AUC, sensitivity, and specificity, respectively. The study shows that this DCNN provides high accuracy in tooth classification and needs further research for clinical applications. In [
18], a decision support system was proposed for the classification of dental periapical cysts and keratocystic odontogenic tumor (KCOT) lesions using CBCT. In the first stage, segmentation was performed on 50 CBCT 3D image datasets. Then, a vector containing 636 different features was created for each dataset. In experiments with six classifiers, the support vector machine (SVM) performed the best with 100% accuracy and F1-scores. The results show that periapical cyst and KCOT lesions can be classified with high accuracy, and this study makes an important contribution for the computer-aided diagnosis of apical lesions of teeth. In [
19], a CNN architecture was proposed for the detection and diagnosis of dental caries in periapical dental images. In the proposed model, CNN and long short-term memory (LSTM) networks were combined for feature extraction. Experimental results showed that the proposed CNN-LSTM model provides higher accuracy (96%) compared to the AlexNet (93%) and GoogleNet (94%) pretrained models. In [
20], a new approach for the automatic diagnosis of dental caries was proposed. The study used a multi-input deep convolutional neural network ensemble (MI-DCNNE) model that utilizes periapical images. The model uses a dataset of 340 caries and 340 non-caries periapical images as the inputs. The results show that the proposed model achieves 99.13% accuracy in diagnosing dental caries. In [
21], the detection and diagnosis of dental caries in periapical radiographs using CNN was evaluated. In total, 3000 images were divided into training and test datasets and classified with the Inception v3 model. The accuracy rates of the models, including premolar, molar, and both teeth, were 89.0%, 88.0%, and 82.0%, respectively. In addition, the premolar model was reported as the category with the highest AUC value (0.917). In [
22], the use of image processing and deep CNN techniques for the early detection of dental caries was investigated. Following the preprocessing steps of histogram equalization, contrast enhancement, and feature selection, edges in tooth images were detected using the Sobel method. The resulting enhanced images were given as inputs to the custom CNN model. The success of the method was compared with OTSU’s threshold segmentation and watershed segmentation techniques. The proposed method achieved an accuracy rate of 96.08%, showing higher efficiency compared to OTSU’s threshold segmentation (72.3%) and watershed segmentation (80.4%) methods.
Image processing and deep learning-based studies in the literature make significant contributions to the diagnosis and treatment processes of dentists. In particular, the analysis of CT images increases the accuracy of lesion detection and the early diagnosis of dental diseases, enabling more reliable results to be obtained. In this way, time is saved in clinical applications and treatment processes can be managed more effectively. Furthermore, these advanced methods allow the customization of treatment plans, making treatment processes more efficient. In this context, a computer-aided multidisciplinary study was conducted.
In this study, coronal and sagittal slices obtained from computed tomography (CT) images in the field of dentistry were used to measure the classification success between images with and without lesions. In the first stage, an image processing method was proposed. Thus, critical and sensitive information in the images was extracted. Then, these enhanced images were given as inputs to the VGG16 transfer learning architecture. Finally, different regularization methods are integrated into the VGG16 transfer learning architecture. Thus, it is planned to improve a generalizable and stable structure.
2. Material and Methods
Image processing is an approach used to analyze and make sense of images. Because raw images have the potential to contain noise, low-resolution areas, different lighting parameters, and a lack of contrast. This situation is a significant obstacle to accurate and effective model learning [
23]. Image preprocessing techniques are preferred to overcome these difficulties and to highlight important structural features in images.
Image preprocessing techniques improve the details in low-quality images. They provide consistency in images obtained under different shooting conditions [
5]. In addition, the models in which the images processed with the preprocessing approach are given as inputs make it possible to extract powerful features on real-world data. For this reason, image processing approaches used for the success of deep learning-based models preferred for analyzing images represent a preliminary stage used to increase the accuracy of the models and reduce errors in classification processes. The input obtained as a result of this stage is valuable for deep learning-based CNN architectures.
Transfer learning-based CNN, which is a subset of the deep learning approach customized for visual data, has the ability to automatically learn local features in visual data. With its multi-layered structure, it can effectively extract complex patterns in images [
24].
In this study, images processed with image processing techniques are classified using the VGG16 transfer learning architecture. VGG16 is a deep convolutional neural network known for its simple and uniform architecture, consisting of sequential 3 × 3 convolutional layers followed by max pooling layers and fully connected layers. Despite its relatively large number of parameters, VGG16 is widely used in medical image analyses due to its robust feature extraction capabilities and proven performance in classification tasks [
25]. Its deep and orderly structure allows for effective learning of hierarchical features from medical images, which is essential for the early diagnosis of lesions. Moreover, to enhance the stability, generalization ability, and classification performance of the model, proposed regularization methods are integrated into the VGG16 transfer learning architecture. These improvements contribute significantly to minimizing overfitting, ensuring that the model maintains high accuracy across various unseen data samples and clinical scenarios.
Regularization is a technique that makes the learning process more generalizable by limiting the complexity of the model [
26]. It prevents the model from focusing only on training data and losing its ability to generalize. It provides a flexible structure [
27]. It reduces the complexity of the model. It uses various penalty methods for this. This term added to the loss function reduces the overfitting potential of the model [
28].
In this study, the classification of species with and without lesions in a dataset consisting of coronal and sagittal slices is planned. For this purpose, the dataset used, the preferred transfer learning architecture, the applied image processing methods, and the proposed regularization techniques are presented in detail. In addition, the experiments in the study were carried out in the Jupyter Notebook environment (version 7.2.2) using the Python programming language. TensorFlow libraries were used to build and train the deep learning models. All experiments were conducted on a computer with an Intel (R) Core (TM) i5-9400 CPU @ 2.90 GHz, 8 GB of RAM, a 64-bit operating system, and a ×64-based architecture.
2.1. Dataset
Datasets represent an important tool used in various analysis and research processes. They play a role in the evaluation of different classification and modeling methods. In this study, the UFPE dataset was used to classify tooth sections with and without lesions. The UFPE dataset was prepared for use in health research in Brazil and approved by the Local Research Ethics Committee of the University of Pernambuco. The dataset is divided into two main categories, healthy and unhealthy tooth samples, and contains a total of 1000 CBCT tooth scans. Each sample in this dataset is organized as pairs of images in both the coronal and sagittal planes [
29]. Examples of lesioned and non-lesioned tooth sections in the dataset are presented visually in
Figure 1.
The images labeled a and b in
Figure 1 show slices representing tooth samples with and without lesions, respectively. The images used in the study were processed and analyzed in 186 × 115 dimensions, both in their original and enhanced versions. These images represent the different examples used to classify the tooth lesions in the dataset. The dataset has three different categories. These are no lesions, large lesions, and small lesions. The dataset is divided into two groups as 80% training data and 20% test data to test the accuracy of the model.
2.2. Image Processing
Image processing offers an approach that enables the extraction of valuable features in images. It is especially important for the interpretation of grayscale images. Therefore, an image processing approach is proposed to perform a successful classification process in the CBCT tooth scan data. In the proposed image processing approach, the entropy curve modification method in [
31] is reconstructed by integrating the modified alpha value to provide optimal enhancement. Then, morphological and logical (bit-level processing) processing steps are performed to preserve details and isolate critical regions, respectively.
2.2.1. Proposed Image Processing Approach 1
In the second filter improved in the proposed image processing approach, a specific hierarchical order is realized for the images with and without lesions.
Improved Entropy Curve Modification
Entropy curve modification, which is based on modifying the entropy curve in the image, obtains the entropy curve through the entropy values corresponding to each gray level of the image. The entropy curve corresponding to the gray levels is then modified and a balanced gray level distribution is achieved [
31]. This process is explained step-by-step below.
- 1.
The image read in the RGB color space is transformed into the grayscale space.
- 2.
The entropy curve of the image is drawn. To draw this curve, entropy information about the gray levels of the image is used. Equation (1) gives a mathematical expression for the entropy of the i’th gray level [
30].
In the mathematical expression given in Equation (1), b represents the gray level. When t = 1, I
i(t) = 1. When t takes a different value, I
i(t) = 0. Based on the assumption P
i = P(t = i), the expression given in Equation (1) is simplified and represented in Equation (2) [
31].
P
i given in Equation (2) is the probability of the
ith gray level to occur. Accordingly,
Ei is the information associated with the
ith gray level [
30].
- 3.
Calculating the uniform entropy value to provide a reference point is often a necessary operation. In this process, it is assumed that the image is of M
N size. Then, the M
N pixels are evenly distributed over 256 gray levels and the mathematical expression given in Equation (3) is obtained [
31].
The mathematical expression given in Equation (3) is redefined in Equation (4) for the entropy corresponding to the gray level [
31].
- 4.
It is important to obtain a modified entropy curve to improve the quality of the image. The smooth entropy is the guide in this process. The mathematical expression for the entropy curve modification is given in Equation (5).
The E given in Equation (5) is the entropy curve of the input image. EU is the smooth entropy curve. The value of α is chosen to provide an optimal improvement.
- 5.
An adaptive format has been improved for the selection of α given in Equation (5). In this format, the variance of the image is first calculated. Then, the variance value is normalized. Mathematical expressions for the normalized variance value are given in Equations (6)–(8).
Normalized variance, substituted for α in Equation (5), expresses the variation of the differences between gray levels at a given distance [
32]. Therefore, it is useful in image optimization.
- 6.
To better understand and balance the information density corresponding to the gray levels of the image, a necessary step is to calculate the probability density function from the modified entropy curve [
31]. For this, the mathematical equations given in Equations (9) and (10) are used.
The mathematical expression given in Equation (9) is the sum of the modified entropy values at m gray levels. The PDF value given in Equation (10) is the probability density value obtained by dividing the entropy density corresponding to each gray level by the total entropy. This is followed by a histogram equalization process based on the modified entropy curve. First, the PDF values are summed. Thus, the cumulative sum of the intensity values up to each gray level is obtained [
30]. Its mathematical expression is given in Equation (11).
The CDF given in Equation (11) is a cumulative distribution process. The cumulative value is scaled in the range [0, L − 1] to homogenize the distribution of gray levels [
31]. Its mathematical expression is given in Equation (12).
L in Equation (12) is the maximum value of the gray levels. It is defined as 256. T is the converted gray level [
31].
The operations performed in Equations (1)–(12) include the steps needed to use the proposed modified entropy curve. This approach [
31], which transforms the old gray level of each pixel into a new level, has an advantage in the interpretation of X-ray images that present outputs in the gray scale range. Because each pixel value in the vector is associated with a new gray level according to T in the context of the image, which is converted into a one-dimensional vector to obtain a new image with improved contrast.
Morphological Gradient Calculation
The contrast-enhanced image is subjected to morphological gradient processing by the improved entropy curve modification. First, a 3 × 3 matrix with element values of 1 is created. Then, a morphological gradient operation is applied through this matrix, which calculates the difference between dilation and erosion operations. Thus, an output image with emphasized edges and boundaries is created.
Logical Operation and Weighted Image Blending
The output obtained as a result of applying the morphological gradient process to the contrast-enhanced image obtained through the improved entropy curve modification and the original image is subjected to the logical NOT process. The output obtained as a result of this process is blended with the contrast-enhanced image obtained through step 1 at ratios of 0.8 and 0.2, respectively. The final output is then rotated.
The flow diagram of the 3 basic steps of proposed image processing approach 1 for images with and without lesions is given in
Figure 2 and
Figure 3.
The enhanced images shown in
Figure 2 and
Figure 3 were converted into data where sensitive points and boundaries were highlighted.
The image processing approach successfully localized pixel values ranging from 0 to 255. This offers the potential for a powerful feature set for the classification of images with and without lesions.
2.3. VGG16 Transfer Learning Architecture
Transfer learning is a deep learning approach that makes it possible to use models trained with big dataset on computers with powerful hardware on smaller datasets for a specific problem space [
33]. Because the model does not perform any learning from scratch within the scope of the customized task training time is reduced and high computational power is not required [
33]. In this study, VGG16 architecture was chosen for the classification of lesion images.VGG16 enables efficient feature extraction through its simple and consistent layer structure, making it well-suited for medical image analysis tasks [
25]. Therefore, in the proposed system, the VGG16 model is retrained using a transfer learning approach to adapt to the specific characteristics of the lesion dataset. Thus it is planned the enhancing the diagnostic performance and robustness.
VGG16 is a prominent model among deep convolutional neural network architectures. This architecture uses filters with a fixed size of 3 × 3 in all convolution layers. It is also based on the principle of size reduction, with 2 × 2 max pooling operations after every two or three convolution layers [
25]. The most important advantage of VGG16 is that its architecture is simple, consistent, and modular.This deep structure consisting of 16 layers is able to effectively learn low-level and high-level features in images. Moreover, reusing pretrained weights with less data offers the potential for high accuracy. This makes VGG16 a very suitable option in data-constrained domains such as medical image processing [
34].
Figure 4 shows the structure of VGG16.
In this study, the fine-tuned hyperparameters for the VGG16 transfer learning architecture are given in
Table 1.
Table 1 presents the chosen fine-tuned hyperparameters for the transfer learning models. Each parameter was optimized to maximize the performance of the model.
2.4. Regularization Functions
Regularization is a method that improves the power of the model to represent the data. It is used to avoid overfitting problems. It aims to optimize the error value by adjusting the coefficients of the model. Minimizing the error value means obtaining a successful model. There are 3 common regularization methods. These are lasso, ridge, and ElasticNet [
35].
Lasso is a regularization method that uses the L1 norm. Its mathematical expression is given in Equation (13) [
35].
Ridge is a regularization method that uses the L2 norm. Its mathematical expression is given in Equation (14) [
35].
ElasticNet is a regularization method that combines the advantages of the ridge and lasso methods. Its mathematical expression is given in Equation (15) [
35].
In the mathematical equations given in Equations (13)–(15), parameter λ controls the degree of minimization of the coefficients; β is a parameter that shows the effect of the independent variables on the target variable. ElasticNet, on the other hand, uses the parameter ρ. The ρ parameter determines the trade-off between the ridge and lasso regularization methods [
35].
In this study, the ridge, lasso, and ElasticNet regularization methods were used. Accordingly, 18 different regularization methods were improved. These methods, which were improved to control the complexity of the model, prevent overfitting, and increase the accuracy, are presented below.
Proposed Regularization Methods
The proposed regularization methods are shaped by integrating the entropy calculation into the lasso, ridge, and ElasticNet regularization methods. In this context, the entropy value calculated in Equation (16) [
32] is given as the input to the lambda update function in Equation (17). Then, a normalized result is obtained by dividing by the specified normalization constant. This result is used in the mathematical equations given in Equations (18)–(20) as a parameter controlling the degree of minimization of the coefficients.
The second proposed regularization methods are shaped by integrating the energy calculation into the lasso, ridge, and ElasticNet regularization methods. In this context, the energy value calculated in Equation (21) [
32] is given as an input to the lambda update function in Equation (22). Then, a normalized result is obtained by dividing it by the specified normalization constant. This result is used in the mathematical equations given in Equations (23)–(25) as a parameter controlling the degree of minimization of the coefficients.
The third proposed regularization method is shaped by integrating the root mean square (RMS) calculation into the lasso, ridge, and ElasticNet regularization methods. In this context, the RMS value calculated in Equation (26) [
32] is given as an input to the lambda update function in Equation (27). Then, a normalized result is obtained by dividing it by the specified normalization constant. This result is used in the mathematical equations given in Equations (28)–(30) as a parameter controlling the degree of minimization of the coefficients.
The fourth suggested regularization methods are shaped by integrating the normalized values used in the entropy calculation process into the lasso, ridge, and ElasticNet regularization methods. In this context, the entropy value obtained with the normalized values calculated in Equations (31) and (32) are given as inputs to the lambda update function in Equation (33). The result obtained through this function is used in the mathematical equations given in Equations (34)–(36) as a parameter controlling the degree of minimization of the coefficients.
The proposed fifth regularization methods are shaped by integrating the normalized values used in the energy calculation process into the lasso, ridge, and ElasticNet regularization methods. In this context, the energy value obtained with the normalized values calculated in Equation (37) is given as an input to the lambda update function in Equation (38). The result obtained through this function is used in the mathematical equations given in Equations (39)–(41) as a parameter controlling the degree of reduction in the coefficients.
The proposed sixth regularization method is shaped by integrating the normalized values used in the RMS (root mean square) calculation process into the lasso, ridge, and ElasticNet regularization methods. In this context, the RMS value obtained with the normalized values calculated in Equation (42) is given as an input to the lambda update function in Equation (43). The result obtained through this function is used in the mathematical equations given in Equations (44)–(46) as a parameter controlling the degree of reduction in the coefficients.
The proposed 18 different regularization methods prevent the model from losing its generalization ability and increase its stability.
3. Results and Discussion
In recent years, deep learning has gained significant development The capacity of deep learning algorithms to process large datasets and make meaningful inferences from these data provides higher accuracy and generalization capability than traditional machine learning methods [
36]. The success of deep learning methods increases significantly on images preprocessed with various preprocessing methods [
37,
38]. Especially, image processing optimizes the learning effort from complex data. Optimization is valuable for improving the performance [
39] of deep learning approaches involving CNN architectures.
CNN automatically extracts meaningful features from images and creates in-depth feature representations. This makes it possible to achieve high success rates in object detection and image recognition tasks [
6]. However, the success of CNNs does not only depend on the network architecture and learning algorithms. The quality of the datasets is also important. On this issue, image processing comes to the fore with the aim of improving data quality. It significantly improves the learning process of the model by providing strong inputs to CNN architectures. [
37,
38].
The techniques used at preprocessing stages prevent the model from making erroneous predictions that might arise from low-quality data. Color correction, denoising, normalization, histogram equalization, morphological operations, data augmentation methods, etc. are some examples to image processing techniques. [
40]. In addition, standardizing the brightness and contrast levels of images helps the model to obtain more robust results [
41]. On the other hand, edge detection methods help to highlight important structural features in the images, allowing the boundaries to be clearly defined and increasing the accuracy of the model [
42]. Thus, classification performance is strengthened and more reliable results are obtained.
In this study, lesion classification was performed on preprocessed images using a VGG16 transfer learning architecture to classify dental images. The performance metrics for the classification results are given in
Table 2.
According to
Table 2, the VGG16 model achieved 58.64% accuracy in the processed images. While the model correctly detects 72% of the positive samples (recall), the precision is 40.9% due to the high number of false alarms in the positive predictions. This indicates that the model gives too many false alarms in the positive class. According to the F1-score (52.17%), the overall performance of the model is balanced.
The enhanced images with image processing approach were given as inputs to VGG16, which integrates 18 different regularization techniques to produce stable and generalizable results. The performance metrics for the regularization methods integrated into the VGG16 architecture for the classification of lesioned and non-lesioned dental images are presented in
Table 3.
Table 3 shows that the regularization 13 method provides maximum success in optimizing performance. This is because in classification without regularization, the F1-score is 52.17%. However, in classification with regularization 13 integrated, the F1-score is 70.80%, indicating strong optimization.
Figure 5 show the ROC curve results.
The AUC value obtained in
Figure 5 represents a result that could be improved. However, the model in which the improved regularization method is integrated is more successful than the original model. Heat maps showing the areas the model focuses on during the classification process are given in
Figure 6. Additionally, to statistically validate the performance difference between the models, a Bowker’s test of symmetry was conducted based on the results from the best-performing configuration, namely Reg13. The test yielded a chi-square value of 11.23 with 1 degree of freedom and a
p-value of 0.0008, indicating statistically significant asymmetry (
p < 0.001). This result proves the effectiveness of the proposed method.
Grad-CAM heat maps are valuable for decision-making. However, image processing and the regularization method improved for hyperparameter tuning play complementary roles in making accurate decisions. Because, while image processing enables the model to learn more accurately and effectively by extracting meaningful features from raw images, regularization techniques strengthen the generalization capacity of the model by preventing overfitting. In addition, in order to evaluate the proposed method, direct comparisons were made with widely used transfer learning-based models such as Xception, ResNet50, MobileNetV2, MobileNetV3, EfficientNetB0, and DenseNet169 in scope of the Reg13 integration. The obtained results are given in
Table 4.
When comparing the results, the Proposed_VGG16 model exhibited the best performance, with 58.65% accuracy, 90.91% precision, and a 70.80% F1-score. Its superiority, particularly in terms of precision, demonstrates that the model is highly reliable in distinguishing the positive class. While the other ResNet50, MobileNetV3, EfficientNetB0 models achieved 100% precision, their TN values of zero indicate that this success is based on limited generalization. Evaluating all models under the same conditions clearly demonstrated the relative superiority of the proposed method.
The regularization methods presented in
Table 3 were used to prevent overfitting. They allow optimization to improve a successful model. Lasso (L1), ridge (L2), and ElasticNet are the three basic regularization methods that are widely used. During the variable selection process, the lasso regularization method reduces some coefficients to zero. The ridge regularization method limits the magnitude of the coefficients without reducing them to zero. The ElasticNet method provides feature selection by using the advantages of the lasso and ridge regularization methods. Customized versions of each method have the potential to allow scientific development in improving performance, increasing the generalization success, and enhancing feature selection.
The 18 different regularization methods improved in this study are formed in groups of 3. Each group customizes the coefficient of importance that the model will give to the penalty term in the widely used lasso, ridge, and ElasticNet regularization methods.
In the first customization process within the scope of group 1, the coefficient of importance that the model will give to the penalty term is recreated with the entropy parameter. Thus, extreme values in the model output distribution are avoided. Balanced predictions are encouraged with soft classification outputs. In the second customization process within the scope of group 2, the coefficient of importance that the model will give to the penalty term is recreated with the energy parameter. This parameter produces a sharp response for the model decision. It provides strong discrimination between classes and exhibits the opposite behavior of the entropy approach. In the third customization process within the scope of group 3, the coefficient of importance that the model will give to the penalty term is recreated with the RMS parameter. RMS takes the average of the energy calculated in the second group and then applies the square root operation to it. This plays a role in preventing extreme errors as it provides an overall energy measurement for the dataset. In the fourth customization process within the scope of group 4, the coefficient of importance that the model will give to the penalty term is recreated by giving the normalized outputs with the softmax function as inputs to the entropy calculation. In the fifth customization process within the scope of group 5, the coefficient of importance that the model will give to the penalty term is recreated by giving the normalized outputs with the softmax function as inputs to the energy calculation. In the sixth customization process within the scope of group 6, the coefficient of importance that the model will give to the penalty term is recreated by giving the normalized outputs with the softmax function as inputs to the RMS calculation. The softmax function used in group 4, group 5, and group 6 provides a scaling functionality for the data. Thus, a balanced and equal evaluation is applied to the data mapped to the same space. This is valuable for the model to produce reliable predictions. These 6 interrelated groups analyze the success of the model theoretically. They enable the production of reliable, stable, and high-performance outputs.
In the transfer learning experiments, the improved VGG16 model for 1000 categories with (224, 224) input sizes using the ImageNet dataset was selected. This model has 138,357,544 total trainable parameters. However, in the original format of the modified VGG16 model, the 2 fully connected layers that were placed consecutively after the flattening process and the prediction layer for 1000 categories were removed. Instead, the prediction layer for 2 categories was integrated. As a result, the total number of parameters was reduced to 14,714,688. Changes in the number of parameters change the training time and affect the hardware selection. A strategic choice should be made according to the goal.
In the proposed scenario, the optimization of the classification accuracy is significantly improved. Also, in order to compare the contributions of the study, similar studies in the literature were analyzed. In this context, previous studies using CT and CBCT datasets were compared in
Table 5 to present the innovative approaches and findings of this research.
Some articles in the literature were reviewed. These studies [
13,
14,
15,
16,
21,
43,
44,
45,
46,
47,
48] show that deep learning and image processing approaches are actively used in medical imaging. The automatic classification, early diagnosis, and clinical decision support ability and generalizability of the improved approach for different diagnostic processes have paved the way for multidisciplinary studies based on medicine and informatics. However, it should be kept in mind that preprocessing and fine tuning are key for the improved architectures. They must be implemented successfully to reduce false positives and false negatives. Therefore, in this work, we introduced an image processing filter and the improved regularization functions in a suitable combination.
Table 1,
Table 2 and
Table 3 show that the proposed methodology effectively contributes to performance improvements in the context of lesion image classification. These approaches have the potential to be integrated into the structures presented in [
13,
14,
15,
16,
21,
43,
44,
45,
46,
47,
48]. Thus, we plan to obtain more effective outputs.