1. Introduction
According to projections, 1.56 billion individuals will have hypertension by 2025. Furthermore, 66% of the hypertensive population resides in underdeveloped or impoverished nations, where a lack of adequate healthcare services to identify, manage, and treat hypertension exacerbates the problem [
1,
2,
3]. Hypertension is triggered by an increase in arterial pressure [
4]; it is the prevalent pattern of eye illness that has lately been globally increasing. Several human organs, including the retina, heart, and kidneys, are damaged by hypertension [
5]. Among these consequences, hypertensive retinopathy (HR) [
6] is the main cause of cardiovascular disease, leading to death. Therefore, it has been identified as a global public health threat. Detecting and treating hypertension early can reduce the risk of HR. HR is notoriously challenging to detect in its earliest stages because there is a dearth of modern imaging equipment and trained ophthalmologists.
Generally, HR is an abnormality in the retina triggered by an excessive rise in blood pressure. The existence of signs such as hemorrhage (HE) spots in the retina, cotton wool patches, and micro-aneurysms are potential signs of HR-related eye illness. Early detection and proper treatment of HR-related eye illness [
7] is critical to saving human life. Many researchers have recently demonstrated that retinal experts use of a digital fundus camera to acquire microscopic retinal samples to assess the presence of micro-vascular alterations caused by HR is a cost-effective and non-invasive approach. Many HR patients are non-invasively screened using this fundus camera since it is inexpensive, simple to use, and most anatomical features of lesions are visible using this type of imaging. The major goal of utilizing automated systems [
8] is to determine the existence of HR while relieving ophthalmologists of the burden of vast image assessment quickly and at an early stage.
Many previous research efforts have established procedures for retinal image analysis, including image enhancement, segmentation of HR lesions and retinal vessels, extracting features, and supervised machine learning classifiers for HR illness [
9]. Oddly, a significant HR indication is a wideness in retinal veins, which reduces the A/V ratio (the mean artery to vein diameter). Detecting retinal vessel diameter [
10] and other features such as AVR is difficult when using an image analysis system to diagnose HR-related eye disease. Furthermore, getting precise measurements of vascular diameters is quite difficult for ophthalmologists.
As stated in the above paragraph, ophthalmologists use automatic analysis of digital fundus images to identify HR by observing abnormalities in retinography photos. These abnormalities harm HR-related lesions as well as other key eye regions. If these alterations are not noticed early enough, they can progress to HR. It was found that HR-related eye disease can be reversed [
11], whereas diabetic retinopathy (DR) related eye illness cannot be reversed. A normal retinal fundus image and images of HR affected fundi with different symptoms are illustrated in
Figure 1. Eye specialists can use CAD systems to diagnose certain retinal diseases, such as HR-related eye illness. These technologies help academics and the health care industry by self-diagnosing. Ophthalmologists use such systems to diagnose and treat eye-related diseases, particularly HR-related diseases. The authors provided a recent survey for automated HR identification techniques, as illustrated, and compared in
Section 2.
A few studies suggested that hypertensive retinopathy (HR) might be automatically identified by segmenting the retina’s structural features, such as the macula, optic disc (OD), blood vessels (BV), and microaneurysms (MA), as illustrated in
Figure 1. Deep-learning techniques also automatically extract certain structures. These characteristics were then quantitatively assessed to determine irregularities and ultimately identify HR or non-HR cases. Extracting clinical features for HR disease diagnosis is a time-consuming and challenging operation for computerized algorithms. Instead of focusing on more serious symptoms like cotton-wool patches or hemorrhages, researchers spent a lot of time only extracting the features. Microaneurysms were characteristic of early HR, but other lesions were critical in diagnosing malignant HR-related illnesses. In contrast, recently, deep learning methods have been deployed extensively for various computer vision applications and biological imaging analysis tasks.
Major Contributions
In this study, a CAD-HR system is designed to overcome the concerns that have been discussed above. This system makes use of a DSC model with residual connection and a LSVM for classification between HR and non-HR. The CAD-HR system has made several significant contributions, which are mentioned here.
This paper develops a new CAD-HR system to recognize HR eye-related disease based on a residual connection and DSC model LSVM.
To effectively extract features from HR and non-HR photos, a pre-trained CNN model that is lightweight and based on a DSC network is developed. Our utilization of a depth-wise separable CNN for feature extraction in a HR classification challenge is a first to our knowledge. To deal with a real-time context, the proposed feature extraction is more extensive.
For automatic HR classification, we employed a 75–25% train-test split utilizing the linear SVM machine learning classifier. The efficiency and performance of Linear SVM make it a popular choice, especially when working with limited data sets.
Extensive experiments were conducted by employing several statistical metrics on two publicly available and one proprietary benchmark, namely DRIVE, DiaRetDB0, and Imam-HR. A full comparative study comparing the suggested strategy to other existing DL approaches is presented.
The proposed CAD-HR system outperforms other transfer learning (TL) based architectures in recognizing HR.
3. Materials and Methods
3.1. Data Acquisition
In the interest of developing the CAD-HR diagnosis system, we must build a dataset with a sensible number of retinography images. Thus, we prepared 1310 HR retinography photos and 2270 regular retinography photos to test and compare the proposed CAD-HR diagnosis methodology. These retinography photos were obtained from two distinct web and one private source. An experienced ophthalmologist was involved in the development of the training dataset (manually distinguishing the HR and regular fundus photos from different datasets). To produce a gold standard, the medical professional inspected all the HR associated characteristics in a set of 3580 fundus photos, as shown in
Figure 1. The explanation of the three separate datasets (used to prepare our training and testing fundus set) with varied lighting settings and dimensions exists in
Table 2. All those images were re-sized to (700 × 600) pixels to conduct experiments.
In addition, the proposed CAD-HR system is tested on and trained with data obtained from a private hospital known as Imam-HR. The Imam-HR dataset comprised a total number of 3170 retinal samples where 1130 images were from HR patients and 2040 represented non-HR. The resolution of all images was 1125 × 1264 and they were saved in JPEG format. These pictures were obtained as part of a standard procedure for diagnosing patients with hypertension. These images were all downsized to standard resolution (700 × 600) using data from three datasets. In addition, a professional ophthalmologist is involved in the process of preparing HR and non-HR datasets for ground truth evaluation. A sample of a fundus image from the datasets used in this study is shown in
Figure 2.
3.2. Preprocessing and Data Augmentation
The goal for the initial image processing stage is to lessen the impression of disparities in lighting between photos, such as the fundus camera’s brightness and angle of incidence. Using a Gaussian filter, we normalized the color balance and local illumination of each fundus picture. This was achieved in two phases: (1) color space conversion, (2) lightening and contrast correction.
Consider fundus images in RGB space with a white point chromaticity of a certain value (e.g., D65). To remove gamma or camera response nonlinearities first, RGB coordinates were linearized. Then the linearized RGB components were converted to correct colorimetric CIE-XYZ coordinates. After that, the
XYZ axes were translated to linear
LMS space. Equations (1) and (2) represent the transformation matrices:
The data in
LMS color space has a lot of skews, which can be reduced significantly by converting it to logarithmic space (
L,
M,
S) = log (
L,
M,
S) [
44,
48]. The lαβ color space, a visual color space based on LMS cone reactions in the human retina, is used in this paper. The logarithmic
LMS space is then converted into lαβ using Equation (3) [
44]:
l: α (r + g + b), α: Yellow–blue α (r + g–b), β: Red–green α (r–g). This lαβ color space is used in the next section for fundus images color enhancement because the decorrelation process allows for disjointedly handling the three-color channels, streamlining the color enhancement process. After completing the color enhancement process, all the retina photos were transformed to RGB color space.
Most of the retina fundus images suffered from poor luminance, which significantly affected the detection of retina lesions or features and consequently affected the classification accuracy. As a result, the luminance enhancement procedure must be completed correctly to ensure that the upgraded photos preserve the correct color information. This can be done by producing the following luminance gain matrix LG (,):
where
r’,
g’ and
b’(
α,
β) represent the RGB values of any image-pixel, within the enhanced retinal image. As indicated, the illumination matrix is defined in Equation (5) for color invariant improvement of an RGB image.
where
is the illumination strength of a pixel at (
α,
β) space and
is the boosted illumination image. The Adaptive Gamma Correction (AGC) technique is employed at this moment to increase the brightness of a given image. A quantile-based histogram equalization approach [
49] was used for such cases. The q-quantiles are a set of ‘q’ different values that divide the intensity distribution into q equal proportions.
Figure 3 illustrates the results of color space conversion and light and contrast adjustments.
A substantial dataset is necessary for the CNN model to attain good performance accuracy. Moreover, the training of the CNN model using a limited dataset caused low performance due to an overfitting problem. This work applies the data augmentation approach to increase the dataset size and prevent the proposed CAD-HR system from overfitting. The augmentation strategy and parameters are shown in
Table 3. Following the data augmentation, the dataset size increased to 9500 retinal samples. A visual example of the proposed data augmentation approach is shown in
Figure 4.
3.3. CNN and Transfer Learning
Recently, DL frameworks have surpassed conventional methods in various computer vision problems, such as iris recognition [
66], face detection [
67], classification [
68], and many more. According to the findings of these studies, the CNN performs better than traditional methods.
Figure 5 shows the fundamental notion of CNN. To categorize the input image into predetermined classes, the CNN uses a neural network (shown in
Figure 5 as fully connected layers) that uses feature vectors to extract image features. Even though CNN has obtained remarkable results for many image-based systems, it has several problems. The two key issues associated with the CNN model are the expense in performing computations and overfitting. As CNN requires a long time to process, it is challenging to implement CNN models on a single general-purpose machine comprising fewer central processing units (CPU). Fortunately, the introduction of graphical processing units (GPU) [
69] has solved this problem. By employing numerous CPUs running in parallel and GPUs, CNN can be used in real-time systems. Another issue associated with CNNs is over-fitting. The CNN, as previously discussed, is developed by learning millions of trainable parameters. Consequently, CNN-based systems often require an enormous amount of training data. Although this problem has been addressed in several techniques, including data augmentation and dropout, a massive amount of data is still used in such CNN systems for training. This problem was recently solved using transfer learning (TF) art [
70]. With the TF method, we can use a CNN, trained with sufficient data for one problem to solve a different problem. This strategy has been proven useful for many problems, particularly when significant training data, such as medical images [
71], are limited.
Since 1995, research that is based on transfer learning has seen an increase in popularity. This type of research is also well-known by the name of multitask learning [
72]. Multitask learning is a method of learning numerous tasks simultaneously that is relatively equivalent to other learning schemes for the objective of knowledge transfer [
73].
Figure 5 shows how a standard ML technique for various tasks differs from a transfer learning method for the same tasks. The first task in transfer learning is trained on large benchmarks, and then the second task takes advantage of the knowledge gained from the first task to learn more quickly and improve its accuracy. Based on the availability of annotated data, TF tasks are classified into mainly inductive, transductive, and unsupervised transfer learning. Usually, we employed a pre-trained deep learning model, one of the most common types of inductive TL strategy that employs source domain and job, to fine-tune different layers for the target goal. This allowed automated algorithms to get the results more precisely.
There are two ways that the target predictive function FT(.) in the target domain can be improved: first, it can be improved by combining the knowledge from the source domain DS and a task learning task LTS, and second, it can be improved by utilizing the knowledge from the target domain DT and a task learning task LTT, where DS ≠ DT ≠, or LTS ≠ LTT.
A comparison of the classic ML method and the TL methodology is shown in
Figure 6a,b. The target task (“Target Task”) in
Figure 6b and knowledge (model) generated from another ML problem are the two sources from which the TL technique learns system information. In a traditional ML system, the system model is only learned for a particular task utilizing input from a single source (as shown in
Figure 6). A CNN can be reused and applied to a new task using the transfer learning method. To establish the proposed CNN architecture of our work, we modified the architecture and employed the idea of the Xception model [
74] for our experimental work. The training of the Xception network was completed using the ImageNet dataset. The architecture of the pre-trained model and the updated model differ significantly. The Xception model is used only in the Entry flow. Additionally, the fully connected layer was employed as the final classification layer in the pre-trained Xception model. However, to classify the two classes in our proposed modified model, we used a LSVM technique.
3.4. Proposed DL Architecture
A novel classification approach named
DSC-LSVM is presented to differentiate HR-related classes, as depicted in
Figure 7. To generate the feature map, this architecture uses two convolutional layers of kernel size three, three Max Pooling layers, eight Bach Normalization (BN) layers, three residual blocks with two separable convolutional layers, and one convolutional layer of kernel size one. Following that, one flattened layer, one dense layer, and an SVM classifier with a linear activation function were used for classification. For separable convolutional layers, the kernel size was set to three, with border padding of two and stride of two. In the pooling layers, a kernel size of three and a stride of two were also employed. Using Batch Normalization after each separable convolution layer before the activation function could improve the efficiency and stability of deep neural network training [
75].
Additionally, the rectifier linear unit exponential linear units (RELU) function is used to define the output value of the kernel weights and convolutional layer for each separable convolutional layer. It offers a negligible value that enhances the performance of categorization. Finally, a linear SVM (LSVM) classifier was used to categorize the fundus sample into true and false images after the feature had been flattened, densely packed, and reduced in size. Algorithm 1 provides a description of the mechanism that underpins the suggested network model for the feature extractor.
The proposed work uses the
DSC instead of the regular convolution structure to reduce the number of calculations while keeping the CNN network’s generalization. This is shown in
Figure 8. The
DSC model divides the calculation of the convolution into two stages: (1) depth-wise convolution was used to filter each channel, and (2) point-wise convolution was used as the kernel for combining. The outputs of the depth-wise convolution were then combined using a point-wise convolution. A criterion convolution generates a new output set by simultaneously filtering and merging the input.
This new art reduced the computation time and parameter size. Equation (6) shows how to compute the total size of a single convolutional layer of weights (in bytes) [
76].
where C represents the total number of channels, Nf indicates the total number of filters, Lf indicates the length of the filter, and Wf indicates the width of the filter. The computation of the parameters in the separable convolution layer is shown in Equation (7). By utilizing the separable convolutional layer, the number of parameters was reduced compared to the original CNN.
where L indicates the convolution layer, C shows the total number of channels, Nf indicates the total number of filters, Lf represents the length of the filter, and Wf indicates the width of the filter.
The
DSC offers two advantages for constructing a deep learning model: (1) it can help minimize parameters, and (2) it can be used to improve model generalization. Thus, it was reasonable to assume that
DSC improved detection accuracy and training effectiveness.
Algorithm 1: Implementation of the DSC model for feature map extraction |
Input | Array X |
Output | Extraction of feature map x = (x1, x2…, xn) |
Process | |
Step 1 | Input raw data normalization. |
Step 2 | Definition of functions. |
Step 3 | The inputs to the Conv-batch Norm block is array X, which contains several filters, and kernel sizes. |
| - a.
X = Conv (X) and
|
| - b.
X = BN (X) is then applied.
|
Step 4 | Separable Conv2D was utilized instead of Conv2D. |
Step 5 | Constructing DSC network |
| - a.
The process begins with two Conv layers, each containing 32 and 64 filters. Subsequent activation of the ReLU occurs after each of them.
|
| - b.
After that, add is utilized to use Skip Connection.
|
| - c.
There were three different skip connections used. Following the Maxpool layer in each Skip Connection are two Separable Conv levels. The conversion factor for the skip connection is 1 to 1, and it has two strides.
|
Step 6 | Following that, the feature map x = (x1, x2,..., xn) was formed and flattened with the help of the flattened layer. |
The residual connection, which can also be referred to as a skipped connection, allows a network to avoid using either two or three of its levels.
Figure 9 depicts the residual single link block in the deep learning network. As can be seen in
Figure 3, our suggested CNN model incorporates three residual blocks [
77]. The use of residual connectivity in the deep learning model has many benefits, one of which is that it enables the preceding layers of the model network to contribute the function of the level below it to the layer above it. The last connection can be set up in a formal way.
The function O(x) shows the real output value, and the residual layer learning is indicated by R(x) in the network input x.
SVM is a ML classification technique that yields superior results relative to other classifier types and is regarded as an effective classifier for solving various practical situations [
78]. In computer vision or image classification tasks, authors suggested a depthwise separable CNN [
79] instead of deep learning or machine learning classifier. For this work, we decided to employ the Linear SVM classifier because of its ability to deal with tiny datasets and perform well in high-dimensional environments [
80]. Using linear SVM was a logical choice, given that we were working with binary classification tasks. A further purpose for employing linear SVM was to improve the efficacy of our method and to identify the optimal hyperplane that divides the feature space of diseased and normal retinal images [
63]. Typically, an LSVM accepts a vector x = (x
1, x
2..., x
n) and returns a value y ϵ Rn, which can be described as:
In the preceding Equation (4), Wei represents the weight, and b represents the offset; both Wei and b belong to R and are acquired through training. Xiv represents the input vector and is assigned to class 1 or -1 based on whether y is greater than or less than 0.
To produce the optimal hyperplane for data separation, we must minimize:
In Algorithm 2, the mechanism of our suggested classifier is detailed.
Algorithm 2: Proposed LSVM Classifier |
Input | Extracted feature map x = (x1, x2,....., xn) with annotations y=0,1, Test data Xtest |
Output | Classification of normal and abnormal samples |
Process | |
Step 1 | Initially, the classifier and Kernel Regularizer L2 parameters are defined for optimization. |
Step 2 | Construction of LSVM |
| - a.
The training process of LSVM is completed using extracted features x = (x1, x2,....., xn) by our Algorithm 1.
|
| - b.
For the generation of the hyperplane, use Equation (6).
|
Step 3 | The class label is allocated for testing samples Xtest using the decision function of the equation below. Xtest = (Wei, Xiv) + b |
4. Results
To assess the effectiveness of the proposed deep learning-based CAD-HR approach, in all experimental trials, datasets were split into a training set and testing set with a 3:1 split, meaning that three-quarters of the data is used for training and the remaining one-quarter is used for testing. Three separate internet and one private source provided these retina images. To execute feature extraction and classification tasks, all 3580 photos were scaled to (700 × 600) pixels. Our CAD system is created by combining the two deep learning techniques named DSCNN with residual connection, and linear SVM. A computer with a core i7 processor, 16 GB RAM, and a 4 GB Gigabyte NIVIDA GPU was utilized to implement and program our CAD-HR system. Deep learning libraries named TensorFlow (version 2.7) and Keras with windows 10 professional 64-bit edition are installed on this PC.
Various kernel dimensions are used in order to produce feature maps from the preceding stage to create/train the CNN architecture. Because the kernel dimensions (3 × 3 or 5 × 5) are generally used in this project, the convolutional layer’s weights parameters are also changed. The convolutional layers are convoluted using varied window sizes and values acquired from each feature map’s excitation objective function. As a convolutional layer, a similar process was utilized to create the pooling layer. There is only one difference: to optimize the characteristics gained from the previous layer, sliding steps of two and a window size of 2 × 2 are used. This phase boosts the network’s overall speed while lowering the convolutional weights. The output of this average pooling is fed into a LSVM that is fully connected. This FC stage is employed for distinguishing among HR and not-HR situations.
Table 4 illustrates the number of parameters in the convolution layer of the proposed CAD-HR model.
To assess the proposed CAD system’s performance, statistical analysis was used to calculate the accuracy (ACC), specificity (SP), and sensitivity (SE) values. These metrics are utilized to evaluate the generated CAD system’s performance and to compare it to previously developed systems [
48]. This section provides a comprehensive discussion on the various experiments performed to assess the efficiency of the proposed CAD-HR model. These experiments are categorically demonstrated in the subsequent paragraphs.
Experiment 1: A cross validation testing scheme of 10 fold is utilized in the experiment 1 for comparing the resulted AUC metric to other DL methods. Overall, the AUC metric was primarily utilized to judge the categorization accuracy. The developed CAD system’s performance has been quantitatively measured as shown in
Table 5. The developed system achieves very good outcomes with regards to SE (94%) and SP (96%), as well as a lower training error (0.76) in identification of HR eye illness.
Experiment 2: Several studies were carried out in experiment 2 to illustrate the efficacy of the proposed CAD system in detecting HR from input retina images. The optimum network design for retinal images HR categorization tasks was determined through experimentation. The developed CAD system shows the efficacy of the feature learned based on the optimum network design.
Figure 10 shows a visual example of classification results for various input fundi.
Figure 10a shows HR-classified retinography samples with various signs such as cotton wool patches and hemorrhages. A non-HR fundus picture is also shown in
Figure 10b.
Experiment 3: On the retinal fundus datasets with pre-processing, the training phase is accomplished in various CNN and DRL architectures with eight to sixteen stages to perform comparisons.
Table 5 summarizes the findings. It is worth noting that all the CNN and RNN deep-learning models used the same number of epochs in their training. With a validation accuracy of 59%, the highest performing network is chosen, and identical traditional Convolutional Networks were trained. Sensitivity, specificity, accuracy, and AUC metrics were utilized to compare the recital of traditional CNN, DRL, trained-CNN and trained-DRL models to the developed CAD system, as shown in
Table 6.
On this dataset, the trained-CNN model performed well, with the following values: 81.5%, 83.2%, 81.5%, and 0.85 for SE, SP, ACC and AUC, respectively. Metrics’ results of 82.5%, 84%, 83.5%, and 0.86 for SE, SP, ACC and AUC, respectively, were attained with the DRL model. The developed CAD-HR system obtained higher results than deep-learning models by merging the capabilities of DSC on four annotated fundus sets with residual connections that are not disposed to overfitting difficulties.
Experiment 4: We begin to test our proposed
DSC-CNN model on the DRIVE and DiaRetDB0 datasets using LSVM training accuracy and validation accuracy, coupled with training loss and validation loss function. As can be seen in
Figure 11a,b our proposed model exhibits great performance and obtained a training accuracy and validation accuracy of over 100% while employing a total of only ten iterations of the training and validation processes. Additionally, we were able to acquire a very low loss function for both the training data and the validation data that was below 0.1, which is evidence that our proposed strategy is effective.
To adequately evaluate classification performance, we must first collect the confusion matrix. The discordance between the predicted and real labels was revealed by the confusion matrices. The column labels showed the anticipated label for each column, while the row labels showed the genuine label for each row. Consequently, based on the test set identified by our model, we were able to acquire the results of two categories, namely, HR and non-HR samples. Using 110 samples from the DRIVE and DiaRetDB0 datasets, we evaluated the performance of our model. The confusion matrix in
Figure 11b makes it abundantly evident that our suggested model successfully identified every HR image utilized in the DRIVE and DiaRetDB0 datasets. Even with the limited dataset training sample for the model, it might be discovered that the p label for each category does not become muddled. Both categories have been accurately categorized. Consequently, the confusion matrix demonstrated that our suggested model, CAD-HR, has a greater detection accuracy.
Experiment 5: For determining how well our proposed CAD-HR system works, we have utilized a different dataset called Imam-HR in this experiment. First, we performed an analysis of the model’s accuracy during training and validation, as well as the loss function, using both the training data and the validation data. The training and validation accuracy of the CAD-HR model employing Imam-HR is shown in
Figure 12. As demonstrated in
Figure 12, our model performs well in both training and validation. We achieved 100% accuracy on training and validation data, demonstrating that our technique functions well on the retina Imam-HR dataset images.
Next, in order to comprehend our CAD-HR algorithm’s performance better. A confusion matrix that reflects the classification outcome of the suggested CAD-HR model has been obtained. It was found that our suggested approach successfully detected all the HR samples of Imam-HR, as shown in
Figure 13. This indicates that every sample is accurately categorized based on the projected value. As a result, it demonstrates that our technique also has impressive Imam-HR detection accuracy.
4.1. State-of-the-Art Comparisons
Few research attempts used deep learning methodologies for identifying HR among retinal pictures. In this research project, two deep learning models such as Triwijoyo-2017 [
7] and Pradipto-2017 [
33] were selected to compare to the results of our developed system.
Table 7 shows how the proposed CAD system compares to other deep learning approaches (Triwijoyo-2017 [
7] and Pradipto-2017 [
33]).
The Triwijoyo-2017 system [
7] was trained using (32 × 32) scaled gray-level batches translated out of a big fundus set of 9500 retinal pictures, 5000 of which were non-HR and 4500 of which had HR signs. The CNN produced either HR or non-HR decisions. We implemented a similar model to that in [
7].
Table 7 presents the results of several DL approaches on this 3580-image collection. The performance of the Triwijoyo-CNN-2017 and our systems were quantitively compared with regards to of SE, SP, ACC, and AUC indicators. Considering 3580 retinal pictures, the Triwijoyo-2017 system achieved 78.5%, 81.5%, 80%, and 0.84 for those metrics, respectively. The CNN architecture proposed in the Triwijoyo-CNN-2017 system fails to exert operative and broad features that can be used to distinguish between HR and non-HR.
Consequently, in comparison, the developed CAD-HR system produced better results, with of 94%, 96%, 95%, and 0.96 for SE, SP, ACC, and AUC, respectively. The authors of this research project cited the identification accuracy of 98.6% in Triwijoyo-2017 [
7]. But we notice that they used a very restricted set of input fundi for training, as they only contain 40 retina photos, 20 of which are normal and 20 of which are HR which lead to the high level of precision. Accordingly, our CAD-HR system was tested and trained on a huge dataset. Therefore, we attained a classification accuracy of 95%.
The CNN-RBM model was only used as a classifier in the Pradipto-2017 [
33] system. For subdivision and excavation of features, the system engages image processing algorithms. The classification stage input was a feature vector consisting of A/V Ratio and OD, more willingly than the retinal images. We used the same processes and tested them on our dataset of 3580 photos to compare with Pradipto 2017 [
33].
Table 7 shows that our system significantly outperformed the Pradipto-2017 model too. In this paper, architecture composed of one DL method called
DSC with LSVM was constructed by means of a
DSC-LSVM technique. Therefore, we were able to achieve greater accuracy against recent HR detection approaches. One more stage was integrated to the architecture in the improved
DSC model, three residual connections, which contained localized and specialized features, respectively. In comparison to features recovered by the Triwijoyo-CNN-2017 [
7] and Pradipto-2017 [
33] models, these
DSC with three residual attributes were more generic. We updated residual connections with skip paths to resolve the issue of extracting HR-related grazes using a fresh training technique rather than employing pre-train architectures to create features.
Moreover, we have compared it with three other transfer learning-based (TL) CNN architectures, such as VGG-16, VGG-19, ResNet-50, Inception-v3 and AlexNet. To assess the performance, the preprocessing and data augmentation techniques were first applied to these TL architectures. We employed data augmentation to get around the issue of most CNN architectures needing a lot of labeled data for training. To perform comparisons with TL algorithms, we used, by default, hyper-parameters with 200 epochs. A visual example of the performance of TL algorithms is displayed in
Figure 14. The obtained results demonstrate that our CAD-HR system is outperformed when compared to other state-of-the-art TL-based CNN architectures. On the other hand, in our CAD-HR approach, we intend to deploy more dense separable CNN architectures. Furthermore, merging the deep features of many architectures is an intriguing strategy that can boost performance. The performance has significantly improved when comparing the ResNet50 to other TL models such as VGG-16, VGG-19, Inception-v3 and AlexNet. Therefore, we suggest using ResNet50 to classify HR eye-related diseases. Furthermore, we draw the conclusion that even the ResNet50 model cannot match the performance of the proposed CAD-HR model, which is trained with the optimal hyper-parameter setup. Even though TL-based approaches are often employed to improve classification task accuracy, they also significantly increase the architectural complexity of the model and might not make a meaningful difference in how well deep learning models perform when configured with the best hyper-parameters.
4.2. Computational Cost
The proposed image preprocessing phase used to adjust the lightness and remove noise from the input image took 25 s. Similarly, the feature learning and extraction step by the proposed CAD-HR system took 16.5 s on average, while the LSVM classifier took only 3 s to be created. Thus, the LSVM training to perform binary classification of hypertensive retinopathy into HR and non-HR took 5.20 s on a fixed number of 30 iterations. However, once the training is complete and the test is run, classifying the image only takes an average of 8.12 s. The proposed CAD-HR took 2.1 s higher to compute than [
7,
33] utilizing the convolutional neural networks (CNN) model. This is because we have implemented a novel technique by employing
DSC, three residual blocks, and LSVM in a perceptually oriented color space.
This computational time complexity is generalized in terms of the training of the network. In the development of the CAD-HR system, there are four blocks with residual connections that can be represented as i, j, k, and l, with t training examples and n epochs. The result was O (n × t (i + j + k + l)). This time complexity can be reduced by using tensor processing units (TPUs), which are provided by the Google cloud. In practice, the TPUs achieved substantial speedups for DL models and utilized less power. This point of view will be addressed in future work as well.
5. Discussion
CAD-HR was implemented by utilizing a trained CNN model as an input to DSC architectures to classify HR using three consecutive residual blocks and a linear SVM. The learning process of specialized features was completed by introducing a multilayered hierarchical framework without utilizing complex image processing and feature selection methods. This multi-layer architecture learnt features directly from the input image with the learning methods, eliminating the need for human intervention. The Xception model is modified to include DSC and residual blocks to create more generalizable features to develop CAD-HR architecture. Based on a scratch-based training technique, the depth-wise convolutional layer obtains localized and trained features from four HR-related lesions. The CNN model for learning deep features is made up mostly of convolutional, pooling, and fully connected layers. To construct this model, those layers must be trained and found to be effective in extracting useful information. These features are not optimal for recognizing HR in retinography pictures. As a result, deep residual connections were integrated, which provided highly specialized features rather than feature-based categorization methods that required human intervention.
The success in detecting HR was made possible by an autonomous features learning method. For the diagnosis of HR disease, however, the hand-crafted based classification techniques rely on the pre-processing, segmentation, and localization of HR-related data, which are computationally expensive algorithms. Compared to other crucial symptoms like cotton wool spots or hemorrhage recognition, researchers only exerted considerable attention to extract the necessary elements.
In the past, a few classification systems for HR-related and non-HR-related eye diseases were developed, as described in
Section 2. These systems use deep-learning methods instead of traditional machine learning arts. No datasets with clinical expert annotations defining these HR-related lesion patterns are available for training and evaluating the network. Therefore, it is challenging for computerized systems to identify these illness traits. According to the available literature, the authors used manually constructed features to train the network and assess the performance of standard and cutting-edge deep learning models. Consequently, an automatic technique is necessary to identify optimal features. Compared to the conventional approach, deep-learning models produce superior outcomes. However, other models utilized trained models built from scratch to automatically learn features, but they all shared the same weighting technique at each level. Subsequently, it may be difficult for layers to transfer accurate decision-making weights to deeper network levels.
The CAD-HR system is created in this study to overcome the problems by classifying images into HR and non-HR using two multi-layer deep learning techniques rather than concentrating on image processing algorithms. The CAD-HR system’s significant contributions are listed here. This research developed two new deep learning arts using a convolutional neural network (CNN) and residual blocks. Four distinct HR lesions were used to train the first CNN model, which was then used to determine the hierarchy of features. To enhance the efficiency of the learning procedure, the second residual blocks were used to identify the feature maps with the most useful information. It is the first system built in this study for the purpose of classifying HR, and it is based on perceptually oriented color space. The deep features are classified utilizing the
DSC model and LSVM classifier. According to our knowledge, this is the first attempt to identify HR illness automatically. To construct the CAD-HR system described in this paper, the multilayer deep learning network must be trained on many samples to provide greater feature generalization. Using a DSCNN with three residual blocks, a new deep learning-based technique was created to learn features automatically. However, the proposed CAD-HR system misclassifies a small number of samples. In
Figure 13, a visual illustration is shown. It was an instance of hypertensive retinopathy (HR) with a severe level, and we will address this problem in future research.
For HR recognition,
Table 6 and
Table 7 indicate that the CAD-HR system obtained higher accuracy than the [
7] and the [
33] systems. This is because the CAD-HR system is built with trained features employing DSCNN architecture and deep residual learning techniques. Furthermore, the DRL design was modified by the addition of three residual blocks that extract localized and specialized information. The authors updated deep residual network blocks with three shortcuts to solve the challenge of recovering HR-related lesions through the from scratch training technique rather than employing off-the-shelf pre-trained models to define specialized features.
The CAD-HR system for HR recognition can be enhanced in the future by offering a larger collection of retinography images that will be gathered from various sources. Instead of merely employing deep features, it would be possible to include hand-crafted features to improve the model’s classification accuracy. Since several research groups used the saliency maps technique for segmenting DR-related lesions, those lesions were then extracted from retinography pictures using a train classifier. Only the segmentation step was done in those studies. Future integration of these saliency maps will improve HR eye-related disease classification precision. Furthermore, the various HR severity levels will be evaluated in the future. Much recent research has found that clinical characteristics are crucial indicators for determining the severity level of HR. The extraction of those HR-related lesions with varied thresholds, on the other hand, will be used to detect the disease level of HR. As a result, clinicians may find it advantageous to use them to tackle the problem of hypertension.
We have developed this CAD-HR system to recognize only two classes such as HR and normal. However, the CAD-HR system has not been tested on the five-stages of the HR system. In addition, the optimization of hyper-parameters is required to fine-tune this deep-learning model. Apart from the time complexity, the overall computational time can be decreased by adding a block-based fine-tuning strategy. Also, the preprocessing step can be enhanced in terms of utilizing different color space models. The ConvMixer architecture should also be tested against the proposed model, which should be one of the future works.