1. Introduction
In modern medicine, certain disease diagnoses and clinical treatments are based on findings obtained from medical images, such as X-rays, Magnetic Resonance Imaging (MRI), and Computed Tomography (CT). This is also applicable to cervical radiculopathy.
Cervical radiculopathy is often a result of disc herniation or cervical spondylosis, resulting in pain in the neck and arm and nerve paralysis or sensory loss by pressing on nerves in the arm [
1]. As a symptom underlying the diagnosis of cervical radiculopathy, foraminal stenosis refers to the narrowing of the foramen between the cervical spine, in which the nerves extending from the cervical spine are compressed, causing pain or decreased sensation and paralysis in the arms. Foraminal stenosis arises as disc degeneration with age causes decreased disc height and foraminal narrowing [
1]. Therefore, the presence of foraminal stenosis is important in determining early diagnosis and treatment for cervical radiculopathy. To determine the presence or absence of foraminal stenosis, this paper focuses on deep learning-based approaches to diagnosing foraminal stenosis.
Foraminal stenosis is analyzed by experts based on radiographs, such as X-rays, MRIs, and CTs. Diagnosis is mainly made based on MRI because it has the highest accuracy, and the predictive success rate is about
[
2]. However, MRI diagnostic tests are a potential burden to patients because they are expensive. In contrast, X-rays are relatively inexpensive, but it is difficult for experts to diagnose foraminal stenosis using only X-rays. However, this does not mean that there are no diagnostic clues on X-rays. According to [
1], it can be confirmed even with the naked eye that the foramen is narrowed in the X-ray oblique view. Therefore, this paper aims to propose a novel foraminal stenosis classification model by applying a deep learning algorithm to X-ray images in order to learn features that are difficult to identify with the naked eye automatically and efficiently. It is expected that the proposed model, as an auxiliary tool, will help experts diagnose foraminal stenosis more consistently. Furthermore, it can be expected that the patient will be relieved of the burden of the cost of the examination as the proposed model uses X-ray images rather than MRIs or CTs. In addition, the diagnosis of foraminal stenosis can be automated with the proposed classification model, without requiring much professional expertise.
To date, there have been few cases of foraminal stenosis diagnosis by applying deep learning algorithms to cervical spine X-rays. Most studies using X-ray images were mainly focused on chest X-ray images [
3,
4,
5] or lumbar spine radiographs [
6,
7] rather than the cervical spine. Therefore, this paper proposes a classification model to diagnose foraminal stenosis by applying deep learning algorithms to cervical spine X-ray images. In addition, most studies related to diagnosing spinal diseases used MRIs or CTs [
6,
8,
9], which are expensive for patients. This study aims to diagnose foraminal stenosis using X-ray images only, which will be less expensive. It is often difficult to obtain a large amount of data owing to the characteristics of medical data. This paper proposes methods that can substantially increase the accuracy of the model with only a small amount of data by using various image preprocessing and data augmentation methods.
Our contributions can be summarized as follows: (i) we introduce a new technique for classifying foraminal stenosis. (ii) We propose a classification model using cervical spine X-ray images. (iii) We demonstrate that the proposed methods are suitable for a small number of X-ray images.
First, to detect foraminal stenosis, the proposed model needs to focus on the foramen. In an original X-ray image, the cervical spine oblique view contains not only the foramen but also other bone parts such as teeth and skull, so we cropped the input image only to the Region of Interest (ROI). In order to crop the desired section of the image, we applied YOLOv5 [
10] to learn the ROI, as described in
Section 3.1.
Second, as the Convolutional Neural Network (CNN)-based model tends to be sensitive to the input, to emphasize the foramen part, we applied Histogram Equalization, which is one of the most popular methods for X-ray images [
11,
12]. Histogram equalization makes the image clearer because the contrast between the bone part and non-bone part is emphasized. As shown in
Section 3.2, such image preprocessing can help CNN-based approaches [
13] learn X-ray classification models more effectively.
Third, in the case of the oblique view of the cervical spine X-ray used in this study, the labels of the left view and the right view may be different even for the same patient. Therefore, the left and right X-ray images are learned separately. As the amount of data used in this paper is limited, we perform data augmentation by using flipped images of the left view for training the right view, and vice versa. As a result, we can double the number of images for generating the model, as described in
Section 3.3.
Fourth, as the CNN-based model tends to be sensitive to input, by applying the Spatial Transformer Network (STN), the slope of the cervical spine, which is different for each person, is aligned into a similar slope. In
Section 3.4, we show the effectiveness of STN in increasing the accuracy of the model for diagnosing foraminal stenosis.
Finally, in
Section 3.5, this paper proposes a novel foraminal stenosis classification model based on ResNet50 [
14], which utilizes cervical spine X-rays by effectively processing low amounts of medical data. Transfer learning [
15] was performed using a pre-trained model to utilize the small amount of data, and fine-tuning was applied to reflect the characteristics of the medical data domain to the parameters of the pre-trained model.
Table 1.
Overview of studies using deep learning for medical images, especially X-ray images or spine data.
Table 1.
Overview of studies using deep learning for medical images, especially X-ray images or spine data.
Reference | Task | Method | Modality | Metric |
---|
Jamaludin et al. 2017 [16] | Spinal stenosis diagnosis | 3D CNN | 12,018 MRI images of 2009 subjects | Accuracy (Acc) |
Won et al., 2020 [17] | Spinal stenosis diagnosis | R-CNN, RPN, ResNet50, VGG | 12,018 MRI images | Acc, F1 score |
Dong et al., 2018 [18] | Chest organ segmentation | ResNet18 | Chest X-ray images | IoU |
Saiz et al., 2020 [19] | Lung detection | VGG16, Fast R-CNN | 987 Chest X-ray images | Acc, Specificity, Sensitivity |
Brunese et al., 2020 [20] | Covid-19 classification | Variant VGG16, Grad-CAM | 6523 Chest X-ray images | Acc, F-measure, Specificity, Sensitivity |
Al-Kafri et al., 2019 [21] | Spinal stenosis diagnosis | SegNet, DeepLab, RefineNet, VGG16 | 48,345 MRI images of 515 subjects | IoU, Acc. BF-score |
Fan et al., 2020 [22] | Spinal stenosis diagnosis, semantic segmenatation | 3D U-Net | 1681 CT images of 31 subjects | DC |
Gaonkar et al., 2019 [23] | Spinal stenosis diagnosis, Disc segmenatation | Deep U-Net | MRI images of 1755 subjects | Dice score, Hausdorff distance, and average surface distance |
Bharati et al., 2021 [24] | Covid-19 classification | CO-ResNet: ResNet101, ResNet50, ResNet152 | 5935 Chest X-ray images | Acc, AUC, F1-score, Precision, Recall, Sensitivity |
Nayak et al., 2021 [25] | Covid-19 detection | ResNet34, ResNet50, GoogleNet, VGG16, AlexNet, MobileNetV2, InceptionV3, SqueezeNet | 406 Chest X-ray images | Acc, AUC, F1-score, Precision, Specificity, Sensitivity |
Chen et al., 2022 [26] | Scoliosis diagnosis | Faster R-CNN, ResNet50, LBP, SVM | 3600 Spine X-ray images | AUC, Precision, Specificity, Sensitivity |
2. Related Work
With the advent of CNN, the image classification field has developed rapidly. Starting with AlexNet [
27], the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) error decreased remarkably. Models that have performed well in the ILSVRC, such as VGGNet (VGG) [
28] and ResNet [
14], are still widely used as pre-trained models. These models are used not only for datasets similar to ImageNet but also for medical data. As presented in
Table 1, the VGG model has been widely used in COVID-19-related models that use chest X-ray images after the outbreak of COVID-19 [
19,
20,
24,
25]. In addition, the ResNet model has been used upon various types of medical images [
17,
29,
30] including X-ray images [
18,
24,
25,
26].
According to
Table 1 and Qu et al. [
31], a paper on current development and prospects of deep learning in spine images, most of the spine-related studies [
16,
17,
21,
22,
23] so far have proposed models using MRI or CT images. Furthermore, these studies considered the lumbar spine, not the cervical spine. Among these studies, VGG and ResNet were used in Won et al. [
17]. Other spinal-related studies [
21,
22,
23] were segmentation-related studies, not classification model-related studies. Therefore, a U-Net based model was used in these studies [
21,
22,
23]. SpineGEM [
6] is similar to this study in that the proposed model classifies spine diseases based on the VGG-M model. However, SpineGEM used MRI images, not X-ray images, to classify diseases, and they classified the diseases of the lumbar spine, not the cervical spine. Another previous study related to spine diseases [
8] detects foraminal stenosis in the same manner as in this study but uses MRI to detect foraminal stenosis of the lumbar spine, not the cervical spine. In this previous study [
9], an MRI image of the lumbar spine was used, and the accuracy of the lumbar spine disc state classification was increased to
by applying the ROI method and fine-tuning several pre-trained models. Most of the X-ray-related studies [
18,
19,
20,
24,
25] in
Table 1 are studies using chest X-rays and ResNet-based models. In [
32], the study used spine X-ray images to detect scoliosis. They performed experiments using a variety of models, i.e., ResNet34, ResNet50, GoogleNet, VGG16, AlexNet, MobileNetV2, InceptionV3, and SqueezeNet, to obtain the scoliosis classification model. The best-performing model of [
32] is a ResNet-based model. In addition, the ResNet50 model used as the base has parameters trained on the ImageNet dataset. However, there is a difference between the ImageNet dataset and the X-ray dataset used in this study. In a study [
33] that performed gender detection using cervical spine X-rays, the model was trained by fine-tuning the pre-trained model. This study fine-tuned the ResNet50 model to fit the model parameters to the X-ray dataset. Therefore, this paper proposes a novel ResNet50-based model with transfer learning and fine-tuning using a pre-trained model’s parameters, as there is not much data to learn.
YOLOv5 [
10] shows high-accuracy performance in object detection, and many studies [
33,
34,
35] apply YOLOv5 for object detection. Consequently, this paper employed YOLOv5 to crop the ROI part from cervical spine X-ray images to remove the unnecessary parts for learning the proposed model.
The study related to X-ray image preprocessing [
11] suggests that the Histogram Equalization-applied dataset’s accuracy was
higher than the non-Histogram-Equalization-applied dataset’s accuracy. For this reason, this paper suggests Histogram Equalization as a preprocessing method to improve the performance of the proposed model.
Thus, transfer learning, fine-tuning, and ROI methods were applied using the pre-trained model to increase the accuracy of our proposed model.
5. Discussion
This paper proposes a novel model to classify the presence or absence of foraminal stenosis, a diagnostic component of cervical radiculopathy, using X-ray images. It also suggests effective methods for preprocessing and augmentation to overcome the challenges arising from the limited number of X-ray images available for training. The accuracy of the best-performing model is approximately
. In addition, fine-tuning and transfer learning are suitable when the pre-trained model is used in distinctive domains such as medical datasets. As a preprocessing method, we demonstrate that HE and STN are the most effective methods for X-ray images, as summarized in
Table 5. HE increases the contrast between bone and non-bone parts in X-ray images, so the performance of the model is improved. STN learns the spatial features of the slope of the cervical spine and makes the slope, which varies for each patient, align to reduce the geometric invariance of the input dataset for the CNN-based model and improve the performance of the model. We also suggest that Flip is a suitable method to overcome the lack of cervical spine X-ray data in this study. Flip is a specialized method for cervical spine X-ray data to augment considering the characteristics of the data as shown in
Table 5. The proposed model can be a help as a reference to the clinical judgment process for cervical root disease, including determining whether to perform MRI or diagnostic root block. In addition, we expect that the proposed model can contribute to reducing the cost of expensive examinations by detecting foraminal stenosis using X-ray images only rather than MRIs or CTs. It is expected that the proposed model, as an auxiliary tool, will help experts diagnose foraminal stenosis more consistently. While foraminal stenosis is difficult to diagnose by a physician only based on oblique radiograph images, a deep learning model could have detected features not easily recognized by human eyes. Recent clinical studies [
39] have also suggested that deep learning models could give feedback to physicians regarding radiograph interpretation and that clinicians could learn from deep learning models. The proposed model can be expected to be further utilized later by increasing accuracy, such as automating the diagnosis of foraminal stenosis. However, a major limitation of this study is the limited amount of the labeled cervical spine X-ray data. Therefore, it is necessary to conduct further research in order to overcome this limitation and improve the performance of the proposed model by applying a more effective attention module [
40] in the future. While we applied Flip and STN to double the amount of data and to align the slope of the cervical spine, future research can conduct data augmentation using other methods to further augment the dataset by considering various angles of the cervical spine. Afterward, we plan to compare which method is more suitable for the cervical spine X-ray image dataset from the perspective of improving the performance of the foraminal stenosis classification model. Furthermore, the self-supervised learning method [
32,
41,
42,
43,
44] can be applied to unlabeled data to increase the amount of the data. We also plan to apply contrastive learning [
45] to improve the metrics of the proposed model in order to strengthen the classification of the normal class.