2. Related Work
In recent literature, numerous researchers have become interested in disease diagnostics utilizing deep learning and machine learning to detect knee OA according to the Kellgren and Lawrence (KL) grading system. Yang et al. [
9] developed an automated KOA diagnostic model that can be used in a mobile device called RefineDet. This model consists of two connected models, the Anchor Refinement Module (ARM) used for region of interest (ROI) localization and the Object Detection Module (ODM) for KOA classification. The Transfer Connection Block (TCB) was introduced to share information between the two models. This model was trained and validated on 2579 X-ray images of the posterior-anterior (PA) view of knees collected from the General Hospital of the People’s Liberation Army in China. In this approach, the images were captured using an iPhone held at a distance of 40 cm. The pre-trained network was built on 2499 train images, 263 validation images, and 941 test images. The method achieved an accuracy of 95.7%.
Dalia et al. [
10] aimed to detect knee osteoarthritis from radiographic images using VGG16, Resnet-152, and DenseNet-201. The proposed models were developed and compared using 8892 radiographic knee images from the OAI dataset. The ROI was detected using YOLOv5. The best results were obtained using transfer learning of VGG16 by adding two fully connected (FC) layers with 4096 units, a third FC layer with 1000 units, and a softmax activation function. The VGG16 model achieved an accuracy of 69.8%.
Wahyuningrum et al. [
11] developed an end-to-end supervised learning model of a Deep Convolutional Neural Network (DCNN) to classify knee OA using three-fold cross validation. The proposed model consists of five convolution blocks. Each block includes a Rectified Linear Unit (ReLU) layer, a max-pooling layer, a flatten layer, a fully connected layer, and a dropout layer. The model achieved an average accuracy of 77.24% when tested using the three folders containing 527, 501, and 528 images.
Yong et al. [
12] proposed an Ordinal Regression Module (ORM) to classify knee OA using VGG, ResNet, ResNext, GoogLeNet, DenseNet, and Mobilenet. The proposed ORM splits the probability space of the scalar output into K classes using K-1 cut-points to perform ordinal regression. The neural networks were optimized using a Cumulative Link (CL) loss function and trained using 4130 X-ray images belonging to the OAI dataset. The best result was obtained using DenseNet-161 with an accuracy of 88.09%.
Ruikar et al. [
13] developed an automated computer-aided diagnosis, Osteoarthritis network (OACnet), based on a deep neural network for convincing detection of Knee OA. This model was trained on 4746 radiological images acquired from the OAI dataset. It was built from scratch and with hand-crafted feature engineering (joint space narrowing, bone spur, sclerosis, and deformation). The model achieved an accuracy of 83.74%, improved to 92.7% when combined with hand-crafted features.
Wahyuningrum et al. [
14] developed a CNN-based system using ReseNet, VGGNet, and DenseNet to diagnose knee OA. The proposed system was implemented using Long-Term Memory (LSTM) as a special kind of CNN that can memorize information and store it in complex network elements over a long period. The proposed system was trained and validated using 5148 X-ray images obtained from the OAI dataset. The LSTM combined with VGG16 achieved the highest accuracy of 75.28%.
Both Chen et al. [
15] and Wani et Saini. [
16] proposed a novel adjustable ordinal loss instead of the cross-entropy loss in the detection of knee OA using VGG, ResNet, DenseNet, and InceptionV3. To develop and compare the proposed models, 8260 knee X-ray images were used in [
15] and 1656 X-ray images were applied in [
16], all collected from the OAI dataset. The VGG19 model with the proposed ordinal loss in [
15,
16] obtained the highest knee severity grading accuracy of 70.4%, and 96.7%, respectively. The ordinal loss function-based approach was used also by Jain et al. [
17] to develop an automated method of detecting knee osteoarthritis from X-ray images, named High-Resolution Network (HRNet). The developed model was combined with a convolutional mass attention module (CBAM). HRNet is a revolutionary multi-resolution deep CNN consisting of a convolution (2D) layer followed by layers that add up the high-to-low resolution and then merge the multi-resolution in parallel for information exchange. The model was built on 8260 knee X-ray images from the OAI dataset. The method achieved an accuracy of 71.74% and a mean absolute error (MAE) of 0.311.
Yunus et al. [
18] tended to apply a specific approach based on Darknet-53 and Alexnet combined with local binary pattern (LBP) to extract deep features and identify knee OA severity from radiological images. The final classification was performed with the support vector machine (SVM) and the K-nearest neighbors (KNN). Then the classified images were localized using a combination of YOLOv2 and an open neural network exchange (ONNX) built in 24 layers for the preparation of the developed model as (i) input layer, (ii) two element-wise Affine layers, (iii) four convolutional layers, (iv) four Batch normalization (BN) layers, (v) three max-pooling layers, and (vi) four activation layers, while YOLO-v2 was built using three convolutional layers, two BN layers, and two ReLU layers. This approach was developed using 3795 X-ray images from the OAI public dataset. The model achieved an accuracy of 90.6%.
Hu et al. [
19] developed a novel deep learning architecture, Adversarial Evolving Neural Network (A-ENN), for the longitudinal progression of Knee OA severity over 4 years. The deep learning model was built with Resnet-18 and three classifiers: VGG19, ResNet50, and visual transformer model (Vit). The proposed model was trained and tested on 3294 labeled knee X-ray images belonging to the OAI dataset. The model combined with VGG19 achieved the best accuracy of 64.6%, 63.9%, 63.2%, 61.8%, and 60.2% for progression baseline, 12-month, 24-month, 36-month, and 48-month, respectively.
Raisuddin et al. [
20] proposed and evaluated Deep Active Learning (DAL) designed to classify knee OA severity. The proposed model was built with Semi-Supervised Learning (SSL) deep Siamese using the VGG and Consistency Regularization (CR) approach which ensures the model’s stability in front of the input noise. This model was trained and validated using 8953 knee X-ray images from the OAI dataset. The developed DAL achieved a balanced accuracy of 64.13%.
Huu et al. [
21] applied the transfer learning of VGG16 for the automated binary classification of KOA severity using a deep Siamese convolution neural network. The proposed model consists of six convolutional layers with a stride of 1, three convolutional layers with a stride of 2, three dropout layers, a Separable Adaptive Max-pooling (SAM) layer, and a fully connected layer. The proposed model was built using 2874 X-ray images collected from the OAI dataset. The updated VGG16 model achieved an accuracy of 89%.
Yifan et al. [
22] presented a knee OA classifier using the Transfer learning of ReseNet34 and DenseNet121 combined with a novel learning scheme that splits data into two categories based on reliability. The two models were developed using 8302 X-ray images from the OAI dataset and a hybrid loss function to manipulate the lower reliability sets. Both models, DenseNet121 and ReseNet34, achieved an accuracy of 70.13% and 68.32%, respectively.
More recently, Alshamrani et al. [
23] proposed transfer learning models based on sequential CNNs, VGG16, and ResNet-50 to identify normal and abnormal knees from X-ray images. The proposed models were trained using 3836 X-ray images collected from Kaggle. The best developed model was VGG16 which achieved a training accuracy of 99% and a testing accuracy of 92%.
Mohammed et al. [
24] suggested a binary classification and a multiclass classification for the severity of KOA from radiographic images. This approach was built using six pre-trained DNN models: ResNet101, MobileNetV2, VGG16, VGG19, InceptionResNetV2, and DenseNet121. The designed models were trained and tested on 9786 knee images taken from OAI. The best-performing model was the pre-trained ResNet101 for three classes and five classes with an accuracy of 89% and 69%, respectively.
Pi et al. [
25] presented an ensemble network based on DenseNet-161, EfficientNet-b5, EfficientNet-V2-s, RegNet-Y-8GF, ResNet-101, ResNext-50-32×4d, Wide-ResNet-50-2, and ShuffleNet-V2-×2-0. The proposed method was implemented using 8260 images from the Osteoarthritis Initiative with optimal image sizes for training the various deep learning models. The optimized model ensemble network achieved an accuracy of 76.93%.
Table 1 summarizes the binary and multiclass classification techniques utilized in recent publications based on internal validation.
5. Discussion
In this work, an automated approach to classify the severity of knee osteoarthritis from simple radiographic images according to the KL grade is presented and implemented in a desktop application using TKINTER and the DL model based on Xception. The pre-trained Xception was chosen due to its superior performance on the OAI dataset compared to other models used during software development, such as EfficientNetV2M (93.15%) and MobileNetV2 (75.60%).
Table 4 gives the multiclass classification values in comparison with similar works. The developed method achieved a validation multiclass accuracy of 99.39% and a test accuracy of 97.20%, which is a high performance compared to [
11,
12,
14]. To validate our work, we tested the model on the Medical expert database and a local database. The model achieved an accuracy of 95.20% on Medical Expert-I and 94.94% on Medical Expert-II. This minimal difference confirms the results obtained in [
34]. To validate the results, the model was tested on 30 images and compared with the diagnosis of a rheumatologist. In case of disagreement, a radiologist was consulted. Out of the 30 images, 28 were correctly identified (90%). It is worth noting that when four images were presented twice to the same doctor, the latter failed to give the same diagnosis for two images. However, our software’s diagnosis coincided with the second diagnosis of the doctor for one of the images, which was then revised. Furthermore, it is often challenging to differentiate between images of grades KL-0 and KL-1, which explains the discrepancy in identifying these two grades, but there is no clinical benefit in distinguishing between KL grades 0 and 1. In binary classification for osteoarthritis (OA) (KL < 2), and non-OA (KL ≥ 2), the new software MedKnee tested on a local dataset achieved an accuracy of 100%. Indeed, as shown in
Table 5, although only one knee was classified as doubtful (KL = 1) by the rheumatologist, the referee (radiologist) confirmed the software’s diagnosis by validating that the knee in question had minimal osteoarthritis (KL = 2). In 2023, similar software, MediAI-OA [
35], was developed using the NASNet DL model, but its accuracy is limited to 83%, which is significantly lower than the performance of our new software, MedKnee.
However, several limitations must be noted regarding the proposed approach. First, the study of the OAI dataset did not include lateral radiographs. Therefore, as noted by Ahmed and Mustapha [
1], the addition of lateral radiographs would have provided additional information. Secondly, the radiographic images of knee osteoarthritis utilized to train the proposed model are pre-processed and consist of a PA radiograph with fixed flexion and identical resolution and size, whereas raw images require pre-processing before they can be processed. The generalizability of the developed model to external databases is another major limitation. The proposed model is limited to pre-processed images with zooming, resizing, and region of interest (ROI) rearrangement. Since the model was only developed using the pre-treated images from the OAI dataset, its accuracy is not acceptable in the absence of these operations. Nonetheless, the model can be more broadly generalized without requiring significant preprocessing if it is built using a combination of many external databases and local or other institutional datasets, including bilateral and unilateral images of varying sizes and resolutions. Furthermore, the developed application helps physicians to identify knee osteoarthritis with acceptable accuracy using manual localization of the (ROI). The addition of deep learning models, such as YOLO or Faster-RCNN, would be helpful for real-time detection of the ROI. Finally, the implementation of the proposed approach in this study with enhanced computational resources, including Nvidia GeForce, can enhance the degree of accuracy achieved by increasing the number of training epochs and images. In the future, we expect to improve our application by using a large local dataset and introducing automatic ROI selection with the option of adding image segmentation. Nevertheless, the developed software achieved a high level of accuracy and can help physicians predict the exact severity grade of knee OA by analyzing radiographic images.