1. Introduction
Images have become a vital channel for conveying information due to the rapid volant of digital technology. This led to an immersive user experience and the exponential stacking of digital images in large data volumes. A decline in the quality of digital images takes place during the cyclic period of acquiring the images, performing image compression, processing, and transmitting the images to the data storage point. This inevitable fact consequently governs the immersive user experience, leading to abysmal image quality, thus making a space for an image quality assessment IQA approach to be filled in [
1]. On the other side of the spectrum, being able to robustly assess the image quality deeply affects various computer vision applications, which broadly include bio-medical imaging, self-driving vehicles, and object detection, to name a few [
2,
3].
Among the two quality assessment approaches, subjective and objective IQA, the latter is widely utilized, even though subjective IQA is a definitive and nearly error-free approach. The induction of human evaluation in the latter approach makes it a time extensive and tedious task, thus giving a selective priority to objective IQA, which automatically deduces the image quality by deploying computational methods capable of speculating the image quality in a manner that is harmonious with the image quality scores of human subjects [
4]. Objective IQA is sub-classed into full reference IQA, reduce-reference IQA, and no-reference IQA, widely labeled as Blind IQA. The availability of distortion-free versions of images, termed as pristine images, classifies the three types. The reference or pristine image is unavailable during the process of BIQA, making it a noteworthy assessment approach among the three objective IQA methods [
4,
5].
Moreover, BIQA is further classified into distortion-specific and general-purpose image quality assessment approaches. The latter approach is designed keeping in mind the fact that multiple distortions can occur in a digital image, making it superior to the distortion-specific approach. As the general-purpose approach is not limited to a specific type of image distortion, its ability to assess image quality highly relies on obtaining the features that capture the essential information of multiple image distortions [
6]. Conventional BIQA methods utilizing hand-crafted features are based on platforms such as the human visual system HVS and natural scene statistics NSS. However, due to the higher extent of similarity among BIQA and human perception, compiling such hand-crafted features thath can effectively capture and represent the image quality degradation levels becomes a tedious task [
7].
Improper lighting, constraints put on imaging lenses, and limitations of sensors can induce distortions that are more complex in nature, and such cases in digital images usually occur in reality-based scenarios. The traditional quality assessment methods usually diverged towards artificially distorted images with distortions such as JPEG, gaussian noise, and blue, to name a few [
8]. However, the bottleneck in these datasets is that they contain a limited quantity of images, which are less diverse in terms of distortions. Furthermore, as multiple distortions may exist in an image in reality-based cases, such artificially distorted images do not emulate the complex nature of distortions that can occur in real world images [
9].
Traditional BIQA models extract low-level features from distorted images and predict the image quality by applying machine learning regression models [
10,
11]. This way of applying NSS to represent image quality does not capture the local distortions in an image. A recent utilization of deep learning methods in IQA research prompted the use of convolutional neural networks CNN to extract image features in order to highlight degradation regions in an image. One major issue arises that deep learning based IQA methods are usually trained on smaller and artificially induced image datasets [
12,
13]. Moreover, CNNs are also deployed to extract features of distorted images to replace the conventional approach of hand-crafted features for IQA research [
1]. CNN-based IQA methods should be capable of capturing two quantities, such as distortion and image visual content. This mainly arises from the fact that most CNN-based IQA methods are oriented to image-visual content, which emulates the process of image recognition [
14]. The designed IQA method should be capable of handling generalization as various distortions can coexist in an image from a set of vast image distortions. Along with it, deep IQA methods required a large quantity of images to train a neural network, usually comprising more than one million images [
15].
In recent times, deep neural networks have shown great improvement in different research fields such as bioinformatics [
16,
17,
18], agriculture [
19,
20], and medical imaging [
21,
22]. The deep IQA methods utilize convolutional operations by deploying deep neural networks to extract features from distorted regions of the image [
1]. These IQA methods cannot yield accurate predictions on reality-based or authentic datasets, as such methods cannot learn the patterns and feature representations that occur in these distorted images. The distortions as induced in artificial datasets range from Gaussian blur, Gaussian noise, JPEG compression, etc. [
15]. While distortions in the authentic datasets are generally caused by naturally occurring conditions. This can be evidently observed by examining the gradient maps of both types of datasets. The high-frequency components evident in the gradient map are object-oriented in the case of a naturally distorted image, while such components are distributed across the whole image for artificially distorted images [
23].
A no-reference IQA approach that can combine both types of distorted images, is an effective solution because it is more versatile in nature. Multiple such IQA approaches countenance this issue, but still, these methods have various issues, i.e., in order to predict the image quality, high-level semantic features are utilized. Furthermore, such approaches try to combine the effects of both types of distorted images by using a bilinear pooling operation, which is inadequate for capturing the relationship between these two types of distorted images [
24].
2. Datasets
The TID2013 dataset comprises 25 reference photos and 3000 distorted images created with 24 various forms of distortion, such as JPEG compression, Gaussian blur, and white noise. The collection also contains subjective quality rankings for each warped image, which were gathered from human observers during a subjective experiment. The subjective quality scores range from 0 to 9, with 0 being the poorest and 9 representing the best.
The CSIQ dataset contains 30 reference photos and 866 distorted images created with six distinct forms of distortion, including JPEG2000 compression, JPEG compression, and Gaussian blur. The subjective quality scores for each warped image were also collected from human observers in a subjective experiment, and they range from 1 to 5, with 1 being the poorest quality and 5 being the greatest quality.
The LIVE dataset is more current, with 29 reference photos and 779 distorted images made using 5 different forms of distortion, including JPEG compression, Gaussian blur, and white noise. The subjective quality scores for each deformed image were determined by human observers in a subjective experiment, and they range from 0 to 100, with 0 being the lowest quality and 100 representing the greatest quality.
While these datasets have been frequently used to test the effectiveness of blind image quality evaluation algorithms in the literature, they do have certain drawbacks. For example, the distortion types contained in these datasets may not completely reflect the spectrum of real-world image distortions. Nonetheless, these datasets serve as a valuable standard for comparing the performance of various approaches in the field of blind image quality evaluation.
Figure 1 represents the sample images from the datasets.
3. Proposed Methodology
Figure 2 represents the overall framework of the Blind Image Quality Assessment system used in this research paper. The framework consists of three main stages: Feature Extraction, Feature Selection, and Quality Prediction.
In the Feature Extraction stage, the reference and distorted images are preprocessed using several well-known image processing techniques to extract relevant features. The extracted features include Haar wavelet coefficients, Prewitt and Sobel edge detection, and Log filters.
In the Feature Selection stage, the extracted features are passed through the WEKA machine learning tool to select the most relevant features using the Information Gain attribute algorithm aiming to reduce the in dimensionality of the feature map and improve the efficiency of the proposed model. The selected characteristics are supplied into the Support Vector Machine (SVM) Regression model in the Quality Prediction step to forecast the quality of the distorted images.
3.1. Feature Extraction
Feature extraction is an important step in Blind Image Quality Assessment because it allows vital details to be extracted from deformed images. Numerous well-known algorithms for image processing were utilized in this study to extract characteristics from source and damaged images. The images were processed using the following filters:
3.1.1. Wavelet Transform
The Wavelet Transform is a common image processing approach for extracting features. It divides the image into frequency sub-bands that capture varying levels of detail and texture. The Haar wavelet transform was utilized in this study to extract characteristics from reference and distorted images. This is how the Haar wavelet transform is defined,
where
denotes the wavelet coefficient at scale
l and position
m,
denotes the Haar wavelet filter, and
denotes the input image pixel value.
3.1.2. Prewitt and Gaussian
The Prewitt filter is an edge recognition filter that improves the image’s edges. Prewitt filter was used for the pristine and damaged images in this study, followed by a Gaussian filter to minimize noise. The Prewitt filter can be defined by following equation,
where
and
are the Prewitt filters for identifying vertical and horizontal edges, respectively, and
I is the input image. The Gaussian filter is defined as follows,
where
denotes the Gaussian filter at location
, and
denotes the standard deviation of the filter.
3.1.3. Log and Gaussian
The log filter is used to identify edges and minimize image noise. In this study, the reference and distorted images were subjected to a Log filter, which was then followed by a Gaussian filter to minimize noise. The Log filter may be defined in the following way,
where
denotes the Log filter at location
, and
denotes the standard deviation of the filter.
3.1.4. Prewitt, Sobel, and Gaussian
Another edge detection filter that improves image edges is the Sobel filter. Prewitt and Sobel filters were applied to the reference and distorted images in this study, followed by a Gaussian filter to remove noise.
3.2. Feature Selection
The suggested Blind Image Quality Assessment system relies on the choice of features to minimize the complexity of the feature vector and increase the machine’s performance. In this study, the WEKA machine learning tool is utilized to pick features.
The Information Gain attribute feature selector ranks attributes based on their capacity to convey details regarding the overall quality of the distorted images. By employing a feature as a split property, the evaluator calculates the reduction in entropy of the quality scores. The Information Gain score
is derived using the following equation for feature set
F and quality scores
Q:
In this equation, denotes the entropy of the quality scores, f denotes an attribute, and denotes the quality rating of the image determined by that selected feature. The total number of images in Q for which the feature f is the split attribute is given by . The entropy of the quality ratings for images when the feature f is the divided attribute is .
The Information Gain metric represents the quantity of information obtained regarding image quality by incorporating a feature into the model. The greater the Information Gain score, the more essential the attribute is for quality forecasting, thus increasing its position in the process of choosing the feature.
4. Quality Prediction Stage
In the final assessment stage, the selected characteristics are used to construct a regression model that is capable of predicting the score for the quality of the distorted images. Due to its capacity to handle feature vectors with high dimensionality and nonlinear correlations between features and quality ratings, Support Vector Machine (SVM) regression is employed as the prediction model in this research work. We extract numerous image characteristics from the proposed system, such as wavelet coefficients, edge detection findings, and filter responses. These characteristics contribute to a multidimensional feature space. The capacity of SVM to handle such complicated and high-dimensional data makes it ideal for our needs. The strength of SVM is its capacity to deal with non-linear connections between characteristics and target variables. The connection between image attributes and quality scores in image quality evaluation can be non-linear and complicated. The kernel-based trick of SVM allows it to implicitly translate characteristics to a higher-dimensional space, allowing for the successful modeling of non-linear connections. Moreover, SVM offers high generalization characteristics and is less susceptible to overfitting, both of which are important when constructing a robust and trustworthy BIQA model.
The SVM regression model establishes the relationship between the set of features
F and the quality ratings
Q. A set of training data
D of
N labeled image data is used to train the function of regression
which is defined as,
In this formula,
p is an image sample expressed as a feature vector, and
and
are the initial samples’ Lagrange multipliers and labels for classes, respectively. The kernel function
shifts the feature vector
p to a higher-dimensional feature space, allowing for nonlinear decision boundaries. The radial basis function (RBF) kernel function is utilized in this research paper,
where
denotes the kernel parameter.
The below-mentioned optimization equation is used to learn the regression coefficients
and bias term
w:
C is a hyperparameter in this optimization problem that determines the tradeoff between maximizing the margin and reducing the training error. The optimization issue is tackled iteratively using Sequential Minimal Optimization (SMO). Once trained on the training set, the SVM regression model can be used to predict the quality scores of fresh, unseen image samples using their associated feature vectors and the learnt regression function .
5. Evaluation Metrics
Several measures are used to compare projected quality scores with ground-truth quality scores to evaluate the performance of BIQA models. These measures are used to assess the relationship between projected and actual quality ratings. To evaluate the proposed framework, we have used four of the most commonly used evaluation metrics, which are discussed below.
5.1. Spearman Rank-Order Correlation Coefficient (SROCC)
The monotonic connection between two ranking variables is measured by SROCC. It is used in BIQA to assess the relationship between expected and actual quality ratings. The equation of SROCC is defined as,
where
m is the total number of images in the dataset,
is the projected quality rating rank of the
lth image, and
is the ground truth rating rank of the
lth image. SROCC has a value between
and 1, with 1 indicating an ideal positive association, 0 indicating no correlation, and
indicating a perfect negative association.
5.1.1. Linear Correlation Coefficient (LCC)
The linear connection between two variables is measured by LCC. It is used in BIQA to assess the relationship between expected and actual quality ratings. The following is a definition of LCC,
where
and
are the forecasted and true quality ratings of the
lth image, and
and
are the average of the predicted and ground truth quality scores. LCC has a value between
and 1, with 1 indicating perfect positive correlation, 0 indicating no correlation, and
indicating a perfect negative correlation.
5.1.2. Kendall Rank-Order Correlation Coefficient (KROCC)
KROCC assesses the ordinal relationship between two variables. It is used in BIQA to assess the relationship between expected and actual quality ratings. The equation of KROCC is defined as,
where
m is the total number of images within the database, and a pair of images
is considered to be concordant if their rankings in their predicted and actual scores are the same, and otherwise discordant. KROCC has a value between
and 1, with 1 indicating perfect positive correlation, 0 indicating no correlation, and
indicating a perfect negative correlation.
5.1.3. Root Mean Squared Error (RMSE)
The RMSE gauges the difference between the expected and actual quality ratings. The RMSE is defined as,
M is the total number of test images, is the projected quality rating for image l, and is the actual quality rating for image l. The smaller the RMSE, the greater the model’s performance.
6. Results and Discussion
In this section, comparative analysis is carried out between different features and between existing and proposed methods.
6.1. Performance Comparison of Different Features
The performance of the proposed Blind Image Quality Assessment (BIQA) methodology on three frequently used benchmark datasets is shown in
Table 1. The table displays the results produced by employing various image attributes in the suggested technique. The proposed model employed four types of image features in this study: Haar wavelet coefficients, Prewitt and Sobel edge detection, Log filters, and Prewitt, Sobel, and Gaussian filter combinations.
Table 1 also includes the results of using the Information Gain feature selection algorithm with the WEKA tool on the Prewitt and Gaussian (WEKA PG) and Prewitt, Sobel, and Gaussian (WEKA PSG) features, which achieved a performance improvement over the non-selected features.
6.2. Feature Analysis
Figure 3 depicts the Prewitt and Gaussian (PG) filter feature vectors for four distinct forms of distortion from CSIQ datasets: JPEG compression, white noise, Gaussian blur, and JPEG 2000 compression. Each subfigure depicts the PG feature vector distribution for a distinct type of distortion.
As seen in the image, each form of distortion has a distinct feature vector pattern. In the first subfigure, for example, the feature vector of JPEG compression shows a different pattern when compared with the other forms of distortion. This shows that the suggested technique is capable of collecting and learning the unique characteristics of each form of distortion.
Furthermore, the results show that by extracting and selecting relevant features with the Information Gain Score algorithm and then training the SVM regression model, the proposed framework can accurately predict the quality of distorted images. On widely used benchmark datasets such as TID2013, CSIQ, and LIVE, approach outperforms state-of-the-art methods in terms of accuracy and robustness.
6.3. Comparison with Exisiting Techniques
Table 2 compares the performance of proposed Blind Image Quality Assessment (BIQA) model to that of various leading models, including FRIQUEE, NFERM, BRSIQUE, BLINDS-II, CORNIA, and DIVINE, in terms of LCC, SROCC, KROCC, and RMSE on all three benchmark datasets (CSIQ, LIVE, and TID2013). On all three datasets, proposed model outperformed all other models in terms of all evaluation metrics. On the TID2013 dataset, for example, the proposed model achieved an LCC of 0.9453, an SROCC of 0.9289, a KROCC of 0.7804, and an RMSE of 0.406, all of which are significantly better than the best results reported by the other models. Similarly, using the LIVE and CSIQ datasets, the suggested model consistently outperformed the other models in terms of all assessment measures.
Several factors contribute to the superior performance of the proposed BIQA model. For starters, combining multiple image features such as wavelet transform, Prewitt and Sobel edge detection, and Log filters, as well as selecting them using the Information Gain attribute evaluator, allows for the capture of a wide range of relevant image attributes that are important for quality assessment. This enables the algorithm to successfully discriminate between different forms of image distortion and reliably estimate their quality.
Second, using SVM regression as the underlying machine learning algorithm has several advantages, including the ability to handle high-dimensional feature spaces, non-linear relationships between features and quality scores, and the ability to generalize well to new, previously unseen data.
Furthermore, extensive experimental evaluation and analysis of various types of features and feature selection techniques aid in the identification of the most effective combination of features for BIQA. Overall, the proposed BIQA model provides a promising method for reliably and robustly assessing the quality of distorted images, with potential applications in image reduction, transmission, and restoration.
It is vital to highlight that the use of machine learning techniques, such as the SVM Regression model, provides computational efficiency by definition. ML algorithms are renowned for their capacity to effectively analyze data and generate predictions based on previously learnt patterns. Furthermore, the feature selection phase in the proposed system reduces the complexity of the feature space, enhancing computing efficiency even further.
Certain restrictions and possible downsides must be considered in suggested approach. To begin, the framework’s performance is strongly dependent on the feature extraction techniques used, such as wavelet transformation and edge detection. While these strategies have yielded promising results, their efficacy may vary based on the precise properties of the photos under consideration. Furthermore, the framework’s generalizability to various image types should be examined. Although the system has been evaluated on benchmark datasets, its performance may vary when applied to other image categories or domains.
7. Conclusions
This study presented a system for Blind Image Quality Assessment (BIQA) that integrates feature extraction, feature selection, and support vector machine (SVM) regression. The incorporation of four separate types of image features, as well as the usage of the Information Gain attribute approach for feature selection, contribute to the regression model’s increased performance. Using well-known benchmark datasets such as TID2013, CSIQ, and LIVE, this shows that the proposed framework outperforms previous techniques in terms of accuracy and durability. Furthermore, this study performed extensive tests to investigate the effect of various types of features and feature selection methodologies on the framework’s performance. The system improves the resilience and accuracy of the quality prediction model by including a wide set of image properties. Furthermore, using the Information Gain approach for feature selection minimizes the complexity of the feature space and improves the regression model’s performance. As a whole, the proposed methodology outperforms the competition across many benchmark datasets, the proving its efficacy and robustness in blind image quality evaluation. These findings illustrate the major benefits of the proposed methodology and help to develop the discipline. It is crucial to highlight, however, that the suggested approach has limitations, such as the dependency on certain datasets and the need for future feature extraction technique development. A future study should solve these constraints and investigate the framework’s use in additional image-processing tasks.