A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception

Yang, Yang; Liu, Chang; Wu, Hui; Yu, Dingguo

doi:10.3390/math12162531

Open AccessArticle

A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception

by

Yang Yang

^†,

Chang Liu

^†,

Hui Wu

^† and

Dingguo Yu

^*,†

College of Media Engineering, Communication University of Zhejiang, Xueyuan Street, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(16), 2531; https://doi.org/10.3390/math12162531

Submission received: 17 July 2024 / Revised: 14 August 2024 / Accepted: 15 August 2024 / Published: 16 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Most image quality assessment (IQA) algorithms based on sparse representation primarily focus on amplitude information, often overlooking the structural composition of images. However, structural composition is closely linked to perceived image quality, a connection that existing methods do not adequately address. To fill this gap, this paper proposes a novel distorted-image quality assessment algorithm based on a sparse structure and subjective perception (IQA-SSSP). This algorithm evaluates the quality of distorted images by measuring the sparse structure similarity between a reference and distorted images. The proposed method has several advantages. First, the sparse structure algorithm operates with reduced computational complexity, leading to faster processing speeds, which makes it suitable for practical applications. Additionally, it efficiently handles large-scale data, further enhancing the assessment process. Experimental results validate the effectiveness of the algorithm, showing that it achieves a high correlation with human visual perception, as reflected in both objective and subjective evaluations. Specifically, the algorithm yielded a Pearson correlation coefficient of 0.929 and a mean squared error of 8.003, demonstrating its robustness and efficiency. By addressing the limitations of existing IQA methods and introducing a more holistic approach, this paper offers new perspectives on IQA. The proposed algorithm not only provides reliable quality assessment results but also closely aligns with human visual experience, thereby enhancing both the objectivity and accuracy of image quality evaluations. This research offers significant theoretical support for the advancement of sparse representation in IQA.

Keywords:

image quality assessment (IQA); sparse structure; subjective perception; sparse representation; human visual system

MSC:

65D18

1. Introduction

The mainstream full-reference image quality assessment (IQA) typically evaluates the quality of distorted images by measuring the differences or similarities between the distorted images and their reference counterparts [1,2]. However, this purely objective approach may not always align with human subjective visual perception [3,4]. To bridge this gap, it is crucial to explore how the physical differences in images can be translated into differences in human visual perception.

In pursuit of this objective, researchers have developed various algorithms designed to simulate the characteristics or mimic the functions of the human visual system. These algorithms aim to create mathematical models that convert physical image differences into perceptual differences as experienced via the human visual system. By simulating how the human eye perceives factors like color, brightness, contrast, and texture, these models provide a more accurate assessment of image quality. For instance, some models account for how the human visual system responds differently in high-contrast versus low-contrast environments [5], or how it is sensitive to details under varying brightness and color conditions [6]. Through these advanced models, researchers can better understand and quantify the impact of physical image differences on human perception, thereby enhancing the accuracy and reliability of IQA. Consequently, the essence of IQA lies in effectively simulating the human visual system to convert physical image differences into perceptual differences that align with human visual experience. The framework of this approach is illustrated in Figure 1.

First, in the realm of full-reference IQA, error sensitivity methods are primarily focused on modeling the characteristics of the front-end visual system, specifically the primary visual processing components of the human eye. However, these methods tend to overlook the critical role of the visual cortex, the primary unit responsible for higher-level visual analysis and processing. Neglecting the visual cortex can lead to suboptimal performance, as it fails to capture the complexity of the human visual system. For instance, [7] introduced an error sensitivity model that evaluates the impact of single packet loss and leverages the spatiotemporal correlation in video to analyze features that directly influence perceived video quality. However, this model is limited to predicting error sensitivity in localized areas. Similarly, [8] demonstrated the feasibility of an objective model for assessing video quality in face recognition tasks, trained and validated on representative image sequences. Yet, this model is not applicable to video sequences used for target recognition tasks, highlighting the challenge of assessing video processing pipelines for both manual and computer vision recognition tasks. In addition, [9] proposed an improved congestion prevention method based on LwIP for embedded machine vision systems, which enhances video control error suppression without compromising system stability.

Second, information theory methods measure the perceptual difference between distorted and reference images by analyzing variations in the amount of information perceived by the observer. These methods attempt to quantify changes in the informational content of images, using it as a metric. However, a significant drawback of these methods is their inability to provide image quality scores within a closed interval, which can lead to inaccuracies in certain IQA scenarios. For example, [10] addressed image contrast enhancement as an optimization problem, employing a new metaheuristic algorithm, the Barnacles Mating Optimizer, to find the optimal solution, though it works optimally only for images using grayscale mapping. Moreover, [11] constructed a novel dataset with labeled and unlabeled real-world images, where their proposed algorithm outperforms state-of-the-art UDC image enhancement methods in terms of accuracy and speed but is limited to ultra-high-definition images [12].

Finally, structural similarity methods operate under the assumption that the human visual system can adaptively extract structural information from images and that this structural information is closely linked to quality perception. These methods use a mathematical definition to establish the concept of image structural similarity, demonstrating its effectiveness in IQA. However, they often lack a detailed discussion of the relationship between these metrics and the human visual system, as well as physiological and psychological support. For example, [13] provided a comprehensive outline of deep learning-based MRI image post-processing methods aimed at improving image quality and correcting artifacts, yet it lacks physiological state support. Additionally, [14] performed a comparative analysis of three specific filters (a non-local mean filter, a median filter, and an adaptive median filter) for denoising under various noise densities, but the study lacks psychological considerations.

This paper is structured as follows: The introduction sets the stage by outlining the research background, identifying key gaps, and detailing the study’s contributions. The Research Design section is divided into two main parts: Dictionary Learning, which includes image grayscale conversion, building and standardizing the training matrix, and developing an overcomplete dictionary; and Quality Assessment, which covers techniques for dividing images into tiles, selecting salient blocks, evaluating sparse structural similarity, and assessing brightness similarity. The Experiment and Comparison section details the design of performance metrics, compares the proposed algorithm’s overall and feature-specific performance, and evaluates algorithm efficiency. Finally, the Conclusion summarizes the study’s findings, discusses their implications, and suggests future research directions. The framework of this paper is shown in Figure 2.

In general, the effective utilization of the visual cortex’s functional characteristics, combined with the advantages of sparse representation models and the concept of a sparse structure, can more accurately simulate the human visual system’s perception of image quality, thereby enhancing the objectivity and reliability of IQA. The contributions of this paper are as follows:

This paper identifies critical gaps in current IQA methods, particularly the neglect of structural composition and the visual cortex’s role in image quality assessment.
This paper proposes a new image quality assessment algorithm based on a sparse structure and subjective perception (IQA-SSSP), which effectively translates physical image differences into perceptual differences aligned with human visual experience.
By integrating insights from visual physiology and psychology, this paper improves the accuracy and objectivity of image quality assessments, offering a more reliable reflection of human perceptual experience.

2. Related Works

In recent years, significant progress has been made in the field of image quality assessment (IQA), with an increasing emphasis on incorporating insights from visual physiology and psychology. This interdisciplinary approach has led to the development of advanced IQA methods that more accurately simulate the human visual system, thereby improving their alignment with subjective human perceptions. Central to these advancements is the application of models based on sparse coding and visual cortex characteristics, which have demonstrated considerable success across various image processing tasks, including face recognition, super-resolution reconstruction, and image fusion. This section reviews the relevant literature, focusing on the integration of visual perception theories and sparse representation models in enhancing IQA performance.

As research in visual physiology and psychology continues to advance, there is an increasing focus on leveraging the characteristics of visual perception to enhance the performance of image quality assessment (IQA) methods. The integration of the latest findings from these fields allows for the development of more accurate simulations of the human visual system, which is essential for improving the objectivity of IQA outcomes. The visual cortex, a crucial component of the human visual system, plays a pivotal role in the recognition and interpretation of visual signals [15,16]. Understanding and effectively utilizing the properties of the visual cortex can significantly enhance the consistency between IQA results and human subjective perceptions.

Recent studies have demonstrated the potential of sparse coding models and visual cortex-based sparse representation models in various image processing tasks. These models have been successfully applied to areas such as face recognition, where they contribute to the accuracy and robustness of recognition systems [17,18]. Additionally, they have shown promise in the domain of super-resolution reconstruction, where joint sparse representation and multi-morphological sparse regularization techniques have been employed to improve the quality and resolution of images, particularly in applications related to remote sensing and other high-resolution imaging tasks [19,20,21].

In the context of IQA, these models are also proving to be valuable. For instance, the use of online convolutional sparse coding has facilitated advancements in image fusion processes, resulting in improved image quality [22]. Moreover, the integration of deep learning and sparse representation approaches has led to the development of more sophisticated blind IQA algorithms, which are capable of evaluating image quality without reference images. Such approaches include methods that incorporate multilevel feature fusion and transformer-based evaluators, which have shown potential in aligning algorithmic assessments more closely with human visual perceptions [23,24].

3. Research Design

The human visual system has about 1 million ganglion cells in the retina and lateral geniculate body and about 50 million nerve cells in the fourth layer of the visual cortex

V 1

area. Therefore, when the bioelectric signal is transmitted to the

V 1

, only a few nerve cells are activated, and most of the others are in an inhibited state. Based on this, sparse coding models can describe the response of visual cortical cells to external visual stimuli [25], and sparse representation models can describe image signals [26]. In sparse representation models, any signal can be represented as a linear combination of several basic atoms, and the set of these basic atoms is called a dictionary. As shown in Figure 3, the signal sig can be decomposed into the product of the dictionary dic and the sparse representation coefficient coe. The orange column vector in the dictionary dic represents the atoms selected to represent the signal sig. The red cells in the sparse representation coefficient coe correspond to the non-zero coefficients of the selected atoms, while the rest are zero.

As can be seen from the figure above, images can be decomposed into the product of the overcomplete dictionary and the sparse representation coefficient through the sparse representation model. When the sparse representation model is used for image representation, the differences between different image quality features are mainly reflected in the sparse representation coefficient.

The sparse representation coefficient information mainly includes two parts: amplitude information and structural information. At present, most of the image quality assessment methods based on sparse representation only use the amplitude information of the sparse representation coefficients while ignoring the structural information. However, when performing image quality analysis, using structural information has two significant advantages over amplitude information: (1) The physiological meaning is clear. Compared with activity levels, differences in cell activation states can more clearly reflect differences in the human visual system’s response to different visual stimuli. (2) Calculation is convenient. When solving the difference in image atomic combinations, using structural information does not require complex numerical multiplication operations, which helps reduce the computational complexity of the algorithm and improve the practicality of the algorithm.

In order to study the ability of sparse representation coefficient structure information to describe the perceived quality of images, this paper defines the sparse structure of images and predicts the quality of distorted images by measuring the similarity of image sparse structures. The main framework of the algorithm is shown in Figure 4, which is divided into two stages: dictionary learning and quality assessment.

In dictionary learning, a large amount of training data is used to generate a dictionary that can effectively capture image features so that it can show good sparse representation capabilities in the subsequent quality evaluation process.

In quality assessment, the quality of the distorted images are comprehensively evaluated by calculating the sparse structure similarity and brightness similarity of the salient areas of the images. First, salient areas containing the main content and key information of the images are extracted from the distorted images and the reference images. Secondly, the over-complete dictionary generated in the dictionary learning is used to perform sparse decomposition of the salient area, and a sparse representation coefficient containing detailed information of the image structure is obtained. Then, the sparse structure similarity and brightness similarity of the distorted images and the reference images are calculated. Finally, the sparse structure similarity and brightness similarity are combined to comprehensively assess the quality of the distorted images.

In summary, this algorithm provides a new IQA by defining the sparse structure of the images and measuring their similarity. Compared with the traditional methods, this method not only considers the amplitude information of the sparse representation coefficients but also introduces structural information. The research methodology is divided into two main phases: dictionary learning and quality assessment. In the dictionary learning phase, the process begins with image grayscale conversion to standardize the input data. Following this, a training matrix is built and standardized to prepare the data for dictionary learning. An overcomplete dictionary is then constructed, which serves as the foundation for sparse representation. In the quality assessment phase, the image is divided into tiles, and highly salient image blocks are selected to focus on the most critical areas. The assessment incorporates sparse structural similarity to measure structural fidelity, alongside brightness similarity to evaluate the consistency of brightness, culminating in a comprehensive quality evaluation.

3.1. Dictionary Learning

In dictionary learning, it is necessary to learn a set of basic structural atoms that can represent natural scene images from the natural scene images training set. The experiment selected the public dataset Landscape-Dataset, which contains more than 7000 different types of natural scenery and landscape photos, covering a wide range of geographical environments and weather conditions. Such diversity makes the dataset highly versatile and representative. The experiment selected 2000 different scene contents as training images, and some of the training images are shown in Figure 5.

Dictionary learning mainly consists of four parts: image grayscale, training matrix construction, standardization, and dictionary training. First, the image is converted from the original RGB to a grayscale image to better extract image features. Next, the image is divided into multiple small blocks, which will serve as the basic unit of dictionary learning. Then, the image blocks are preprocessed, including denoising, normalization, and other operations to improve the quality and consistency of the image. Finally, the dictionary is initialized through the random selection of small blocks in the training image or the use of predefined image blocks as initial atoms to better represent the training image.

3.1.1. Image Grayscale Conversion

This algorithm mainly focuses on the brightness information of an image, and it requires the training image to be converted from the RGB color space to the grayscale space; it can simplify data processing and focus on the brightness characteristics of images [27].

The training image data are read from the RGB color space, the standard color conversion formula is applied, and RGB images are converted into grayscale images. These grayscale images only contain brightness information and remove color information, thereby simplifying the complexity of data processing. This paper stipulates that the value of parameter

α

is 299, the value of parameter

β

is 587, the value of parameter

γ

is 114, and the value of parameter

θ

is 1000, as shown in Equation (1).

i_{g r a y} = \frac{α \cdot i_{r e d} + β \cdot i_{g r e e n} + γ \cdot i_{b l u e}}{θ}

(1)

Among them,

i_{g r a y}

represents the grayscale image of image i,

i_{r e d}

represents the red channel value of image i,

i_{g r e e n}

represents the green channel value of image i, and

i_{b l u e}

represents the blue channel value of image i.

3.1.2. Building Training Matrix and Standardization

The sparse representation model mainly extracts the local structural information of the images, so the training images need to be processed in blocks. This algorithm adopts the random sampling criterion and randomly collects 100 image blocks from each training image.

First, each training image is divided into several small blocks of size

\sqrt{r} \times \sqrt{r}

. Second, 100 image blocks in each training image are randomly selected to ensure that the selected image blocks can represent different areas of the entire image. Then, each collected image block is straightened by column and converted into a one-dimensional vector. Finally, all the straightened image blocks are combined into a training data matrix with r rows and 200,000 columns. Each column of the matrix represents an image block, and each row of the matrix is the brightness value of a pixel position in an image block.

This training data matrix will be used as input data in the dictionary learning stage. Through the learning and optimization process, an over-complete dictionary will be generated that can effectively represent the local features of natural scene images for subsequent sparse representation and image quality evaluation.

The training data are converted into a uniform scale to eliminate the deviation caused by brightness and contrast differences between different image blocks. First, each image block is mean-normalized. By calculating the mean of each image block and subtracting the mean from each pixel value in the image block, the pixel values of each image block are distributed in a relatively concentrated range.

After mean normalization, variance normalization is also required for each image block. By calculating the standard deviation of each image block and dividing each pixel value in the image block by this standard deviation, the pixel values of each image block have the same variance. The standardization process is shown in Equations (2) and (3).

c o l_{x}^{s} = \frac{c o l_{x} - \frac{\sum_{y = 1}^{r} c o l_{x y}}{r}}{\sqrt{\sum_{y = 1}^{r} c o l_{x y}^{2}}}

(2)

c o l_{x} \in M^{r}

(3)

Among them, x represents the x-th column in the matrix,

c o l_{x}^{s}

represents the standardized vector,

c o l_{x}

represents the unstandardized vector, y represents the y-th row in the matrix, r represents the number of rows in the matrix,

c o l_{x y}

represents the y-th element in the vector

c o l_{x}

, and

M^{r}

represents the training data matrix.

Through the above standardization processing steps, the training data can be converted to a uniform scale so that different image blocks have consistency in brightness and contrast.

3.1.3. Overcomplete Dictionary

Through an iterative refinement of the initial dictionary and the updating of the sparse representation coefficients, a dictionary capable of effectively capturing the local structure of the image can be learned from the standardized training data matrix [28]. The process begins with the construction of an initial dictionary derived directly from the standardized training data. This initial dictionary is typically a coarse approximation, but it serves as the foundation for subsequent optimization.

During the iterative optimization process, each atom in the dictionary undergoes continuous adjustment. The goal is to minimize the reconstruction error associated with the current sparse representation, ensuring that the dictionary becomes increasingly tailored to the underlying structure of the data. As each iteration progresses, the dictionary becomes more precise, and the accuracy of the sparse representation improves correspondingly. This iterative refinement continues until the changes in the dictionary stabilize, indicating convergence, or until the process reaches a predefined number of iterations.

The overall dictionary training process involves the updating of both the dictionary and the sparse representation coefficients. These equations detail the optimization steps taken to iteratively enhance the dictionary’s ability to represent the image data accurately, and they are mathematically formalized in Equations (4)–(6).

D i c = a r g min_{D i c, c o e} ∥ c o l^{s} - D i c \cdot c o e ∥_{2}^{2} + p u n {∥ c o e ∥}_{1}

(4)

∥ D i c_{i} ∥_{2}^{2} \leq 1, i = 1, 2, \dots, c

(5)

D i c \in M^{r \cdot c}

(6)

In these equations, several key variables and symbols are used to describe the dictionary learning process. Specifically,

D i c

denotes the dictionary, which is a collection of atoms or basis vectors that represent the local structures within the image data. The term

a r g, m i n, ∥ \cdot ∥

refers to the process of finding the value of the independent variable that minimizes the norm, thereby minimizing the error in the reconstruction process. The variable

c o e

stands for the sparse representation coefficient, which indicates the degree to which each atom in the dictionary contributes to the representation of the data.

The symbol

c o l^{s}

represents the standardized training data matrix, which contains the image data that have been preprocessed so that they have consistent statistical properties, such as their mean and variance. This matrix serves as the input from which the dictionary is learned. The parameter

p u n

refers to the sparsity penalty, a regularization term that encourages sparsity in the representation, meaning that only a few atoms from the dictionary are used to represent any given data point. The term

{∥ \cdot ∥}_{1}

denotes the L1 norm of the vector, which is commonly used in sparse coding to enforce this sparsity by summing the absolute values of the coefficients. Finally,

M^{r \cdot c}

represents a matrix with r rows and c columns, providing a general framework for the data structures involved.

This iterative updating method leverages the full extent of the information embedded within the standardized training data matrix, allowing the dictionary to become finely tuned to the inherent characteristics of natural scene images. By iteratively optimizing both the dictionary and the sparse representation coefficients, the learned dictionary becomes increasingly adept at capturing and representing the essential features of the image data, leading to more accurate and efficient representations.

3.2. Quality Assessment

This algorithm evaluates the similarity between distorted and reference images by considering both sparse structure similarity and image brightness similarity. Sparse structure similarity: Initially, the algorithm employs a sparse representation model to decompose both the distorted and reference images into the product of sparse representation coefficients and a dictionary. By comparing the sparse representation coefficients of the two images, the algorithm calculates the sparse structure similarity. This metric assesses how closely the local structural features of the distorted image match those of the reference image. Brightness similarity: Subsequently, the images are converted from the RGB color space to grayscale. This conversion simplifies the images and allows for the extraction of brightness information. The algorithm then calculates the brightness similarity by comparing the extracted brightness levels of the distorted and reference images. This step evaluates the similarity in overall brightness between the two images.

After obtaining both the sparse structure similarity and the brightness similarity, the algorithm combines these metrics into a comprehensive similarity index. The combination of these two similarities is achieved through a reasonable weighting scheme, which reflects their relative importance. This comprehensive similarity index represents the overall fidelity of the distorted image in relation to the reference image.

The quality assessment of the distorted images is based on this comprehensive similarity index. A high index value indicates a high degree of fidelity between the distorted and reference images, suggesting minimal distortion and good image quality. Conversely, a low index value signifies poor image quality, as it reflects a greater degree of distortion.

3.2.1. Dividing Tiles and Selecting Highly Salient Image Blocks

In this experiment, the reference images were defined as

i^{r e f}

, which was converted into a grayscale image as

i_{g r e y}^{r e f}

. The distorted image was defined as

i^{d i s}

, which was converted into a grayscale image as

i_{g r e y}^{d i s}

, as shown in Equations (7) and (8).

i_{g r e y}^{r e f} = \frac{299 i_{r e d}^{r e f} + 587 i_{g r e e n}^{r e f} + 114 i_{b l u e}^{r e f}}{1000}

(7)

i_{g r e y}^{d i s} = \frac{299 i_{r e d}^{d i s} + 587 i_{g r e e n}^{d i s} + 114 i_{b l u e}^{d i s}}{1000}

(8)

This algorithm adopts a random sampling strategy to randomly collect multiple image blocks from each training image for dictionary construction. In order to ensure that no important image information is lost, the algorithm adopts a more accurate window sampling strategy [29]. This strategy divides the entire image into non-overlapping image blocks of size

\sqrt{r} \times \sqrt{r}

to ensure that all image areas are taken into account. Each image is precisely divided into several image blocks that can not only accurately evaluate the image quality of each local area but also comprehensively judge the quality level of the entire distorted image through these local evaluation results.

Traditional structural-similarity algorithms calculate the overall quality of an image by using all image blocks. However, using only some of the image blocks with high salience to assess the overall quality of the image can effectively improve the performance of the algorithm [30,31]. In order to reduce computational complexity while ensuring algorithm performance, this paper proposes a method to estimate the salience of image blocks based on image contrast, and it calculates the overall quality score of an image by selecting high-salience image blocks.

First, this algorithm uses image variance to measure the contrast of all reference image blocks. Image blocks with high contrast usually have large variance. All image blocks are sorted from large to small, according to variance, and an image block screening threshold,

t h r

, is set. This threshold is used to screen out image blocks with high contrast, thereby reducing the amount of calculation while retaining the most important part for quality evaluation. After screening, the selected reference image blocks and distorted image blocks are straightened by column, respectively, and combined into reference image matrix

c o l_{r e f} \in M^{r \times s u m}

and distorted image matrix

c o l_{d i c} \in M^{r \times s u m}

, where

s u m

represents the number of selected image blocks. This method not only reduces the computational complexity but also more accurately reflects the perception of image quality via the human visual system.

3.2.2. Sparse Structural Similarity

When the human visual system is stimulated via visual stimulation, only some neurons in the primary visual cortex are significantly activated [32]. If the reference image and the distorted image activate the same number of neurons in the visual cortex, the visual perception they cause will be closer. In the sparse representation model, each atom in the dictionary can be regarded as a neuron in the visual cortex. When the atoms used to represent the distorted image and the reference image are more similar, the quality perception of the distorted image will be closer to that of the reference image; the similarity of two signals is measured as shown in Figure 6.

In the figure above, the sparse representation model is used to decompose the signal so that the combination of dictionary atoms can reflect the activation state of neurons in the visual cortex. By comparing the sparse representation coefficients of the distorted image and the reference image, the similarity of the two images in activating neurons in the visual cortex can be calculated. Among them, the reference signal can be represented as a combination of atoms

(3, 6, 8, 10, 11)

in the dictionary, and the distorted signal can be represented as a combination of atoms

(3, 7, 8, 10, 11)

in the dictionary. The sparse structure similarity of the distorted signal to the reference signal is 80%.

The reference image matrix

c o l_{r e f} \in M^{r \times s u m}

and distorted image matrix

c o l_{d i c} \in M^{r \times s u m}

first use the pre-trained dictionary

d i c

to calculate the sparse representation coefficients respectively, as shown in Equations (9) and (10).

min_{c o e_{i}^{r e f}} ∥ c o l_{i}^{r e f} - d i c \cdot c o e_{i}^{r e f} ∥_{2}, {∥ c o e_{i}^{r e f} ∥}_{0} \leq l i m, i = 1, 2, \dots, s u m

(9)

min_{c o e_{i}^{d i s}} ∥ c o l_{i}^{d i s} - d i c \cdot c o e_{i}^{d i s} ∥_{2}, {∥ c o e_{i}^{d i s} ∥}_{0} \leq l i m, i = 1, 2, \dots, s u m

(10)

Among them,

c o l_{i}^{r e f}

represents the i-th column in the reference image matrix

c o l^{r e f}

,

c o l_{i}^{d i s}

represents the i-th column in the distorted image matrix

c o l^{d i s}

,

c o e_{i}^{r e f}

represents the sparse representation coefficient corresponding to

c o l_{i}^{r e f}

,

c o e^{d i s}

represents the sparse representation coefficient corresponding to

c o l^{d i s}

,

l i m

represents the sparsity constraint,

{∥ \cdot ∥}_{2}

represents the norm of the vector

l i m_{2}

, and

{∥ \cdot ∥}_{0}

represents the norm of the vector

l i m_{0}

.

The sparse structure similarity of each pair of ”reference-distortion” image blocks is calculated using the sparse representation coefficients

c o e^{r e f} = [c o e_{1}^{r e f}, c o e_{2}^{r e f}, \dots, c o e_{s u m}^{r e f}]

and

c o e^{d i s} = [c o e_{1}^{d i s}, c o e_{2}^{d i s}, \dots, c o e_{s u m}^{d i s}]

, as shown in Equation (11).

s i m_{i} = \frac{∥ c o e_{i}^{r e f} \cdot c o e_{i}^{d i s} ∥_{0} + c}{∥ c o e_{i}^{r e f} ∥_{0} + c}, i = 1, 2, \dots, s u m

(11)

Among them,

s i m_{i}

represents the sparse structural similarity of the i-th pair of “reference-distortion” image blocks, and c represents a constant to maintain computational stability.

This algorithm uses the average pooling strategy to summarize the sparse structure similarity calculation results of all high-salience image blocks and then average them. The average pooling strategy not only simplifies the calculation process but also effectively smoothes local errors and improves the stability and reliability of the overall evaluation [33], as shown in Equation (12).

S i m (c o l^{r e f}, c o l^{d i s}) = \frac{\sum_{i = 1}^{s u m} s i m_{i}}{s u m}, i = 1, 2, \dots, s u m

(12)

Among them,

S i m (\cdot)

represents the overall sparse structural similarity of the distorted image. This paper stipulates that the similarity is bounded, that is,

S i m (c o l^{r e f}, c o l^{d i s}) \in [0, 1]

, and takes the value of 1 if and only if the distorted image is exactly the same as the reference image.

3.2.3. Brightness Similarity and Quality Evaluation

The amplitude information of the sparse representation coefficient is ignored when the sparse structure similarity is calculated, which makes the sparse structure similarity feature unable to accurately reflect the subtle differences in small-scale brightness changes in the image, leading to a certain degree of information loss and evaluation bias. This algorithm further proposes to use image brightness similarity as a supplement.

Image brightness similarity evaluates the quality of an image by directly comparing the difference in brightness information between the distorted image and the reference image. By combining sparse structure similarity and brightness similarity, the algorithm can more comprehensively reflect the overall quality of the image. By calculating the similarity between the two images in brightness, the image brightness similarity index is obtained. Finally, the sparse structure similarity and brightness similarity are combined to form a comprehensive similarity index for evaluating the quality of the distorted image. The brightness similarity between the reference image matrix

c o l_{r e f} \in M^{r \times s u m}

and the distorted image matrix

c o l_{d i c} \in M^{r \times s u m}

is shown in Equations (13)–(15).

S i m_{L} (c o l_{r e f}, c o l_{d i c}) = \frac{\sum_{i = 1}^{s u m} ((c o l_{i}^{r e f} - a v g (c o l_{r e f})) \cdot (c o l_{i}^{d i s} - a v g (c o l_{d i s}))) + c}{\sqrt{\sum_{i = 1}^{s u m} ({(c o l_{i}^{r e f} - a v g (c o l_{r e f}))}^{2} \cdot \sum_{i = 1}^{s u m} {(c o l_{i}^{d i s} - a v g (c o l_{d i s}))}^{2}} + c}, i = 1, 2, \dots, s u m

(13)

a v g (c o l_{r e f}) = \frac{\sum_{j = 1}^{r} e l e_{i j}^{r e f}}{r}

(14)

a v g (c o l_{d i s}) = \frac{\sum_{j = 1}^{r} e l e_{i j}^{d i s}}{r}

(15)

Among them,

S i m_{L} (\cdot)

represents the brightness similarity between the reference image and the distorted image,

a v g (\cdot)

represents the column-wise averaging of the matrix, c represents a constant to maintain computational stability,

e l e_{i j}^{r e f}

represents the j-th element in the i-th column of the reference image matrix, and

e l e_{i j}^{d i s}

represents the j-th element in the i-th column of the distorted image matrix.

Through the above calculations, this paper obtains the image sparse structure similarity

S i m (c o l_{r e f}, c o l_{d i s})

and image brightness similarity

S i m_{L} (c o l_{r e f}, c o l_{d i s})

, and it can derive the overall similarity quality between the distorted image and the reference image, as shown in Equation (16).

Q u a l i t y (c o l_{r e f}, c o l_{d i s}) = α \cdot S i m (c o l_{r e f}, c o l_{d i s}) + β \cdot S i m_{L} (c o l_{r e f}, c o l_{d i s}), α + β = 1

(16)

Among them,

α

and

β

are parameters for adjusting the sparse structure similarity and brightness similarity.

4. Experiment and Comparison

To assess the effectiveness and practicality of the IQA-SSSP proposed in this paper, we conducted a series of comparative experiments using the natural scene IQA Landscape-Dataset. This dataset was selected to evaluate the proposed method against a range of existing algorithms. The comparison included methods based on both manually extracted image features and deep learning techniques. Specifically, we compared the proposed method with manually extracted feature-based algorithms such as the ECGQA algorithm by [34], the DLBF algorithm by [35], and the FDD algorithm by [36]. Additionally, we included deep learning-based methods, including the CELL algorithm by [37], the PDAA algorithm by [38], and the CFFA algorithm by [39].

To thoroughly examine the performance of the proposed IQA-SSSP method relative to these comparison algorithms, we employed five key evaluation metrics. These metrics include the Pearson linear correlation coefficient (PLCC) to measure accuracy, the Spearman rank correlation coefficient (SRCC), and the Kendall rank correlation coefficient (KRCC) to assess monotonicity, as well as the mean absolute error (MAE) and root mean square error (RMSE) to evaluate consistency. Each metric provides insights into different aspects of algorithm performance, such as how well the predicted scores align with subjective assessments and the consistency of the scoring.

To ensure a fair comparison across different algorithms, we used a logistic regression function to map all image quality scores to a unified scale. This normalization step allowed us to compare scores within the same range, facilitating a consistent and accurate evaluation of the proposed method against its counterparts. Through this comprehensive experimental approach, the effectiveness of the IQA-SSSP method was rigorously tested and validated.

4.1. Algorithm Performance Index Design

The PLCC is used to evaluate the linear correlation between the predicted score and the subjective score, the SRCC and the KRCC are used to evaluate the consistency between the algorithm output score and the subjective score ranking, the MAE is used to evaluate the average value of the algorithm prediction error, and the RMSE is used to evaluate the square root mean of the prediction error.

4.1.1. PLCC

The calculation method for the Pearson linear correlation coefficient (PLCC) in this experiment is as follows. The PLCC measures the linear correlation between the predicted image quality scores and the subjective quality scores provided by human observers, as detailed in Equation (17) [40].

P L C C = \frac{\sum_{i = 1}^{n} (q u a_{i} - \bar{q u a}) (e v a_{i} - \bar{e v a})}{\sqrt{\sum_{i = 1}^{n} {(q u a_{i} - \bar{q u a})}^{2} \sum_{i = 1}^{n} {(e v a_{i} - \bar{e v a})}^{2}}}

(17)

Among them,

q u a_{i}

represents the quality label of the i-th test image,

\bar{q u a}

represents the mean of the quality labels of all test images,

e v a_{i}

represents the objective quality evaluation score of the i-th test image,

\bar{e v a}

represents the mean of the objective quality evaluation scores of all test images, and n represents the total number of test images. The closer the value of PLCC is to 1, the stronger the linear relationship between the predicted score and the subjective score.

4.1.2. SRCC

The calculation method for the Spearman rank correlation coefficient (SRCC) in this experiment is as follows. SRCC measures the monotonic relationship between predicted image quality scores and subjective quality scores by comparing their ranks, as provided in Equation (18) [41].

S R C C = 1 - \frac{6 \sum_{i = 1}^{n} {(N_{q u a_{i}} - N_{e v a_{i}})}^{2}}{n (n^{2} - 1)}

(18)

Among them,

N_{q u a_{i}}

represents the serial number of the i-th test image after the quality labels of all test images are sorted from small to large, and

N_{e v a_{i}}

represents the serial number of the i-th test image after the objective quality evaluation scores of all test images are sorted from small to large. The SRCC reflects the ability of the algorithm to maintain the order of scores by calculating the grade difference between the predicted score and the actual score. An SRCC value of 1 indicates that the predicted score is exactly the same as the order of the subjective score, which can effectively evaluate the prediction monotonicity of the algorithm in IQA.

4.1.3. KRCC

The calculation method for the Kendall rank correlation coefficient (KRCC) in this experiment is as follows. KRCC evaluates the concordance between the rankings of predicted image quality scores and subjective quality scores, as detailed in Equation (19) [42].

K R C C = \frac{2 (C_{p a r i s} - D_{p a r i s})}{n (n - 1)}

(19)

Among them,

C_{p a r i s}

represents the number of identical pairs between the sorted test image quality labels and the objective quality evaluation scores, and

D_{p a r i s}

represents the number of different pairs. KRCC determines whether the sorting relationship of two variables is similar by comparing the sorting consistency between sample pairs. A KRCC value of 1 indicates a completely consistent sorting.

4.1.4. RMSE

The calculation method for the root mean square error (RMSE) in this experiment is as follows. RMSE measures the average magnitude of the differences between predicted image quality scores and subjective quality scores, providing an indication of the prediction accuracy, as outlined in Equation (20) [43].

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(q u a_{i} - e v a_{i})}^{2}}{n}}

(20)

The RMSE reflects the degree of deviation of the predicted value by calculating the error between the predicted value and the actual value. The smaller the RMSE value, the closer the predicted result of the algorithm is to the actual result, and the better the prediction consistency.

4.1.5. MAE

The calculation method for the mean absolute error (MAE) in this experiment is as follows. MAE measures the average magnitude of the errors between predicted image quality scores and subjective quality scores, reflecting the accuracy of the predictions, as provided in Equation (21) [44].

M A E = \frac{\sum_{i = 1}^{n} | q u a_{i} - e v a_{i} |}{n}

(21)

Among them, the MAE measures the prediction consistency of the algorithm by calculating the average of the absolute errors between the algorithm’s predicted values and the actual values. A lower MAE value indicates that the algorithm has higher prediction consistency and accuracy.

The PLCC, the RMSE, and the MAE are all influenced by the range of objective quality evaluation scores. Since different image quality assessment algorithms produce objective quality scores with varying ranges, it becomes challenging to directly compare these metrics across different algorithms. This variation in score ranges can lead to misleading conclusions if the scores are not normalized to a common scale.

To address this issue, this paper employs a five-parameter logistic regression equation for nonlinear mapping [45]. This approach introduces five parameters to transform the objective quality scores from each algorithm into a unified range. Through the application of this transformation, the scores are standardized, allowing for a direct and meaningful comparison of performance metrics between different algorithms. The normalization process helps eliminate discrepancies caused by varying score ranges and ensures that the evaluation indicators are comparable across algorithms. The specific logistic regression formula used for this transformation is presented in Equation (22).

S_{q u a} = α_{1} (\frac{1}{2} - \frac{1}{1 + e^{α_{2} (S_{q u a}^{o} - α_{3})}}) + α_{4} \cdot S_{q u a}^{o} + α_{5}

(22)

Among them,

S_{q u a}

represents the quality score after mapping, which is used to verify the algorithm’s performance.

α_{1}, α_{2}, α_{3}, α_{4}, α_{5}

represent the five parameters that need to be introduced, and

S_{q u a}^{o}

represents the objective quality evaluation score.

4.2. Overall Performance Comparison

This experiment tests the overall performance of this algorithm and various comparison algorithms on the Landscape-Dataset. Table 1 summarizes the evaluation results, covering the scores of various evaluation indicators.

In order to clearly show the comparison results of various algorithms, this paper specifically shows in boldface the assessment results of the algorithm with the best performance among all evaluation indicators. The experimental results show that this algorithm can effectively measure the quality changes of natural scene images, and the image evaluation performance of this algorithm is stable and has a strong generalization ability.

In our experiments conducted on the Landscape-Dataset, we performed a comparative analysis of the proposed IQA-SSSP algorithm against several established benchmark algorithms, including ECGQA, DLBF, FDD, CELL, PDAA, and CFFA. This comparison was designed to assess the performance of IQA-SSSP across a range of evaluation metrics. The results revealed that the CFFA algorithm achieved exceptional performance in terms of the Pearson linear correlation coefficient (PLCC) and Kendall’s rank correlation coefficient (KRCC). This strong performance can be attributed to CFFA’s extensive training with a large and diverse dataset of natural images, which has optimized its capability of handling such datasets effectively. However, the IQA-SSSP algorithm demonstrated superior performance across all other metrics evaluated. This includes metrics beyond PLCC and KRCC, highlighting its ability to deliver consistent and reliable assessments of image quality across various dimensions. The results suggest that IQA-SSSP provides a more balanced and robust approach compared to the other algorithms in the study. Its overall effectiveness and reliability in image quality assessment are underscored by its performance across multiple metrics, making it a promising tool for comprehensive image quality evaluations.

In order to intuitively demonstrate the overall performance of this algorithm and the comparison algorithm on the data set, this paper presents some scatter plots between objective evaluation values and subjective evaluation values. These scatter plots can intuitively reflect the performance of different algorithms in image quality evaluation. In the scatter plot, each point represents an image in the database, its horizontal axis represents the normalized objective quality evaluation value, and the vertical axis represents the normalized subjective quality evaluation value, as shown in Figure 7.

The red line in the figure represents the logistic function fitting curve, which is used to assess the relationship between objective evaluation values and subjective evaluation values. A better monotonicity of the fitting curve indicates higher consistency in ranking via the algorithm, meaning that the rankings produced via the algorithm align well with the subjective quality judgments. Additionally, when the scatter points are closely aligned with the fitting curve, they reflect higher accuracy in the algorithm’s quality evaluation.

The experimental results reveal that the points in the scatter plot for the IQA-SSSP algorithm are highly concentrated and predominantly lie near the fitting curve. This suggests that the objective evaluation values provided via the IQA-SSSP algorithm closely match the subjective perception of image quality, indicating that the algorithm effectively reflects human visual system responses. Moreover, the strong concentration of points in the scatter plot demonstrates that the IQA-SSSP algorithm exhibits stable performance across the Landscape-Dataset, further confirming its reliability and robustness in predicting image quality.

4.3. Image Feature Comparison

This experiment verified the influence of image features and some parameters on the performance of IQA-SSSP, and it combined sparse structure similarity and brightness similarity to jointly evaluate the quality of distorted images.

4.3.1. Based on Similarity Comparison

In order to test the impact of the two image similarities on IQA-SSSP performance, this experiment used sparse structure similarity and brightness similarity to measure image quality. The method using only sparse structure similarity is denoted as IQA-SSSP (S), and the method using only brightness similarity is denoted as IQA-SSSP (L). The experimental results compared the performance differences of the three methods in image quality evaluation so that the contribution of the two similarity features in the comprehensive evaluation could be analyzed. Table 2 shows the performance comparison results.

From the above table, the following conclusions can be drawn: Firstly, the method utilizing sparse structural similarity alone demonstrated better performance compared to the method using brightness similarity alone. This suggests that the sparse structural similarity feature proposed in this paper effectively captures changes in image quality, with notable advantages in detailing image structure and preserving intricate features. Secondly, the incorporation of brightness similarity proves to be complementary to sparse structural similarity, enhancing its overall performance. The algorithm combining both features achieves the highest evaluation performance, indicating that brightness similarity successfully compensates for the limitations of sparse structural similarity in handling brightness variations. This combination provides a more robust and comprehensive assessment of image quality by addressing both structural and brightness-related aspects.

In the similarity comparison experiment, the proposed IQA-SSSP algorithm demonstrated superior performance across several key metrics, including PLCC, SRCC, KRCC, MAE, and RMSE, when compared to both IQA-SSSP(S) and IQA-SSSP(L). Notably, the IQA-SSSP algorithm excelled particularly in the KRCC metric, underscoring its exceptional performance in assessing structural similarity.

This standout performance in KRCC indicates that the IQA-SSSP algorithm is highly effective at capturing and reflecting structural similarities between images. The algorithm’s ability to outperform its counterparts in various evaluation metrics highlights its overall robustness and accuracy. The significant improvement in KRCC demonstrates that the proposed method provides a more precise and reliable measure of image quality, aligning closely with human visual perception and setting a new standard for structural similarity assessment.

4.3.2. Based on Image Block Comparison

The image block size is an important parameter that affects the performance of the sparse representation model. In order to verify the impact of the image block size on the performance of this algorithm, this section designed multiple sets of comparative experiments, gradually increasing the image block size from 4 × 4 to 12 × 12 and recording and analyzing the performance changes of the algorithm under different image block sizes. The experimental results are shown in Table 3.

From the table, it is evident that the algorithm performs optimally when the image block size is set to 8 × 8. This indicates that an 8 × 8 block size provides the most effective balance for capturing image quality within the sparse representation model. The results demonstrate that this block size offers the best capability for describing image quality by efficiently capturing sufficient structural details and maintaining a manageable level of computational complexity. In contrast, when the image block size is either smaller or larger than 8 × 8, the algorithm’s performance gradually declines. Smaller blocks fail to capture adequate structural information, leading to a less accurate depiction of image quality. On the other hand, larger blocks increase computational complexity and may introduce redundant information, which negatively impacts performance. Therefore, the chosen block size of 8 × 8 is optimal, as it achieves the best trade-off between descriptive accuracy and computational efficiency, ensuring both effective image quality assessment and practical processing requirements.

In the experiment analyzing image block size, the 8 × 8 image block configuration proposed in this paper proved to be the most effective. This block size strikes an optimal balance, avoiding both data redundancy and a loss of precision. Specifically, smaller blocks such as 4 × 4 tend to introduce redundant data, while larger blocks like 12 × 12 may result in decreased accuracy due to the larger area being analyzed. The 8 × 8 block size minimizes redundant information and maintains high data precision, thereby enhancing the overall performance of the image quality assessment. The experimental results support the choice of this block size, as it effectively captures the relevant details without compromising the accuracy of the evaluation. Consequently, 8 × 8 has been established as the optimal block size for the image quality assessment experiments conducted in this study.

4.3.3. Comparison Based on High-Salience Image Thresholds

To investigate the impact of the threshold used for selecting high-saliency image blocks on algorithm performance, this experiment varied the threshold T by setting it to the 20–80% percentiles of the variance values of all image blocks, sorted from largest to smallest. This approach allows for a comprehensive evaluation of how different threshold settings affect the accuracy and stability of the image quality assessment.

The experimental results, as shown in Table 4, revealed that the algorithm performs optimally when the threshold T is set to the 60% percentile of the variance values. This choice means that the first 60% of image blocks with the highest variance are used for image quality evaluation. This setting strikes an effective balance between computational complexity and evaluation accuracy. Using a threshold that includes the top 60% of high-variance image blocks ensures that the algorithm focuses on the most informative regions of the image while avoiding the excessive computational burden and potential inaccuracies associated with including too many low-variance blocks. Consequently, setting the threshold T to the 60% percentile of variance values provides a practical compromise that enhances both the performance and efficiency of the algorithm, making it suitable for real-world applications.

4.4. Algorithm Efficiency Comparison

The algorithm running time is an important indicator for measuring the practicality of the image quality evaluation algorithm. This experiment compared the running time of this algorithm with that of the comparison algorithm, and the experimental results are shown in Table 5.

The experimental results demonstrate that this algorithm achieved a shorter running time compared to all other algorithms, with the exception of ECGQA, while still maintaining high evaluation performance. This improved efficiency can be largely attributed to the algorithm’s effective optimization strategies in both sparse representation and brightness similarity calculations. By carefully refining and optimizing these processes, the algorithm reduces computational overhead and accelerates processing times without compromising the quality of the evaluation. The optimization strategies include streamlined procedures for calculating sparse representations and brightness similarities, which minimize redundant computations and enhance the overall processing speed. These refinements allow the algorithm to handle large datasets and complex image quality assessments more efficiently. As a result, the algorithm provides a practical balance between rapid execution and robust performance, making it a valuable tool for real-world applications where both speed and accuracy are critical.

Furthermore, the algorithm reduces computational demands by selectively evaluating high-saliency image blocks. This targeted approach ensures that only the most relevant image regions are processed, minimizing unnecessary calculations and further streamlining the evaluation process. The combined effect of these optimizations not only accelerates the algorithm’s performance but also maintains its accuracy, making it a competitive choice for practical image quality assessment tasks.

5. Conclusions

The innovation of the IQA-SSSP algorithm proposed in this paper lies in its approach to evaluating the quality gap between distorted and reference images by leveraging a sparse structure and subjective perception. This method translates the physical differences between images into a perceptual quality gap, mapping the input visual stimulus (image) to the visual cortex response signal (sparse representation coefficient). Traditional image quality assessment methods based on sparse representation models often focus solely on the amplitude information of sparse coefficients, neglecting their structural context. In contrast, IQA-SSSP introduces the concept of sparse structure similarity, which addresses this gap by incorporating structural information. This is complemented by integrating image brightness similarity to address the limitations of sparse structure similarity in capturing brightness variations.

The experimental results strongly support the effectiveness of the proposed approach. Specifically, the IQA-SSSP algorithm consistently outperformed the other evaluated methods across multiple metrics, including the PLCC, SRCC, KRCC, MAE, and RMSE. This performance demonstrates its ability to provide a more balanced and robust assessment of image quality, aligning closely with human subjective perceptions. The results indicate that the algorithm excels in capturing both structural and brightness-related aspects of image quality, leading to a comprehensive and accurate evaluation. Additionally, the algorithm’s lower computational complexity compared to other mainstream evaluation algorithms underscores its practical efficiency, making it a competitive choice for real-world image quality assessment tasks. Overall, the findings confirm that the IQA-SSSP algorithm offers a significant advancement in image quality evaluation by effectively combining sparse structure and brightness similarity metrics, ensuring both high accuracy and computational efficiency.

Author Contributions

Conceptualization, Y.Y. and D.Y.; methodology, Y.Y.; software, C.L.; validation, Y.Y. and H.W.; formal analysis, C.L.; investigation, D.Y.; resources, H.W.; data curation, C.L.; writing—original draft preparation, Y.Y.; writing—review and editing, D.Y.; visualization, Y.Y.; supervision, D.Y.; project administration, D.Y.; funding acquisition, D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the national social science fund of China (grant no. 22BSH025), the national natural science foundation of China (grant no. 62206241), the key research and development program of Zhejiang Province, China (grant no. 2021C03138), and the medium and long-term science and technology plan for radio, television, and online audiovisuals (grant no. 2022AD0400).

Data Availability Statement

This paper uses the public dataset Landscape-Dataset; all readers can access it for free on GitHub, https://github.com/koishi70/Landscape-Dataset, accessed on 10 January 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IQA	Image quality assessment
IQA-SSSP	Novel distorted image quality assessment algorithm based on a sparse structure and subjective perception
UDC	Under-display camera
PLCC	Pearson linear correlation coefficient
SRCC	Spearman rank correlation coefficient
KRCC	Kendall rank correlation coefficient
MAE	Mean absolute error
RMSE	Root mean square error

References

Xian, W.; Chen, B.; Fang, B.; Guo, K.; Liu, J.; Shi, Y.; Wei, X. Effects of Different Full-Reference Quality Assessment Metrics in End-to-End Deep Video Coding. Electronics 2023, 12, 3036. [Google Scholar] [CrossRef]
Meynet, G.; Nehmé, Y.; Digne, J.; Lavoué, G. PCQM: A full-reference quality metric for colored 3D point clouds. In Proceedings of the 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), Athlone, Ireland, 26–28 May 2020; Volume 1, pp. 1–6. [Google Scholar]
Duanmu, Z.; Liu, W.; Wang, Z.; Wang, Z. Quantifying visual image quality: A bayesian view. Annu. Rev. Vis. Sci. 2021, 7, 437–464. [Google Scholar] [CrossRef]
Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; Volume 1, pp. 5148–5157. [Google Scholar]
Vinke, L.N.; Bloem, I.M.; Ling, S. Saturating nonlinearities of contrast response in human visual cortex. J. Neurosci. 2022, 42, 1292–1302. [Google Scholar] [CrossRef]
Van Der Kooi, C.J.; Stavenga, D.G.; Arikawa, K.; Belušič, G.; Kelber, A. Evolution of insect color vision: From spectral sensitivity to visual ecology. Annu. Rev. Entomol. 2021, 66, 435–461. [Google Scholar] [CrossRef] [PubMed]
Ma, R.; Li, T.; Bo, D.; Wu, Q.; An, P. Error sensitivity model based on spatial and temporal features. Multimed. Tools Appl. 2020, 79, 31913–31930. [Google Scholar] [CrossRef]
Leszczuk, M.; Janowski, L.; Nawała, J.; Boev, A. Objective video quality assessment method for face recognition tasks. Electronics 2022, 11, 1167. [Google Scholar] [CrossRef]
Song, Z.; Yao, J.; Hao, H. Design and implementation of video processing controller for pipeline robot based on embedded machine vision. Neural Comput. Appl. 2022, 34, 2707–2718. [Google Scholar] [CrossRef]
Ahmed, S.; Ghosh, K.K.; Bera, S.K.; Schwenker, F.; Sarkar, R. Gray level image contrast enhancement using barnacles mating optimizer. IEEE Access 2020, 8, 169196–169214. [Google Scholar] [CrossRef]
Luo, J.; Ren, W.; Wang, T.; Li, C.; Cao, X. Under-display camera image enhancement via cascaded curve estimation. IEEE Trans. Image Process. 2022, 31, 4856–4868. [Google Scholar] [CrossRef]
Conde, M.V.; Vasluianu, F.; Vazquez-Corral, J.; Timofte, R. Perceptual image enhancement for smartphone real-time applications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; Volume 1, pp. 1848–1858. [Google Scholar]
Chen, Z.; Pawar, K.; Ekanayake, M.; Pain, C.; Zhong, S.; Egan, G.F. Deep learning for image enhancement and correction in magnetic resonance imaging—State-of-the-art and challenges. J. Digit. Imaging 2023, 36, 204–230. [Google Scholar] [CrossRef]
Ashraf, R.; Nisha, R.; Shamim, F.; Shams, S. Cutting through the Noise: A Three-Way Comparison of Median, Adaptive Median, and Non-Local Means Filter for MRI Images. Sir Syed Univ. Res. J. Eng. Technol. 2024, 14, 1–6. [Google Scholar] [CrossRef]
White, A.L.; Kay, K.N.; Tang, K.A.; Yeatman, J.D. Engaging in word recognition elicits highly specific modulations in visual cortex. Curr. Biol. 2023, 33, 1308–1320. [Google Scholar] [CrossRef] [PubMed]
Ayzenberg, V.; Simmons, C.; Behrmann, M. Temporal asymmetries and interactions between dorsal and ventral visual pathways during object recognition. Cereb. Cortex Commun. 2023, 4, 3–15. [Google Scholar] [CrossRef] [PubMed]
Liu, F.; Chen, D.; Wang, F.; Li, Z.; Xu, F. Deep learning based single sample face recognition: A survey. Artif. Intell. Rev. 2023, 56, 2723–2748. [Google Scholar] [CrossRef]
Pan, Y.; Lan, T.; Xu, C.; Zhang, C.; Feng, Z. Recent advances via convolutional sparse representation model for pixel-level image fusion. Multimed. Tools Appl. 2024, 83, 52899–52930. [Google Scholar] [CrossRef]
Deka, B.; Mullah, H.U.; Barman, T.; Datta, S. Joint sparse representation-based single image super-resolution for remote sensing applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2352–2365. [Google Scholar] [CrossRef]
Zhang, J.; Tang, J.; Feng, X. Multi Morphological Sparse Regularized Image Super-Resolution Reconstruction Based on Machine Learning Algorithm. IAENG Int. J. Appl. Math. 2023, 53, 1–8. [Google Scholar]
Ismail, I.; Eltoukhy, M.M.; Eltaweel, G. Super-Resolution Based on Curvelet Transform and Sparse Representation. Comput. Syst. Sci. Eng. 2023, 45, 167–181. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, Z.; Feng, Z. Image fusion using online convolutional sparse coding. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 13559–13570. [Google Scholar] [CrossRef]
Shi, Y.; Xia, W.; Wang, G.; Mou, X. Blind ct image quality assessment using ddpm-derived content and transformer-based evaluator. IEEE Trans. Med. Imaging 2024, 1, 1–15. [Google Scholar] [CrossRef]
Lan, X.; Zhou, M.; Xu, X.; Wei, X.; Liao, X.; Pu, H.; Shang, Z. Multilevel feature fusion for end-to-end blind image quality assessment. IEEE Trans. Broadcast. 2023, 69, 801–811. [Google Scholar] [CrossRef]
Sarfraz, F.; Arani, E.; Zonooz, B. Sparse coding in a dual memory system for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 9714–9722. [Google Scholar]
Gkillas, A.; Ampeliotis, D.; Berberidis, K. Connections between deep equilibrium and sparse representation models with application to hyperspectral image denoising. IEEE Trans. Image Process. 2023, 32, 1513–1528. [Google Scholar] [CrossRef] [PubMed]
Kamal, S.T.; Hosny, K.M.; Elgindy, T.M.; Darwish, M.M.; Fouda, M.M. A new image encryption algorithm for grey and color medical images. IEEE Access 2021, 9, 37855–37865. [Google Scholar] [CrossRef]
Li, C.; Zhu, D.; Wu, C.; Du, B.; Zhang, L. Global Overcomplete Dictionary-Based Sparse and Nonnegative Collaborative Representation for Hyperspectral Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 1, 1–14. [Google Scholar] [CrossRef]
Nguyen, V.L.; Shaker, M.H.; Hüllermeier, E. How to measure uncertainty in uncertainty sampling for active learning. Mach. Learn. 2022, 111, 89–122. [Google Scholar] [CrossRef]
Patel, Y.; Appalaraju, S.; Manmatha, R. Saliency driven perceptual image compression. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; Volume 1, pp. 227–236. [Google Scholar]
Zhang, P.; Liu, W.; Zeng, Y.; Lei, Y.; Lu, H. Looking for the detail and context devils: High-resolution salient object detection. IEEE Trans. Image Process. 2021, 30, 3204–3216. [Google Scholar] [CrossRef] [PubMed]
Oldenburg, I.A.; Hendricks, W.D.; Handy, G.; Shamardani, K.; Bounds, H.A.; Doiron, B.; Adesnik, H. The logic of recurrent circuits in the primary visual cortex. Nat. Neurosci. 2024, 27, 137–147. [Google Scholar] [CrossRef] [PubMed]
Malla, P.P.; Sahu, S.; Alutaibi, A.I. Classification of tumor in brain MR images using deep convolutional neural network and global average pooling. Processes 2023, 11, 679. [Google Scholar] [CrossRef]
Liu, G.; Han, X.; Tian, L.; Zhou, W.; Liu, H. ECG quality assessment based on hand-crafted statistics and deep-learned S-transform spectrogram features. Comput. Methods Programs Biomed. 2021, 208, 106269–106277. [Google Scholar] [CrossRef]
Mahum, R.; Aladhadh, S. Skin lesion detection using hand-crafted and DL-based features fusion and LSTM. Diagnostics 2022, 12, 2974. [Google Scholar] [CrossRef]
Chen, L.; Rottensteiner, F.; Heipke, C. Feature detection and description for image matching: From hand-crafted design to deep learning. Geo-Spat. Inf. Sci. 2021, 24, 58–74. [Google Scholar] [CrossRef]
Rasheed, M.T.; Shi, D.; Khan, H. A comprehensive experiment-based review of low-light image enhancement methods and benchmarking low-light image quality assessment. Signal Process. 2023, 204, 108821–108832. [Google Scholar] [CrossRef]
Valicharla, S.K.; Li, X.; Greenleaf, J.; Turcotte, R.; Hayes, C.; Park, Y.L. Precision detection and assessment of ash death and decline caused by the emerald ash borer using drones and deep learning. Plants 2023, 12, 798. [Google Scholar] [CrossRef]
König, M.; Seeböck, P.; Gerendas, B.S.; Mylonas, G.; Winklhofer, R.; Dimakopoulou, I.; Schmidt-Erfurth, U.M. Quality assessment of colour fundus and fluorescein angiography images using deep learning. Br. J. Ophthalmol. 2024, 108, 98–104. [Google Scholar] [CrossRef]
Song, H.Y.; Park, S. An Analysis of Correlation between Personality and Visiting Place using Spearman’s Rank Correlation Coefficient. KSII Trans. Internet Inf. Syst. 2020, 14, 1102–1113. [Google Scholar]
Li, H.; Cao, Y.; Su, L. Pythagorean fuzzy multi-criteria decision-making approach based on Spearman rank correlation coefficient. Soft Comput. 2022, 26, 3001–3012. [Google Scholar] [CrossRef]
Chen, J.; Yang, M.; Gong, W.; Yu, Y. Multi-neighborhood guided Kendall rank correlation coefficient for feature matching. IEEE Trans. Multimed. 2022, 25, 7113–7127. [Google Scholar] [CrossRef]
Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 24, 5481–5487. [Google Scholar] [CrossRef]
Robeson, S.M. Decomposition of the mean absolute error (MAE) into systematic and unsystematic components. PLoS ONE 2023, 18, e0279774. [Google Scholar] [CrossRef]
Li, Z.; Pang, S.; Qu, H.; Lian, W. Logistic regression prediction models and key influencing factors analysis of diabetes based on algorithm design. Neural Comput. Appl. 2023, 35, 25249–25261. [Google Scholar] [CrossRef]

Figure 1. Image quality assessment framework.

Figure 2. Framework of this paper.

Figure 3. Sparse representation of the signal.

Figure 4. Algorithm structure.

Figure 5. Partial training images.

Figure 6. Measuring the similarity of two signals.

Figure 7. Subjective and objective quality scores.

Table 1. Assessment results of all images in the Landscape-Dataset.

Algorithm	PLCC	SRCC	KRCC	MAE	RMSE
ECGQA	0.863	0.817	0.684	10.564	12.786
DLBF	0.901	0.923	0.783	9.478	9.378
FDD	0.887	0.922	0.846	9.736	9.678
CELL	0.921	0.917	0.868	8.746	9.536
PDAA	0.917	0.967	0.894	9.278	8.746
CFFA	0.929	0.936	0.916	8.367	8.947
IQA-SSSP	0.929	0.970	0.901	7.994	8.003

Table 2. Performance comparison of IQA-SSSP(S), IQA-SSSP(L), and IQA-SSSP.

Algorithm	PLCC	SRCC	KRCC	MAE	RMSE
IQA-SSSP(S)	0.917	0.913	0.855	8.378	9.101
IQA-SSSP(L)	0.901	0.923	0.783	9.478	9.378
IQA-SSSP	0.929	0.970	0.901	7.994	8.003

Table 3. The impact of image block size on the IQA-SSSP.

Block Size	PLCC	SRCC	KRCC	MAE	RMSE
4 × 4	0.897	0.943	0.798	8.477	9.342
6 × 6	0.877	0.932	0.748	9.109	9.482
8 × 8	0.929	0.970	0.961	7.994	8.003
10 × 10	0.902	0.963	0.933	8.955	9.400
12 × 12	0.914	0.939	0.946	8.573	9.501

Table 4. The impact of image block selection threshold T.

Threshold T	PLCC	SRCC	KRCC	MAE	RMSE
20%	0.902	0.939	0.877	9.101	10.110
40%	0.924	0.950	0.895	9.722	9.997
60%	0.929	0.970	0.961	7.994	8.003
80%	0.919	0.922	0.955	10.128	8.962

Table 5. Running time of the algorithm.

Algorithm	Running Time (s)	Algorithm	Running Time (s)
ECGQA	0.0365	PDAA	3.8745
DLBF	0.0762	CFFA	1.6725
FDD	0.1763	CELL	0.1688
IQA-SSSP	0.0392

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Liu, C.; Wu, H.; Yu, D. A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception. Mathematics 2024, 12, 2531. https://doi.org/10.3390/math12162531

AMA Style

Yang Y, Liu C, Wu H, Yu D. A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception. Mathematics. 2024; 12(16):2531. https://doi.org/10.3390/math12162531

Chicago/Turabian Style

Yang, Yang, Chang Liu, Hui Wu, and Dingguo Yu. 2024. "A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception" Mathematics 12, no. 16: 2531. https://doi.org/10.3390/math12162531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Distorted-Image Quality Assessment Algorithm Based on a Sparse Structure and Subjective Perception

Abstract

1. Introduction

2. Related Works

3. Research Design

3.1. Dictionary Learning

3.1.1. Image Grayscale Conversion

3.1.2. Building Training Matrix and Standardization

3.1.3. Overcomplete Dictionary

3.2. Quality Assessment

3.2.1. Dividing Tiles and Selecting Highly Salient Image Blocks

3.2.2. Sparse Structural Similarity

3.2.3. Brightness Similarity and Quality Evaluation

4. Experiment and Comparison

4.1. Algorithm Performance Index Design

4.1.1. PLCC

4.1.2. SRCC

4.1.3. KRCC

4.1.4. RMSE

4.1.5. MAE

4.2. Overall Performance Comparison

4.3. Image Feature Comparison

4.3.1. Based on Similarity Comparison

4.3.2. Based on Image Block Comparison

4.3.3. Comparison Based on High-Salience Image Thresholds

4.4. Algorithm Efficiency Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI