Next Article in Journal
Hierarchical Tensor Decomposition of Module Partition for the Mechanical Simulation
Previous Article in Journal
Short-Term Traffic Flow Prediction Based on a K-Nearest Neighbor and Bidirectional Long Short-Term Memory Model
 
 
Article
Peer-Review Record

Improved Image Quality Assessment by Utilizing Pre-Trained Architecture Features with Unified Learning Mechanism

Appl. Sci. 2023, 13(4), 2682; https://doi.org/10.3390/app13042682
by Jihyoung Ryu
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2023, 13(4), 2682; https://doi.org/10.3390/app13042682
Submission received: 16 January 2023 / Revised: 6 February 2023 / Accepted: 13 February 2023 / Published: 19 February 2023

Round 1

Reviewer 1 Report

In order to deal with No-Reference Image Quality Assessment (NR-IQA) problem, this paper proposes a deep learning method which uses Pre-trained CNN model and a mechanism that extract local and non-local features of the input patch. The method is certain novel, but this paper needs to be modified in some directions.

 

1.There are some issues as follows about figures in this paper. The resolution of Figure 2 and Figure 3 is not high enough. It is better to replace the original figures with high resolution ones. Besides, the font in the figures should be unified.

 

2. There are some simple formatting issues in this manuscript. For example, in Section 1, line 28, the author writes two full stops at the end of the sentence.

 

3. In Section 2, Figure 1 is not mentioned in the paper, which confused me to classify the categories of the images. The author would better use this figure to introduce what exactly a distorted image’s category is.

 

4. In the experiment part, the author only uses two figures and one table to demonstrate the results of the experiment in this paper. The author should think about how to analyze the experimental results in a clearer and more comprehensive way. In addition, using more diagrams can better show the results of the experiment.

 

5. There is no training details in this paper. The author should provide a more detailed description of the parameters when training the model such as batch size, learning rate and the loss function. Besides, the author should introduce the experimental environment.

 

6.The proposed method is somewhat novel. Besides, some other methods can be added in the “Introduction” part. I would suggest adding some recent literature in the paper.

(1)   https://doi.org/10.1109/TCSVT.2022.3213592

(2)   https://doi.org/10.1109/TETCI.2022.3205384

Author Response

In order to deal with No-Reference Image Quality Assessment (NR-IQA) problem, this paper proposes a deep learning method which uses Pre-trained CNN model and a mechanism that extract local and non-local features of the input patch. The method is certain novel, but this paper needs to be modified in some directions.

1.There are some issues as follows about figures in this paper. The resolution of Figure 2 and Figure 3 is not high enough. It is better to replace the original figures with high resolution ones. Besides, the font in the figures should be unified.

Response:

Respected Reviewer, we have updated the figures and they are high resolution images with 600dpi resolution.

  1. There are some simple formatting issues in this manuscript. For example, in Section 1, line 28, the author writes two full stops at the end of the sentence.

Response:

Respected Reviewer, we have tried to resolve all formatting issues and moreover we have availed an official proof-reading service.  

  1. In Section 2, Figure 1 is not mentioned in the paper, which confused me to classify the categories of the images. The author would better use this figure to introduce what exactly a distorted image’s category is.

Response:

Respected Reviewer, Thanks for identifying the mistake. We have updated the manuscript. Figure 1 shows six sample images belonging to different distortions categorizing in 25 different distortions adopted to prepare the dataset.

  1. In the experiment part, the author only uses two figures and one table to demonstrate the results of the experiment in this paper. The author should think about how to analyze the experimental results in a clearer and more comprehensive way. In addition, using more diagrams can better show the results of the experiment.

Response:

Respected Reviewer, based on your comment we have used radar graph to have competitive analysis between proposed and existing techniques. Figure 4 shows the radar graph for comparison. The proposed architecture was able to show better spikes than existing techniques. The spinal network is apparently able to give meaningful representation to the extracted feature maps.

  1. There is no training details in this paper. The author should provide a more detailed description of the parameters when training the model such as batch size, learning rate and the loss function. Besides, the author should introduce the experimental environment.

 

Response:

Respected Reviewer, Thanks for identifying the gap. The model is trained for 100 epochs with batch size of 8 for training, validation, and test process. Mean square error is used as a loss function. While Adam optimizer with learning rate of 0.00001 is used as an optimizer. The manuscript is updated with further details.

 

6.The proposed method is somewhat novel. Besides, some other methods can be added in the “Introduction” part. I would suggest adding some recent literature in the paper.

(1)   https://doi.org/10.1109/TCSVT.2022.3213592

(2)   https://doi.org/10.1109/TETCI.2022.3205384

Response:

Respected Reviewer, We have updated the manuscript with the suggested references.  

 

 

Reviewer 2 Report

The author aims to improve no-reference image quality assessment by applying a deep learning framework on a popular training data set, especially Inception-ResNet-V2 architecture. The work looks interesting since image quality evaluation is indeed contraversial sometimes even though it's a classical topic. However, the paper suffers several issues, and the novelty of this work is limited. First, the presentation style should be improved and quite a lot of grammar issues/errors should be addressed, e.g., "be expresses as". Second, the motivation for using the Inception-ResNet-V2 architecture rather than others is not fully clear. Also, does this pre-trained architecture introduce certain bias for assessing the image qualities? How is bias avoided in this method then? Although KADID-10k Dataset is an exceptionally large data set, it may still miss some types of distortions from the real applications. How would this be resolved? In the experiments, the partition of the dataset seems ad-hoc which may need more explanation and details about how those metrics are implemented for this large dataset. Lastly, it would be great to see the comparisons between the proposed assessment tool and the traditional image quality evaluation metrics, such as PSNR, SSIM and so on, when the reference image is provided. 

Author Response

The author aims to improve no-reference image quality assessment by applying a deep learning framework on a popular training data set, especially Inception-ResNet-V2 architecture. The work looks interesting since image quality evaluation is indeed contraversial sometimes even though it's a classical topic. However, the paper suffers several issues, and the novelty of this work is limited. First, the presentation style should be improved and quite a lot of grammar issues/errors should be addressed, e.g., "be expresses as".

Response:

Respected Reviewer, we have tried to resolve all formatting issues and moreover we have availed an official proof-reading service.  

 

Second, the motivation for using the Inception-ResNet-V2 architecture rather than others is not fully clear. Also, does this pre-trained architecture introduce certain bias for assessing the image qualities? How is bias avoided in this method then?

Response:

Respected Reviewer, we have used Inception-ResNet-V2 architecture because the base line paper of KADID-10k used the same architecture. The proposed framework explores the role of spinal network so by adding the spinal network into already existing techniques helps to know how better the spinal network improve the result when being used with any specific architecture. Moreover the pre-trained architectures carried on biasness towards assessing the image qualities as the pre-trained model is trained for object classification while the problem we are intending to solve is regarding regression. So, there is no biasness in the pre-training, the only key useful element of the pre-trained network is that it is trained on a big dataset that has allowed it to have better weights for diverse feature maps.

 

Although KADID-10k Dataset is an exceptionally large data set, it may still miss some types of distortions from the real applications. How would this be resolved?

Response:

Respected Reviewer, KADID-10k has added 25 different distortions to prepare the dataset that are most common distortions from the real applications. In case any other distortion is identified in future so the model can be trained on that distortion and can than be further used in real-time. Further the higher performance of the proposed model on the dataset containing 25 different distortions clearly indicate that the proposed model can perform well in case any of the distortion other than the ones considered in KADID 10k is identified so the proposed model can surely perform well on it.

 

In the experiments, the partition of the dataset seems ad-hoc which may need more explanation and details about how those metrics are implemented for this large dataset.

Response:

Respected Reviewer, we have updated the manuscript with more explanation regarding the experimental procedure being used to carry out the results.

 

Lastly, it would be great to see the comparisons between the proposed assessment tool and the traditional image quality evaluation metrics, such as PSNR, SSIM and so on, when the reference image is provided. 

Response:

Respected Reviewer, in this manuscript we are intending to resolve the problem associated with no-reference image quality assessment while the metrics like PSNR and SSIM are full reference image quality assessment techniques. So, therefore the comparison wont be fair. However, we compared our technique with number of existing no-reference image quality assessment techniques. Further we have now added radar graph to have competitive analysis between proposed and existing techniques. Figure 4 shows the radar graph for comparison. The proposed architecture was able to show better spikes than existing techniques. The spinal network is apparently able to give meaningful representation to the extracted feature maps.

Round 2

Reviewer 2 Report

There are no more comments. 

Back to TopTop