Next Article in Journal
Machine Learning in Electronic and Biomedical Engineering
Previous Article in Journal
Spectrum Sensing Based on STFT-ImpResNet for Cognitive Radio
 
 
Article
Peer-Review Record

Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping

Electronics 2022, 11(15), 2436; https://doi.org/10.3390/electronics11152436
by Jiyun Park 1,2 and Byung Cheol Song 2,*
Reviewer 2:
Reviewer 3:
Electronics 2022, 11(15), 2436; https://doi.org/10.3390/electronics11152436
Submission received: 28 June 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 4 August 2022
(This article belongs to the Topic Computer Vision and Image Processing)

Round 1

Reviewer 1 Report

The article can be accepted in its present form. 

Author Response

Many thanks to your kind review. 

Reviewer 2 Report

It is interesting to generate a stack of differently exposed low dynamic range (LDR) images from a single LDR image using the concept of cycle consistency in [14]. However, intrinsic properties of differently exposed LDR images are ignored in the current version. Please consider improving the current version by taking the following comments into consideration:

1) Due to possible color distortion and noise amplification, it is challenging to restore under-exposed and over-exposed regions of an LDR by using the concept of cycle consistency. Unfortunately, there are no details on how the under-exposed and over-exposed regions are addressed in this submission. Please provide more details in the revised version. 

2) Besides generating an HDR image as in this paper, a stack of multiple differently exposed images can also be fused together as in the following paper:

Zheng et al., Single image brightening via multi-scale exposure fusion with hybrid learning, IEEE Trans. on Circuits and Systems for Video Technology, 2021.

Please highlight the pros and cons of the proposed algorithm with respect to the algorithm in the above paper.

3) The proposed algorithm can be extended to study the exposure interpolation problem which was studied in the following paper:

Yang et al., Multi-scale fusion of two large-exposure-ratio images, IEEE Signal Processing Letters,  2018.

Please discuss the application in the future works. 

Author Response

Many thanks to the reviewer's kind comments.

I enclose the responses to reviewer's comments. 

Author Response File: Author Response.pdf

Reviewer 3 Report

 

  1. The abstract section is missing more details regarding the proposed method and novelties. E.g., “The proposed method enables very efficient learning in terms of memory.”, how was this achieved?
  2. What exactly is the idea behind the “light-weight iTM network”? Please highlight the general concept proposed here.
  3. Please avoid the use of “[…] in spite of […]”.
  4. “HDR-VDP Q score”, the abstract section is not the correct place to introduce a metric. Please highlight why the achieve results are better.
  5. I would recommend a total revision of the abstract section. Please clearly highlight the proposed novelties and why this manuscript archives better results.
  6. Keywords section: please use a comma (“,”) to separate the keywords. “light-weight”? Missing a noun?
  7. The introduction section is missing an introduction. The way is written now looks more like a literature review section. Please add one or two paragraphs where the authors can first explain the problem and why it’s important to research for a solution. Please provide a motivation for this manuscript.
  8. [25-29], [19-21], then [1-7]. The authors propose a very strange order for the list of references. Always add a space between the last word and the added citation (“proposed[25-29]”).
  9. Usually the abbreviation introduced in the abstraction section must be reintroduced in the introduction section.
  10. “[…] than conventional massive-sized networks.” When using such expression, the authors must also first demonstrate that the SOTA methods use deep neural networks. Also, please defined the concept of “massive-size” network.
  11. “The proposed light-weight network has only a tiny size of 1/100 compared to the SOTA network.” This is not a good example of professional academic writing. Please introduced the concept of “tiny size”. Which SOTA network more exactly?
  12. “[…] without unnecessary waste of memory.” Please first demonstrate that there is a waste of memory use before claiming this.
  13. The two bullets in the main contributions are not sufficient. Please clarify what exactly is the novelty introduced in this manuscript. Please don’t use general statements.
  14. “Section 2 describes the related work to understand this paper.” That is not the purpose of the related work section. A background section can be used to understand the problem, but the purpose of the related work section is to prove that the authors have reviewed all the state-of-the-art methods in this domain and they now propose a new method based on their observations; a method different that the state-of-the-art and based on novel ideas. Please clarify what are those novel ideas proposed in this manuscript!
  15. “[…] proves the superiority of the proposed method […]”. The authors tend to use very pompous statements.
  16. “SingleHDR [7] dissected the LDR generation process”. Please explain the “dissection” process (?). Again, there is a general problem of writing a professional academic research manuscript.
  17. “However, the previous indirect iTM methods did not achieve satisfactory HDR reconstruction performance in spite of considerable parameter sizes.” Please defined satisfactory HDR reconstruction performance. Please avoid using “in spite of”.
  18. Section 3 is ambiguous. Please use an introduction to the section to explain the main concepts of the proposed method and the explain in turn each concept. The section is missing an explanation on why a cyclic learning-based solution must be employed and not a different approach.
  19. Line 151-161. Maybe a mathematical explanation might clarify the design decisions. It feels like there is no logical connection between the ideas.
  20. Figure 3 caption: please define the “phenomenon that can occur during re-estimation”. What exactly show one notice in the image? The zoom-in images in the right are showing what exactly? The authors are missing some explanations.
  21. Figure 4 is blurry.
  22. Lines 169-173, please define all abbreviations before usage.
  23. Section 3.2. The removal of the weight normalization layer cannot be considered as a contribution. Indeed, it’s removal will heavily reduce the training and inference runtime; however, the authors should already know the importance of using such layers. Please provide a detail argumentation on why the removal of this layer will not affect the network convergence and how the proposed architecture avoids over-fitting.
  24. Section 4.1, please provide more details regarding the training dataset. Please provide a short explanation on why the beta_2 parameters were modified compared with the default specifications.
  25. “Batch size and epoch were set to 2 and 10, respectively.” So, the batch size is set to 2, but what does “epoch = 10” means? The number of training epochs? How many images were used for training, validation, and testing?
  26. In Table 1, the numerical values of PSNR show that the results in general are very bad. Such low values prove that there is quite a lot of distortion present in the images. Please provide an argumentation regarding this issue.
  27. Figure 6. In general, the best two results are show one next to each other. It’s not clear to be if the color-coded rectangle presentation represents a better way of presenting the visual results. One can simply add again the reference names under each rectangle. The role of such an image is to show zoom-in areas of an image where one can clearly notice some differences. Since the rectangles are just slightly enlarged, it’s hard to notice which method provides better results. Please provide also the numerical results for these images.
  28. Similar problems in Figure 7: the visual results are not well presented. What exactly is the purpose of the figure?
  29. Table 2 is hard to read. It’s not clear what exactly are the numerical results of the anchor version and of the different variations.
  30. The experimental section is missing a time complexity section. If the architecture is so “tiny”, how fast is it compared with the state-of-the-art designs?
  31. The Appendix B section is a bit long and contains too many visual results, i.e., it represents around 31% of the entire manuscript.
  32. The Reference section is missing a lot of information and is not written according to the journal specifications.

 

General problems:

  1. It’s not clear what is the proposed novelty.
  2. There are many algorithmic details missing.
  3. The manuscript quality must be highly improved. There are many parts that must be further polished or simply rewritten in a profession academic way.
  4. It feels like the article is a bit short and it was expanded with appendix sections.

 

 

Author Response

Many thanks to the kind comments.

I enclose the responses to reviewers' comments. 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Please check your conclusion part carefully.

Author Response

We attached the point-by-point response to the reviewer’s comments. 

Author Response File: Author Response.pdf

Reviewer 3 Report

 

  1.  My comments 1 and 2 were not answered. Please further polish the abstract section.
  2. Answer to my comment 18. The added paragraph remains ambiguous. Please add an explanation on why you need to train 4 networks and use only 2. How does the proposed method managed to train only 2 networks?
  3. Answer to my comment 4. Please revise the entire sentence, not only part of it. (“[…] and as […] and […]).
  4. Answer to my comment 5. Please further revise the abstract section.
  5. Answer to my comment 7. The answer is not enough. The introduction section is missing an introduction. Please revise the section.
  6. My comment 10 was not answered. Please avoid using the expression “massive size”.
  7. Answer to my comment 20. Please revise the caption. “with the most naïve way”? The sub-captions should be “. (left) ” text. “(right)” text.
  8. Answer to my comment 23. Why not try to add in the manuscript the answer?
  9. Answer to my comment 25. Please clarify this in the manuscript. That is why I am rising this issue.
  10. Answer to my comment 26. Please add a few sentences in the manuscript regrading this issue.
  11. My comment 29 was not answered. Please add the difference to the right of the Q score and update the head of the table for that column with “(gain)” or “(difference)”
  12. My comment 30 was not answered. Please add a discussion in the manuscript regrading this issue.
  13. Answer to my comment 31. Please revise the two appendix and avoid using a multi-page figures as they are hard to read. There is still a weak motivation for adding them.

As shown above there are many, many issues that were not properly addressed.

The initial quality of the manuscript was very poor. The current quality must be further improved.

 

Author Response

We attached the point-by-point response to the reviewer’s comments. 

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

The authors modified the manuscript and improved the quality of the presentation. Although, there are still a few typos present in the manuscript, e.g., the punctuation after each equation.

After quite an intensive effort to review this manuscript, I can now recommend its acceptance.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

This paper proposes new cyclic learning based on cycle consistency to improve the performance of multi-EVs stack generation. The proposed method enables very efficient learning in terms of memory. In addition, this paper presents a lightweight iTM network that dramatically reduces the massive network sizes of the existing networks. The proposed lightweight network requires only a small parameter size of 1/100 compared to the state-of-the-art (SOTA) method. The lightweight network contributes to the practical use of iTM. Experimental results show that the proposed method achieves SOTA performance in terms of HDR-VDP Q score, i.e., a representative metric of evaluating HDR reconstruction performance, and is qualitatively comparable to conventional indirect iTM methods. The idea proposed is quite interesting, but I can find some flaws that need to be rectified.

  1. The Introduction section can be elaborated to represent the motivations behind this research.
  2. The organization of the paper is missing and needs to be included at the end of the introduction section.
  3. The Related works section can be materials and methods as it elaborates the significant methods used in the proposed system namely, direct vs indirect and cycle consistency.
  4. Moreover, the related works need to be included by explaining the major problems identified in the existing state of art systems and also Table can be given as a summary of related works.
  5. There is no separate proposed architecture section and it should be defined. The existing section 3 Methods need to be refined into a readable form.
  6. The tricks behind the proposed system need to be elaborated with detailed steps or Pseudocode. It should deliver a mathematical expression but no such pseudocode I found in this article.
  7. Section 4.1 can be named as an experimental setup.
  8. Why the mentioned references in Table 1 have been taken for comparisons needs to be explained.
  9. More recent and relevant references need to be included.
  10. What are the inferences and significance of Figures 6- 8?
  11. The abstract and conclusion need greater improvement and should highlight the significance of the proposed system with the metrics and future works.
  12. The spelling and grammar need your attention.

Overall it can be a good article once undergoes a major revision.

Author Response

I enclose the replies to the kind reviewers' comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

RELATED WORK:

The related work is very brief without citing classic and seminal works. Furthermore, there are many deep learning methods missing.

NTIRE 2021 challenge and papers should be cited.

METHOD:

-The luminance compensation module has a very brief explanation. What is the justification for why it should work?

-The training using cycle for inverse tone mapping has already emerged in the literature. For example in this paper:

https://arxiv.org/abs/2202.05522

TRAINING:

-Training time is not reported. How long does it take to train? How many epochs?

-The computer machine is not reported.

-How long does it take at inference time by varying image resolution?

-How were training/evaluation/test sets divided?

EVALUATION:

- It is not clear if the authors used scene-referred or display-referred image values in the evaluation. Display-referred image values should be used because scene-referred values need calibration. Furthermore, the black and the peak values of the monitor should be reported.

-Table 1 should report the first author's name because only the reference makes it difficult for the reader to understand the comparisons.

-The use of a tone mapping operator (Reinhard or Kim+Kautz) for evaluating PSNR and SSIM is not sound because they introduce distortions. This evaluation should be performed using PU encoding:

https://github.com/gfxdisp/pu21/

Please check these papers on how to run a proper evaluation:
https://arxiv.org/abs/2108.08713

https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr-benchmark/

-Visual comparisons are tone mapped making it difficult to judge. The authors should show a slice of information (gamma + exposure) without tone mapping.

-Are images shown results from which set? Training? Evaluation? Test?

 


MISSING REFERENCES:

Important state of the art is missing; these paper needs to be added in the comparisons:

https://marcelsan.github.io/SIGGRAPH2020/

https://github.com/alex04072000/SingleHDR

The seminal work on inverse tone mapping is missing:

@inproceedings{DBLP:conf/graphite/BanterleLDC06, author = {Francesco Banterle and Patrick Ledda and Kurt Debattista and Alan Chalmers}, editor = {Y. T. Lee and Siti Mariyam Hj. Shamsuddin and Diego Gutierrez and Norhaida Mohd. Suaib}, title = {Inverse tone mapping}, booktitle = {Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia 2006, Kuala Lumpur, Malaysia, November 29 - December 2, 2006}, pages = {349--356}, publisher = {{ACM}}, year = {2006}, url = {https://doi.org/10.1145/1174429.1174489}, doi = {10.1145/1174429.1174489}, }

Classic methods should be cited, please check/cite references from:

@article{DBLP:journals/cgf/BanterleDAPMLC09, author = {Francesco Banterle and Kurt Debattista and Alessandro Artusi and Sumanta N. Pattanaik and Karol Myszkowski and Patrick Ledda and Alan Chalmers}, title = {High Dynamic Range Imaging and Low Dynamic Range Expansion for Generating {HDR} Content}, journal = {Comput. Graph. Forum}, volume = {28}, number = {8}, pages = {2343--2367}, year = {2009}, url = {https://doi.org/10.1111/j.1467-8659.2009.01541.x}, doi = {10.1111/j.1467-8659.2009.01541.x}, }

 

 

 

Author Response

I enclose the replies to the kind reviewers' comments as a pdf file.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

In this round of revision, the authors have addressed all the queries and can be considered for publication.

Author Response

Many thanks again for the reviewer's kind comments. We think the quality of the paper has improved a lot with your help.

Reviewer 2 Report

The changes compared to previous version are minimal.

The evaluation is still the same as the previous one; which is not acceptable.

The paper does not cite/discuss this paper that proposes a similar idea: https://arxiv.org/abs/2202.05522

Author Response

First of all, many thanks to the reviewer's careful comments. We are sorry for some points (especially evaluation) that the last revision did not satisfy the reviewer. However, please understand that the authors have done their best to follow the global standard, that is, the evaluation method of papers in their field. 

1. The paper does not cite/discuss this paper that proposes a similar idea:https://arxiv.org/abs/2202.05522.

Answer) In the introduction section of the revised paper, the following paragraph ha been added along with the citation of the suggested article, i.e., https://arxiv.org/abs/2202.05522.

“In addition, [30] proposed to generate HDR images using SDR video sequence information. What all the techniques mentioned so far have in common is that information on various EV images is required to generate an HDR image from a single LDR. We agree on the importance of multi-EVs stack generation as in previous studies. However, the network size to create a multi-EVs stack from a single LDR must be a problem to be solved…”

 

2. The use of a tone mapping operator (Reinhard or Kim+Kautz) for evaluating PSNR and SSIM is not sound because they introduce distortions. This evaluation should be performed using PU encoding: https://github.com/gfxdisp/pu21/

Answer) Most of the related papers or works adopt the tone-mapping operator when they use PSNR and SSIM as quantitative evaluation indicators of iTM. So, we are doing this in the same way for a fair comparison with the previous works. Of course, if more papers disclose their codes in the future, we can use PSNR and SSIM through PU encoding, which may be our further work. 

 

3. Visual comparisons are tone mapped making it difficult to judge. The authors should show a slice of information (gamma + exposure) without tone mapping.

Answer) Actually, the gamma value used for tone mapping was set to 2.2, and we indicated this in the revised manuscript. Also, in section 4.2, we added the following sentence. 

“…In addition, PSNRs and SSIMs of the images tone mapped(γ=2.2) by [9] and [22] were compared…”

In addition, note that we obtained images having EVs of -3,-2,-1,+1,+2,+3 from the EV0 LDR image defined as “the image with the most evenly distributed histogram of the images of the stack” in deep chain HDRI [5]. Then, seven images with exposure values of -3, -2, -1, 0, 1, 2, 3 were merged into an HDR image using the debevec’s method [8] in the OpenCV.

Round 3

Reviewer 2 Report

The evaluation is still using PSNR and SSIM in the tone mapped domain. Although some papers use it, this does not mean it is correct. This is WRONG because it introduces a lot of distortions.

PU-encoding is greatly avaialable with source in different places since 2008:

https://resources.mpi-inf.mpg.de/hdr/fulldr_extension/
https://github.com/gfxdisp/pu21

Another issue with the paper is that all images showed do not keep the ratio of the original ones.

Regarding visual quality, it is not clear why the method should provide an advantage. This is not clear from Figure 7 (where ground-truth is missing).

Text: it is not clear if images shown in the visual comparisons are from the test set or not.

Author Response

  1. PU encoding issue: Because quantitative results of many techniques are not disclosed, it is practically difficult to grasp the overall trend through PU encoding. Nevertheless, we downloaded the experimental results of singleHDR which is available only now, and then tried PU encoding. The PSNR and SSIM results were derived as follows.

 

PSNR (mean/std)

 SSIM (mean/std)

 Ours

39.13 / 4.50

0.8560 / 0.1294 

 SingleHDR

 36.23 / 6.03

0.7773 / 0.1896 

 

2. Aspect ratio & resolution issue:  In the proposed method as well as in existing methods, all images have 8-bit resolution. Also, as in previous papers, there is only a marginal tolerance to the extent that the difference in spatial resolution does not affect the evaluation of HDR performance.

 

3. Ground-truth issue: For the LDR stack included in the HDR-eye dataset, the EV gap mentioned in the paper does not correspond to 1. So, GT in Fig. 7 doesn't exist.

4. Test set issue: All images shown in this paper belong to the test set. 

Back to TopTop