Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Parallel Image Completion with Edge and Color Map

Appl. Sci. 2019, 9(18), 3856; https://doi.org/10.3390/app9183856

by Dan Zhao, Baolong Guo^* and Yunyi Yan

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2019, 9(18), 3856; https://doi.org/10.3390/app9183856

Submission received: 5 August 2019 / Revised: 25 August 2019 / Accepted: 10 September 2019 / Published: 13 September 2019

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

I have read the paper and I have enjoyed it much more than standard NN or CNN paper. The authors do not blindly attempt to fill up a network with neurons and layers with no clear understanding of what or why. In here, at least authors attempt to create some interpretation of the image, which can further provide some insights on the way in which things work and, eventually, why they don't work. I liked the so-called "edge path" and "color path", so as to feed the final network with more than just the image. However, the paper also shows some of the problems of (C-)NN papers, which I would like to see corrected. I detail them further, but render in the absence of science here. We have the goal, the references, the notation and the experiments. What is it, development here? People seems to forgot to do theory and to think over experiments. There is no questioning. No hypothesis to test. No corner-filling questions. Just batch experiments. No surprising findings. Just reporting. All these papers will be gone with the wind as soon as another network with more neurons comes up. And that's gonna be next year. Some papers wil survive. Take an example: Chen, Ding, Chin, Marculescu, ICDMW 2019. Not my favorite paper. I see a bunch of flaws in it, with all due respect to the authors. But a paper that intends to go deep on NNs. To produce knowledge. A paramount example of this, is "Deep Neural Networks Are Easily Fooled", by Nguyen, Yosinski and Clune. Those papers will last. Those not simply stating "I tried this and it worked". This said, I recall, the paper at least attempts to make something other than a plug-and-play NN, and that's to be appreciated Before detailing some comments, I would like to stress that I do not understand why this paper fits the scope of the journal. If it were addressed at specific tasks (e.g. customizing the paper for better use in bioimagery, or medical imagery), I would understand why it is submitted here. However, it does not. So, avoiding the big question on whether NNs are science or engineering (a tool to fix), I still hardly consider this Applied Sciences. It's certainly a methodological paper on how to configure an existing tool for a context-generic task. I do understand answering this question belongs to the editor, but I wanted to stress this point. - I would like to see the conditions under which the proposed network does not work. We've seen recently papers on precisely that. For exaple, for image classification we see the images which can fool the network, specifically seeking the cases in which the "confidence" of the labelling is high. I expect the same. Otherwise, it is a super complex network of weights which work in a list of images displayed. - These papers have little or no mathematics. This one is over average, specially due to color and edge representation. But ultra-expanding maths (as in Eq (1)) is not too good. Those lines could be reduces to "M is a mask with M(x,y)=1 represents a missing pixel". By the way, if "k" is the pixel, the notation "k_x, k_y" is way more intuitive than "x_k, y_k". If it's a matter of personal preference, ok. But it's more complex to see. - Tables as Table 1 are the problem in NN papers. Ok, thee are the configurations. And? I mean, why? What do they mean? It feels like me cooking, I just put what I think it's gonna work and repeat forever if successful. Because I have no idea on cooking. So why those kernels? Is it robust to changes? Are there configurations which don't work? Is it the fastest you could find with good results? Is it worth using larger operators? Smaller operators maybe? I never find these answers, nor these questions, in the NN papers. By the way, this continues over again in experimental section - I dislike authors using hand-made sub-subsections, as in pages around 10. A paper should be an article, not a list of itemizes titled by hand-made bold-faced text. Please remove all that text in boldface. And write prose so we can read it at once, not part-by-part.

Author Response

Dear reviewer:

Thank you for your comments on our manuscript entitled "Parallel Image Completion with Edge and Color Map" (Manuscript ID: applsci-578335). Those comments are very helpful for revising and improving our paper, as well as the important guiding significance to other research. We have studied the comments carefully and made corrections which we hope meet with approval. All the revisions are highlighted in blue in the revised manuscript. The responds to the comments are as follows.

Point 1: I would like to see the conditions under which the proposed network does not work. We've seen recently papers on precisely that. For example, for image classification we see the images which can fool the network, specifically seeking the cases in which the "confidence" of the labelling is high. I expect the same. Otherwise, it is a super complex network of weights which work in a list of images displayed.

Response 1: In experimental section 4.8, The limitations of PIC-EC have been analyzed. We have carefully analyzed the paper "Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images" and found that deep neural networks are easily fooled by the adversarial examples. In order to explore the impact of adversarial examples on PIC-EC, we generate some adversarial examples by adding the gaussian noise. As Figure 10 suggests, the performance of PIC-EC is easily affected by the noise in the textured regions, such as the hair and eyes. And the image tones are darker. Since PIC-EC should learn the image features from the known region, the adversarial examples will degrade the quality of learned features and affect the performance of PIC-EC.

PIC-EC would also be fooled by the misleading edge and color priors. The failure cases illustrated in Figure 11 show that PIC-EC cannot handle some unaligned faces well, especially the side face images. In these cases, the edge maps in the missing region are not intactly restored. Therefore, there is no enough edge prior to construct the structure of the missing region, and the color prior dominates the image completion leading blurry contents. Because the color maps contain lots of low-frequency information that are less affected. This issue may be alleviated with 3D data augmentation.

Point 2: These papers have little or no mathematics. This one is over average, specially due to color and edge representation. But ultra-expanding maths (as in Eq (1)) is not too good. Those lines could be reduces to "M is a mask with M(x,y)=1 represents a missing pixel". By the way, if "k" is the pixel, the notation "k_x, k_y" is way more intuitive than "x_k, y_k". If it's a matter of personal preference, ok. But it's more complex to see.

Response 2: The image completion task is a linear inverse problem that tends to be ill-posed. Therefore, it can be formulated as a constrained optimization problem shown in Eq (2) in section 3.1. However, it is difficult to directly solve this optimization problem using the deep neural networks without any priors, because of the complexity of the natural images. The image priors serve as the regularizations that stabilize the degradation inversion and direct the outcomes toward more plausible results. Therefore, the image completion problem can be decoupled into three relatively easy sub-problems. The edge map can be obtained from the constrained optimization problem Eq (3), and the color map can be obtained from Eq (4). These two optimization problems can be modeled using edge and color paths in PIC-EC. The edge and color maps are then fed to the image completion network as the priors to solve the constrained optimization problem of Eq (5).

By the way, we rewrite the Eq (1) for notational simplicity.

Point 3: Tables as Table 1 are the problem in NN papers. Ok, there are the configurations. And? I mean, why? What do they mean? It feels like me cooking, I just put what I think it's gonna work and repeat forever if successful. Because I have no idea on cooking. So why those kernels? Is it robust to changes? Are there configurations which don't work? Is it the fastest you could find with good results? Is it worth using larger operators? Smaller operators maybe? I never find these answers, nor these questions, in the NN papers. By the way, this continues over again in experimental section.

Response 3: In section 3.2.1 and section 3.2.2, we explain the network configurations. The generator configuration is designed mainly according to the receptive field. We hope the receptive field of each neuron in the last layer can be as large as the input size of 256×256. This means that the neurons in the last layer can see the entire input image and more information can be used to generate contents. It is very important for image completion problem. The receptive field of the configuration listed in Table 1 is 248×248, it is very close to the input size. In this case, the receptive field of neurons inside the generator is relatively large. We find that the receptive fields of other configurations are small than 248×248. In general, the kernel size corresponding to the large feature maps should be larger, because the features in these feature maps tend to larger. This encoder-decoder architecture allows to reduce the memory usage and computation time by initially decreasing the resolution before further processing the image. And the residual blocks in the generator make it easy for the network to learn the identify function. It is an appealing property for image completion problem, since the output image should share structure with the input image. Therefore, the generator is designed to fulfill these conditions.

As shown in Table 2, the discriminator outputs score map instead of a single score, and the receptive field of each neuron in the score map is 70×70 which can still cover the entire input image. In order to reduce the computational power and maintain the receptive field, the kernel size is set to 4×4.

In the experimental section 4.2 and section 4.3, we explain the reasons for the parameter settings. In the training processing, the stability is very important. In the edge and color model, the parameters of feature matching loss should be set larger value than other, since the feature matching loss stabilizes the training process. In the image completion network, its training process is more stable because it has more information include edge and color priors guiding the training. Thus, we set the parameter of perceptual loss 1.0. All the adversarial parameters should be set very small values, because too large the values tend to lead instability in the early stage. As the training progresses, the adversarial parameters should be gradually increase until 1.0 for more details.

Point 4: I dislike authors using hand-made sub-subsections, as in pages around 10. A paper should be an article, not a list of itemizes titled by hand-made bold-faced text. Please remove all that text in boldface. And write prose so we can read it at once, not part-by-part.

Response 4: We have removed all the text in boldface in section 3.3, so the readers can read it at once, not part-by-part.

We have also improved the English language and style and added more background and relevant references.

Reviewer 2 Report

In this paper a novel image completion framework with edge and color priors is presented. It is a well written paper which includes a complete state of the art with conclusions supported by the results. If possible, not only mask size information, but also a comparison of the computation time should be included in the results.

Author Response

Dear reviewer:

Point 1: A comparison of the computation time should be included in the results.

Response 1: In experimental section 4.6, we have performed the feed-forward inference average computation time comparison for different methods. All these methods were evaluated with a machine of an Intel Core Xeon E5-2640 v3 CPU and a TITAN X (Pascal) GPU processing 1000 256×256 images with 128×128 holes. The results are listed in Table 5. Because PIC-EC has to obtain the edge and color priors before the image completion, it takes a little more computation time.

We have also added more background and relevant references. The English language and style have been checked.

Round 2

Reviewer 1 Report

I'm satisfied with the manuscript in its current version.

Article Menu

Parallel Image Completion with Edge and Color Map

Further Information

Guidelines

MDPI Initiatives

Follow MDPI