Next Article in Journal
Applying a Genetic Algorithm to a m-TSP: Case Study of a Decision Support System for Optimizing a Beverage Logistics Vehicles Routing Problem
Next Article in Special Issue
Segmentation of Echocardiography Based on Deep Learning Model
Previous Article in Journal
OATCR: Outdoor Autonomous Trash-Collecting Robot Design Using YOLOv4-Tiny
Previous Article in Special Issue
Digital Technologies for Innovative Mental Health Rehabilitation
 
 
Article
Peer-Review Record

Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification

Electronics 2021, 10(18), 2296; https://doi.org/10.3390/electronics10182296
by Hyun-Tae Choi and Byung-Woo Hong *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2021, 10(18), 2296; https://doi.org/10.3390/electronics10182296
Submission received: 26 July 2021 / Revised: 6 September 2021 / Accepted: 15 September 2021 / Published: 18 September 2021

Round 1

Reviewer 1 Report

1) The paper requires a significant polishing of the language style. Here are examples:

Line 8: “ on the segmentation models(Mumford Shah model and Chan-Vese model) and classification. The first stage(i.e., Mumford-Shah model or Chan-Vese model) is to find a base image mask for classification”

 --> first an OR then and AND of two methods? Please decide about or/and!

Line 15: “can segment the input image to the foreground and the background only with the class labels” ?

--> Segment an image with class labels only? Please clarify this!

 

Line 41: “algorithms minimize their objective functions with a curve in a level-set method” ?

-->Please clarify this procedure!

 

Line 56: “We use the two-stage model, which has been mainly used in object detection, to the image segmentation domain.” 

--> … in the image segmentation problem OR … to solve the image segmentation problem.

 

Line 59: “ … a simple loss to get the background, “ 

--> … a loss function for classifier training …

 

2. You could already early make it clear whether your segmentation algorithm takes the form of a classifier training that is applied to a single image or how you combine unsupervised learning of the pixel-like classifier with a supervised one. Otherwise, it is difficult to  follow your explanation. 

3. Using the classifier network as an image segmentation tool requires an iterative minimization of the loss functional used in unsupervised learning, but he classification process is a one-pass process only.  How can this contradiction be resolved?

 

4. Line 67: The term “image segmentation” is much wider than you use it. Not only regions can be detected in the image segmentation process but also keypoints, edges and textures. Your approach is about image region segmentation for the purpose of foreground/background detection based on an active contour (level-set method) and pixel classification.

 

5. Subsec. 2.3: the state-of-the-art in deep learning-based semantic image segmentation is the work of Girshick et al. , i.e. Fast-RCNN. Faster-CNN ad Mask-0CNN. You should pay more attention to it than only simply referring it. You could also explain the different purpose of your method and the reason why you refer these semantic segmentation networks.

 

6. Line below eq. (7): "In (6), the first term and the second term divide the input image to the multiple regions."?

--> The result of (6) is an overall loss function, so there are no any multiple regions. The function operates on image pixels classified either as an inner and outer pixel of a region. The division into multiple regions must be initialized somehow before the minimization of the loss function starts.

 

7. Line over eq. (8): “is the ground-truth labels” ?

--> are …

 

8. Line 179: “… either the Chan-Vese algorithm or the piecewise-smooth Mumford-Shah functional.”

--> Please define what is your algorithm exaclty corresponding to – the unsupervised training of a classification net with a loss function represented by given functional or a supervised training of a classification net with a cross-entropy loss. An algorithm cannot be at the same an “algorithm” and a “functional”.

 

9. In equation (12), what is log4b? If 4 means the logarithm’s base use a subscript notation. Why should it be 4 and not 2 or 10?

 

10. Line 136: “image attribution problem”?

--> I have not heard about this – you probably mean the ROI detection or feature extraction problems?

 

11. It is unclear, why you do not compare your approach with any classic image segmentation method, especially with the baseline method of Chan-Vese?

 

12. Eventually, there is a weakness of your method, that the initialization of the foreground region is unknown and the number of regions must be known in advance.

 

Author Response

Please see the attachment.

We will upload the revised version with your point, if  you allow me. 

Author Response File: Author Response.pdf

Reviewer 2 Report

Due to the high quality of the papers published in the Electronics journal I kindly ask the authors to make the following improvements.

The authors should consider answering the following questions to avoid this issue and to find the “generalization” traits of their study: What new findings provide their study compared to other similar studies performed (the most recent)? What lessons could be learned from their study? What is the rationale of the study, limitations, and future applications of the results? The originality of the paper is not properly presented.

What is original is not clearly demonstrated as a degree of novelty compared to what is already published in the field. The contribution of the study is fuzzy. Please make a list that clearly states the actual contribution of the paper (and be introduced in the chapter: Introduction).

The methodology chapter can confuse the reader. It seems to present an important contribution of the authors. However, this chapter should be reconfigured with a clear separation of the elements already known from the literature (an appendix may be added) and the original elements brought by the authors.

Figures 3 , 4 and 5 do not show the true value of the methodology used. With any image processing software package, what is represented in these figures can be obtained. Thus, I consider that the results of this paper are not well described. Chapter 4. “Experimental Results” doesn't show enough discussion. Discussions and Conclusions chapter it also does not present enough discussion and is truly short.

Author Response

Please see the attachment.

We will upload the revised version with your point, if  you allow me. 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

  1. The term “two-stage image segmentation” is not properly reflecting your approach. It is too general and irrelevant if not misleading the Reader. A “two-step” is a technical issue that is not characterizing the problem nor the type of your solution. First, you do machine learning – propose a deep neural network instead of a classic image segmentation based on level-set theory. Second, you do foreground/background region detection and classification, i.e., a segmentation of an input image into a two-class labeled image. Third, “a two-step image segmentation” says virtually nothing about the approach but combing a region proposal network for one class (the foreground or the background class regions are dual problems) with an image classification step is a very simplified case of “semantic image segmentation”. Fourth, you do a weakly supervised learning, where an unsupervised encoder-decoder network is supported for image labeling by a supervised binary classifier. Nothing from these characteristic features is reflected by the title and term “two-step image segmentation”.
  2. Other main problems are the limited novelty of your approach and the lack of experimental comparison with similar methods. There is a rich literature on classic and neural approaches to image segmentation. You mentioned neural methods by ref. [24]-[33], or classic use of the Mumford-Shah or Chan-Vese methods by ref. [5]-[7]. Can you compare the results of some of nearest methods but also classic solutions on the same datasets as applied to your implementation? In this way, the novel proposed features will be highlighted and verified.
  3. The experimental section should give detailed information about the network implementation, i.e., any image preprocessing, input size, network layers, sizes, type of layer, activation function, etc. Otherwise, the results will hardly be reproducible.
  4. It is unclear, how you have obtained the results in case “classifier only”, as the classifier is not producing a mask – it works like an encoder of the image, generating and classifying image features.
  5. You have adopted the network structure given in ref. [32], fig. 1, but without any explanation of the difference between learning and segmentation operation modes. Your network structures in figure 1 and 2 represent the learning mode. Explain or draw the structure needed after learning by the active image segmentation mode of the network.
  6. Equation (1) can hardly be called Mumford-Shah functional (MSfun) (piecewise smooth), as the original MSfun doesn’t consider the size of an area (your second term) in the model but asks for homogeneous regions, i.e., it counts the total variability of model regions (in the piecewise smooth case) or variability is equal to zero in the piecewise constant model. What you introduce by (1) is the Chan-Vese functional in the binary (two-phase) piecewise constant model case, which includes an additional penalization of the model region size. This difference is not clearly stated in your paper.
  7. By introducing the role of phi(x, y) you can explain that this is a distance function (representing a contour) and show how it particularly represents a discrete 2D contour.
  8. Line 193, you write: “We use the sigmoid function instead of the Heavy Side function (2), so 194 that the mask has only 0,1 values”? -> H in (2) has only values {0, 1} but sigmoid values are from a dense interval of [0.0, 1.0].
  9. c1, and c2 in (1) and (3) should not be called region centers, because for a 2D image region this term typically represents the 2D location of region’s weight center and not the mean (average) or expected value (as it should be) of the image function in this region. The equation (4) defines c1 as the mean value of the image function in foreground regions while c2 – as the mean of image function value in the background regions.
  10. It is unclear whether the network weights leading to an optimal loss value, established for a given training dataset, can be valid for other datasets, especially with different lighting conditions. There must be some image preprocessing stage that normalizes global image intensity, or can you show that this problem is solved by the network itself?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Thanks to authors for the coverletter!

The authors wrote to me very decisively and some sentences from the response letter could be improved and introduced into the paper to clarify certain aspects.

Further, the original aspects are not clearly shown.

The authors wrote to me as “it is difficult to compare our results to other weakly-supervised segmentation” and that they use the “Chan-Vese and Mumford-Shah to make the boundary precisely”. The results and discussions from this work are not clearly reported to others similar studies. For example, I ask the authors to highlight their results by comparison with studies such as this one (that use Chan-Vese and Mumford-Shah models), as an example: https://journals.sagepub.com/doi/pdf/10.1177/1748301816668025. I do not own this study. There are several to a simple search in the scientific databases.

I apologize to the authors if I was not clear the first time, but I ask them to bring discussions of their method and results by comparison with other recently studies or models. I understand that the authors present new data, but it is still not clear to me how this study can contribute to this area.  Maybe is better to the authors to re-organize this work and re-submit as a review paper type, where to expose and correlate the results presented here.

The article has been slightly improved, but I still do not think it is ready to be accepted for publication.

The language style used in some places is not scientific.

I kindly ask the authors to avoid the repetitive word "so".

Please avoid “try” word.

Line 82: if the classical variational methods try to solve the problem with clustering, and yet it doesn't solve it, I kindly ask the authors to motivate here too, even if it is repeated.

Figures 3, 4 and 5 are described in the text in order 3, 5, 4. Please solve this situation.

I kindly ask the authors to bring more discussion about figure 4, as they brought about figures 3 and 5.

The figures must be moved before the Discussions and Conclusions chapter because its discussion is in chapter 4.

About eq. 1, how does constants c1 and c2 depend on C? I understand that “the third and fourth terms are used to control the difference between the piece-wise constant model’s result and the input image u0”, but I observed that c1 and c2 are calculated from eq. 4 and 7. Please clarify here with short discussions.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

Thank you for your response to my comments. I consider this submission to be much improved and easier accessible by a reader.

Still, I could argue with some of your choices (experimental validation) but the paper is generally acceptable for publication. The title changed to “Unsupervised Object Segmentation based on Bi-partitioning Image Model integrated with Classification” is better matching the content of this paper than “two-step segmentation”, although I have not requested to remove the Mumford-Shah and Chan-Vese names from the title.

In my view, the novelty is in the use of the piece-wise smooth Mumford-Shah and the piece-wise constant binary Chan-Vese functionals as alternative loss functions for training an image segmentation network and the integration of a classifier giving a weakly supervised machine learning approach to binary image segmentation. Some terminology has also been corrected increasing the clarity of presentation.

The experimental section could have been extended by a comparison with more classic and neural techniques of binary image segmentation. 

Reviewer 2 Report

I thank the authors for considering my recommendations.

In this form, the authors have improved the article and it can be accepted for publication.

Back to TopTop