Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Framework Using Contrastive Learning for Classification with Noisy Labels

Data 2021, 6(6), 61; https://doi.org/10.3390/data6060061

by Madalina Ciortan^*,†, Romain Dupuis^†

and Thomas Peel

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Data 2021, 6(6), 61; https://doi.org/10.3390/data6060061

Submission received: 29 April 2021 / Revised: 3 June 2021 / Accepted: 5 June 2021 / Published: 9 June 2021

(This article belongs to the Special Issue Machine Learning with Label Noise)

Round 1

Reviewer 1 Report

A framework using contrastive learning for classification with noisy labels

Abstract:

Major contribution:
The contrastive pre-training increases the robustness of any loss function to noisy labels
2.additional finetuning phase can further improve accuracy

Introduction:

line 33 seems confusing: if you are talking about contrastive, are you talking about supervised or unsupervised as you are talking about the advantages of self supervised contrastive learning in line 31.

Related work

Adding more recent works in contrastive learning is important with methods such as swav and Byol, simsam etc. Or else, you can cite come of the works that has provided extensive reviews on state of the work such as,

1. Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee, D., & Makedon, F. (2021). A survey on contrastive self-supervised learning. Technologies, 9(1), 2.
2. Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., ... & Krishnan, D. (2020). Supervised contrastive learning. arXiv preprint arXiv:2004.11362.

Preliminaries

figure 1: the label should be NFL+RCE right?

line 91: It doesnt convey well when you say, the image x are of size d. does this mean your input is a single dimensional vector?

3.1, 3.2 well explained.

Line 127: Is your claim that the pseudo labels that are generated with the self-supervised learning is more accurate correct? Can this statement be generalized for all problems? How do you evaluate the performance of your pretext task (pretraining with contrastive learning?). This assumption can be made only if you are sure that the model has learnt some meaningful features during the pretext task which is not valid in all problems.

In figure 2 pre-training phase, you are generating pseudo-labels for the training set. Why do you have a connection of noisy labels there? What is the purpose of having a connection of noisy label merged with the output of unsupervised contrastive block?

check line 128, 129: "improves the performance the classifier"?? what you are trying to convey is not clear

section 4.1 well explained

section 4.2:

Please explain the temprature coefficient in the loss function

5 experiments:

195: ReLu -> ReLU

Explaining the dimension of the projection head is good for reproducability

If using simCLR, what was the size of the batch used. Did that have any influence on the learning?

Overall comment:

The paper overall is well explained and its a good contribution. Please try to address the above concerns in the related work and other sections.

section 6 results:
Well organized and writen

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This work investigates the use of contrastive learning as a pre-training task to perform image classification in the presence of noisy labels.

It presents an extensive empirical study showing that a contrastive pre-training step brings a significant performance gain when using different loss functions: non-robust, robust, and regularized by pre-training

Your work prompts a few substantive observations:

In your manuscript, the cross-entropy penalized by forward learning regularization seems to give the best results. However, the learning phases remain sensitive to the configuration of the inputs. How to explain this underperformance?
The problem of overfitting is always posed if ad-hoc stopping criteria are not set. In your case, what are the criteria used?
It would be interesting to extend your references by analyzing and citing the following key works involving loss functions of the cross-entropy type:

Gorgi Zadeh and M. Schmid, "Bias in Cross-Entropy-Based Training of Deep Survival Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.2979450.

Ouahabi, A.; Taleb-Ahmed, A. Deep learning for real-time semantic segmentation: Application in ultrasound imaging. Pattern Recognition Letters 2021, 144, 27–34.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Authors should strictly follow our recommendations and should cite and discuss section 3.2 "Loss function optimization" of the reference entitled "Deep learning for real-time semantic segmentation: Application in ultrasound imaging". This section shows the importance of the choice of the loss function in the context of application-oriented deep learning.

Author Response

The authors have added the reference.

Article Menu

A Framework Using Contrastive Learning for Classification with Noisy Labels

Further Information

Guidelines

MDPI Initiatives

Follow MDPI