Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Soft Contrastive Cross-Modal Retrieval

Appl. Sci. 2024, 14(5), 1944; https://doi.org/10.3390/app14051944

by Jiayu Song¹, Yuxuan Hu^1,*, Lei Zhu²

, Chengyuan Zhang³, Jian Zhang¹ and Shichao Zhang¹

Reviewer 1:

Amin Jajarmi

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2024, 14(5), 1944; https://doi.org/10.3390/app14051944

Submission received: 6 December 2023 / Revised: 10 January 2024 / Accepted: 23 February 2024 / Published: 27 February 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Title: Soft Contrastive Cross-Modal Retrieval

Review Report: The manuscript presents a novel approach, Soft Contrastive Cross-Modal Retrieval (SCCMR), addressing issues related to the sharp embedding boundaries and lack of robustness in existing cross-modal retrieval methods. The integration of deep cross-modal models with soft contrastive learning and smooth label cross-entropy learning is a commendable effort to enhance common subspace embedding and improve model generalizability and robustness. The experimental results, comparing SCCMR with 12 state-of-the-art methods on two multi-modal datasets, demonstrate its superior performance in image-text retrieval. In my opinion, the manuscript is well-structured and clearly articulates the motivation, proposed method, experimental setup, and results. The following comments and suggestions are provided to help improve the manuscript: 1-The manuscript is generally well-written, but some sentences could be rephrased for improved clarity. For instance, consider revisiting the sentence: "Although most existing cross-modal retrieval methods have achieved remarkable performance, the embedding boundary becomes sharp and tortuous with the upgrade of model complexity." to enhance clarity and readability. 2-Provide more details on the datasets used in the experiments, including any specific characteristics, challenges, or biases they may have. This information is crucial for a comprehensive evaluation of the proposed method. 3-While the manuscript mentions outperforming 12 state-of-the-art methods, a more in-depth comparative analysis would strengthen the paper. Provide insights into specific scenarios or aspects where SCCMR excels, and discuss cases where it may be less effective. 4--The authors are advised to consider citing more recent publications to reinforce the novelty and relevance of their work. Additionally, it would be beneficial to include references to recent advancements in cross-modal retrieval or related areas. Overall, the Soft Contrastive Cross-Modal Retrieval (SCCMR) method shows promise in addressing existing challenges in cross-modal retrieval. With some improvements in clarity, experimental details, and comparative analysis, this manuscript could significantly contribute to the field. I recommend major revisions to address the points raised in this review. Additionally, please consider incorporating recent relevant literature to strengthen the manuscript's theoretical and contextual foundations.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Dear Review 1,

The response has been attached. Please see the attachment.

Thank you for your comments again.

Warm regards,

Jiayu Song

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a Soft Contrastive Cross-Modal Retrieval (SCCMR) to improve the generalizability and robustness of the model. However, the motivation and novelty of this work are limited. The detailed comments are as follows:

- The equations should be explained more. For example, what is the objective of eq. 1? The set builder form in Eqs. 2 and 3 is not appropriate.

- Authors claim that the existing contrastive learning approach has a limitation such as robustness, but it is not analyzed sufficiently.

- Comparisons with more recent methods and results on well-known benchmark dataset such as MSCOCO should be added.

- Combining soft contrast learning and label smoothing in the multimodal domain seems to lack novelty.

Comments on the Quality of English Language

Well written.

Author Response

Dear Review #2,

The response has been attached. Please see the attachment.

Thank you for your comments again.

Warm regards,

Jiayu Song

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors proposed in the paper a novel end-to-end cross-modal retrieval model, named Soft Contrastive Cross-Modal Retrieval (SCCMR), which combines soft contrastive learning and cross-modal learning. The paper tried to combine soft contrastive learning into cross-modal retrieval to improve multimodal feature embedding.

Tests were performed on two benchmark multimedia datasets, Wikipedia and NUS-WIDE which showed that the SCCMR method has good effectiveness.

I recommend a more extensive description of the part of the conclusions detailing the reason why the proposed method has a better performance than other similar methods with which it was compared.

Author Response

Dear Review #3,

The response has been attached. Please see the attachment.

Thank you for your comments again.

Warm regards,

Jiayu Song

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript has been considerably improved. Hence, I recommend this paper for publication in this reputable journal.

Article Menu

Soft Contrastive Cross-Modal Retrieval

Further Information

Guidelines

MDPI Initiatives

Follow MDPI