Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration

Symmetry 2021, 13(6), 964; https://doi.org/10.3390/sym13060964

by Mingshu He¹

, Xiaojuan Wang^1,*, Chundong Zou¹, Bingying Dai² and Lei Jin³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Symmetry 2021, 13(6), 964; https://doi.org/10.3390/sym13060964

Submission received: 10 April 2021 / Revised: 14 May 2021 / Accepted: 21 May 2021 / Published: 28 May 2021

(This article belongs to the Section Computer)

Round 1

Reviewer 1 Report

Dear Authors,

The paper studied an interesting topic, but the following issues should be considered:

1. Abstract to conclusion needs English to proofread. Academic English should be used.

2. Section two, should be a literature review and should have some recent ones.

3. Methodology and results are well-written ad described.

4. Conclusion section should be enhanced in light of the findings.

5. Policy implications and recommendations for future studies should be added to the conclusion section.

best of luck

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] 1. Abstract to conclusion needs English to proofread. Academic English should be used.

[Response] Thank you very much for your valuable suggestion. We modified several descriptions to improve the quality. Some of them are shown as follows:

Revised content	Line Numbe
Before revision: Through the understanding of these contents, After revision: By understanding these contents,	1
Before revision: This paper focuses on the process of commodity trade declaration process and identify the commodity categories based on text information on customs declarations. After revision: This paper focuses on commodity category recognition based on text information in the process of commodity trade declaration.	2-4
Before revision: make some improvements, After revision: some improvements have been made,	10-11
Before revision: In two datasets After revision: In the two datasets used in this paper	11
Before revision: if the commodity occurs the trade of their history After revision: if the commodity occurs during the historical trade	32
Before revision: computer version (CV) After revision: computer vision (CV)	35
Before revision: Sarker et al. formulate the problem of building a context-aware predictive model based on Decision Tree (DT) for predicting user diverse behavioral activities with smartphones [5] After revision: For instance, Sarkeret al. [5] formulate the problem of building a context-aware predictive model based on Decision Tree (DT) for predicting user diverse behavioral activities with smart phones.	39
Before revision: Zeng et al. proposed an ML framework for predicting 41 users’ behavior interests [6]. After revision: Zenget al. [6] proposed an ML framework for41predicting users’ behavior interests.	41
Before revision: make us After revision: inspire us to	42
We added some necessary references to related work and revised the section name, “Related work” to “Literature review and related work”.	71-93
Before revision: Each word is given multiple weights, and these weights are applied to the word embedding, and transformed features are input into the multi-channel CNN model to predict the label of the sentence After revision: Each word is given multiple weights that are applied to the word embedding, and transformed features are input into the multi-channel CNN model to predict the label of the sentence.	119
We described Fig. 1 in more detail in section “Overview of the framework”.	135-156
Before revision: Bert is a multi-layer bidirectional Transformer [31] based language representation model, so it cannot obtain the sequence information of the token. Position Embedding is introduced to add location information to make up for that. After revision: Bert is a multi-layer bidirectional Transformer [31] based language representation model, which means it cannot obtain the sequence information of the token. Therefore, position Embedding is introduced to add location information to make up for that.	182
We added some necessary explanations from line 190 to 200 to make the description of Transformer Encoder more clearly.	190-200
Before revision: The HSNet model’s structure design are as Fig. 4 After revision: The HSNet model’s structure design is shown in Fig. \ref{fig3}	227
Before revision: which demands us to consider whether we can get better results from the fusion of the two models After revision: which enlightens us to consider whether we can get better results from the fusion of the two models	241
Before revision: HS-Dataset1 contains 226,528 samples collected from some cooperative companies and website. After revision: HS-Dataset1 contains 226,528 samples collected from some cooperative companies and websites.	267
Before revision: and then set a Tanh function After revision: and then sets a Tanh function	286
Before revision: The proportion of correct samples predicted by model among the total samples After revision: The proportion of the correct samples predicted by model among the total samples	326
Before revision: The experiment on HS-Dataset2 indicates a better performance that can reach a high index about 99%. After revision: The experiment on HS-Dataset2 indicates a better performance that every metric except Averaged-F1 can reach a high index about 99% in each level	345
Before revision: Then, in order to observe the HS-code classification results output from each model and compare their difference After revision: Then, in order to observe the HS-code classification results output from each model and compare their differences.	365
Before revision: As for testing process, After revision: During the test,	406
We revised the conclusions from line 409 to the end.	409

[Comment] 2. Section two, should be a literature review and should have some recent ones.

[Response] Thank you very much for your valuable suggestion. We revised” Related Work” into “Literature Review and Related Work”. We make an in-depth literature review in this section over the past five years, and then introduce some related technology in other similar fields in this section. Some new literatures are added here. Line 70 to line 93 are revised version.

[Comment] 3. Methodology and results are well-written ad described.

[Response] Thank you very much for your valuable suggestion. And we are pleased to get your recognition of our work.

[Comment] 4. Conclusion section should be enhanced in light of the findings.

[Response] Thank you very much for your valuable suggestion. We added some descriptions in the conclusion section from line 410 to line 441.

[Comments] 5. Policy implications and recommendations for future studies should be added to the conclusion section.

[Responses] Thank you very much for your valuable suggestion. We added some future work in the conclusion section from line 410 to line 441.

Thanks again for your efforts. In addition to the above revisions, more than 20 modifications have been made to improve the quality of this paper. For easy review, we highlighted the changes in the PDF. If there are any questions, please feel free to contact me.

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Reviewer 2 Report

The approach used in this paper is interesting, but there are some considerations:

Instead of "Related work", I propose an in-depth literature review showing that there are few works on similar topics
poor literature
In the References list no paper from Symmetry. Why do Authors want to publish an article in this journal?

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] Instead of "Related work", I propose an in-depth literature review showing that there are few works on similar topics poor literature.

[Comment] In the References list no paper from Symmetry. Why do Authors want to publish an article in this journal?

[Response] Thank you very much for your valuable suggestion. We find there are some instructive research in Symmetry. And we added the most relevant one (Ref. 26) into section “Literature Review and Related Work”. Besides, we think this study is suitable for the theme of the “Computer and Engineering Science and Symmetry” in Symmetry. The paper uses the technology of big data and machine learning to solve the HS-code declaration problem, which is a natural language processing in computer science and engineering. At the time, few studies have focused on HS-code classification before and we hope that researchers can pay more attention to this field since the commodity trade is going on all the time.

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Reviewer 3 Report

The idea of the paper is very interesting and the model seems to be even too complex as a result of the fusion of two models. In my opinion, the results obtained with HSBert are already good enough considering the accuracy levels reached by it, in any case, HSBert performs much better than HSCNN. In any case, I consider the paper very interesting and well written.

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] The idea of the paper is very interesting and the model seems to be even too complex as a result of the fusion of two models. In my opinion, the results obtained with HSBert are already good enough considering the accuracy levels reached by it, in any case, HSBert performs much better than HSCNN. In any case, I consider the paper very interesting and well written.

[Response] Thank you very much for your valuable suggestion. We are pleased to get your recognition of our work. As for fusion models, it comes from the low averaged f1 value on two datasets between HSBert and HSCNN. As the red box marked in the figure, we hope to improve the averaged f1 and reduce the impact of data imbalance. Hence, we attempt to add some convolutional layer into HSBert and built HSNet model. We found that HSNet has a brilliant performance on dataset 2 but a little worse than HSBert on dataset1. In this instance, we consider to build a fusion model between HSBert and HSnet. And the results show that the fusion model makes sense. Although the fusion model cost more resources to improve accuracy, it is of great significance to unbalanced datasets. We can choose a suitable model in application.

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Reviewer 4 Report

The aim of the paper is to introduce a ML-based classification framework for commodity trade declaration.

This paper deals with an interesting subject. The methodology is clearly described. The provided experiments seem to prove good performances of the proposed approach.

However, some minor corrections should be made, as follows:

Line 33 – CV stands for computer vision
The authors should provide further explanations regarding Figure 1
Figure 2 - explain the Transformer Encoder representation in terms of operations (2) and (3)
Figure 3 – only one convolution filter size is specified.

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] The aim of the paper is to introduce a ML-based classification framework for commodity trade declaration. This paper deals with an interesting subject. The methodology is clearly described. The provided experiments seem to prove good performances of the proposed approach. However, some minor corrections should be made, as follows:

Line 33 – CV stands for computer vision

[Response] Thank you very much for your valuable suggestion. We revised it into “computer vision”.

[Comment] The authors should provide further explanations regarding Figure 1

[Response] Thank you very much for your valuable suggestion. We added some sub tags (A, B…) to the figure and added some descriptions in terms of each part. Here is the revised figure and description.

This paper targets at helping people to complete the trade declaration automatically. Fig. 1 depicts the process of declaration with data processing, modeling and classification strategy. As shown in the figure, part A is the original data that consists of training data and their labels. Part B describes the process of data processing. From part C and D, we define the declaration process as a classification task. Our aim is to get the correct 10-digit HS-code. Since the number of HS-code is too large and data distribution is unbalanced in each code, we split the code into multiple. According to the results, the split method improve the separability of a single model. From a business perspective, the establishment of HS-code is completed through different chapters, sections and items by each HS-code digital. HS-code is divided into 22 categories and 98 chapters by the first 2 digits. The third and forth digits determine its section and the fifth and sixth show its items. The rest digits are the classification criteria for goods defined by countries. The original Chinese text data are transferred to vectors and sent to the classifiers based on ML. Some traditional ML models, neural network-based models and fusion models are used in this paper. Then, the final intact HS-code is obtained by multi-layer superposition.

As for the trade declaration process, our framework completely replaces the manual retrieval process that shows as the red dotted box in part E and F. When handling the commodity declaration data that did not occur before, we do not need to consult datum again. The experiments proved that the automatic classification process can improve the identification efficiency and classification accuracy compared with other existing methods. The accuracy can achieve over 99% in our dataset.

[Comment] Figure 2 - explain the Transformer Encoder representation in terms of operations (2) and (3)

[Response] Thank you very much for your valuable suggestion. We added some necessary explanations from line 190 to 200 to make the description of Transformer Encoder more clearly. The revised content are as follows.

The Bert Encoder layer is composed of a stack of N identical Transformer blocks, We denote the Transformer block as T(h), in which h represents the hidden vector. Mini batch is usually used in the process of self-attention calculation. The model’s input dimension is B×S, which B is the size of a batch and S is the length of a sentence. The mini batch is composed of sentences with different lengths, so we set a max sequence length parameter. If a sentence in a batch exceeds max sequence length, the excess length will be cut off. Similarly, the sentences that are not long enough is filled with 0. This process is padding. However, the part filled with 0 will also participate in self-attention, so attention mask is used to solve this problem. Attention mask can make the invalid part not participate in the calculation. The detailed operations of Bert Encoder layer are as follows

[Comment] Figure 3 – only one convolution filter size is specified.

[Response] Thank you very much for your valuable suggestion. We revised this figure and added all convolution filters size in it. The revised figure is shown as follow:

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Article Menu

A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration

Further Information

Guidelines

MDPI Initiatives

Follow MDPI