Next Article in Journal
IoT Security Risk Management Strategy Reference Model (IoTSRM2)
Previous Article in Journal
An Innovative Approach for the Evaluation of the Web Page Impact Combining User Experience and Neural Network Score
 
 
Article
Peer-Review Record

A Multi-Model Approach for User Portrait

Future Internet 2021, 13(6), 147; https://doi.org/10.3390/fi13060147
by Yanbo Chen *, Jingsha He, Wei Wei, Nafei Zhu and Cong Yu
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Future Internet 2021, 13(6), 147; https://doi.org/10.3390/fi13060147
Submission received: 28 April 2021 / Revised: 26 May 2021 / Accepted: 26 May 2021 / Published: 31 May 2021
(This article belongs to the Section Big Data and Augmented Intelligence)

Round 1

Reviewer 1 Report

-Some figures are not clear at all, for example the writing in all figures is blur, try to improve the quality of the figures. The same goes for the equations, eq1 looks like it is copied from other papers using screenshot.

-be more specific and avoid the use of "etc." especially in the abstract, please refer to the following article about the usage of "etc" in scientific article, https://www.grammarly.com/blog/et-cetera-etc/.

-add the following works to the related works section:
(user demographie detection)
[1] "Prediction of user demographics from music listening habits." Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. 2017.
[2] "Predicting Twitter user demographics using distant supervision from website traffic data." Journal of Artificial Intelligence Research 55 (2016): 389-408.

(User Interest and user personality portrait/profiling).
[1] Mining user interest based on personality-aware hybrid filtering in social networks. Knowledge-Based Systems 206, 106227. 2020.


 

 

 

Author Response

Response to Reviewer 1’s Comments

Thank you for your valuable feedback, and we appreciate you pointing out specific details for our paper. We’ve made adjustments to the manuscript.

Point 1: Some figures are not clear at all, for example the writing in all figures is blur, try to improve the quality of the figures. The same goes for the equations, eq1 looks like it is copied from other papers using screenshot.

Response 1: We have modified all the figures and equations that were not clear.

Point 2: be more specific and avoid the use of "etc." especially in the abstract, please refer to the following article about the usage of "etc" in scientific article, https://www.grammarly.com/blog/et-cetera-etc/.

Response 2: The word "etc" in the abstract has been changed to "and so on".  The modified sentence is “Age, gender, education stage and so on are the most basic attributes to identify and portray users.”

Point 3: add the following works to the related works section:

(user demographie detection)

[1] "Prediction of user demographics from music listening habits." Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. 2017.

[2] "Predicting Twitter user demographics using distant supervision from website traffic data." Journal of Artificial Intelligence Research 55 (2016): 389-408.

(User Interest and user personality portrait/profiling).

[1] Mining user interest based on personality-aware hybrid filtering in social networks. Knowledge-Based Systems 206, 106227. 2020.

Response 3: The above articles has been added into the related works section and the references and cited in the manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

The research question could be potentially interesting for the journal as well as the methodology proposed. In this form, the quality of the work is not adequate. It is very important to have a clear presentation of the topic, the method, and the results.

The goal of the proposed method is not appropriately stated.

Therefore I suggest considering the paper as an initial draft and improve all the sections.

line 19

"Experiment results show that our multi-model method is indeed more accurate than single-model methods."  this statement doesn't provide any element of interest in the abstract. Much better to summarise the performances of the multimodal method proposed in the study.

line 44 In general a method needs to be described in the first use like the XGBoost, at least with a first short statement and a reference.

The related work is not tracking a clear state of the art of the proposed topic and I advise finding more journal results. There is a clear lack of references in the method section. 

Authors are not proposing the theory of SVM or other classifiers used therefore is important to cite the most relevant references of those classifiers. Furthermore, many acronyms are used without any explanation in their first use i. e. Support Vector Machine (SVM). in particular in 3.2. there are many models discussed without references and definitions.

Most of the paper is based on the description of the classifiers without references, while a short description + good references + a description of the parameters used for the optimization is enough.

It is important to give more details about 3.4. Integration of the models. The description is written very quickly and it is not clear. this is the real contribution of the work and it requires a proper description.

Results and analysis: it is necessary to write an extensive statistical analysis of the results on the test . using k folder cross validation and defining the average of the results with standard deviation.

The analysis needs to address a strong discussion related to the performances.

line 390 "so next we will try to adjust the parameters or other methods to see if the accuracy 390 of the prediction can be further improved." seems to indicate that the result of the analysis was only preliminary and not well tested. 

"Figure 8. The accuracy rate of PV-DBOW in the training process" is not useful. please insert results on the test (blind test)

Figures: use descriptive captions. there is just a title and it is necessary to drive the reader in what is relevant to the figure. Improve the quality of the pictures., which is very low.

A number of typos are present and in the conclusion in 383  is written "According to my findings," which is weird for a group of authors

Author Response

Response to Reviewer 2’s Comments

Thank you for your valuable feedback, and we appreciate you pointing out specific details for our paper. We’ve made adjustments to the manuscript.

Point 1: The research question could be potentially interesting for the journal as well as the methodology proposed. In this form, the quality of the work is not adequate. It is very important to have a clear presentation of the topic, the method, and the results.

Response 1: The summary of the article, the integration model and the parameter results are all modified. For example, we added “The method used to combine individual learners is called the associative strategy. For the classification problem, we can use the voting method to select the class with the most output. For regression problems, we can average the output of the classifier. The above voting and averaging methods are both very effective combination strategies, and one combination strategy uses another machine-learning algorithm to combine individual machine learning results, namely stacking.” to the integration model.

Point 2: The goal of the proposed method is not appropriately stated. Therefore I suggest considering the paper as an initial draft and improve all the sections.

Response 2: Various parts of the article have been revised when it is necessary in response to the comment. One example is that we changed “More detailed labels, such as hobby, behavior and spending power, can further enhance the effectiveness of user portrait.” to “Based on these three attributes, it is possible to conduct in-depth mining analysis and high-level prediction recommendations on the user's preferences and personality, thereby enhancing the user's online experience, which is of great significance.” in the summary.

Point 3: line 19, “Experiment results show that our multi-model method is indeed more accurate than single-model methods.” this statement doesn't provide any element of interest in the abstract. Much better to summarise the performances of the multimodal method proposed in the study.

Response 3: A related conclusion has been added to support this conclusion. “The multi-model method is more accurate than single models (such as the TF-IDF + SVM model) in predicting the attributes of users’ gender, age, and educational background.” to summarise the performances of the multimodal method proposed in the study.

Point 4: line 44 In general a method needs to be described in the first use like the XGBoost, at least with a first short statement and a reference.

Response 4: Has been changed to “This model integrates multiple machine learning and deep learning models and then uses the output of the above models as the input of XGBoost (a scalable machine learning system for tree boosting) [2] for further training.” In which the relevant reference is “Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference, 2016.”

Point 5: The related work is not tracking a clear state of the art of the proposed topic and I advise finding more journal results. There is a clear lack of references in the method section.

Response 5: References have been expanded to the article. For example, “Though analyzing information about user's social properties and behaviors [4-5], user's portrait can be constructed to provide an important data base for further accurate and rapid analysis of behaviors and habits of the user [6], which can help enterprises find classified user groups and user’s current needs quickly and let users get a profound understanding of themselves.”

  1. Krismayer, T.;Schedl, M.; Knees, P.;Rabiser, R. Prediction of user demographics from music listening habits. Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. 2017.
  2. Culotta, A.; Ravi, N.K.; Cutler, J. Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data. Journal of Artificial Intelligence Research 55 (2016): 389-408.
  3. Dhelim, S.; Aung, N.; Ning, H. Mining user interest based on personality-aware hybrid filtering in social networks. Knowledge-Based Systems 206, 106227. 2020.

Point 6: Authors are not proposing the theory of SVM or other classifiers used therefore is important to cite the most relevant references of those classifiers. Furthermore, many acronyms are used without any explanation in their first use i. e. Support Vector Machine (SVM). In particular in 3.2, there are many models discussed without references and definitions.

Response 6: Relevant definitions and references have been added into the article. For example, “term frequency-inverse document frequency (TF-IDF)”, “Support Vector Machine (SVM)”, “Distributed Memory Model of Paragraph Vectors (PV-DM) [28]”, and “Distributed Bag of Words version of Paragraph Vector (PV-DBOW) [28]”.

Point 7: Most of the paper is based on the description of the classifiers without references, while a short description + good references + a description of the parameters used for the optimization is enough.

Response 7: The article has been revised in response to the comments. For example,”The Convolution Neural Network (CNN) [27] can take out local features such as words, n-grams, phrases, etc. in clauses.”

Point 8: It is important to give more details about 3.4. Integration of the models. The description is written very quickly and it is not clear. this is the real contribution of the work and it requires a proper description.

Response 8: We added some more text in the description of the integration models “The method used to combine individual learners is called the associative strategy. For the classification problem, we can use the voting method to select the class with the most output. For regression problems, we can average the output of the classifier.

The above voting and averaging methods are both very effective combination strategies, and one combination strategy uses another machine-learning algorithm to combine individual machine learning results, namely Stacking.”

Point 9: Results and analysis: it is necessary to write an extensive statistical analysis of the results on the test. using k folder cross validation and defining the average of the results with standard deviation.

Response 9: In the part of the integration model, the method of k-fold cross-validation was added, and the related integration model was enriched. Then, the section on experimental results were changed accordingly.

“In the training process of each base model, the training set is divided into 5 parts by using 5 fold cross-validation, 4 parts are used for the training of the model in turn, and 1 part is used for the validation of the model

Finally, the prediction results of each base model are spliced together and used as the input of the second layer XGBoost model.”

Point 10: The analysis needs to address a strong discussion related to the performances.

Response 10: First, a graph of changes in gender, age, and education iteration times and accuracy is included. Secondly, the accuracy of gender age, and education is predicted according to the k folder cross method.

Point 11: line 390 "so next we will try to adjust the parameters or other methods to see if the accuracy 390 of the prediction can be further improved." seems to indicate that the result of the analysis was only preliminary and not well tested.

Response 11: Adjusting parameters or otherwise means learning more about other techniques or methods to improve the model to improve accuracy, which may be misstated. The sentence has been modified “so next we will try to adjust the models or other techniques to see if the accuracy of the prediction can be further improved.”

Point 12: "Figure 8. The accuracy rate of PV-DBOW in the training process" is not useful. please insert results on the test (blind test)

Response 12: In the results and analysis section, graphs of changes in gender, age, and education iteration times and accuracy are included.

Point 13: Figures: use descriptive captions. there is just a title and it is necessary to drive the reader in what is relevant to the figure. Improve the quality of the pictures., which is very low.

Response 13: The pictures in the article have been changed and the figures’ titles have been changed. For example, we changed the title of figure 4, “Structure of PV-DM model,” to “Structure of PV-DM training vector”.

Point 14: A number of typos are present and in the conclusion in 383 is written "According to my findings," which is weird for a group of authors

Response 14: We have removed such expressions.

Author Response File: Author Response.docx

Reviewer 3 Report

The work requires the following corrections:
- first of all the mathematical notation needs to be corrected, sometimes the subscripts are not applied, and it needs careful checking.
- secondly, the presented results are not sufficient, visualised data from the learning process should be presented, including the effectiveness of problem solving on validation data in each learning epoch.
- any parameters used, libraries, should be specified in table form,
- then the parameters for assessing the quality of the model should be standardised. The authors presented the Precision parameter using TP and FP, while the accuracy of each class was not defined in a similar way. This defined parameter 'Precision' of course should be assigned to a specific decision class. 
- Despite the fact that in the ML community certain abbreviations are known, it cannot be assumed that every reader will be familiar with them. It is necessary to write their full names at the first mention of the techniques, give the abbreviation and give a reference to the reference of their original.
- I see typos in the work, especially in the diagrams showing the models, please check carefully,
- Each element of the presented schemes should be precisely described,
Finally, the codes of the presented model and the anonymised data used in the study should become publicly available to provide a reference point for other authors.

Author Response

Response to Reviewer 3’s Comments

Thank you for your valuable feedback, and we appreciate you pointing out specific details for our paper. We’ve made adjustments to the manuscript.

Point 1: first of all the mathematical notation needs to be corrected, sometimes the subscripts are not applied, and it needs careful checking.

Response 1: The article has been carefully examined and revised. For example, we changed "(x1,y1),(x2,y2)" to "(x1,y1),(x2,y2)".

Point 2: secondly, the presented results are not sufficient, visualised data from the learning process should be presented, including the effectiveness of problem solving on validation data in each learning epoch.

Response 2: In the results and analysis section, graphs of changes in gender, age, and education iteration times and accuracy are included.

In the part of experiment setting and result, the training parameters of correlative model were added. For example,

parameter name

parameter value

parameter description

kernel

linear

Kernel function, where the linear kernel is selected

C

3

Penalty coefficient

probability

True

Probability estimation

 

Point 3: any parameters used, libraries, should be specified in table form.

Response 3: Changes have been made. For example, parameters were specified in table form.

Point 4: then the parameters for assessing the quality of the model should be standardised. The authors presented the Precision parameter using TP and FP, while the accuracy of each class was not defined in a similar way. This defined parameter 'Precision' of course should be assigned to a specific decision class.

Response 4:  Has been modified to read “True positives (TP) is the number of samples that are correctly classified as positive, i.e. the number of samples that are actually positive and classified as positive by the classifier; False positives (FP-RRB) is the number of cases incorrectly classified as positive, i.e. the number of samples actually negative but classified as positive by the classifier; False negatives (FN-RRB) is the number of counter-examples that are actually positive but are classified as counter-examples by the classifier; True negatives (TN-RRB) is the number of counter cases that are correctly divided, that is, the number of samples that are actually negative and classified by the classifier as counter cases.”

Point 5: Despite the fact that in the ML community certain abbreviations are known, it cannot be assumed that every reader will be familiar with them. It is necessary to write their full names at the first mention of the techniques, give the abbreviation and give a reference to the reference of their original.

Response 5: Relevant definitions and references have been added into the article. For example, “term frequency-inverse document frequency (TF-IDF)”, “Support Vector Machine (SVM)”, “Distributed Memory Model of Paragraph Vectors (PV-DM) [28]”, and “Distributed Bag of Words version of Paragraph Vector (PV-DBOW) [28]”.

Point 6: I see typos in the work, especially in the diagrams showing the models, please check carefully,

Response 6: Changes have been made. For example, we changed “我非常喜欢这部电影” to “I like this action movie very much”.

Point 7: Each element of the presented schemes should be precisely described.

Response 7: Changes have been made. For example, “True positives (TP) is the number of samples that are correctly classified as positive, i.e. the number of samples that are actually positive and classified as positive by the classifier; ”

Point 8: Finally, the codes of the presented model and the anonymised data used in the study should become publicly available to provide a reference point for other authors.

Response 8: We’ll upload and expose the core code and the data set as soon as possible.

 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors improved the version of the paper. It is not still evident the major contribute to the knowledge in terms of original method with respect to a simple composition of classifier to solve a specific task.

A robust literature review can help to identify similar works for a proper comparison of the results (it is not necessary on the base of the same data)

"Based on my findings, a more efficient algorithm was proposed, which integrates" 443
still my is present in the comments 

Author Response

Response to Reviewer 2’s Comments

Thank you for your valuable feedback, and we appreciate you pointing out specific details for our paper. We’ve made adjustments to the manuscript.

Point 1: The authors improved the version of the paper. It is not still evident the major contribute to the knowledge in terms of original method with respect to a simple composition of classifier to solve a specific task.

Response 1: This study is not simply a fusion of classifiers. In our proposed scheme, the weight is assigned by summarizing the performance of different classifiers and models on different attributes and the model or classifier with a higher weight on different attributes is used as the filter during data pre-processing to achieve the effect of noise reduction. Then, the prediction results of the validated set after 5 fold cross-validations are retained in the input base model of the processed data set. Finally, the results of the three base models are vertically spliced as the input of the XGBoost model of the second layer model and the fusion model with high accuracy can be obtained by locating the optimal parameters through grid search.

Point 2: A robust literature review can help to identify similar works for a proper comparison of the results (it is not necessary on the base of the same data)

Response 2: To the best of my knowledge, we have included most of the major work in our literature review and compared our work to some closely related work. For instance, the models for user preference prediction and user social recommendation proposed by Wu et al. to describe the user profile and user's historical preferences in SNS under dynamic social network structure are referenced and carefully studied, and the results of our proposed model are compared to those of this work to demonstrate the advantages of our work.

Point 3: "Based on my findings, a more efficient algorithm was proposed, which integrates" 443 still my is present in the comments.

Response 3: We realized the confusion of the statement and have corrected it. The sentence now is “A more efficient algorithm was proposed, which integrates ...”.

Lines 415 was specifically modified in the text.

Author Response File: Author Response.pdf

Reviewer 3 Report

Most of my comments have been taken into account, I recommend acceptance of this work.

Author Response

Response to Reviewer 3s Comments

Point 1: Most of my comments have been taken into account, I recommend acceptance of this work.

Response 1: Thanks for the positive feedback. We really appreciate all the comments which are very helpful for improving our paper. We also appreciate the recognition of the reviewer on our work. Once again, thanks very much!

Back to TopTop