The authors conduct a comprehensive study on the measurement and evaluation of open data using various single-model and multi-model (Stacking Ensemble) machine learning methods. This study contributes to the utilization of open data and demonstrates its intrinsic value to society. Overall, the manuscript is well-structured, the relevant background is provided, and the need for the approach to circumvent challenges is clearly stated. I recommend it for publication after the authors consider the following revisions:

In this work, the authors use four ML algorithms (Random Forest, XGBoost, LightGBM, CatBoost). However, in many studies, NN algorithms generally yield very good prediction performance. Why were they not considered in this work?
What is the computational efficiency of each ML model? It is important to provide this information.
Figure 5 can be improved.
How was the data divided into training and test sets—was it done manually or using an algorithm?
The authors are encouraged to conduct a SHAP (SHapley Additive exPlanations) analysis to gain insightful knowledge about the influence of different input variables.
The authors are encouraged to provide an application example of the developed ML models to offer readers more insights into using these models.

Author Response

Thank you for reviewing our paper. I will submit the revised response incorporating your feedback.

We heartily thank all the comments and coordination again.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors should strengthen the connection to sustainability by explicitly discussing how their findings could contribute to achieving sustainability goals. How open data utilization can contribute to achieving specific Sustainable Development Goals (SDGs), How better evaluation of open data utilization can lead to more targeted policy interventions that promote sustainability, and How their findings can inform the development of data governance frameworks that prioritize sustainability.
The manuscript could benefit from clarifying the link between open data utilization and sustainability throughout the paper, beyond the introduction.
The authors mention removing duplicates and missing values but provide little information about the specific techniques used, which might impact model performance.
Details about hyperparameter tuning for each model are not provided, which could influence the results and require further investigation.
The authors don't discuss any feature engineering techniques employed, which could contribute significantly to model performance.
The manuscript mentions using the integrated and field-specific datasets, but doesn't discuss the specific validation methods (e.g., cross-validation, holdout sets) used to evaluate the model's generalization ability.
The authors only focus on tree-based algorithms, potentially overlooking other promising machine learning models.

8. 8. The results presentation is clear and effective but could benefit from improved graphical representations and further statistical insights. The authors could use additional graphical representations (e.g., box plots, scatter plots) to visually highlight the distribution of prediction errors, model performance comparisons, and potential relationships between data attributes and model performance.

Comments on the Quality of English Language

The authors should strengthen the connection to sustainability by explicitly discussing how their findings could contribute to achieving sustainability goals. How open data utilization can contribute to achieving specific Sustainable Development Goals (SDGs), How better evaluation of open data utilization can lead to more targeted policy interventions that promote sustainability, and How their findings can inform the development of data governance frameworks that prioritize sustainability.
The manuscript could benefit from clarifying the link between open data utilization and sustainability throughout the paper, beyond the introduction.
The authors mention removing duplicates and missing values but provide little information about the specific techniques used, which might impact model performance.
Details about hyperparameter tuning for each model are not provided, which could influence the results and require further investigation.
The authors don't discuss any feature engineering techniques employed, which could contribute significantly to model performance.
The manuscript mentions using the integrated and field-specific datasets, but doesn't discuss the specific validation methods (e.g., cross-validation, holdout sets) used to evaluate the model's generalization ability.
The authors only focus on tree-based algorithms, potentially overlooking other promising machine learning models.

8. The results presentation is clear and effective but could benefit from improved graphical representations and further statistical insights. The authors could use additional graphical representations (e.g., box plots, scatter plots) to visually highlight the distribution of prediction errors, model performance comparisons, and potential relationships between data attributes and model performance.

Author Response

Thank you for reviewing our paper. I will submit the revised response incorporating your feedback.

We heartily thank all the comments and coordination again.

Author Response File: Author Response.pdf

Article Menu

Proposing Machine Learning Models Suitable for Predicting Open Data Utilization

Further Information

Guidelines

MDPI Initiatives

Follow MDPI