Next Article in Journal
Decoupling Regional Economic Growth from Industrial CO2 Emissions: Empirical Evidence from the 13 Prefecture-Level Cities in Jiangsu Province
Previous Article in Journal
The Application of a Pavement Distress Detection Method Based on FS-Net
 
 
Article
Peer-Review Record

Using Machine Learning to Predict Visitors to Totally Protected Areas in Sarawak, Malaysia

Sustainability 2022, 14(5), 2735; https://doi.org/10.3390/su14052735
by Abang Zainoren Abang Abdurahman 1, Wan Fairos Wan Yaacob 2,3,*, Syerina Azlin Md Nasir 2, Serah Jaya 1 and Suhaili Mokhtar 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Sustainability 2022, 14(5), 2735; https://doi.org/10.3390/su14052735
Submission received: 25 January 2022 / Revised: 21 February 2022 / Accepted: 23 February 2022 / Published: 25 February 2022

Round 1

Reviewer 1 Report

In this article, machine learning algorithms are used in an interesting topic. However, the manuscript contains several problems:

  1. What is the reason for using the algorithm used? What are your criteria for choosing this algorithm?
  2. What is the purpose of using this case study?
  3. The quality of the figures and tables is low.
  4. Please add a paragraph in the introduction by relying on and citing important articles on the importance and benefits of machine learning. Important articles are as follows:

https://doi.org/10.1007/s11269-021-02913-4

https://doi.org/10.1007/S11600-020-00446-9

https://doi.org/10.3390/hydrology8010025

  1. In figure 2 geographical locations of the case study are not clear.
  2. Give more details about the KNN algorithm. It is suggested to provide a form of algorithm structure.

Author Response

Response to Reviewer 1 Comments

 

Point 1:                What is the reason for using the algorithm used? What are your criteria for choosing this algorithm?.

Response 1: The reason for choosing the alogorithm is that it fit with classification problem for prediction purposes. Justification is added (lines 198-199)

 

Point 2:                What is the purpose of using this case study?

Response 2: The purpose of using this case study is to identify the possible factors that may affect visitors to visit national parks, nature reserves, and wildlife sanctuaries in sustaining these TPAs. The insights from visitors' data can aid in decision-making related to exhibitions, marketing operations, resource planning, and revenue optimization. Thus, this study aims to understand the natural factors that affect the visitors’ attendance to totally protected areas in Sarawak and investigate the comparable impact of other effects. With the advancement of predictive models such as machine learning techniques, findings from this study can be used as a guide to relevant parties for a range of plan-ning tasks. Justification is added (lines 56 - 64)

 

Point 3:                The quality of the figures and tables is low.

Response 3: The quality of the figures and tables has been improved. (lines 219, 315, 368 and 422)

 

Point 4:                Please add a paragraph in the introduction by relying on and citing important articles on the importance and benefits of machine learning. Important articles are as follows: https://doi.org/10.1007/s11269-021-02913-4

https://doi.org/10.1007/S11600-020-00446-9

https://doi.org/10.3390/hydrology8010025.

Response 4: A paragraph with important articles has been added in the introduction. (lines 64 – 70).

 

Point 5:                In figure 2 geographical locations of the case study are not clear.

Response 5: Figure 2 geographical locations has been improved. (lines 219-227).

 

Point 6:                Give more details about the KNN algorithm. It is suggested to provide a form of algorithm structure

Response 6: Details about KNN algorithm has been added. (lines 309-316).

Reviewer 2 Report

 

  1. The author does not give a specific reason about performing testing and training data 70:30
  2. The parameters used are also not clearly explained. This article does not have enough novelty, and many papers have covered the same topic. The author needs to explain what the highlights of this research are. What is the critical variable of this research?
  3. It is recommended to add some accuracy tests such as AUC and ROC

 

Update reference line 120.

 

The critical issue of performing data mining and machine learning is underfitting and overfitting training data. In line with this, we should find how to get the best parameter models. Previously, the researcher could be using kind of heuristics and metaheuristics optimization [1–4]. In addition, the split of training and testing data has also influenced the accuracy of the model, like the percentage 90:10, 80:20, 70:30, 60:40, 50:50, respectively [5–8] . On the other hand, there are some techniques to separate the training and testing data using K fold [9,10].

 

 

 

 

Additional reference

 

 

  1. Alfares, H.K.; Nazeeruddin, M. Electric load forecasting: Literature survey and classification of methods. Int. J. Syst. Sci. 2002, doi:10.1080/00207720110067421.
  2. Caraka, R.E.; Yasin, H.; Chen, R.C.; Goldameir, N.E.; Supatmanto, B.D.; Toharudin, T.; Basyuni, M.; Gio, P.U. Evolving Hybrid Cascade Neural Network Genetic Algorithm Space-Time Forecasting. Symmetry (Basel). 2021, 13, 1–20.
  3. Santra,a. K.; Christy, C.J. Genetic Algorithm and Confusion Matrix for Document Clustering. Int. J. Comput. Sci. 2012, 9, 322–328.
  4. AgaAzizi, S.; Rasekh, M.; Abbaspour-Gilandeh, Y.; Kianmehr, M.H. Identification of impurity in wheat mass based on video processing using artificial neural network and PSO algorithm. J. Food Process. Preserv. 2021, 45, 1–13, doi:10.1111/jfpp.15067.
  5. Zeinalnezhad, M.; Chofreh, A.G.; Goni, F.A.; Klemeš, J.J. Air pollution prediction using semi-experimental regression model and Adaptive Neuro-Fuzzy Inference System. J. Clean. Prod. 2020, 261, doi:10.1016/j.jclepro.2020.121218.
  6. HHrdle, W.K.; Prastyo, D.D.; Hafner, C.M. Support Vector Machines with Evolutionary Feature Selection for Default Prediction. SSRN Electron. J. 2017, doi:10.2139/ssrn.2894201.
  7. Caraka, R.E.; Hudaefi, F.A.; Ugiana, P.; Toharudin, T.; Tyasti, A.E.; Goldameir, N.E.; Chen, R.C. Indonesian Islamic moral incentives in credit card debt repayment: A feature selection using various data mining. Int. J. Islam. Middle East. Financ. Manag. 2021, Early cite, doi:10.1108/IMEFM-08-2020-0408.
  8. Nayak, J.; Naik, B.; Behera, H.S. A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges. Int. J. Database Theory Appl. 2015, 8, 169–186, doi:10.14257/ijdta.2015.8.1.18.
  9. Sani, N.S.; Rahman, M.A.; Bakar, A.A.; Sahran, S.; Sarim, H.M. Machine learning approach for Bottom 40 Percent Households (B40) poverty classification. Int. J. Adv. Sci. Eng. Inf. Technol. 2018, doi:10.18517/ijaseit.8.4-2.6829.
  10. Zhao, D.; Huang, C.; Wei, Y.; Yu, F.; Wang, M.; Chen, H. An Effective Computational Model for Bankruptcy Prediction Using Kernel Extreme Learning Machine Approach. Comput. Econ. 2017, 49, 325–341, doi:10.1007/s10614-016-9562-7.

 

Author Response

Response to Reviewer 2 Comments

 

Point 1:                The author does not give a specific reason about performing testing and training data 70:30?

Response 1: The reason for performing training and testing data 70:30 has been addressed. (lines 186-188)

 

Point 2:                The parameters used are also not clearly explained. This article does not have enough novelty, and many papers have covered the same topic. The author needs to explain what the highlights of this research are. What is the critical variable of this research?

Response 2: The parameters used has been explained (lines 259-261). The critical variable obtained from this research shows that for local visitors, distance to the nearest city is the most important variable that determine the number of local visitors followed by the size of the park (lines 372-374). While for foreign visitors the age of the park is the most important predictor to determine the number of foreign visitors visiting the park. (lines 413-417)

 

Point 3:                It is recommended to add some accuracy tests such as AUC and ROC.

Response 3: AUC and ROC are frequently used for evaluating the performance of binary classification algorithms. In this study the classification problem involved more than two categories.  Thus, both approaches cannot be carried out and are not suitable to be used to measure accuracy.

 

Point 4:                Update reference line 120.

The critical issue of performing data mining and machine learning is underfitting and overfitting training data. In line with this, we should find how to get the best parameter models. Previously, the researcher could be using kind of heuristics and metaheuristics optimization [1–4]. In addition, the split of training and testing data has also influenced the accuracy of the model, like the percentage 90:10, 80:20, 70:30, 60:40, 50:50, respectively [5–8. On the other hand, there are some techniques to separate the training and testing data using K fold [9,10].

Additional reference

  1. Alfares, H.K.; Nazeeruddin, M. Electric load forecasting: Literature survey and classification of methods. Int. J. Syst. Sci. 2002, doi:10.1080/00207720110067421.
  2. Caraka, R.E.; Yasin, H.; Chen, R.C.; Goldameir, N.E.; Supatmanto, B.D.; Toharudin, T.; Basyuni, M.; Gio, P.U. Evolving Hybrid Cascade Neural Network Genetic Algorithm Space-Time Forecasting. Symmetry (Basel). 2021, 13, 1–20.
  3. Santra,a. K.; Christy, C.J. Genetic Algorithm and Confusion Matrix for Document Clustering. Int. J. Comput. Sci. 2012, 9, 322–328.
  4. AgaAzizi, S.; Rasekh, M.; Abbaspour-Gilandeh, Y.; Kianmehr, M.H. Identification of impurity in wheat mass based on video processing using artificial neural network and PSO algorithm. J. Food Process. Preserv. 2021, 45, 1–13, doi:10.1111/jfpp.15067.
  5. Zeinalnezhad, M.; Chofreh, A.G.; Goni, F.A.; Klemeš, J.J. Air pollution prediction using semi-experimental regression model and Adaptive Neuro-Fuzzy Inference System. J. Clean. Prod. 2020, 261, doi:10.1016/j.jclepro.2020.121218.
  6. HHrdle, W.K.; Prastyo, D.D.; Hafner, C.M. Support Vector Machines with Evolutionary Feature Selection for Default Prediction. SSRN Electron. J. 2017, doi:10.2139/ssrn.2894201.
  7. Caraka, R.E.; Hudaefi, F.A.; Ugiana, P.; Toharudin, T.; Tyasti, A.E.; Goldameir, N.E.; Chen, R.C. Indonesian Islamic moral incentives in credit card debt repayment: A feature selection using various data mining. Int. J. Islam. Middle East. Financ. Manag. 2021, Early cite, doi:10.1108/IMEFM-08-2020-0408.
  8. Nayak, J.; Naik, B.; Behera, H.S. A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges. Int. J. Database Theory Appl. 2015, 8, 169–186, doi:10.14257/ijdta.2015.8.1.18.
  9. Sani, N.S.; Rahman, M.A.; Bakar, A.A.; Sahran, S.; Sarim, H.M. Machine learning approach for Bottom 40 Percent Households (B40) poverty classification. Int. J. Adv. Sci. Eng. Inf. Technol. 2018, doi:10.18517/ijaseit.8.4-2.6829.
  10. Zhao, D.; Huang, C.; Wei, Y.; Yu, F.; Wang, M.; Chen, H. An Effective Computational Model for Bankruptcy Prediction Using Kernel Extreme Learning Machine Approach. Comput. Econ. 2017, 49, 325–341, doi:10.1007/s10614-016-9562-7.

Response 4: References line 120 has been updated. (lines 128-135)

 

Round 2

Reviewer 1 Report

Accept

Reviewer 2 Report

Author has already addressed my question.

Back to TopTop