Next Article in Journal
A Blockchain-Based Decentralized Public Key Infrastructure for Information-Centric Networks
Next Article in Special Issue
Transferring CNN Features Maps to Ensembles of Explainable Neural Networks
Previous Article in Journal
Transducer Cascades for Biological Literature-Based Discovery
Previous Article in Special Issue
Exploiting Distance-Based Structures in Data Using an Explainable AI for Stock Picking
 
 
Review
Peer-Review Record

A Review on Federated Learning and Machine Learning Approaches: Categorization, Application Areas, and Blockchain Technology

Information 2022, 13(5), 263; https://doi.org/10.3390/info13050263
by Roseline Oluwaseun Ogundokun 1, Sanjay Misra 2,*, Rytis Maskeliunas 1 and Robertas Damasevicius 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Information 2022, 13(5), 263; https://doi.org/10.3390/info13050263
Submission received: 26 March 2022 / Revised: 30 April 2022 / Accepted: 18 May 2022 / Published: 23 May 2022
(This article belongs to the Special Issue Foundations and Challenges of Interpretable ML)

Round 1

Reviewer 1 Report

I am fine with the revisions. It would be better if the author proofreads the whole review.

Author Response

As it is a review the section Related works have to provide information on the related reviews highlighting the difference from the existing ones. The current version of the Related Works section contains a brief review of some papers. IT is not clear how they were selected. However, the authors added a comparative review of the papers, but they mostly relate to blockchain and federated learning. There are other interesting reviews in the field of federated learning

The related works have now been revised by presenting reviews on federated learning and how those reviews are different from our review (see pages 3 to 5).

If the authors want to focus only on blockchain and federated learning than they need to specify their research questions more precisely.

The focus of the review is actually on federated learning and Machine learning which is why the RQ is mostly focused on machine learning and federated learning. The research questions are formed from the objective of the review.

The second issue relates to the description of the analysis workflow. It is necessary to specify the research questions at the beginning, not in the middle of the paper as it is done now.

The research questions have been moved to the section one of the manuscripts which is the introduction section (see page 1)

The authors refer to some research questions in section 3.5, and give them only in Section 4.1.

It was only stated in that section 3.5 that the paper selection was made based on the research question formulated which was later discussed in section 4 which is the results and discussion section.

Section 3 points to section 4 and it was only stated that tables 4-10 summarize some of the studies chosen based on the formulated research questions.

Please also check, that the research questions do not match the “application area” in Table 4.

This has been looked into and corrected (page 10)

Another critical issue that needed to be explained: is the search criteria. The authors analyze only articles and preprints. Usually, preprints are not peer-reviewed. So what is the reason for including preprints and excluding peer-reviewed conference papers? By doing so, the authors excluded also papers presented at highly ranked conferences such as Advances in Neural Information Processing Systems (former NIPS)

Yes, papers in preprint are not peer-reviewed but the papers already peer-reviewed in journals or the review process is ongoing. The preprint paper was included because it was noticed that most journals upload or tell authors to upload their works in arXiv databases that are most of those papers that are preprints are already reviewed and accepted for publication but they were asked to be in preprint so they won’t be sent to another journal for publication or plagiarized.

 

The conference papers were excluded during the screening state of the SLR because their scopes were not relevant to this study. When the search for the papers was conducted on the databases, we found few conference papers but while screening using the title, keyword, and full-text of the papers, it was discovered they were screened out based on one reason or the other.

The definitions of the horizontally partitioned and vertically partitioned data should be checked. How parties in vertically partitioned data case could have homogeneous data. I strongly recommend referring to Kairouz, et al, 2021 in order to check all terms and definitions.

This observation has been noted and rechecked but there is nowhere in the study where horizontally partitioned data was mentioned or discussed. The only place where vertically partitioned data was mentioned was in line 376 and it was stated in a study by Cheng et al., 2021 that they developed SecureBoost, a safe context for vertically partitioned data sets.

What is the difference between Tables 2 and 3.

The difference between tables 2 and 3 is that:

the keywords search string for each of the databases was presented in table 2 while the general keyword search string was presented in table 3. The search string in table 3 was generalized.

The description of the whole analysis workflow should be given before presenting results of search, because currently Figures 1and 2 are confusing.

Section 3 discussed the whole methods used for the SLR with some figures to explain the process well while section 4 presented the results gotten from the methods and process carried out in section 3.

 

From page 5, the flow of the paper has been rearranged by starting with the results of the search using PRISMA

 

When categorizing approaches presented in articles (Fig 5) what is the difference between “ML approach applicable for FL” and “Implementation of ML approaches for FL”

ML approach (now changed to methods) applicable for FL is the classes of methods that FL can support while the implementation of ML approach (changed to algorithms) for FL is all about the ML algorithms that have been utilized in FL.

The analysis of the results for the research questions RQ1, RQ2 is quite strange. The RQ1 section is not about what ML models could be implemented in FL. The resume of this section is table with properties of some ML models. How does this table correlate with FL

RQ1 is just about the classes of ML methods that FL can support.

The RQ tends to present FL shifting to the application of ML methods to ensure privacy and efficiency in FL.

The RQ2 section deals with types of FL depending on data distribution. In this sense transfer learning using Federated learning looks strange, so this decision needs to be clarified. More interesting to the reader it would be to know what FL solutions exist for different data partition cases and for what application domains.

The RQ2 just described how to classify FL based on the distribution characteristics of the data.

The suggested FL solutions that exist for different data partition cases and for what application domains can be suggested for future work as trying to include that now will mean starting the SLR processing all over again. Thank you for your understanding (page 29).

In fact Section RQ3 is the most interesting and relevant to the purpose of the article. RQ4 and RQ5 sections seem very closely related, and I suggest uniting them, however, paying more attention on existing schemes and approaches that use both Blockchain and FL.

RQ4 and 5 have been united together as suggested.

In general, the paper lacks the summary on the analyzed papers, it looks more like a brief description of papers without any attempt to analyze and systemize them.

 

The papers gathered were already analyzed using the RQs formulated. It is only that we didn’t include papers with empirical studies or case study papers.

Author Response File: Author Response.pdf

Reviewer 2 Report

The federated learning has gained recently a lot of research interest, and, therefore, a systematic  review of the existing solutions is really valuable. However, this paper does not provide such systematic analysis and review of the challenges facing researchers in federated learning.

First of all, the paper is badly structured, and perhaps this complicates its comprehension.

  • As it is a review the section Related works has to provide information on the related reviews highlighting the difference from the existing ones. The current version of the Related Works section contains some brief review of some papers. IT is not clear how they were selected. However, the authors added comparative review of the papers, but they mostly relate to blockchain and federated learning. There are other interesting reviews in the field of federated learning:
  • Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection. IEEE Trans. Knowl. Data Eng.2021, 1.
  • Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol.201910, 1–19.
  • Kholod, I.; Yanaki, E.; Fomichev, D.; Shalugin, E.; Novikova, E.; Filippov, E.; Nordlund, M. Open-Source Federated Learning Frameworks for IoT: A Comparative Review and Analysis. Sensors202121, 167.

And perhaps one of the most comprehensive research in the federated learning - Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawit, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. In Foundations and Trends® in Machine Learning; Now Publishers Inc.: Boston, MA, USA, 2021.

 

If the authors want to focus only on blockchain and federated learning than they need to specify their research questions more precisely.

 

  • The second issue relates to the description of the analysis work flow. It is necessary to specify the research questions at the beginning, not in the middle of the paper as it is done now. The authors refer to some research questions in section 3.5, and give them only in Section 4.1. Please also check, the research questions do not match the “application area” in Table 4.
  • Another critical issue that needed to be explain: the search criteria. The authors analyze only articles and preprints. Usually preprints are not peer-reviewed. So what is the reason of including preprints and excluding peer-reviewed conference papers? By doing so, the authors excluded also papers presented on highly ranked conferences such as Advances in Neural Information Processing Systems (former NIPS)
  • The definitions of the horizontally partitioned and vertically partitioned data should be checked. How parties in vertically partitioned data case could have homogeneous data. I strongly recommend referring to Kairouz, et al, 2021 in order to check all terms and definitions.
  • What is the difference between Tables 2 and 3.
  • The description of the whole analysis work flow should be given before presenting results of search, because currently Figures 1 and 2 are confusing.
  • When categorizing approaches presented in articles (Fig 5) what is the difference between “ML approach applicable for FL” and “Implementation of ML approaches for FL”
  • The analysis of the results for the research questions RQ1,RQ2 is quite strange. The RQ1 section is not about what ML models could be implemented in FL. The resume of this section is table with properties of some ML models. How does this table correlate with FL? The RQ2 section deals with types of FL depending on data distribution. In this sense transfer learning using Federated learning looks strange, so this decision needs to be clarified. More interesting to the reader it would be to know what FL solutions exist for different data partition cases and for what application domains. In fact Section RQ3 is the most interesting and relevant to the purpose of the article. RQ4 and RQ5 sections seem very closely related, and I suggest uniting them, however, paying more attention on existing schemes and approaches that use both Block chain and FL.

 

In general the paper lacks the summary on the analyzed papers, it looks more like a brief description of papers without any attempt to analyze and systemize them.

 

Author Response

As it is a review the section Related works have to provide information on the related reviews highlighting the difference from the existing ones. The current version of the Related Works section contains a brief review of some papers. IT is not clear how they were selected. However, the authors added a comparative review of the papers, but they mostly relate to blockchain and federated learning. There are other interesting reviews in the field of federated learning

The related works have now been revised by presenting reviews on federated learning and how those reviews are different from our review (see pages 3 to 5).

If the authors want to focus only on blockchain and federated learning than they need to specify their research questions more precisely.

The focus of the review is actually on federated learning and Machine learning which is why the RQ is mostly focused on machine learning and federated learning. The research questions are formed from the objective of the review.

The second issue relates to the description of the analysis workflow. It is necessary to specify the research questions at the beginning, not in the middle of the paper as it is done now.

The research questions have been moved to the section one of the manuscripts which is the introduction section (see page 1)

The authors refer to some research questions in section 3.5, and give them only in Section 4.1.

It was only stated in that section 3.5 that the paper selection was made based on the research question formulated which was later discussed in section 4 which is the results and discussion section.

Section 3 points to section 4 and it was only stated that tables 4-10 summarize some of the studies chosen based on the formulated research questions.

Please also check, that the research questions do not match the “application area” in Table 4.

This has been looked into and corrected (page 10)

Another critical issue that needed to be explained: is the search criteria. The authors analyze only articles and preprints. Usually, preprints are not peer-reviewed. So what is the reason for including preprints and excluding peer-reviewed conference papers? By doing so, the authors excluded also papers presented at highly ranked conferences such as Advances in Neural Information Processing Systems (former NIPS)

Yes, papers in preprint are not peer-reviewed but the papers already peer-reviewed in journals or the review process is ongoing. The preprint paper was included because it was noticed that most journals upload or tell authors to upload their works in arXiv databases that are most of those papers that are preprints are already reviewed and accepted for publication but they were asked to be in preprint so they won’t be sent to another journal for publication or plagiarized.

 

The conference papers were excluded during the screening state of the SLR because their scopes were not relevant to this study. When the search for the papers was conducted on the databases, we found few conference papers but while screening using the title, keyword, and full-text of the papers, it was discovered they were screened out based on one reason or the other.

The definitions of the horizontally partitioned and vertically partitioned data should be checked. How parties in vertically partitioned data case could have homogeneous data. I strongly recommend referring to Kairouz, et al, 2021 in order to check all terms and definitions.

This observation has been noted and rechecked but there is nowhere in the study where horizontally partitioned data was mentioned or discussed. The only place where vertically partitioned data was mentioned was in line 376 and it was stated in a study by Cheng et al., 2021 that they developed SecureBoost, a safe context for vertically partitioned data sets.

What is the difference between Tables 2 and 3.

The difference between tables 2 and 3 is that:

the keywords search string for each of the databases was presented in table 2 while the general keyword search string was presented in table 3. The search string in table 3 was generalized.

The description of the whole analysis workflow should be given before presenting results of search, because currently Figures 1and 2 are confusing.

Section 3 discussed the whole methods used for the SLR with some figures to explain the process well while section 4 presented the results gotten from the methods and process carried out in section 3.

 

From page 5, the flow of the paper has been rearranged by starting with the results of the search using PRISMA

 

When categorizing approaches presented in articles (Fig 5) what is the difference between “ML approach applicable for FL” and “Implementation of ML approaches for FL”

ML approach (now changed to methods) applicable for FL is the classes of methods that FL can support while the implementation of ML approach (changed to algorithms) for FL is all about the ML algorithms that have been utilized in FL.

The analysis of the results for the research questions RQ1, RQ2 is quite strange. The RQ1 section is not about what ML models could be implemented in FL. The resume of this section is table with properties of some ML models. How does this table correlate with FL

RQ1 is just about the classes of ML methods that FL can support.

The RQ tends to present FL shifting to the application of ML methods to ensure privacy and efficiency in FL.

The RQ2 section deals with types of FL depending on data distribution. In this sense transfer learning using Federated learning looks strange, so this decision needs to be clarified. More interesting to the reader it would be to know what FL solutions exist for different data partition cases and for what application domains.

The RQ2 just described how to classify FL based on the distribution characteristics of the data.

The suggested FL solutions that exist for different data partition cases and for what application domains can be suggested for future work as trying to include that now will mean starting the SLR processing all over again. Thank you for your understanding (page 29).

In fact Section RQ3 is the most interesting and relevant to the purpose of the article. RQ4 and RQ5 sections seem very closely related, and I suggest uniting them, however, paying more attention on existing schemes and approaches that use both Blockchain and FL.

RQ4 and 5 have been united together as suggested.

In general, the paper lacks the summary on the analyzed papers, it looks more like a brief description of papers without any attempt to analyze and systemize them.

 

The papers gathered were already analyzed using the RQs formulated. It is only that we didn’t include papers with empirical studies or case study papers.

Reviewer 2

Comments

Response

I am fine with the revisions. It would be better if the author proofreads the whole review.

We have used Grammarly and Pubsure (researcher. life) software to check our grammar, punctuations error, and other errors associated with the paper. We have also employed the use of an English expert to assist in proofreading.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors addressed all the issues and recommendations. I think that the paper could be accepted in the present form.

Back to TopTop