Next Article in Journal
Core Element Cloning, Cis-Element Mapping and Serum Regulation of the Human EphB4 Promoter: A Novel TATA-Less Inr/MTE/DPE-Like Regulated Gene
Next Article in Special Issue
Computational Strategies for Scalable Genomics Analysis
Previous Article in Journal
Genome-Wide Association Studies for Methane Production in Dairy Cattle
Previous Article in Special Issue
MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction
 
 
Article
Peer-Review Record

A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks

Genes 2019, 10(12), 996; https://doi.org/10.3390/genes10120996
by Ashley Cliff 1,2, Jonathon Romero 1,2, David Kainer 2, Angelica Walker 1,2, Anna Furches 1,2 and Daniel Jacobson 1,2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Genes 2019, 10(12), 996; https://doi.org/10.3390/genes10120996
Submission received: 21 October 2019 / Revised: 23 November 2019 / Accepted: 26 November 2019 / Published: 2 December 2019
(This article belongs to the Special Issue Impact of Parallel and High-Performance Computing in Genomics)

Round 1

Reviewer 1 Report

pdf attached

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors developed a fast implementation of iterative random forest (iRF) as well as its parallel computing version based on an open-source C++ code. I feel this paper is easy to follow and is practically useful, especially for large-scale industrial applications. I only have two concerns:

1. Indeed, the currently only available open-source implementation for iRF is based on R, and it suffers from lots of limitations, e.g., (1) hard to generalize to large-scale data sets; (2) unable to be deployed in distributed systems. The authors address both limitations. However, is it fair to compare a R implementation with a C++ implementation? At least, the speed improvement may be incurred by many factors.

2. The authors mentioned that the iRF can automatically determine the importance of features or even infer their interactions. On the other hand, the authors also mentioned that the Pearson Correlation coefficient, the mutual information, the sequential feature selection, are able to do the same thing. Is it possible for authors to perform a simple comparison with the correlation coefficient or even the mutual information based ones? Such a comparison can gain more insights on the pros and cons of iRF and the new C++ implementation.

Finally, it should be pointed out that most of the mutual information based feature selection methods belong to the sequential feature selection algorithm family. See examples:
[1] Battiti, Roberto. "Using mutual information for selecting features in supervised neural net learning." IEEE Transactions on neural networks 5, no. 4 (1994): 537-550.
[2] Yu, Shujian, Luis Gonzalo Sanchez Giraldo, Robert Jenssen, and Jose C. Principe. "Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional." IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).

Author Response

Please see the attachment. 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors well addressed all my concerns. I recommend acceptance of this manuscript.

Back to TopTop