Next Article in Journal
Mesenchymal Stem Cell-Based Therapy for Rheumatoid Arthritis
Next Article in Special Issue
Fragments of rDNA Genes Scattered over the Human Genome Are Targets of Small RNAs
Previous Article in Journal
Serous Membrane Detachment with Ultrasonic Homogenizer Improves Engraftment of Fetal Liver to Liver Surface in a Rat Model of Cirrhosis
Previous Article in Special Issue
HERON: A Novel Tool Enables Identification of Long, Weakly Enriched Genomic Domains in ChIP-seq Data
 
 
Article
Peer-Review Record

Genomic Marks Associated with Chromatin Compartments in the CTCF, RNAPII Loop and Genomic Windows

Int. J. Mol. Sci. 2021, 22(21), 11591; https://doi.org/10.3390/ijms222111591
by Teresa Szczepińska 1,2, Ayatullah Faruk Mollah 1,3 and Dariusz Plewczynski 1,4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Int. J. Mol. Sci. 2021, 22(21), 11591; https://doi.org/10.3390/ijms222111591
Submission received: 24 September 2021 / Revised: 21 October 2021 / Accepted: 22 October 2021 / Published: 27 October 2021
(This article belongs to the Special Issue Functions of Non-coding DNA Regions)

Round 1

Reviewer 1 Report

The submitted paper describes a feature selection approach based on machine learning to classification the compartment labels provided from Hi-C experiment. 

Feature extraction is a very useful approach to extract biological traces; however, it is necessary to clearly explain the created dataset. 

  • Please explain more clearly the columns in the dataset (supplementary data). Each column should be explained clearly where the value coming and how are they processed. 
  • Please explain more about "Labels" for the classification. Author mentioned that they are compartment information from HiC data; however, I couldn't find in supplementary data (I may miss but it needs to be clearly placed as a text file or as a column in the dataset). 
  • Also, the count plot of label is necessary. How many are class A and B labeled? if the counts are not so even between A and B, then there should be some strategies for that. 
  • Even the purpose is feature selection, the dataset should be split into train and test sets and the evaluation should be happening in the test set. Please clearly mention that how the authors did. 
  • For the feature selection, the random forest that the authors used also provides feature importance. It can be used for the feature ranking.

Author Response

Response to Reviewer 1 Comments

The submitted paper describes a feature selection approach based on machine learning to classify the compartment labels provided from Hi-C experiment. 

Feature extraction is a very useful approach to extract biological traces; however, it is necessary to clearly explain the created dataset. 

Point 1: Please explain more clearly the columns in the dataset (supplementary data). Each column should be explained clearly where the values are coming and how they are processed.  

 

Response 1: The description of the columns in the dataset has been extended in the Supplementary Information document. The meaning of each id extension is described in the “Creation of the features” paragraph. The list of features created from each experiment is now precisely written after description of each dataset and its source.

 

Point 2: Please explain more about "Labels" for the classification. Author mentioned that they are compartment information from HiC data; however, I couldn't find it in supplementary data (I may miss it but it needs to be clearly placed as a text file or as a column in the dataset).  

 

Response 2: The description of the labels is in the Materials and Methods section in the main manuscript in the “Compartment association” paragraph. It has been also added to the Supplementary Information document as suggested.

 

Point 3: Also, the count plot of label is necessary. How many are class A and B labeled? if the counts are not so even between A and B, then there should be some strategies for that. 

 

Response 3: Thanks for your valuable comment. In the revised supplementary, we have included a table showing the number of loops or genomic windows falling in Compartment A or Compartment B. For your quick reference, the table is also given below. It may be noted that the number of samples are reasonably even as there are only mild imbalances. Usually, an imbalance of 10% or less (in the smaller class) is concerning that needs to be taken care of. In the present case, the highest imbalance is only 17.7% (in case of RNA Pol II loops) which is much higher than 10%. Besides that, we have adopted a wide range of performance evaluation metrics that are capable enough to reflect class imbalance induced biased training effect, if any. For instance, unlike accuracy which is susceptible to biased training, metrics like recall, precision, f-scores, etc. are robust enough. From the results presented in the manuscript, it is evident that along with accuracy, recall, precision, f-scores etc. are quite consistent across multiple cross-validation experiments with multiple classification models which demonstrates that the mild class imbalance did not affect.  

Supplementary Table. Distribution of loops of different types and genomic windows into compartment A or compartment B. Unambiguous loops/windows are those that are partially covered by both compartments.

Loop type / window

Total number of loops / windows

Number of unambiguous  loops / windows

Compartment A

Compartment B

CTCF Convergent loops

22,709

15,388

10,099

5,289

CTCF tandem loops

11,674

8,925

6,734

2,191

RNA Pol II loops

71,371

69,681

57,349

12,332

Genomic windows

30,376

26,336

9,820

16,516

 

Point 4: Even though the purpose is feature selection, the dataset should be split into train and test sets and the evaluation should be happening in the test set. Please clearly mention how the authors did. 

Response 4: Kindly note that we applied 10-fold stratified cross-validation in evaluation of feature(s) of all the experiments carried out in this work (please refer to Section 4 - Materials and Methods, last paragraph). In this approach, a dataset is split into 10 folds in a stratified way, wherein 9 folds are used for training and the remaining 1 fold is used for testing, and this is continued 10 times to ensure all samples are used for training as well as testing. Thus, for each fold we obtain values of standard performance metrics. Mean of such values across the folds are considered as the final evaluation measures. It may, therefore, be realized that experiments with such an approach are more robust than single training test partition experiments.

 

Point 5: For the feature selection, the random forest that the authors used also provides feature importance. It can be used for the feature ranking.

Response 5: Thanks for your insightful comment. We agree with your view that classification performance of random forest may signify feature importance as well. In fact, we did consider it in analysing feature importance and consequently identifying important genomic marks associated with compartmentalization. This is indicated in Figure 1 as well as in the last para of Section 4. Please note that the feature ranking is based on consensus of two standard feature selection schema i.e. MCFS RI and ANOVA f-statistic, and classification performance is not mingled into it. However, classification performance measures are no doubt useful, and therefore they are also utilized in analysing and identifying important genomic marks in addition to the obtained feature ranking. 

Reviewer 2 Report

This is a very well designed study which gives important input to the question about chromatin compartments and how to computationally predict those. This gives new insights to the difficulties in predicting with only a few features. 

Desciption of the cells from which the features are taken, ENCODE and GM12878 would give a further dimension and should be added.

The English should be edited such that it sounds less informal.

Author Response

Response to Reviewer 2 Comments

This is a very well designed study which gives important input to the question about chromatin compartments and how to computationally predict those. This gives new insights to the difficulties in predicting with only a few features. 

Point 1: Description of the cells from which the features are taken, ENCODE and GM12878 would give a further dimension and should be added.

Response 1: The following text describing the GM12878 cell line has been added to the manuscript (in Section 4 Materials and Methods).

“GM12878 is a lymphoblastoid cell line produced from the blood of a female donor with northern and western European ancestry by EBV transformation. It is one of the Tier 1 ENCODE cell lines, so the vast amount of sequencing data, including transcriptome, chromatin immunoprecipitation-sequencing for histone marks, and transcription factors are available for this line together with genome regulatory segmentation. This cell line has a relatively normal karyotype and it represents the mesoderm cell lineage.”

 

Point 2: The English should be edited such that it sounds less informal.

Response 2: The manuscript has been proofread by a professional English correction service and all suggested corrections regarding grammar, interpunction, and other language issues have been carefully incorporated. The service included a native speaker check of scientific writing. The text is now more formal and suitable for scientific publication.



Reviewer 3 Report

Authors used a classification approach to rank genetic markers associated with compartmentalization. They quantified a variety of markers in GM12878 cells, including GC content, histone modifications, DNA binding proteins, open chromatin, transcription, and genome regulatory segmentation. Methods for predicting the chromatin state that identify active elements such as promoters, enhancers, or heterochromatin improve the prediction of loop segregation into compartments. In comparison, histone changes H3K9me3, H4K20me1, and H3K27me3 and GC levels do not reliably indicate compartments. Therefore, I recommended the publications of the paper after major revision according to given my comments.

  • The abstract is not clear. Please add the aim and objective of the review.
  • The study's background should be clearly stated. Describe the introduction and review of the work.
  • Please speculate on the results. The discussion must improve.
  • In Conclusion, the authors should add the significance of this research, and its potential practical application.
  • The MS English needs to be improved. The article's English must be carefully checked for grammatical errors.

Author Response

Response to Reviewer 3 Comments

Authors used a classification approach to rank genetic markers associated with compartmentalization. They quantified a variety of markers in GM12878 cells, including GC content, histone modifications, DNA binding proteins, open chromatin, transcription, and genome regulatory segmentation. Methods for predicting the chromatin state that identify active elements such as promoters, enhancers, or heterochromatin improve the prediction of loop segregation into compartments. In comparison, histone changes H3K9me3, H4K20me1, and H3K27me3 and GC levels do not reliably indicate compartments. Therefore, I recommended the publications of the paper after major revision according to given my comments.

Point 1: The abstract is not clear. Please add the aim and objective of the review.

 

Response 1: As per your suggestion, we have carefully checked and revised the abstract. The aim and objective of the research carried out is now clearly indicated. The abstract section is also structured into four parts, (1) Background, (2) Methods, (3) Results, and (4) Conclusions, as recommended for this journal.

 

Point 2: The study's background should be clearly stated. Describe the introduction and review of the work.

 

Response 2: Please note that the background of the study was stated in the manuscript (Paragraph 1-2 of the Introduction). However, as per your comment, we have thoroughly checked and revised to better reflect the review of related works, the background and the motivation of the current study. For your quick reference, we would like to state that we have extended the background description with the information about mechanisms for the formation of compartments that have been proposed so far (Paragraph 2 of the Introduction).



Point 3: Please speculate on the results. The discussion must improve. 

 

Response 3: The discussion section has been extended by the reference to the previous study by Falk et al, Nature 2019). In this study, authors indicate the importance of heterochromatin-heterochromatin interactions for compartmentalization based on modeling the H-C data and microscopic data from conventional nuclei and inverted rod nuclei. Our study is in agreement with this finding. However, among candidates for mediators of heterochromatin–heterochromatin interactions, authors suggested modified histones, which is not supported by our analysis.

 

Point 4: In Conclusion, the authors should add the significance of this research, and its potential practical application.

 

Response 4: We have extended the Conclusion paragraph by including the importance of this research for further studies on compartmentalization and its potential practical applications. In particular:

  1. The ranking of the genomic marks presented here is a useful guide for further studies on the factors that are responsible for compartmentalization. 
  2. It is also a helpful hint on chromatin modification that can be used as compartment indicator in microscopic studies.

 

Point 5: The MS English needs to be improved. The article's English must be carefully checked for grammatical errors.


Response 5: The manuscript has been proofread by a professional English correction service and all suggested corrections regarding grammar, interpunction, and other language issues have been carefully incorporated. The service included a native speaker check of scientific writing. The text is now more formal and suitable for scientific publication.

Round 2

Reviewer 3 Report

Requested corrections were completed.

Back to TopTop