Next Article in Journal
Integrated Transcriptome and Metabolome Analysis Reveal That Exogenous Gibberellin Application Regulates Lignin Synthesis in Ramie
Previous Article in Journal
Induction of Polyploidy in Citrus Rootstocks through In Vitro Colchicine Treatment of Seed-Derived Explants
Previous Article in Special Issue
Genome-Wide Association Study for Agronomic Traits in Wild Soybean (Glycine soja)
 
 
Article
Peer-Review Record

Genome-Wide Association Study (GWAS) of the Agronomic Traits and Phenolic Content in Sorghum (Sorghum bicolor L.) Genotypes

Agronomy 2023, 13(6), 1449; https://doi.org/10.3390/agronomy13061449
by Ye-Jin Lee 1,2, Baul Yang 1,2, Woon Ji Kim 1, Juyoung Kim 1, Soon-Jae Kwon 1, Jae Hoon Kim 1, Joon-Woo Ahn 1, Sang Hoon Kim 1, Eui-Shik Rha 2, Bo-Keun Ha 3, Chang-Hyu Bae 2,* and Jaihyunk Ryu 1,*
Reviewer 1:
Reviewer 2: Anonymous
Agronomy 2023, 13(6), 1449; https://doi.org/10.3390/agronomy13061449
Submission received: 26 April 2023 / Revised: 19 May 2023 / Accepted: 22 May 2023 / Published: 24 May 2023
(This article belongs to the Special Issue Advances in Plant Genetic Breeding and Molecular Biology)

Round 1

Reviewer 1 Report

This article describes a genome-wide association study (GWAS) of agronomic traits and phenolic content in 96 sorghum varieties to identify significant SNPs associated with the phenotypic traits of this crop that can increase its value in the bioindustry for the production of biofuels and bioplastics.

The article is framed in compliance with the requirements and includes a qualitatively conducted and designed experiment. In my opinion, the article is well organized and the proposed methods are of value for this line of research.

After reading the article, there were some comments and suggestions to the authors.

1. The "keywords" section should be given more broadly, adding such terms as "agronomic characteristics of sorghum", "phenolic compounds", for example.

2. In table 1, the column "Origin" contains data that are not correlated with each other: for some samples, the country of origin is indicated, for others it is not indicated. Authors need to bring the data into a uniform form, or mark «*» and make a note to the table that the marked data originate from the country of Korea, if the samples analyzed by them have the same country of origin as the authors.

3. The authors indicate that they used data from the NCBI database. These accessions should be given an Accession number in GeneBank, in case readers wish to use the same genetic data as the authors of this study.

4. In Table 1, you also need to indicate what the given figures are for the parameters "heading date", "plant height", "soluble solids content" "dry yield", "total phenolic content". The authors need to explain what these numbers are: average values, minimum values? It may be worthwhile for the authors to provide a range of values of their choice.

5. Also in Table 1, the column titled "Treatment (part)" can be deleted, and for the corresponding samples, a note and a footnote can be made in the note.

6. In section 2.3. "Ultra-High-Performance Liquid Chromatography (UPLC) Analysis" it is necessary to give an explanation on the basis of which database the peaks of the components were determined.

7. In section 2.5. “Genome-Wide Association Study (GWAS) with Agronomic Traits and Phenolic Compounds”, the authors should explain why they chose such a high level of significance “p-value was set to 0.0001”, in the literature on GWAS, significance levels are most often set (P = 5 × 10-8 threshold). The authors need to make it clear that with their chosen parameters, "they do not exclude the presence of false positives" - this is very important, since the levels chosen for this study are "not rigorous enough". Although the authors refer to another study (A genome-wide association study of seed protein and oil content in soybean), it does not provide any statistical justification for this level of significance.

8. In tables 2,3, etc., it is desirable to make all the explanations related to the use of abbreviated terms: skew, kurt, CV, etc., despite the fact that they may occur in the text.

9. In figure 1, the authors need to indicate the units of measurement on the axes of the graphs, either in a note to the figure or in the figure itself.

10. Figure 2 is very small and unreadable. Authors should present it in a different format (for example, not for individual genetic lines, but for the predominant phenolic component - in the form of a bar chart or other type of graphs), or submit it as an appendix with good resolution so that any reader can familiarize themselves with it in more detail.

11. Lines 230-231 and 453-454:

The authors write "The strong positive correlations were shown between PH and 230 DY (r = 0.60, P < 0.001), and HD and PH (r = 0.40, P < 0.001)", but this is not entirely true. If one can still agree with the first statement, then the values of r < 0.5 cannot be recognized as a strong correlation, the authors should redo the wording. The fact is that different authors consider different values of the correlation as strong, however, with any of the formulations, a strong correlation is r> 0.6-0.8 according to different sources.

12. In conclusion, the authors should add a few sentences about which candidate genes (names of enzymes, without specialized abbreviations like SbRio.07G123800) in their opinion represent the greatest prospects for research. Similar abbreviations should also be changed in the abstract at the beginning of the article.

Author Response

Author's Responses to Reviewer's Comments
Ms Number: Agronomy-2394025
Dear editor and reviewers
On behalf of my co-authors, we thank you for giving us an opportunity to revise our manuscript, 
and we appreciate the editor and reviewers for your positive and constructive comments and 
suggestions on our manuscript entitled; Genome-Wide Association Study (GWAS) of the 
Agronomic Traits and Phenolic Content in Sorghum (Sorghum bicolor. L) Genotypes.
We have reviewed our manuscript carefully, and tried our best to improve our manuscript and 
made some changes in the manuscript based on the editor and reviewers’ comments. We marked the 
changes in red in our revised paper. We appreciate for editor and reviewers’ warm work, and hope 
that the correction will meet with approval. The main corrections in our manuscript and the 
responds to the reviewer’s comments are as follows:
REVIEWER COMMENTS: 
Reviewer 1
1. The "keywords" section should be given more broadly, adding such terms as "agronomic characteristics of 
sorghum", "phenolic compounds", for example.
[Response] As suggested by the reviewer, we've added terms to the "Keywords" section.
- Keywords: Sorghum; Biomass yield; Phenolic compounds; Mutation breeding; Genome-wide 
as-sociation studies (GWAS); Single-nucleotide polymorphisms (SNP) (Lines 36 to 37)
2. In table 1, the column "Origin" contains data that are not correlated with each other: for some samples, the 
country of origin is indicated, for others it is not indicated. Authors need to bring the data into a uniform 
form, or mark «*» and make a note to the table that the marked data originate from the country of Korea, if 
the samples analyzed by them have the same country of origin as the authors.
3. The authors indicate that they used data from the NCBI database. These accessions should be given an 
Accession number in GeneBank, in case readers wish to use the same genetic data as the authors of this 
study.
- [Response] Unfortunately, some resources did not origin information in Korea GeneBank. As 
suggestion, we have removed uncorrelated data from Table 1 and restated the origin of the samples.
- We added GeneBank accession numbers in Table 1.
4. In Table 1, you also need to indicate what the given figures are for the parameters "heading date", "plant 
height", "soluble solids content" "dry yield", "total phenolic content". The authors need to explain what these 
numbers are: average values, minimum values? It may be worthwhile for the authors to provide a range of 
values of their choice.
- [Response] The results in Table 1 (section 3.1) show the minimum, maximum, and average values 
for the 96 individuals. We modified the sentence to include information about the values to improve 
readers' comprehension.
- HD ranged from a minimum of 58.0 days (Banwoldang-7) to a maximum of 115.0 days (Pioneer-
931), with an average of 87.8 days. PH varied from a minimum of 89.0 cm (Dansusu 2-8) to a 
maximum of 465.0 cm (DINE-A-MITE-1), with an average of 282.0 cm. SC ranged from a minimum 
of 5.0 brix° (IS5718 and Dansusu 2-8) to a maximum of 18.8 brix° (Dansusu1 and Dansusu4), with 
an average of 13.4 brix°. DY ranged from a minimum of 2.4 tons/ha (Dansusu 2-8) to a maximum of 
26.1 tons/ha (IS645-3 and Moktak-2), with an average of 13.0 tons/ha. (Lines 187 to 193)
5. Also in Table 1, the column titled "Treatment (part)" can be deleted, and for the corresponding samples, a 
note and a footnote can be made in the note.
- [Response] As suggestion, we have modified Table 1 to remove treatment and added a footnote to 
provide information about treatment.
6. In section 2.3. "Ultra-High-Performance Liquid Chromatography (UPLC) Analysis" it is necessary to give 
an explanation on the basis of which database the peaks of the components were determined. we
- We added reforence in section 2.3; “Luteolinidin diglucoside, luteolin glucoside, apigeninidin 
glucoside, luteolinidin, apigeninidin, and 5-O-Me-luteolinidin were identified using the method described in 
a previous study [31].
7. In section 2.5. “Genome-Wide Association Study (GWAS) with Agronomic Traits and Phenolic 
Compounds”, the authors should explain why they chose such a high level of significance “p-value was set 
to 0.0001”, in the literature on GWAS, significance levels are most often set (P = 5 × 10-8 threshold). The 
authors need to make it clear that with their chosen parameters, "they do not exclude the presence of false 
positives" - this is very important, since the levels chosen for this study are "not rigorous enough". Although 
the authors refer to another study (A genome-wide association study of seed protein and oil content in 
soybean), it does not provide any statistical justification for this level of significance.
- [Response] We revised the reference to provide statistical justification. (Lines 733 to 735; Reference 39) 
The authors tried to set the p-value using the Bonferroni method, but this method was too conservative to 
detect SNPs of related traits. Therefore, we arbitrarily set the p-value (0.0001) based on a type 1 error to select 
SNPs associated with the traits.
8. In tables 2,3, etc., it is desirable to make all the explanations related to the use of abbreviated terms: skew, 
kurt, CV, etc., despite the fact that they may occur in the text.
- [Response] As suggested, we've added abbreviated terms to all tables in the footnotes.
9. In figure 1, the authors need to indicate the units of measurement on the axes of the graphs, either in a 
note to the figure or in the figure itself.
- [Response] We changed Figure 1 for to indicate the units of measurement and Chemical structure of 
sorghum phenolics.
10. Figure 2 is very small and unreadable. Authors should present it in a different format (for example, not 
for individual genetic lines, but for the predominant phenolic component - in the form of a bar chart or other 
type of graphs), or submit it as an appendix with good resolution so that any reader can familiarize 
themselves with it in more detail.
- [Response] We changed Figure 2 for selected two genotypes *maximum and minimum) and added 
figure to total phenolic compound.
11. Lines 230-231 and 453-454:
The authors write "The strong positive correlations were shown between PH and 230 DY (r = 0.60, P < 0.001), 
and HD and PH (r = 0.40, P < 0.001)", but this is not entirely true. If one can still agree with the first 
statement, then the values of r < 0.5 cannot be recognized as a strong correlation, the authors should redo the 
wording. The fact is that different authors consider different values of the correlation as strong, however, 
with any of the formulations, a strong correlation is r > 0.6-0.8 according to different sources.
[Response] We have removed data with low correlation coefficient from the sentence.
12. In conclusion, the authors should add a few sentences about which candidate genes (names of enzymes, 
without specialized abbreviations like SbRio.07G123800) in their opinion represent the greatest prospects for 
research. Similar abbreviations should also be changed in the abstract at the beginning of the article.
[Response] As suggested, we added the annotated gene name.
- GWAS analysis showed that SbRio.10G099600 (FUT1) was associated with heading date, SbRi
o.09G149200 with plant height, SbRio.06G211400 (MAFB) with dry yield, SbRio.04G259800 (P
DHA1) with total phenolic content and luteolinidin diglucoside, and SbRio.02G343600 (LeET
R4) with total phenolic content and luteolinidin suggesting that these genes could play a ke
y role in sorghum. (Lines 29 to 33)
- A total of 40 SNPs were identified as highly associated with the investigated traits. We sele
cted five strong candidate genes from the 40 significant SNPs, of which five genes [HD; Sb
Rio.10G099600 (FUT1); PH; SbRio.09G149200; DY; SbRio.06G211400 (MAFB); TPC and luteolin
idin; SbRio.02G343600 (LeETR4); TPC and luteolinidin diglucoside; SbRio.04G259800 (PDHA1)]
are thought to be closely related to each trait. (Lines 622 to 627)

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript “Genome-Wide Association Study (GWAS) of the Agronomic Traits and Phenolic Content in Sorghum (Sorghum bicolor. L) Genotypes” examines the association of mutations (specifically SNPs) that were induced by gamma rays, as well as variation observed among existent cultivars, with several important commercial traits in sorghum.

In general, the relevance of the study is clear and well-justified. The methods and experiments were done in a logical order, but the manuscript needs work, many details regarding the methods and QC protocols need to be included before considering the manuscript acceptable for publication. Results (like QQ plots) should be included, and supplementary tables should be provided for complete review. The manuscript's grammar needs to be improved as well (minor improvements).

Major and minor problems found in the manuscript are described below.

Abstract:

Line 21: The expression “Our detected six phenolic compounds (luteolinidin diglucoside, luteolin glucoside, apigeninidin glucoside, luteolinidin, apigeninidin, and 5-O-Me luteolinidin), with luteolinidin being the major phenolic in all genotypes.” Needs clarification and grammar correction.

Line 27 to 30: The authors indicate that “SbRio.10G099600 was associated with heading date, SbRio.09G149200 with plant height, SbRio.06G211400 with dry yield, SbRio.04G259800 with total phenolic content and luteolinidin diglucoside, and SbRio.02G343600 with total phenolic content and luteolinidin”. This is very interesting. To the knowledge of the authors, are any of those genes annotated? If so, what are their predicted functions based on their annotation?  I think that detail for the genes found could be included briefly in the abstract too.

Keywords: Maybe adding biomass to the keywords since the traits that were studied are associated with biomass. That will give the study more visibility when published.

Introduction:

Line 51 to 53. The expression “While natural mutations have a very low mutation rate of 10−5 to 10−8, ionizing radiation can increase the mutation rate by about 1,000 to 1 million-fold compared to natural mutations [10-12].” needs clarification and grammar correction.

Lines 49 to 57: I will recommend moving the aim of the study to the end of the introduction and explaining whether the authors used ionizing radiation to generate the genotypes analyzed in this study. Otherwise, it seems odd to mention facts about ionizing radiation and leave it to the reader to wonder why they are mentioning this until reading it at the end. It also seems odd also to have the aim of the study mentioned twice (lines 82 to 85).

Methods:

Line 119 and 120: Table 1 appears before the authors mentioned: “heading date (HD), plant height (PH), fresh yield (FY), dry 119 yield (DY), and soluble solids content (SC)” in the methods. I will recommend the table to appear after the authors have mentioned the abbreviations of each trait so the reader can look at the table and understand what the abbreviations are without having to refer to the end of the table “back and forth” every time the reader wants to look at a specific trait. This will make it easier for the reader. Also, authors should make sure that the abbreviations used throughout the manuscript are consistent with previous standard abbreviations used for the same traits in previous literature. For example, in rice literature, SES (Standard Evaluation System for Rice) established that for plant height, the abbreviation is “Ht”, not “PH”, given the fact that “PH” could be confused with “pH”. Using a standard abbreviation system helps readers understand the abbreviations without even having to look for what the author’s made-up abbreviation stands for. So, I will recommend the authors make sure they are using and standard abbreviation system for their traits.

Lines 120 to 129: The description of each trait (when and how each trait was measured) needs clarification. If the heading date is at 50% heading, then indicate that it corresponds to days between sowing and 50% heading. The way traits are described is very confusing. Is the fresh and dry yield just the fresh weight and dry weight (tons) divided by value? What is the value? Not clear from this description what that value was. For example, what do the authors mean by dry matter ratio? Authors need to improve this, otherwise, the reader has no idea of what each trait is.

Lines 135 to 136: The expression “The filtered extracts were transferred to 2 ml vials and analyzed using the UPLC system (CBM-20A, 135 Shimadzu Co., Kyoto, Japan). Each sample extract was analyzed using a UPLC system coupled with a photodiode array detector (DAD; Agilent 1260 series; Agilent Technolo- 137 gies, Santa Clara, CA, USA)…” is redundant. Please improve the paragraph to remove redundancy.

Lines 141 to 142: Authors indicate that “The mobile phase consisted of 0.05% formic acid in water (A) and acetonitrile (B) with 0.05% formic acid.” Is the last just a mixture of formic acid in acetonitrile? So, it would be 0.05% formic acid in acetonitrile. Or is this a ratio of acetonitrile mixed with diluted 0.05% formic acid in water? Needs clarification.

Line 161: Which version of the sorghum genome was used? In which version of Phytozome was found? There are plenty of different versions of Sorghum bicolor genomes in Phythozome and there are plenty of Phytozome databases. This is an important detail to specify. I see that Sorghum bicolor Rio_v2 genotype is mentioned, but normally versions will have a second number like: Sorghum bicolor Rio_v2.1. Please fix this every time Sorghum bicolor Rio_v2 genotype is mentioned.

Line 164 to 167: What does the authors mean by “Common SNPs from the reference sequence were selected to separate genotypes in the SNP matrix, and polymorphic SNPs were selected by comparing the common SNPs with the base sequence of each Sorghum bicolor Rio_v2 genotype.” This idea needs clarification. I would think that the SNPs that needed to be used for the analysis are all SNPs found. Some of them would be specific to the cultivar (because it is different from the reference used for aligning reads), but others would be induced by the ionizing radiation treatment applied to the cultivars. If that is the case, the SNPs induced by the treatments on each cultivar should be identified by comparing the SNPs found for the same cultivar (that was not treated) with the treated one. That way authors would know whether the SNP associated with the trait was obtained with the ionizing treatment  (an induced mutation) or it was just a SNP present in the cultivar genotype already.

Line 169 to 178: How did the authors guarantee to the readers that the SNPs found are not given to sequencing errors or contamination? Did replicates of each DNA sample were sent for GBS? What was the sequencing depth originally, and was any depth threshold set during the analysis to be able to call a genotype (for example, genotypes with a depth lower than 5 would be considered missing)? Those are important details that are not mentioned in the methods at all. Illumina sequencing has a low error rate but it still exists. How do authors make sure the SNP was not a sequencing error, besides filtering the vcf file? Please indicate in the method what where the Quality Control parameters were used during analysis to make sure the SNPs obtained were not due to error. For example, what was defined as low quality read? what were the thresholds used for trimming adapters from the reads, etc. Or was this left for the sequencing company to decide?

Results:

Figure 1. Y-axis should be made the same height in this figure, so the reader can see clearly see that the phenolic content is very different from the highest TPC line (B) to the original sorghum cultivar (A), given the different heights of the peaks. Each figure in a manuscript should speak for itself when first looking at it, without having to read any description. So, I will recommend making the Y-axis the same height from 0 to 400 mAU and naming the compounds in the figure instead of adding numbers. By the way there are two numbers “1” of different sizes in panel A. That would be avoided if names are used.

Figure 2. This figure is impossible to read. I would not use it unless the reader can clearly read the text in the figure. Make the text bigger or lose the figure. It does not tell me anything about your results the way it is now because it is not readable at all. Look for a better way to represent what you want to represent here, but do not use this figure the way it is now, it is not informative. A better way to represent the author’s idea is to use 2D graphs with the same x-axis and y-axis coordinates for each of the varieties, like in Figure 1, or make box plots with the concentration of each of the six compounds found in each of the varieties analyzed and make a composite figure.

Figure 3. Would recommend making the font size bigger for the legend and the foot note.

Lines 253 to 254. The authors indicate that “The total length of the mapped region was 4,968.2 Mb, with an average of 51.7 Mb per sample, which covered approximately 7.09% of the reference genome sequence. Among the 96 lines, the average depth of the mapped region ranged from 4.00 to 16.37 (Table S1).” If the 7.09% is true, this means that the sequencing depth was very low, not GBS but more like skim sequencing, and conclusions should be interpreted carefully from this study. Check the calculations for coverage, because a 7.09% seems really low, but the number of SNP per sample is huge. So, what could be expected for the other 93% of the genome? I think the calculations might be wrong. The sorghum genome size is 729,379,862 bp or ~730 Mb. If the total length of the mapped region is 4,968.2 Mb, and the average per sample is 51.7 Mb, the coverage of the reference genome is bigger than 7%. I will recommend checking all calculations again.

Table 5. Kb symbol is not Kbs is Kb.

Line 280: How was the LOD threshold selected? Was this an arbitrarily selected threshold? Please indicate, it is an important detail.

Figure 4. I believe that this figure should include SNPs found in Table 6, since several models support them as well. What do the colors mean? The is no description in the legend for the colors. Why just one SNP was selected for each trait? Again, figures should speak for themselves. Think about that when making a figure for a reader to understand in one single look. I do not understand completely why these are shown and other SNPs are not shown. If lack of space, then include the figure in the supplemental materials but use all SNPs supported and organize them by trait and p-value. In your figure, the way it is right now, you are showing an SNP that has no statistically significant differences between genotypes for the green trait.

Figure 5. Same comments as for Figure 4. What do the colors mean? The is no legend. Why just one SNP was selected for each trait? Which one is which trait? Again, figures should speak for themselves. Think about that when making a figure for a reader to understand in one single look. I do not understand completely why these are shown and other SNPs are not shown. If lack of space, then include the figure in the supplemental. If lack of space, then include the figure in the supplemental materials but use all SNPs supported and organize them by trait and p-value.

No QQ plot is shown for the GWAS results. Needs to include QQ plots for the reader to evaluate whether there were confounders that affected the study or not. For example, how do the authors check that the existence of population structure is not affecting the GWAS results? No description was found in the methods for this quality control step.

Discussion:

Lines 391-394: Authors indicate that “Previous studies on mutation breeding have demonstrated that gamma rays can successfully induce mutations in quantitative traits of sorghum, such as increased grain and biomass yields, as well as improved nutritional value for food and fodder quality”. What kind of mutations are the most commonly induced (SNPs, translocations, deletions)?

Lines 412 to 414: The idea seems incomplete in the following sentence: “Grain sorghums that have juicy stems usually vary in their final grain yields, but it is not clear whether this is related to dry [45].” Indicate what do authors mean by dry? Dry yield?

Lines 414 to 416: The authors introduced the idea of a stay-green trait. But there is no description of this trait in the current study until the discussion, which seems confusing. What is the purpose of the description of this trait here? Please explain or clarify in the discussion.

Line 419: Authors indicate “an FNE-mapping population” isn’t if FINE-mapping instead? Please correct it.

Lines 466 to 474: The paragraph seems to repeat information that was already presented in the results. Please do not include results in the discussion section. Explain how a higher number of reads does is beneficial to detecting significant SNPs in this case. The idea is not clear and seems not to be relevant for this discussion.

Line 494. Is there any annotation for the gene SbRio.10G099600? If there is, please indicate it in the discussion since it seems to be the most relevant finding from the study.

Line 596 to 598. Authors indicate “The study demonstrated that GWAS is a powerful tool for identifying the genetic factors that contribute to these important traits”. I suggest that they modify the sentence to: “The study demonstrated that GWAS is a powerful tool for identifying potential genetic factors that contribute to these important traits”. Because there is no validation study made specifically on the genes to conclude those candidates are responsible for the phenotypes observed. The coverage of the genome according to the results was low and there could be other mutations present in the genome (caused by the treatment) that could not be detected by RAD-Seq and cannot be excluded by the authors.

 

No supplementary materials were provided and therefore I could not examine the supplementary tables from this manuscript. Not sure why, since there is many Tables cited in the text as supplementary.

Needs improvement. A native English speaker should read the manuscript and help to improve cohesion and coherence.

Author Response

Author's Responses to Reviewer's Comments
Ms Number: Agronomy-2394025
Dear editor and reviewers
On behalf of my co-authors, we thank you for giving us an opportunity to revise our manuscript, and we 
appreciate the editor and reviewers for your positive and constructive comments and suggestions on our 
manuscript entitled; Genome-Wide Association Study (GWAS) of the Agronomic Traits and Phenolic 
Content in Sorghum (Sorghum bicolor. L) Genotypes.
We have reviewed our manuscript carefully, and tried our best to improve our manuscript and made 
some changes in the manuscript based on the editor and reviewers’ comments. We marked the changes in 
red in our revised paper. We appreciate for editor and reviewers’ warm work, and hope that the correction 
will meet with approval. The main corrections in our manuscript and the responds to the reviewer’s 
comments are as follows:
REVIEWER COMMENTS: 
Reviewer 2
1. Line 21: The expression “Our detected six phenolic compounds (luteolinidin diglucoside, lute
olin glucoside, apigeninidin glucoside, luteolinidin, apigeninidin, and 5-O-Me luteolinidin), wi
th luteolinidin being the major phenolic in all genotypes.” Needs clarification and grammar 
correction.
[Response] As suggested by the reviewer, we have modified the sentence.
- Six phenolic compounds, including luteolinidin diglucoside, luteolin glucoside, apigeninidin g
lucoside, luteolinidin, apigeninidin, and 5-O-Me luteolinidin and luteolinidin was found to b
e the major phenolic compound in all genotypes (Lines 21 to 23).
2. Line 27 to 30: The authors indicate that “SbRio.10G099600 was associated with heading date,
SbRio.09G149200 with plant height, SbRio.06G211400 with dry yield, SbRio.04G259800 with t
otal phenolic content and luteolinidin diglucoside, and SbRio.02G343600 with total phenolic c
ontent and luteolinidin”. This is very interesting. To the knowledge of the authors, are any 
of those genes annotated? If so, what are their predicted functions based on their annotatio
n? I think that detail for the genes found could be included briefly in the abstract too.
[Response] We added the annotated gene names.
- GWAS analysis showed that SbRio.10G099600 (FUT1) was associated with heading date, SbRi
o.09G149200 with plant height, SbRio.06G211400 (MAFB) with dry yield, SbRio.04G259800 (P
DHA1) with total phenolic content and luteolinidin diglucoside, and SbRio.02G343600 (LeET
R4) with total phenolic content and luteolinidin suggesting that these genes could play a ke
y role in sorghum. (Lines 29 to 33)
3. Keywords: Maybe adding biomass to the keywords since the traits that were studied are ass
ociated with biomass. That will give the study more visibility when published.
[Response] As suggested by the reviewer, we've added terms to the "Keywords" section.
- We changed it; Keywords: Sorghum; Biomass yield; Phenolic compounds; Mutation breeding;
Genome-wide as-sociation studies (GWAS); Single-nucleotide polymorphisms (SNP) (Lines 36
to 37)
4. Line 51 to 53. The expression “While natural mutations have a very low mutation rate of 1
0−5 to 10−8, ionizing radiation can increase the mutation rate by about 1,000 to 1 million-fol
d compared to natural mutations [10-12].” needs clarification and grammar correction.
[Response] As suggested by the reviewer, we have modified the sentence.
- Natural mutations occur at a very low rate (10-5
to 10-8
), whereas ionizing radiation can incr
ease the mutation rate by approximately 1,000 to 1 million times compared to natural mutat
ions [10-12]. Ionizing radiation is a simple, economical, eco-friendly, and convenient process 
that can be used under safe, well-defined, and controlled operating parameters [13,14].
5. Lines 49 to 57: I will recommend moving the aim of the study to the end of the introducti
on and explaining whether the authors used ionizing radiation to generate the genotypes an
alyzed in this study. Otherwise, it seems odd to mention facts about ionizing radiation and 
leave it to the reader to wonder why they are mentioning this until reading it at the end. I
t also seems odd also to have the aim of the study mentioned twice (lines 82 to 85).
[Response] As suggested by the reviewer, we have modified the sentence.
- We changed it; Mutation breeding techniques increase the probability of a mutation occurring in 
nature. Natural mutations occur at a very low rate (10-5
to 10-8
), whereas ionizing radiation can 
increase the mutation rate by approximately 1,000 to 1 million times compared to natural mutations 
[10-12]. Ionizing radiation is a simple, economical, eco-friendly, and convenient process that can be 
used under safe, well-defined, and controlled operating parameters [13,14]. The advantage of 
mutation breeding is that only a subset of the original traits can be modified, and it is particularly 
effective for changes in chemical compound compositions [15]. To date, radiation breeding has 
resulted in the development of over 210 species and around 3,402 varieties, including 20 sorghum 
varieties [16]. This approach is widely employed in breeding programs due to its ability to rapidly 
enhance crops and increase genetic diversity.
- The aim of this study is to investigate the genetic variability of sorghum using high-density 
SNP data from a sorghum population consisting of 59 radiation-induced mutant lines and 3
7 sorghum genetic resources and to detect candidate genes for key bioindustry-related traits 
that may affect biomass yield and chemical treatment through GWAS analysis. (Lines 87 to 
91)
Methods:
6. Line 119 and 120: Table 1 appears before the authors mentioned: “heading date (HD), plant 
height (PH), fresh yield (FY), dry 119 yield (DY), and soluble solids content (SC)” in the me
thods. I will recommend the table to appear after the authors have mentioned the abbreviati
ons of each trait so the reader can look at the table and understand what the abbreviations 
are without having to refer to the end of the table “back and forth” every time the reader 
wants to look at a specific trait. This will make it easier for the reader. Also, authors shoul
d make sure that the abbreviations used throughout the manuscript are consistent with prev
ious standard abbreviations used for the same traits in previous literature. For example, in r
ice literature, SES (Standard Evaluation System for Rice) established that for plant height, th
e abbreviation is “Ht”, not “PH”, given the fact that “PH” could be confused with “pH”. U
sing a standard abbreviation system helps readers understand the abbreviations without eve
n having to look for what the author’s made-up abbreviation stands for. So, I will recomme
nd the authors make sure they are using and standard abbreviation system for their traits.
[Response] Thank you for your comments. We present a paper in which the abbreviations for agricultural 
traits are the same as in this study. We have modified the positioning so that the abbreviation is followed by 
Table 1. Ji et al (2022) used the abbreviation 'heading date (HD)', 'plant height (PH)', and Zou et al (2011) 
used the abbreviation 'heading data, days (HD)', 'plant height, cm (PH)', 'sugar concentration, brix (SC)'. 
Dalla Marta et al (2014) also used the abbreviation 'sugar concentration, brix (SC)'. Habyarimana et al (2018) 
used the abbreviation 'dry biomass yield, t∙ha-1 (DY)', and Hussian et al (2007) used the abbreviation 'dry 
matter yield, t/ha (DY)'. The authors have used abbreviations that are commonly used in the literature, and 
therefore the abbreviations shown in the text are unlikely to confuse the reader.
Ji, G.; Wang, J.; Zhang, Z.; Shi, Y.; Du, R.; Jiang, Y.; Liu, S.; Wang, X.; Sun, A.; Wang, X. Identification of QTLs 
associated with multiple agronomic traits in Sorghum. Euphytica 2022, 218, 140.
Zou, G.; Yan, S.; Zhai, G.; Zhang, Z.; Zou, J.; Tao, Y. Genetic variability and correlation of stalk yield-related 
traits and sugar concentration of stalk juice in a sweet sorghum ('Sorghum bicolor'L. Moench) population. 
Australian Journal of Crop Science 2011, 5, 1232-1238.
Dalla Marta, A.; Mancini, M.; Orlando, F.; Natali, F.; Capecchi, L.; Orlandini, S. Sweet sorghum for 
bioethanol production: Crop responses to different water stress levels. Biomass Bioenergy 2014, 64, 211-219.
Habyarimana, E.; Lorenzoni, C.; Redaelli, R.; Alfieri, M.; Amaducci, S.; Cox, S. Towards a perennial biomass 
sorghum crop: A comparative investigation of biomass yields and overwintering of Sorghum bicolor x S. 
halepense lines relative to long term S. bicolor trials in northern Italy. Biomass Bioenergy 2018, 111, 187-195.
Hussain, A.; Khan, S.; Sultani, M.; Mohammad, D. LOCATIONAL, VARIATION IN GREEN FODDER 
YIELD, DRY MATTER YIELD, AND FORAGE QUALITY OF SORGHUM. Pakistan J. Agric. Res. Vol 2007, 20
We look forward to your review. Nevertheless, we will fix the changed abbreviations if necessary.
7. Lines 120 to 129: The description of each trait (when and how each trait was measured) nee
ds clarification. If the heading date is at 50% heading, then indicate that it corresponds to d
ays between sowing and 50% heading. The way traits are described is very confusing. Is th
e fresh and dry yield just the fresh weight and dry weight (tons) divided by value? What i
s the value? Not clear from this description what that value was. For example, what do the 
authors mean by dry matter ratio? Authors need to improve this, otherwise, the reader has 
no idea of what each trait is.
[Response] We have provided the information in the 2.2 section 
“The heading date was scored as the number of days between sowing and 50% heading. The plant height, 
soluble solids content, and fresh yield were measured at the seed harvest dates of each genotype. The soluble 
solids content (brix°) was determined using a hand-held refractometer (OPT-I, Bellingham & Stanley Ltd., 
England) measured from the juice of the main stem at 15 cm above the ground. The fresh weight of the 
whole plant, except for the panicle, was measured for each individual. Fresh yield was then determined by 
multiplying the fresh weight yield per linear meter (6 m2
) by the total linear meters grown per hectare. 
Subsequently, the dry yield was calculated mechanically by multiplying the fresh yield by the average 
percentage of dry matter. Dry yield was surveyed on 96 genotypes during two generations. Fertilizer (N:P:K 
4:2:2 w/w/w) was applied to 500 kg∙ha-1 at pre-sowing and the plants were not fertilized after sowing” 
8. Lines 135 to 136: The expression “The filtered extracts were transferred to 2 ml vials and an
alyzed using the UPLC system (CBM-20A, 135 Shimadzu Co., Kyoto, Japan). Each sample ex
tract was analyzed using a UPLC system coupled with a photodiode array detector (DAD; 
Agilent 1260 series; Agilent Technolo- 137 gies, Santa Clara, CA, USA)…” is redundant. Plea
se improve the paragraph to remove redundancy.
9. Lines 141 to 142: Authors indicate that “The mobile phase consisted of 0.05% formic acid in 
water (A) and acetonitrile (B) with 0.05% formic acid.” Is the last just a mixture of formic a
cid in acetonitrile? So, it would be 0.05% formic acid in acetonitrile. Or is this a ratio of ac
etonitrile mixed with diluted 0.05% formic acid in water? Needs clarification.
[Response] We changed in 2.3 section; The filtered samples were analyzed using a UPLC and a photodiode 
array detector (DAD; Agilent 1260 series; Agilent Technologies, Santa Clara, CA, USA) and a quadrupole 
liquid chromatograph/mass spectrometer (Agilent 6130; Agilent Technologies, Santa Clara, CA, USA) 
equipped with a XR-ODS column (3.0â…©100 mm, 1.8 μm, Shimadzu, Japan) and a compatible C18 guard 
column (4☓3 mm id.; 3 μM particle size; Phenomenex, Torrance, CA, USA). The mobile phase composed of 
water (solvent A, containing 0.05% formic acid) and acetonitrile (solvent B, containing 0.05% formic acid). 
The gradient program was 0–3 min, 95% A and 5% B; 3–8 min, 100% B; and 18–24 min, 100% A. The flow rate 
of the mobile phase was adjusted to 0.5 mL/min and the column temperature was set to 40 °C. The injection 
volume was 10 μL. The optimal atmospheric pressure ionization–electrospray ionization parameters. The 
detection of total phenolic content was performed at 320 nm. For the quantification of total phenolic content, 
a standard compound (luteolinidin; Sigma, USA) was dissolved in 80% ethanol (v/v). Luteolinidin 
diglucoside, luteolin glucoside, apigeninidin glucoside, luteolinidin, apigeninidin, and 5-O-Me-luteolinidin 
were identified using the method described in a previous study [31].
10. Line 161: Which version of the sorghum genome was used? In which version of Phytozome 
was found? There are plenty of different versions of Sorghum bicolor genomes in Phythozo
me and there are plenty of Phytozome databases. This is an important detail to specify. I se
e that Sorghum bicolor Rio_v2 genotype is mentioned, but normally versions will have a sec
ond number like: Sorghum bicolor Rio_v2.1. Please fix this every time Sorghum bicolor Rio_
v2 genotype is mentioned.
[Response] We found some mistake. The reference genome we used is Sorghum bicolor Rio.v2.1. 
11. Line 164 to 167: What does the authors mean by “Common SNPs from the reference sequen
ce were selected to separate genotypes in the SNP matrix, and polymorphic SNPs were sele
cted by comparing the common SNPs with the base sequence of each Sorghum bicolor Rio_
v2 genotype.” This idea needs clarification. I would think that the SNPs that needed to be 
used for the analysis are all SNPs found. Some of them would be specific to the cultivar (b
ecause it is different from the reference used for aligning reads), but others would be induc
ed by the ionizing radiation treatment applied to the cultivars. If that is the case, the SNPs 
induced by the treatments on each cultivar should be identified by comparing the SNPs fou
nd for the same cultivar (that was not treated) with the treated one. That way authors woul
d know whether the SNP associated with the trait was obtained with the ionizing treatment 
(an induced mutation) or it was just a SNP present in the cultivar genotype already.
[Response] Your suggestion is very necessary for the detection of radiation variants. To detect 
radiation-induced variation, we performed mutant lines obtained from the same original cultivar in 
various crops. The sorghum resources we used in this study have multiple original cultivars that 
have been clustered to ensure diversity of variation (diversity) in the target trait for GWAS. This 
study focused on the target trait-associated SNPs detections. Unfortunately, we did not have a 
diverse population of target trait variants from the same origin, so we analyzed them as a mixed 
population. The strength of this study is that we had sufficient SNP information for our target trait. 
12. Line 169 to 178: How did the authors guarantee to the readers that the SNPs found are not
given to sequencing errors or contamination? Did replicates of each DNA sample were sent
for GBS? What was the sequencing depth originally, and was any depth threshold set duri
ng the analysis to be able to call a genotype (for example, genotypes with a depth lower th
an 5 would be considered missing)? Those are important details that are not mentioned in t
he methods at all. Illumina sequencing has a low error rate but it still exists. How do auth
ors make sure the SNP was not a sequencing error, besides filtering the vcf file? Please indi
cate in the method what where the Quality Control parameters were used during analysis t
o make sure the SNPs obtained were not due to error. For example, what was defined as l
ow quality read? what were the thresholds used for trimming adapters from the reads, etc. 
Or was this left for the sequencing company to decide?
[Response] GBS library was selected after repetition. So, we have 1nd collecting process for 
identifying polymorphic SNPs compared with reference. We have checked the information about 
depth, and we would like to point out that the analysis of GBS is low compared to whole genome 
sequencing, and it is the result of our best efforts, including comparison with reference sequences. 
Normally, the GBS technique was largely used in crop where its low sequence coverage is not a 
drawback for calling genotypes because lines are almost homozygous. We receive data from 
analytics companies (Seeders) and checking that the qualities.
Results:
13. Figure 1. Y-axis should be made the same height in this figure, so the reader can see clearl
y see that the phenolic content is very different from the highest TPC line (B) to the origin
al sorghum cultivar (A), given the different heights of the peaks. Each figure in a manuscrip
t should speak for itself when first looking at it, without having to read any description. So,
I will recommend making the Y-axis the same height from 0 to 400 mAU and naming the 
compounds in the figure instead of adding numbers. By the way there are two numbers “1”
of different sizes in panel A. That would be avoided if names are used.
[Response] We thank the reviewer for this comment. We have added it
14. Figure 2. This figure is impossible to read. I would not use it unless the reader can clearly 
read the text in the figure. Make the text bigger or lose the figure. It does not tell me anyt
hing about your results the way it is now because it is not readable at all. Look for a bette
r way to represent what you want to represent here, but do not use this figure the way it 
is now, it is not informative. A better way to represent the author’s idea is to use 2D grap
hs with the same x-axis and y-axis coordinates for each of the varieties, like in Figure 1, or 
make box plots with the concentration of each of the six compounds found in each of the v
arieties analyzed and make a composite figure.
[Response] In response to the reviewer’s opinion, we have changed
15. Figure 3. Would recommend making the font size bigger for the legend and the foot note.
[Response] As suggestion, we have modified the figure 3.
16. Lines 253 to 254. The authors indicate that “The total length of the mapped region was 4,96
8.2 Mb, with an average of 51.7 Mb per sample, which covered approximately 7.09% of the 
reference genome sequence. Among the 96 lines, the average depth of the mapped region ra
nged from 4.00 to 16.37 (Table S1).” If the 7.09% is true, this means that the sequencing de
pth was very low, not GBS but more like skim sequencing, and conclusions should be inter
preted carefully from this study. Check the calculations for coverage, because a 7.09% seems
really low, but the number of SNP per sample is huge. So, what could be expected for the
other 93% of the genome? I think the calculations might be wrong. The sorghum genome s
ize is 729,379,862 bp or ~730 Mb. If the total length of the mapped region is 4,968.2 Mb, an
d the average per sample is 51.7 Mb, the coverage of the reference genome is bigger than 
7%. I will recommend checking all calculations again.
[Response] We apologize sorry to confuse, we found some mistake. "Depth" was written incorrectly and has 
been corrected. 4,967.2 Mb is the mapped region in the entire population (96 lines). In each individual 
population, the mapped region ranged from a minimum of 7,920,298 bp to a maximum of 83,766,015 bp, 
with an average of 51,752,780 bp. Coverage ranged from a minimum of 1.0859% to a maximum of 11.4846%, 
with an average of 7.0954%. 
- Among the 96 lines, the average depth of the mapped region ranged from 5.00 to 30.17, wit
h an average of 10.27 (Table S1). (Lines 271 to 272)
17. Table 5. Kb symbol is not Kbs is Kb.
[Response] We apologize sorry to confuse, we found some mistake. We modified the word.
18. Line 280: How was the LOD threshold selected? Was this an arbitrarily selected threshold? 
Please indicate, it is an important detail.
[Response] As mentioned in Materials and Methods, we set p-value = 0.0001.The authors tried to set the pvalue using the Bonferroni method, but this method was too conservative to detect SNPs of related traits. 
Therefore, we arbitrarily set the p-value (0.0001) based on a type 1 error to select SNPs associated with the 
traits.
19. Figure 4. I believe that this figure should include SNPs found in Table 6, since several mod
els support them as well. What do the colors mean? The is no description in the legend for
the colors. Why just one SNP was selected for each trait? Again, figures should speak for t
hemselves. Think about that when making a figure for a reader to understand in one single
look. I do not understand completely why these are shown and other SNPs are not shown.
If lack of space, then include the figure in the supplemental materials but use all SNPs su
pported and organize them by trait and p-value. In your figure, the way it is right now, yo
u are showing an SNP that has no statistically significant differences between genotypes for 
the green trait.
[Response] The reason for presenting one boxplot per trait was to effectively show the SNPs with the highest 
variation among the selected SNPs. We also made the colors different for different traits. As suggested by the 
reviewer, we added a legend for the colors and created a box plot of the selected SNPs, which we have 
included in the Supplementary Material (Figure S3). 
20. Figure 5. Same comments as for Figure 4. What do the colors mean? The is no legend. Wh
y just one SNP was selected for each trait? Which one is which trait? Again, figures should 
speak for themselves. Think about that when making a figure for a reader to understand in 
one single look. I do not understand completely why these are shown and other SNPs are 
not shown. If lack of space, then include the figure in the supplemental. If lack of space, th
en include the figure in the supplemental materials but use all SNPs supported and organiz
e them by trait and p-value.
[Response] As mentioned, the reason for presenting one boxplot per trait was to effectively show the SNPs 
with the highest variation among the selected SNPs. We also made the colors different for different traits. As 
suggested by the reviewer, we added a legend for the colors and created a box plot of the selected SNPs, 
which we have included in the Supplementary Material (Figure S4). 
21. No QQ plot is shown for the GWAS results. Needs to include QQ plots for the reader to e
valuate whether there were confounders that affected the study or not. For example, how d
o the authors check that the existence of population structure is not affecting the GWAS res
ults? No description was found in the methods for this quality control step.
[Response] We apologize sorry to confuse, we found some mistake. I attached the Manhattan plot and qq 
plot of the GWAS results by model to the Supplementary Material, but it was not uploaded because the data 
was large. I sent it to the editor, but it seems that the data is missing. The Manhattan plot was placed in the 
supplementary material presented in Figure S1, S2.
Discussion:
22. Lines 391-394: Authors indicate that “Previous studies on mutation breeding have demonstrat
ed that gamma rays can successfully induce mutations in quantitative traits of sorghum, suc
h as increased grain and biomass yields, as well as improved nutritional value for food and
fodder quality”. What kind of mutations are the most commonly induced (SNPs, translocati
ons, deletions)?
[Response] Mutations induced by ionizing radiation range from simple base substitutions to single- and 
double- strand DNA breaks. Radiation also induced SNPs were divided into INDEL, transitions, 
transversions. The mutations are depending on the radiation source being treated. Ion beams show high 
linear energy transfer (LET), and their use is recognized as a more powerful technique for efficiently 
inducing DNA double-strand breaks than other mutagenesis techniques, and producing new mutations. 
23. Lines 412 to 414: The idea seems incomplete in the following sentence: “Grain sorghums tha
t have juicy stems usually vary in their final grain yields, but it is not clear whether this is 
related to dry [45].” Indicate what do authors mean by dry? Dry yield?
[Response] We were mentioning the connection between a dry environment and final grain yield.
24. Lines 414 to 416: The authors introduced the idea of a stay-green trait. But there is no desc
ription of this trait in the current study until the discussion, which seems confusing. What i
s the purpose of the description of this trait here? Please explain or clarify in the discussio
n.
[Response] You're right, late flowering doesn't necessarily equate to staying green. We've removed that 
word. 
25. Line 419: Authors indicate “an FNE-mapping population” isn’t if FINE-mapping instead? Ple
ase correct it.
[Response] We apologize sorry to confuse, We modified the word.
26. Lines 466 to 474: The paragraph seems to repeat information that was already presented in 
the results. Please do not include results in the discussion section. Explain how a higher nu
mber of reads does is beneficial to detecting significant SNPs in this case. The idea is not cl
ear and seems not to be relevant for this discussion.
Response] In Next-Generation Sequencing (NGS) technology, samples are treated with restriction 
enzymes to fragment DNA sequences into shorter pieces. The assembled reads are mapped to the 
reference genome, and during this mapping process, SNPs and indels can be identified. The more reads 
in a sample, the more accurate the sequence will be. So, I think it's important to have a high number of 
reads. Gim, J.A.; Kim, H.S. Development of an Economic-trait Genetic Marker by Applying Next-
generation Sequencing Technologies in a Whole Genome. J. Life Sci 2014, 24, 1258-1267.
27. Line 494. Is there any annotation for the gene SbRio.10G099600? If there is, please indicate i
t in the discussion since it seems to be the most relevant finding from the study.
[Response] As mentioned, Added annotation for genes in SbRio.10G099600.
- SbRio.10G099600 encodes the galactoside 2-alpha-L-fucosyltransferase (FUT1) gene. FUT1 is in
volved in cell wall biosynthesis, which plays a crucial role in plant development, disease res
istance, and signal transduction [56]. (Lines 513 to 515)
28. Line 596 to 598. Authors indicate “The study demonstrated that GWAS is a powerful tool fo
r identifying the genetic factors that contribute to these important traits”. I suggest that they
modify the sentence to: “The study demonstrated that GWAS is a powerful tool for identif
ying potential genetic factors that contribute to these important traits”. Because there is no v
alidation study made specifically on the genes to conclude those candidates are responsible f
or the phenotypes observed. The coverage of the genome according to the results was low a
nd there could be other mutations present in the genome (caused by the treatment) that co
uld not be detected by RAD-Seq and cannot be excluded by the authors.
[Response] As suggested by the reviewer, we have modified the sentence.
- The study demonstrated that GWAS is a powerful tool for identifying potential genetic facto
rs that contribute to these important traits. (Lines 617 to 619)
No supplementary materials were provided and therefore I could not examine the supplementary tables 
from this manuscript. Not sure why, since there is many Tables cited in the text as supplementary.
We send e-mail for office on May 2 because our supplement materials had not been uploaded

Author Response File: Author Response.pdf

Back to TopTop