Next Article in Journal
The Effect of a Spiral Density Wave on the Galaxy’s Rotation Curve, as Applied to the Andromeda Galaxy (M31)
Next Article in Special Issue
The Possibility of Mirror Planet as Planet Nine in the Solar System
Previous Article in Journal
The Phase Space Analysis of Interacting K-Essence Dark Energy Models in Loop Quantum Cosmology
Previous Article in Special Issue
Exploring Anisotropic Lorentz Invariance Violation from the Spectral-Lag Transitions of Gamma-Ray Bursts
 
 
Article
Peer-Review Record

Outliers in Spectral Time Lag-Selected Gamma Ray Bursts

Universe 2022, 8(10), 521; https://doi.org/10.3390/universe8100521
by Fei-Fei Wang 1 and Yuan-Chuan Zou 2,*
Reviewer 1:
Reviewer 2: Anonymous
Universe 2022, 8(10), 521; https://doi.org/10.3390/universe8100521
Submission received: 21 August 2022 / Revised: 2 October 2022 / Accepted: 4 October 2022 / Published: 8 October 2022
(This article belongs to the Special Issue Advances in Astrophysics and Cosmology – in Memory of Prof. Tan Lu)

Round 1

Reviewer 1 Report

In this manuscript, the authors used the Partitioning Around Medoids (PAM) method to search for outliers in the gamma-ray burst sample, from which two significant outliers, GRB 980425B and GRB 030528A were found. Correlations between different parameter combinations are then explored in the sample without the outliers. I think this manuscript is worth publishing after the authors address the points I raise below.

 

1. As mentioned in the Introduction, the giant flares from soft gamma-ray repeaters (SGR) may lie in the sample of conventional GRBs from compact binary mergers or the collapse of massive stars. Indeed, some signals have been confirmed to be associated with SGRs. It is natural to ask could the PAM method or other classification analysis methods well identify such giant flares from the sample. The authors are encouraged to give relevant discussions on this issue.

 

2. In line 205 on page 11, GRBs 980425B and 030528A are always outliers for parameter combinations having correlations. To make it more readable to readers, the authors should at least give a brief reason or comment on this result, e.g., the particular features of these bursts or the observational effect of the instrument.

 

Minor points:

 

1. “Component 1/2” are used as axis labels of most figures. Please define or clarify it in the text or figure caption. 

 

2. In the caption of Figure 10, the citation to another figure is missed. Also, please go through the whole paper to remove similar omissions.

Author Response

In this manuscript, the authors used the Partitioning Around Medoids (PAM) method to search for outliers in the gamma-ray burst sample, from which two significant outliers, GRB 980425B and GRB 030528A were found. Correlations between different parameter combinations are then explored in the sample without the outliers. I think this manuscript is worth publishing after the authors address the points I raise below.

Reply: We thank the reviewer very much for the comments. We have modified according the comments and the modifications are marked in bold face.
 
----------------
1. As mentioned in the Introduction, the giant flares from soft gamma-ray repeaters (SGR) may lie in the sample of conventional GRBs from compact binary mergers or the collapse of massive stars. Indeed, some signals have been confirmed to be associated with SGRs. It is natural to ask could the PAM method or other classification analysis methods well identify such giant flares from the sample. The authors are encouraged to give relevant discussions on this issue.

Reply: Many thanks for the suggestion. We believe the SGRs could be classified as outliers in the GRB sample. However, it may not be done from this tau_{lag,i} dominated data sample. It might very likely in the data sample contains T90, E_p, alpha, beta, TR_45/T90 etc. We added the discussion in the text (last paragraph).
 
----------------
2. In line 205 on page 11, GRBs 980425B and 030528A are always outliers for parameter combinations having correlations. To make it more readable to readers, the authors should at least give a brief reason or comment on this result, e.g., the particular features of these bursts or the observational effect of the instrument.

Reply: GRB 980425B is a low-luminosity gamma-ray burst, and GRB 030528A is an X-ray rich burst. We have added the particular features in the text (2nd paragraph of section 4).

----------------
Minor points:
 

1. “Component 1/2” are used as axis labels of most figures. Please define or clarify it in the text or figure caption. 

Reply: “Component 1/2” are the two principal components. We added the clarification in the caption of figure 1.

---------------- 
2. In the caption of Figure 10, the citation to another figure is missed. Also, please go through the whole paper to remove similar omissions.

Reply: Thanks and we have corrected the mistake in Figure 10, and checked the whole paper.

Reviewer 2 Report

I read the manuscript named “Outliers in spectral time lag selected gamma-ray bursts” submitted for publication on Universe by Wang and Zou. The paper describes a parameter space exploration of the properties of a large sample of Gamma-Ray Bursts (GRBs) using the Partitioning Around Medoids (PAM) method, with the aim of identifying outliers and testing the statistical significance of the correlations in the sample. The results of this type of study can have important implications in the identification of general relations that could help to obtain statistical estimates of important parameters, or to uncover the presence of features that affect the GRB classification, possibly pointing to the discovery of new GRB classes. All the questions above have a fundamental role in Astrophysics and Cosmology and the results of the study present some degree of novelty. In spite of this, I think that the paper has some serious issues that need to be addressed, before I can recommend its publication. In particular, I see that the sample and the methodology descriptions are not complete and sometimes unclear. This results in some major concerns and in a list of minor issues that I report in the following. I also think that the text needs moderate language revision, though I could generally understand what the authors are implying.

MAIN ISSUES

  • My first problem concerns the description of the sample selection. The authors refer to a table of 6289 GRBs, described in Wang et al. (2020, ApJ, 893, 77), from which they extract a set of observed and rest-frame parameters. Something is missing in this description, because rest frame parameters can only be obtained from sources with at least a redshift estimate (573 entries in the table), or by assuming some hypothesis. My understanding is that the authors use, in every test, only the subset of entries that have all the parameters on which they are working. This implies that the sample changes for every correlation study and that, in any case, the starting sample is not made by 6289 events (some of which do not even have reported coordinates). If the authors are using only events with a measured spectral lag (as apparently implied by line 84), there would be only 428 entries and only 165 of these have a measured redshift.

  • A second important problem is the missing or unclear definition of some critical parameters. The authors use the spectral lag in all their correlation tests, but the parameter is not explicitly defined. In Section 3.1 it is hinted that it is obtained as the division of the peak arrival time lag at different energies by the energy separation of the bands. This definition needs an equation, to clarify how the parameter was converted in rest frame.

  • A third important problem is the lack of a proper definition of the concept of outlier itself. Though rather intuitive, outliers can be considered as such if they fail to meet some cluster membership rules. The text should explain on what conditions the algorithm decides the cluster membership. It is clear that points are tested for Euclidean distance from a medoid, but what are the membership criteria? The ellipses drawn in the lower left panel of Figure 1 (and in similar panels of other figures) suggest that the algorithm decides if a point is in or out of a group based on the value of its “components”. The algorithm described in lines 106-123 can, in principle, assign every point to a group. So the authors need to explain if their definition of outliers corresponds to the points of the smallest group or if there are any rules that the points fail to meet to fall in any of the existing groups.

  • The last major issue is related to the figures and it is very closely related to the previous point. Referring to Figure 1, the caption states that the components illustrated on the cluster plot “do not represent any combination of the three parameters.” This opens two issues. First, the caption should explain what the plots are, not what they are not. Second, what is the advantage of using PAM, if the existence of two clear outliers (and one border-line point) could be equivalently obtained by looking at the lower right panel? As the text currently reports, the contribution of PAM is only to decide which points should be considered as members of a group, but this is obtained based on undefined criteria. It would make much more sense to test the partitioning on the correlation parameter space or, alternatively, to describe which criteria the PAM considers more powerful to discriminate among cluster members. The text does not even mention how the correlations illustrated in Figures 1-8 were derived. Were they identified by the algorithm or are they just the best linear relationship among the tested parameters? And why the relations presented in Figures 9-19 do not show any corresponding correlation?



In addition to the above problems, I report below a list of some minor issues that I spotted throughout the text:

  • Line 16: “Some the GRBs could not…” → “Some GRBs could not …”

  • Lines 36-37: “The positive spectral lag is that the high-energy photons arrives before the low-energy photons, the negative spectral lag is on the contrary.” → “A positive spectral lag is when the high-energy photons arrive before the low-energy ones, a negative spectral lag is the opposite.”

  • All references to the “Band model” and the “Band function” should be given with capital B (possibly including the subscript Band specification for the spectral parameters), since they refer to “David Band” (and not to a general “spectral band”). This applies to lines 74-76, lines 90-94 and the captions of Figures 1, 2, 6, 7, 9, 10 and 15.

  • Line 85: It is claimed that GRB 060218 is disregarded, because “it is a clear outlier”, but GRB 980425B and 030528A, which have “much larger” spectral lags than the other GRBs, are kept in the sample (lines 196-197). The authors need to clarify why one burst was removed from the sample and the other two left in.

  • Lines 91-95: the discussion on the spectral index convention is presented in a very confusing way. The text should just report the equations of the power law model, the cut-off power law and the Band function with the proper signs for the values extracted from the sample table. For example, if the table reports alpha=2.0 for a power-law N(E) = A E^-2, then the power-law convention is N(E) = A E^-alpha. Similar considerations apply to other spectral forms.

  • Line 106: how is the best cluster number K calculated? If it is a result of the Nbclust process, the text should at least describe the approach used by the algorithm to obtain this value.

  • Caption of Figure 1: the text should define R^2. Is it the square of the Pearson’s linear correlation parameter? If so, there should also be a discussion on what is meant by stating that “only 8 combinations” pass the hypothesis testing (line 137).

  • Lines 146-147: “Therefore, these two analysis …” → “Therefore, these two analyses …”

  • Line 151: “by the difference of two energy band central value …” → “by the difference of two energy band central values …”

  • Lines 176-178: According to this statement, Figure 11 and some following figures should motivate the existence of strong correlations, after removing the outliers. Why it is not possible to extract such correlations and plot them?

  • Line 187: “To shown the distribution …” → “To show the distribution …”

  • Line 189: The statement “Though there are very few outlier-like…” is highly questionable. The only rare GRBs appear to be those with very negative spectral lags. Though rare, there is no reason to expect them to be “outliers” in terms of a correlation, since correlations can in principle extend to the negative spectral lag domain and find these objects close to the expected correlation lag.

  • Caption of Figure 10: there is a wrong reference to another figure, likely Fig. 9.

  • Line 197: “the most large spectral …” → “the largest spectral …”

  • Line 200: “As there is no strong correlations …” → “As there are no strong correlations”

  • Lines 202-204: “Once we find tighter correlations …” is a very important part of the motivation of this study. The importance of looking for reliable relations to be used as standard candle calibration methods should be explicitly mentioned in abstract and introduction.

  • Line 216: “… remove it from the whole sample.” → “… remove them from the whole sample.”

 

Author Response

I read the manuscript named “Outliers in spectral time lag selected gamma-ray bursts” submitted for publication on Universe by Wang and Zou. The paper describes a parameter space exploration of the properties of a large sample of Gamma-Ray Bursts (GRBs) using the Partitioning Around Medoids (PAM) method, with the aim of identifying outliers and testing the statistical significance of the correlations in the sample. The results of this type of study can have important implications in the identification of general relations that could help to obtain statistical estimates of important parameters, or to uncover the presence of features that affect the GRB classification, possibly pointing to the discovery of new GRB classes. All the questions above have a fundamental role in Astrophysics and Cosmology and the results of the study present some degree of novelty. In spite of this, I think that the paper has some serious issues that need to be addressed, before I can recommend its publication. In particular, I see that the sample and the methodology descriptions are not complete and sometimes unclear. This results in some major concerns and in a list of minor issues that I report in the following. I also think that the text needs moderate language revision, though I could generally understand what the authors are implying.

Reply: We are sincerely grateful for the reviewer's valuable and critical comments. We have carefully modified the draft according to the reviewer's comments. All the modifications are marked in bold face in the text. The details are listed point-to-point in the following.

----------------
MAIN ISSUES

My first problem concerns the description of the sample selection. The authors refer to a table of 6289 GRBs, described in Wang et al. (2020, ApJ, 893, 77), from which they extract a set of observed and rest-frame parameters. Something is missing in this description, because rest frame parameters can only be obtained from sources with at least a redshift estimate (573 entries in the table), or by assuming some hypothesis. My understanding is that the authors use, in every test, only the subset of entries that have all the parameters on which they are working. This implies that the sample changes for every correlation study and that, in any case, the starting sample is not made by 6289 events (some of which do not even have reported coordinates). If the authors are using only events with a measured spectral lag (as apparently implied by line 84), there would be only 428 entries and only 165 of these have a measured redshift.

Reply: Many thanks for the comment. Yes, we just used the GRBs with measured spectral lag and redshift. The sample changes for every correlation studies. We clarified these issues in the text (section 2, 1st paragraph, 2nd sentence).

----------------
A second important problem is the missing or unclear definition of some critical parameters. The authors use the spectral lag in all their correlation tests, but the parameter is not explicitly defined. In Section 3.1 it is hinted that it is obtained as the division of the peak arrival time lag at different energies by the energy separation of the bands. This definition needs an equation, to clarify how the parameter was converted in rest frame.

Reply: Thanks. We have clarified the origin of the spectral time lag we used, and added GRB 980425B as an example for a clearer explanation in the text (section 2, 1st paragraph, last few sentences).

----------------
A third important problem is the lack of a proper definition of the concept of outlier itself. Though rather intuitive, outliers can be considered as such if they fail to meet some cluster membership rules. The text should explain on what conditions the algorithm decides the cluster membership. It is clear that points are tested for Euclidean distance from a medoid, but what are the membership criteria? The ellipses drawn in the lower left panel of Figure 1 (and in similar panels of other figures) suggest that the algorithm decides if a point is in or out of a group based on the value of its “components”. The algorithm described in lines 106-123 can, in principle, assign every point to a group. So the authors need to explain if their definition of outliers corresponds to the points of the smallest group or if there are any rules that the points fail to meet to fall in any of the existing groups.

Reply: Thanks for pointing it out. The definition of outliers corresponds to the points of the smallest group. In this paper, the smallest group has a few points. We stated the criterion in the 4th paragraph of section 2. 

----------------
The last major issue is related to the figures and it is very closely related to the previous point. Referring to Figure 1, the caption states that the components illustrated on the cluster plot “do not represent any combination of the three parameters.” This opens two issues. First, the caption should explain what the plots are, not what they are not. Second, what is the advantage of using PAM, if the existence of two clear outliers (and one border-line point) could be equivalently obtained by looking at the lower right panel? As the text currently reports, the contribution of PAM is only to decide which points should be considered as members of a group, but this is obtained based on undefined criteria. It would make much more sense to test the partitioning on the correlation parameter space or, alternatively, to describe which criteria the PAM considers more powerful to discriminate among cluster members. The text does not even mention how the correlations illustrated in Figures 1-8 were derived. Were they identified by the algorithm or are they just the best linear relationship among the tested parameters? And why the relations presented in Figures 9-19 do not show any corresponding correlation?

Reply: (1) The two components are the two principal components; (2) PAM can find the existence of clear outliers; (3) We have described the process of PAM in Section 2; (4) We tried all the combinations with spectral time lag. For every combination, we tried to find the outliers firstly. After that, we figured out the linear regression is significant or not without outliers. Figures 1-8 are the combinations with passed hypothesis testing; (5) Figures 9-19 don't pass the hypothesis testing. These are stated in the 1st paragraph of section 3 (see the bold face text).

----------------
In addition to the above problems, I report below a list of some minor issues that I spotted throughout the text:

Line 16: “Some the GRBs could not…” → “Some GRBs could not …”

Reply: Thanks, we have corrected it.

----------------
Lines 36-37: “The positive spectral lag is that the high-energy photons arrives before the low-energy photons, the negative spectral lag is on the contrary.” → “A positive spectral lag is when the high-energy photons arrive before the low-energy ones, a negative spectral lag is the opposite.”

Reply: Thanks and corrrected.

----------------
All references to the “Band model” and the “Band function” should be given with capital B (possibly including the subscript Band specification for the spectral parameters), since they refer to “David Band” (and not to a general “spectral band”). This applies to lines 74-76, lines 90-94 and the captions of Figures 1, 2, 6, 7, 9, 10 and 15.

Reply:  Thanks and corrrected.

----------------
Line 85: It is claimed that GRB 060218 is disregarded, because “it is a clear outlier”, but GRB 980425B and 030528A, which have “much larger” spectral lags than the other GRBs, are kept in the sample (lines 196-197). The authors need to clarify why one burst was removed from the sample and the other two left in.

Reply: GRB 060218 is actually an X-ray flash, which is not a typical GRB. It has many different properties far from a normal GRB. Therefore, it is natually to remove it following  Foley et al. 2008. We clarified it in the text.

----------------
Lines 91-95: the discussion on the spectral index convention is presented in a very confusing way. The text should just report the equations of the power law model, the cut-off power law and the Band function with the proper signs for the values extracted from the sample table. For example, if the table reports alpha=2.0 for a power-law N(E) = A E^-2, then the power-law convention is N(E) = A E^-alpha. Similar considerations apply to other spectral forms.

Reply: As the definition is like  N(E) = A E^alpha, while alpha~-2 in general, which is mostly a negative number. Therefore, in the statistics, we use -alpha, -beta (they become positive numbers). We would keep the original formula. This is also consistent with our previous work. Indeed, in the literature, there are different definitions, such as: N(E) = A E^{-alpha}, E^{-beta}; or N(E) = A E^{alpha}, E^{-beta}; or N(E) = A E^{alpha}, E^{beta}; or even using f_nu = B nu^alpha etc. To be unified, in Wang et al. 2020, we choose a single definition, which is also used in this paper as shown in eq.(1). We also added some explanation at the 2nd paragraph following eq.(1), trying to make it not so confusing.

----------------
Line 106: how is the best cluster number K calculated? If it is a result of the Nbclust process, the text should at least describe the approach used by the algorithm to obtain this value.

Reply: Nbclust package provides 30 indices for determining the number of clusters. Charrad 2014 gave more details about Nbclust package. We added the discription in the text.

----------------
Caption of Figure 1: the text should define R^2. Is it the square of the Pearson’s linear correlation parameter? If so, there should also be a discussion on what is meant by stating that “only 8 combinations” pass the hypothesis testing (line 137).

Reply: We use the adjusted R^2 to measure the goodness of the regression model. It means the percentage of variance explained considering the parameter freedom. We tried linear regression for the 191 combinations without outliers. Only 8 combinations have passed the hypothesis testing.  We also added the discription in the text(section 3, 1st paragraph, last but 4-5 lines).

----------------
Lines 146-147: “Therefore, these two analysis …” → “Therefore, these two analyses …”

Reply: We have finished.

----------------
Line 151: “by the difference of two energy band central value …” → “by the difference of two energy band central values …”

Reply: We have finished.

----------------
Lines 176-178: According to this statement, Figure 11 and some following figures should motivate the existence of strong correlations, after removing the outliers. Why it is not possible to extract such correlations and plot them?

Reply: In order to avoid the influence of different unit, we apply data standardization to all the parameters. In Figure 11 and some following figures, they are the PAM results, but not the original data plots. We didn't find correlations In Figure 11 and the following figures. In the oringinal line 176-177, "In some of these figures (e.g., figures 11, ), one can see the sample 176 without outliers appear kind of linear correlations. " It only tells the reader to be caution, that the apparent linear correlation on PAM diagram is not true. Only linear regression can tell the true correlation.

----------------
Line 187: “To shown the distribution …” → “To show the distribution …”

Reply: We have finished.

----------------
Line 189: The statement “Though there are very few outlier-like…” is highly questionable. The only rare GRBs appear to be those with very negative spectral lags. Though rare, there is no reason to expect them to be “outliers” in terms of a correlation, since correlations can in principle extend to the negative spectral lag domain and find these objects close to the expected correlation lag.

Reply: Yes, that statement is quite confusing and not quite relavant to the whole result. We deleted that sentence.

----------------
Caption of Figure 10: there is a wrong reference to another figure, likely Fig. 9.

Reply: We have finished.

----------------
Line 197: “the most large spectral …” → “the largest spectral …”

Reply: We have finished.

----------------
Line 200: “As there is no strong correlations …” → “As there are no strong correlations”

Reply: We have finished.

----------------
Lines 202-204: “Once we find tighter correlations …” is a very important part of the motivation of this study. The importance of looking for reliable relations to be used as standard candle calibration methods should be explicitly mentioned in abstract and introduction.

Reply: We have added the discussion in abstract and introduction.

----------------
Line 216: “… remove it from the whole sample.” → “… remove them from the whole sample.”

Reply: We have finished.

Round 2

Reviewer 2 Report

I read the revised manuscript and the replies to my first round of comments and I see that the authors introduced proper clarifications and amendments, making the text much clearer. I think that now the paper comes in an acceptable form, even though it still needs some degree of language revision and, possibly, the inclusion of one missing piece of information. Specifically, the authors need to clarify with an explicit threshold, or at least an appropriate reference, what is the parameter and the corresponding value on which they consider a correlation to be acceptable or not. My understanding is that the decision depends on the value of R^2, as hinted by lines 198-199 in section 3.1. If this is the case, an explicit indication of what is the minimum value of R^2 for an acceptable correlation would improve the objectivity of the discussion.


I took note of some parts of the text that need language corrections (see comments on the attached pdf file). More complete adjustments can be made at the type-setting stage and I do not think it is necessary to further review the manuscript.

 

Comments for author File: Comments.pdf

Author Response

I read the revised manuscript and the replies to my first round of comments and I see that the authors introduced proper clarifications and amendments, making the text much clearer. I think that now the paper comes in an acceptable form, even though it still needs some degree of language revision and, possibly, the inclusion of one missing piece of information. Specifically, the authors need to clarify with an explicit threshold, or at least an appropriate reference, what is the parameter and the corresponding value on which they consider a correlation to be acceptable or not. My understanding is that the decision depends on the value of R^2, as hinted by lines 198-199 in section 3.1. If this is the case, an explicit indication of what is the minimum value of R^2 for an acceptable correlation would improve the objectivity of the discussion.

Reply: Many thanks for the valuable comment. Our judgement whether we put the combinations of parameters in section 3.1 or in section 3.2, is based on the p-value. If the p-value is less than 0.05, we consider the linear correlation is significant and put it in section 3.1; otherwise, it is put in section 3.2. We have added an explicit statement at the 1st sentence of section 3.2.

----------------------

I took note of some parts of the text that need language corrections (see comments on the attached pdf file). More complete adjustments can be made at the type-setting stage and I do not think it is necessary to further review the manuscript.

Reply:  We are very grateful for the reviewer's careful reading and the detailed corrections. We corrected them and marked them in bold face. 

Author Response File: Author Response.pdf

Back to TopTop