Next Article in Journal
Exploring COVID-19 Daily Records of Diagnosed Cases and Fatalities Based on Simple Nonparametric Methods
Previous Article in Journal
COVID-19 Treatment Guidelines: Do They Really Reflect Best Medical Practices to Manage the Pandemic?
 
 
Article
Peer-Review Record

A Statistical Analysis of Death Rates in Italy for the Years 2015–2020 and a Comparison with the Casualties Reported from the COVID-19 Pandemic

Infect. Dis. Rep. 2021, 13(2), 285-301; https://doi.org/10.3390/idr13020030
by Gianluca Bonifazi 1,2, Luca Lista 3,4, Dario Menasce 5,*, Mauro Mezzetto 6, Alberto Oliva 2, Daniele Pedrini 5, Roberto Spighi 2 and Antonio Zoccoli 2,7
Reviewer 1:
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Infect. Dis. Rep. 2021, 13(2), 285-301; https://doi.org/10.3390/idr13020030
Submission received: 7 February 2021 / Revised: 22 March 2021 / Accepted: 23 March 2021 / Published: 1 April 2021

Round 1

Reviewer 1 Report

This is a well written article focusing on modeling for the COVID crisis in Italy comparing two data sets. The level of detail, especially for model fit and subgroup analysis like age and gender is notable and makes the article relevant for other epidemiologists and statisticians as well as those who would use this article for teaching purposes. The authors could improve on the application of their article to different audiences and future researchers. Specifically the introduction could elude to the impact of such high excess deaths. The conclusion could point out that COVID consequences such as burden on hospitals in Northern Italy must have been astounding compounding problems with distribution of needed equipment. Perhaps this inequality in burden of care in some areas contributed to further death rates. Other studies could address this. The authors should add directions for future research also for others modeling infectious disease and the importance of using two data sets for identifying and addressing discrepancies.

Author Response

Dear reviewer, we thank you for your constructive and well thought comments. Below we provide detailed answers to each individual point your raised and describe how we changed, wherever appropriate, our manuscript text in order to clarify the issues at stake in order to improve its readability.

Issue: The authors could improve on the application of their article to different audiences and future researchers. 

Answer: the main focus of this article is the statistical methodology adopted to extract the excess of deaths corresponding to the COVID-19 pandemics and to other peaks, presumably due to seasonal flu, from the background rate of mortality due to other sources. It is not meant to discuss the causes of the excesses or the effects of the epidemics, since this is out of the expertise of the authors. The adopted methodology is customarily used to extract meaningful information, according to a model, from an ensemble contaminated by a background whose shape can be derived directly from the data.

To this extent, we think we are already addressing such audiences and researchers with our description of the adopted methodology. 

We’ll describe this point in the introduction.

Issue: Specifically the introduction could elude to the impact of such high excess deaths. 

Answer: as described in the above comment, we would prefer to abstain from commenting on the causes and impact of such excess deaths, since we have no data to perform deeper analyses nor a model for the consequences of such an impact. Our work only deals with how to extract the numerical values with the highest possible accuracy with a correct statistical approach.

Issue: The conclusion could point out that COVID consequences such as burden on hospitals in Northern Italy must have been astounding compounding problems with distribution of needed equipment. 

Answer: once more, we do not feel that our quantitative analysis enables us to point to specific inefficiencies or problems of the National Health Care System. We limit ourselves to extract the numbers and the corresponding uncertainty ranges and leave it to specialists to use these numbers to draw conclusions concerning the problem of distribution of needed equipment. Furthermore  we do not have access to data adequate to make specific studies about the consequences of the burden on the national Health System.

Issue: the authors should add directions for future research also for others modeling infectious disease and the importance of using two data sets for identifying and addressing discrepancies.

 

Answer: our methodology is widely used in several research fields in Physics. The adopted models are based on a heuristic approach: the model of 13 Gaussians, plus a sinusoid and a Gompertz, provides a fit with a reasonable  goodness-of fit, as determined by the chi square estimator, and the distribution of pulls (shown in the text) demonstrates that their assumption as a model is able to provide an interpolation with good levels of quality and accuracy.
For what concerns the usefulness of using two data sets for comparison, this is certainly a good thing (when data are available), but in order to draw conclusions about the reason for the discrepancy (which we abstain to indicate), richer samples, with more qualifiers, are needed. 

A commonly used criteria to establish how well a model describes the data, in statistics, is a careful check of the pulls distribution. The fact that this distribution turns out to be centered on zero with a very small error is a remarkable indicator that our choice for the model used to describe the data is justified and accurate.
We added a sentence in the conclusion to this regard.

Reviewer 2 Report

In the manuscript entitled, “A statistical analysis of death rates in Italy for the years 2015–2020 and a comparison with the casualties reported for the COVID-19 pandemic”, authors attempted to evaluate the trend of causalities in Italy during seasonal flu epidemics, coincide with the on going COVID-19 pandemic. Authors utilized Gaussian distribution to evaluate statistical data publicly available on Italian National Institute of Statistics (ISTAT) for the period of 01-01-2015 to 30-09-2020. And, with the application of Gaussian distribution model, they also compared the COVID-19 related causality data obtained from Italian Department for Civil Protection (DPC) with the data obtained from ISTAT. The manuscript is included with a very rich information, inclusive of COVID-19 related death and co-incidence of flu seasonal epidemic, which seems like the core of this manuscript. However, authors failed to bring out such significances in both introduction and result sections of the manuscript. The manuscript is rather presented in a way to justify the application of Gaussian distribution model for the analysis and interpretation of these data. So, there are some confusion related to the objective of this manuscript. Legends given with the graphs and tables are containing a very limited or partial information. Data presented in tables are mostly crude which can be more refined to makes it easier to understand and maintain interest of readers.

 

Minor comments:

  1. Title of the manuscript need to be updated with the a suitable one to match with the given information.
  2. Equation 2 and Equation 3 mentioned in Table2 and line 124 should be marked properly.
  3. Line 41-42 is repeated with the Figure 2 legend.
  4. Add more discussion to the manuscript and change ‘Results’ to ‘Results and discussion’.
  5. In the figure4, peaks are labelled a g1-g13. No information given in the legend about such labeling and similarly table1 is also partially explained. Such examples are many and repetitive in the manuscript.

Author Response

Dear reviewer, we thank you for your constructive and well thought comments. Below we provide detailed answers to each individual point your raised and describe how we changed, wherever appropriate, our manuscript text in order to clarify the issues at stake in order to improve its readability.

Issue: ...however, authors failed to bring out such significances in both introduction and result sections of the manuscript. 

Answer: We acknowledge this suggestion and will provide an expanded introduction and conclusions section to this regard.

Issue: the manuscript is rather presented in a way to justify the application of Gaussian distribution model for the analysis and interpretation of these data. So, there are some confusion related to the objective of this manuscript. 

Answer: We disagree with this statement. The analysis presented in the paper aims at establishing a statistically solid methodology to extract quantitative values from the data distributions. In order to obtain this, a model is necessary and we chose the simplest model possible (gaussian, sinusoid and Gompertz): the good values of the estimators returned by the fit confirms that the chosen model well represents the data, within the accuracy dictated by the available data sample. We added a sentence in the introduction to this regard.

Issue: legends given with the graphs and tables are containing a very limited or partial information. 

Answer: We have improved on this point in the revised version of the paper, but the amount of information that can be extracted from the plots has already mostly been reported and commented in the text.

Issue: data presented in tables are mostly crude which can be more refined to makes it easier to understand and maintain interest of readers.

Answer: it is not clear to us what kind of additional information can be added to the tables. The exact meaning of each and any value has been described in the text. We’ll try to improve on this point, nevertheless.

Minor comments:

  1. Title of the manuscript need to be updated with the a suitable one to match with the given information.

We have already submitted the paper on arXiv and are therefore reluctant to change the title, moreover, in our humble opinion, the title already reflects the main point raised in the paper.

  1. Equation 2 and Equation 3 mentioned in Table2 and line 124 should be marked properly.

Sorry, we're not sure we fully understand the meaning of this suggestion: just in case we modified the captions in in the Table 2 adding more complete descriptions of the parameters shown.

  1. Line 41-42 is repeated with the Figure 2 legend.

Took note, we changed the text appropriately

  1. Add more discussion to the manuscript and change ‘Results’ to ‘Results and discussion’.

    We did it as per your request
  1. In the figure4, peaks are labelled a g1-g13. No information given in the legend about such labeling and similarly table1 is also partially explained. Such examples are many and repetitive in the manuscript.

    Since the peaks are represented by gaussians (as explained in the text) and a gaussian is defined by equation 3 by the symbol g_i, we thought it was clear what the symbol means in the pictures. We’ll clarify this in the updated version of the paper

Reviewer 3 Report

I found the article very interesting. The presentation of some data is very interesting, which makes it exceptionally legible. Despite many advantages, however, I have suggestions:
- to move the tables from the Conclusions part to the Discussion part,
- the article lacks a clear literature review referring to similar analyzes of COVID-19 in the world.
Good job :)

Author Response

Dear reviewer, we thank you for your constructive and well thought comments. Below we provide detailed answers to each individual point your raised and describe how we changed, wherever appropriate, our manuscript text in order to clarify the issues at stake in order to improve its readability.

Issue: to move the tables from the Conclusions part to the Discussion part

Answer: We tried our best to improve the structure of the document following your suggestion

Issue: the article lacks a clear literature review referring to similar analyzes of COVID-19 in the world.

Answer: We realize this is a good suggestion. We tried to improve on this

 

Reviewer 4 Report

I have several doubts about this manuscript that can be serious impediments to the publication, if they are not solved adequately.

We can begin with the absence of a clearly defined research hypothesis, and a final scientific target declared and discussed in both the abstract and the Introduction.

What do the authors want to demonstrate with their paper? Not only, but what do they add to what we already know from the already published literature?

My simple guess is that they want to confirm the role played by Covid-19 in Italy during the 2020 spring in terms of excess deaths, yet this is already evident from the plots provided by the Italian Institute of Statistics, and discussed as well in many published papers.

More interesting, and difficult to achieve, instead, would be trying to estimate the magnitude of those excess deaths, by also avoiding all the bias of this sad phenomenon (such as death misclassifications and so on).

Well, if this is the scientific target of this paper (i.e., count how many excess deaths are due to Covid) I have many difficulties with the statistical methods they have decided to adopt.

In fact, with the aforementioned target in mind, it would be reasonable to proceed with some other alternative methodology, able to estimate the deviation of total number of deaths in 2020 from the expected baseline levels of mortality from previous years.

This could be done in several different ways, even if at the end what many have done has been to fit some autoregressive quasi Poisson or autoregressive negative binomial regression models, adapted for time series analysis form previous years, to forecast the expected number of deaths in the absence of the Covid disease. Excess deaths were detected when observed mortality was above some given upper bound, within, a predefined confidence interval.

In this paper, instead, an extravagant model has been adopted with the idea of concatenating all subsequent years yielding a sinusoidal (!) behavior of the deaths across different years and then sub(modeling) the peaks, either with Gaussians (for flu) or with a Gompertz (for Covid 19). 

Further, why don't they discuss with more attention on the use of a Gompertz curve? Usually, a logistic or a Gompertz curve are used to describe a fast growth and slower decrease. Maybe, this has been true for the peak of the 2020 spring, but we have not seen an exponential decrease in the deaths of the second wave in Italy (winter 2020).

In any case, my opinion is that this model is both naive and unusual, and it has the defect that at the end the authors only yield just another description of this phenomenon, but without any possibility to estimate what really we are interested in: the the magnitude of the excess deaths.

My proposal is that they provide a serious revision of their with more attention to all this factors:

  • Intro is too brief
  • A comparative review of recent literature is missing
  • a reflection on the used models is in order, also with the aim to explain why to use this extravagant method in contrast with more effective alternatives, and not only with reference to death but also to infections
  • a numerical comparison with results from alternatives would be desirable (expressed in terms of how many excess deaths are due to Covid)
  • the Reference section is scarce and should be supplemented.

A (non exhaustive) list of papers to read and cite could be as follows:

Excess mortality during the COVID-19 outbreak in Italy: a two-stage interrupted time-series analysisMatteo Scortichini, Rochelle Schneider dos Santos, Francesca De’ Donato, Manuela De Sario, Paola Michelozzi, Marina Davoli, Pierre Masselot, Francesco Sera, Antonio Gasparrini, International Journal of Epidemiology 2020

 

Estimating the burden of SARS-CoV-2 in France, Salje H.,Kiem C.T.Lefrancq N.Courtejoie N.Bosetti P.Paireau J.Andronico A.Hozé N.Richet J.Dubost C.-L.Strat Y.L.Lessler J.Levy-Bruhl D. Fontanet A. Opatowski L. Boelle P.-Y.Cauchemez S., Science 2020

Temporal dynamics in total excess mortality and COVID-19 deaths in Italian cities. Michelozzi P. de’Donato F. Scortichini M. Pezzotti P. Stafoggia M.Sario M.D. Costa G. Noccioli F. Riccardo F. Bella A. Demaria M. Rossi P. Brusaferro S. Rezza G. Davoli M., Bmc Public Health. 2020

Estimation of Excess Deaths Associated With the COVID-19 Pandemic in the United States, March to May 2020. Weinberger D.M. Chen J. Cohen T. Crawford F.W. Mostashari F. Olson D. Pitzer V.E. Reich N.G. Russi M. Simonsen L. Watkins A. Viboud C., Jama Intern Med. 2020

A Cross-Regional Analysis of the COVID-19 Spread during the 2020 Italian Vacation Period: Results from Three Computational Models Are Compared Luca Casini Marco Roccetti, Sensors 2020

Empirical model for short-time prediction of COVID-19 spreading, Martí Català,Sergio Alonso ,Enrique Alvarez-Lacalle,Daniel López,Pere-Joan Cardona,Clara Prats, Plos Computational Biology 2020

Monitoring for outbreak-associated excess mortality in an African city: Detection limits in Antananarivo, Madagascar, Fidisoa Rasambainarivo, Anjarasoa Rasoanomenjanahary, Joelinotahiana Hasina Rabarison, Jean Michel Heraud, C. Jessica E. Metcalf, Benjamin L. Rice, International Journal of Infectious Diseases, 2020

 

Author Response

Dear reviewer, we thank you for your constructive and well thought comments. Below we provide detailed answers to each individual point your raised and describe how we changed, wherever appropriate, our manuscript text in order to clarify the issues at stake in order to improve its readability.

Issue: We can begin with the absence of a clearly defined research hypothesis, and a final scientific target declared and discussed in both the abstract and the Introduction.

Answer: The subject of our paper is not a research hypothesis. Rather, it is a practical application of a widely used statistical methodology in scientific research in order to quantify excesses with respect to a background by determining simultaneously the shape of the peaks and background from data, providing, at the same time, a mathematically correct estimate of the relative uncertainties. We’ll modify the text to clarify this point

Issue: What do the authors want to demonstrate with their paper? Not only, but what do they add to what we already know from the already published literature?

Answer: we are not trying to demonstrate something new. Our aim is to apply an effective and proven  statistical methodology to the data set of deceased people. The novelty of this approach consists in the fact that in literature alternative methods have been discussed which provide  less precise values to be extracted from the background and, moreover, do not provide the relative uncertainties.

Issue: My simple guess is that they want to confirm the role played by Covid-19 in Italy during the 2020 spring in terms of excess deaths, yet this is already evident from the plots provided by the Italian Institute of Statistics, and discussed as well in many published papers.

Answer: The plots certainly provide visual evidence of the role played by the Covid-19 in Italy, but we want to additionally provide a statistically solid method (whose validity has been amply proven by many decades  of use in physical science) to extract the most accurate values for the death excesses and the relative uncertainties, as already explained.

Issue: More interesting, and difficult to achieve, instead, would be trying to estimate the magnitude of those excess deaths, by also avoiding all the bias of this sad phenomenon (such as death misclassifications and so on).

Answer: the only quantity we can compute, given the nature of the data at our disposal, is just the crude magnitude of the excess peaks parametrized as gaussians or a Gompertz. The effect of misclassification or, worst, the effects of overburdened hospitals (which tended to increase the number of casualties) along with the effect of reduced deaths due to the lock-down restrictions (which saved an undetermined number of lives by preventing people to die of work or traffic related accidents) cannot be estimated from the data at our disposal, since the necessary details are not recorded in them.

We are aware that in correspondence of the COVID19 peak other effects might have contributed to the build up of the peak: overburdened hospitals, preventing emergency help to other kinds of patients (that have therefore died), also contribute to the excess peak. The diminished number of deaths with respect to the average due to a reduced mobility caused by lockdown and a reduced number of deaths due to work-related activities, on the contrary, tend to diminish the amplitude of the wave with respect to previous years. Nevertheless, the method to compute the area of the peak with respect to the sinusoid baseline remains valid. This method simply provides the excess of deaths in that period, not the excess specifically due to COVD19. In fact we compare this bare number with the one provided by DPC, which counts the deaths directly due to COVID, finding a large discrepancy due to a host of reasons we do not comment on, since we do not have the necessary information to do so.

Issue: Well, if this is the scientific target of this paper (i.e., count how many excess deaths are due to Covid) I have many difficulties with the statistical methods they have decided to adopt.

Answer: we are surprised by this comment. The methodology indicated in our paper is definitely not new: besides being described in uncountable statistics text books (reference a James),. Whenever there is a data distribution which can be modeled by a certain number of reasonably chosen functions, the least squares approach is statistically proven to be one of the best methods available to compute the values of the excesses with respect to what is considered a background with a known shape. In addition, it provides numerical estimators to assess how well the model describes the data.

Issue: In fact, with the aforementioned target in mind, it would be reasonable to proceed with some other alternative methodology, able to estimate the deviation of total number of deaths in 2020 from the expected baseline levels of mortality from previous years.

Answer: in fact our aim is the reverse of what you propose. We did  explore the available literature (in particular the one you suggested) and verified that  the method we propose has never been used to tackle this problem. Alternative methods do not provide both the results AND the associated uncertainty (one computed with a mathematically correct method like the least squares method). Moreover, the simultaneous determination of the magnitude of the excesses due to the COVID-19 pandemic and the individual seasonal flues, together with the shape of the background, prevents from systematic biases introduced in other methods, for instance when averaging the number of deaths in previous years in order to determine the baseline to be subtracted from the COVID-19 pandemic excess.

Issue: This could be done in several different ways, even if at the end what many have done has been to fit some autoregressive quasi Poisson or autoregressive negative binomial regression models, adapted for time series analysis form previous years, to forecast the expected number of deaths in the absence of the Covid disease. Excess deaths were detected when observed mortality was above some given upper bound, within, a predefined confidence interval.

Answer: we are convinced that the method we propose is more robust than some autoregressive quasi Poisson method, for the reasons explained before, which we took care to add in the introduction and in the body of the paper in order to clarify your point. (Aggiungere commento sul fatto che una autoregressve Poisson non può essere paragonata a un fit) 

The statistical properties and the advantages of maximum-likelihood, hence minimum chi-squared estimate methods, are well documented in literature and unanimously recognized by statisticians, and are not matter of debate since many decades at least.

Issue: In this paper, instead, an extravagant model has been adopted with the idea of concatenating all subsequent years yielding a sinusoidal (!) behavior of the deaths across different years and then sub(modeling) the peaks, either with Gaussians (for flu) or with a Gompertz (for Covid 19).

Answer: we consider the adjective “extravagant” highly inappropriate and somewhat derogatory: in literature there is an ample use of gaussian models to represent the seasonal excess of deaths with respect to a background modeled by a sinusoidal wave (for example: https://gis.cdc.gov/grasp/fluview/mortality.html, ). Specifically, the organization officially devoted to monitor the death mortality in Europe and the yearly excesses due to influenza, also adopts a sinusoidal function as the background (see https://www.euromomo.eu/).  In particular, we show how well the fit pulls behave, an overwhelming indication that the model chosen well represents the data. To the extent of addressing these points we added bibliographical references in the text.

Issue: Further, why don't they discuss with more attention on the use of a Gompertz curve? Usually, a logistic or a Gompertz curve are used to describe a fast growth and slower decrease. Maybe, this has been true for the peak of the 2020 spring, but we have not seen an exponential decrease in the deaths of the second wave in Italy (winter 2020).

Answer: the first wave is very well represented by a Gompertz and this can be gleaned by the good chi square of the fit and by the very good behaviour of the pulls. We do not analyze the second wave in our analysis, but in general, if there are additional contributing factors in the development of the pandemic that tend to distort the Gompertz, these must certainly be taken into account. For the first wave the pandemic peak does not show any sign of additional factors distorting it and we therefore conclude that a simple Gompertz model is apt at describing the data.

Issue: In any case, my opinion is that this model is both naive and unusual, and it has the defect that at the end the authors only yield just another description of this phenomenon, but without any possibility to estimate what really we are interested in: the the magnitude of the excess deaths.

Answer: we are at a loss here: in all tables featured in the article we provide exactly the magnitudes we are looking for, with the correctly estimated errors. The fact that this distribution turns out to be centered on zero with a very small error is a remarkable indicator that our choice for the model used to describe the data is justified and accurate. Moreover, the time series consists in almost 2000 points while the model functions has 46 parameters: this means that the fit is largely over-constrained (with a very large number of degrees of freedom). Any deviation of the data with respect to the assumed model would turn out in a very big chi square value, the pull distribution would definitely not feature a gaussian shape and it would not have a null mean value. All these are indications that our model well represents the data, therefore justifying our claim that we can extract excess values with a robust technique.

My proposal is that they provide a serious revision of their with more attention to all this factors:

  • Intro is too brief

We will certainly improve on this point in the revised manuscript

  • A comparative review of recent literature is missing

A good suggestion: we’ll provide this review.

  • Issue: a reflection on the used models is in order, also with the aim to explain why to use this extravagant method in contrast with more effective alternatives, and not only with reference to death but also to infections
  • Answer: As declared above, we continue to consider the term “extravagant method” unfair. It is not a method we invented, it is a method widely and extensively used in scientific literature. Furthermore it is demonstrated to be the most effective way to extract quantitative values in a statistically correct method when compared to other alternative approaches. In addition we clearly stated that our methodology only deals with a correct extraction of the numerical values of the excess peaks abstaining to comment on the causes and implications of these excesses. The study of infections is out of the scope of this paper, essentially because we do not have all the necessary data in our sample. a numerical comparison with results from alternatives would be desirable (expressed in terms of how many excess deaths are due to Covid). It is not clear where we could compute these values given the data source at our disposal which does not provide the necessary kind of information.
  • the Reference section is scarce and should be supplemented.

    We’ll provide an expanded list

Round 2

Reviewer 1 Report

The paper is improved and the method is solid. It will be better if the authors can make other requested changes.

Author Response

Dear reviewer, we thank you for this second round of suggestions.

You commented: "The paper is improved and the method is solid. It will be better if the authors can make other requested changes."

Our answer is the following:

Dear reviewer,

    We are a somewhat  confused by your reply. It is not clear to us what are the changes you request besides those we have implemented following your first suggestions. 

Could you please be more specific about the requested changes you consider to be still missing? 

Moreover we are kindly asking you to clarify why, given your initial statement  “The paper is improved and the method is solid”, you applied a significant demotion of the overall score of the paper with respect to the first version.

Please let us know

respectfully

Reviewer 4 Report

I am in the trouble of confirming my negative evaluation on this paper.

Just for a moment, I do want to skip all the methodological concerns I had raised which received only elusive answers from the authors.

And I go to the main issue: I cannot ignore the two following negative facts:

1 this paper, at the end, does not add anything more or new with respect to the reports published by the Italian institute of statistics;

2 the authors have answered to the reviewer that:

  • the revised manuscript has a new Introduction ameliorated with a more extended comparison with the current literature. Despite their formal declaration, this has not been done. The first version of the paper contained an Introduction as long as some 20 lines, the current one is as long as 28 lines!
  • the revised manuscript has an extended References Section. This is is not true, again. The former version of the paper had 18 references, the current version of the paper has 19 references!

I think that this paper has a little potential for publication, if authors systematically ignore, neglect or elude the comments and suggestions provided by the Reviewers

 

 

Author Response

Dear reviewer, we thank you for you second round of suggestions.

Following is your main remark, in red:

This paper, at the end, does not add anything more or new with respect to the reports published by the Italian institute of statistics

This is our answer:

Dear reviewer we disagree with your point: we significantly changed the text in several places to indicate why the method we employ significantly improves the results obtained by ISTAT, Euromomo and other measurements of this kind found in literature. 

The introduction has been extensively revised and improved. 

We have added more references, this time, according to your request and suggestion.

Additionally the bibliography contains now review articles which provide, internally an extensive number of references.

Thank you

Back to TopTop