A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil

Yale, Camila Pareja; Yoshizaki, Hugo Tsugunobu Yoshida; Fávero, Luiz Paulo

doi:10.3390/math10224352

Open AccessArticle

A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil

by

Camila Pareja Yale

^1,*,†

,

Hugo Tsugunobu Yoshida Yoshizaki

^1,*,†

and

Luiz Paulo Fávero

^2,*,†

¹

Production Engineering Department, Polytechnic School, University of São Paulo—USP, São Paulo 05508-010, Brazil

²

Accounting Department, School of Economics, Business and Accounting, University of São Paulo—USP, São Paulo 05508-010, Brazil

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(22), 4352; https://doi.org/10.3390/math10224352

Submission received: 30 September 2022 / Revised: 13 November 2022 / Accepted: 15 November 2022 / Published: 19 November 2022

(This article belongs to the Special Issue Advancements in Machine Learning and Statistical Modeling, and Real-World Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This article presents the results of the implementation of a forecasting model, to predict the relief materials needed for assisting in decisions prior to natural disasters, thus filling a gap in the exploration of Generalized Linear Mixed Models (GLMM) in a humanitarian context. Demand information from the State of Sao Paulo, Brazil was used to develop the Zero Inflated Negative Binomial Multilevel (ZINBM) model, which gets to handle the excess of zeros in the count data and considers the nested structure of the data set. Strategies for selecting predictor variables were based on the understanding of the needs for relief supplies; consequently, they were derived from vulnerability indicators, demographic factors, and occurrences of climatic anomalies. The model presents coefficients that are statistically significant, and the results show the importance of considering the nested structure of the data and the zero-inflated nature of the outcome variable. To validate the fitness of the ZINBM model, it was compared against the Poisson, Negative Binomial (NB), Zero Inflated Poisson (ZIP), and Zero Inflated Negative Binomial (ZINB) models.

Keywords:

zero-inflated models; count data models; multilevel models; hierarchical models; random effects; nested models; relief demand forecasting model; humanitarian logistics

MSC:

62J12

1. Introduction

In the aftermath of a disaster, scarcity or mismanagement of relief supplies can compromise the emergency response. Therefore, precise information about relief supplies is highly important for humanitarian operations and for the reduction of post-disaster losses.

However, there are few works about demand forecasting in the humanitarian area [1,2]; according to [3], some possible causes are the low awareness and understanding by social organizations of the value of demand prediction.

Most previous studies focused on forecasting food demand after a disaster. Ref. [4] proposed a model to calculate the probability of starvation. Ref. [5] predicted food donation by type of donor and location. Refs. [6,7] estimated the food demand after an earthquake. In addition, Ref. [8] focused on the demand of agricultural products.

Another group of studies addressed the prediction of relief materials demand. Ref. [9] analyzed demand by type of material. Ref. [10] evaluated demand by type of donor. In addition, Ref. [11] predicted the overall demand.

Finally, Refs. [12,13] developed models to predict the number of affected people by earthquakes. Refs. [14,15] predicted the blood demand and fuel demand after a disaster, respectively.

Earthquakes were the most addressed disaster type; hurricanes, floods, avalanches, and other sudden onset disasters were less common. Furthermore, the preference to apply Artificial Intelligence (AI) models was evident. Support Vector Machine (SVM), Neural Network (NN), Case-Based Reasoning (CBR), Principal Component Analysis (PCA) and hybrid models were employed in ten of fourteen papers. Four papers proposed time series based models, the Autoregressive integrated moving average (ARIMA) model being the most common.

All of these past studies addressed demand forecasting in the response phase of the disaster, so they took as a fact the existence of demand for relief supplies. When predicting that demand before a disaster in a large, country-size area, for each of the many counties or provinces, historical data shows that many of their outputs are zero, as most sudden onset disasters have limited geographical scope. This adds to the complexity to recognize the presence (or absence) of demand.

With this motivation, the present study aims to implement a relief supply demand forecasting model using actual data from the state of Sao Paulo, Brazil, which has the same size of the United Kingdom. Although the state of Sao Paulo is not susceptible to large-scale disasters, smaller-scale disasters are frequent, especially in vulnerable regions with extreme social inequalities [16,17].

As demand values present an excessive amount of zeros, Zero-Inflated (ZI) and hurdle models can be appropriated for the study’s goal [18,19]. Hurdle models assume that all zeros are generated by a structural source, which adds a strong restriction in the model. On the other hand, the zero values from ZI models can be generated either by a structural source or a sampling source [20]. Relief supply demand’s structural source of zeros is concerning the occurrence of a disaster. However, it is possible to obtain sampling zeros when a disaster occurs.

Thus, the main contribution of this study is to approach a little explored phenomenon-supply demand forecasting for disaster response by employing an innovative Zero-Inflated Negative Binomial Multilevel (ZINBM) model, which considers not only the structural and sample zeros, but also the hierarchic groupings of the dataset.

One of the key features of this study is to consider the hierarchical character presented by the municipalities of Sao Paulo through a multilevel model that allows us to combine information at the individual and group levels, allowing greater efficiency when estimating coefficients for specific groups or identifying effects at the aggregate level, as evidenced in the study [21]. Another important contribution is that this is the first study in the humanitarian area that addresses the demand forecasting from an accumulative perspective, having to deal with the excess of zeros in the data.

To validate the suitability of the ZINBM model, a Poisson model was implemented at a first stage to determine whether the data presented over-dispersion; finally, the performance of the proposed model was compared against some base models: Negative Binomial (NB), Zero-Inflated Poisson (ZIP), and Zero-Inflated Negative Binomial (ZINB).

The principal challenges in the study were the limited amount of data and the complexity of the features for prediction and the uncertainty, typical of these kinds of models.

2. Materials and Methods

2.1. Data

The data employed for this study include the aggregated demand in kilograms of relief supplies in the municipalities of Sao Paulo from 2015 to 2020. The data set was constituted by 443 registers with the name of the municipality, the year of the event and the amount of the demand. All years that each municipality did not require relief supplies were considered as demand equal to zero. From the 645 municipalities of the state, only 161 needed supplies at some point in the five years. These data were obtained from the Coordenadoria Estadual de Proteção e Defesa Civil do Estado de São Paulo (CEPDEC-SP), the Sao Paulo Civil Defense Coordination.

More of the 80% of the demand is composed of basic food baskets and cleaning kits. Other products were requested in smaller proportions, as shown in Figure 1.

Figure 2 displays the number of requests by municipality in the five-year period. Here, it is possible to observe the great quantity of zero values and the over-dispersion in the demand, factors that made it necessary to apply Generalized Linear Models (GLM)s, which are specific for count data regression models.

2.2. Predictor Variables

The predictor variables were selected considering the vulnerability of the municipalities (IPVS), demographic aspects, and climatic anomalies caused by the El Niño phenomenon. Table 1 describes each variable.

The IPVS contemplates the inequality of the municipalities and the localization of their poverty areas. Among the factors considered by the vulnerability level IPVS are the income, literacy, health, job opportunities, and access to services offered by the state and social mobility opportunities. The seven vulnerability levels of the IPVS were aggregated in three groups: very high, high and moderate vulnerability, to improve the statistical significance of the predictor variables in the model.

The climatic anomaly variable is related with the happening of the phenomenon El Niño, so it is mapped as a categorical variable with values “yes” and “no”. This value was taken from the result of the prevision system ENSO, which annually publishes an alert of the phenomenon.

The Brazilian Institute of Geo-Statistics IBGE issues each year an estimation of the population for each municipality. The methodology employed for the estimation is based on the tendency of the population growth observed in the last two census, for each municipality. Due the scale of the populations in the municipalities, the final predictor was obtained by taking the logarithm of the value.

To determine the levels in the ZINBM model, a categorical variable was generated applying the K-means clustering method [22]. The disaster susceptibility of each municipality was considered as the only variable to calculate the distance among observations and generate the clusters. A different number of clusters were tested and scored calculating the Silhouette Coefficient [23]; Figure 3 shows the silhouette values; it can be evidenced that the optimum value of clusters is three.

Table 2 shows descriptive statistics for quantitative and qualitative variables.

2.3. Methodology

The predictor variables and the demand show nonlinear relationships, which points to the use of GLMs that apply advanced statistical techniques for parameters adjustment. As the demand is a positive integer number, a Poisson regression and a NB models were considered. Both models belong to a subgroup of the GLMs, known as count data models [24].

To select the appropriated counting model, it is necessary to identify the characteristics of the dispersion of the data. The existence of over-dispersion was identified in the data of the outcome variable, through the test proposed by [25], making the NB model more suitable than the Poisson model. The result is shown in Table 3.

To confirm the excess of zeros in the outcome variable of the demand for relief supplies, the Vuong test [26] was applied, which compares the likelihood functions between the zero-inflated model with the traditional analogue model. Table 4 displays the result of the Vuong test and the corrections proposed in the paper [27] when comparing the NB model with the ZINB.

The value of the Vuong test estimate is

z = 40.6094

, and the corrected statistics AIC and Bayesian Information Criterion (BIC) are

z = 40.6088

and

z = 40.6073

, respectively. All the results have p-values lower than 0.05. Therefore, the ZINB model is more suitable in comparison with the NB model.

The ZI models combine a binary model with a data count model. This way, the binary element helps to determine the presence of a zero response, known as structural zero. Meanwhile, the data count element determines the occurrence of a phenomenon, it being possible to output a zero response, known as sample zero [18].

The existence of different vulnerabilities of the municipalities, related to geographic, social, and economic factors, that reduce the opportunities for preventing natural disasters, suggests a nested structure in the data. This characteristic limits the good fitness of the GLM models, creating the need of the application of Generalized Linear Mixed Models (GLMM), also called “multi-level” models, which take into consideration the existence of dependence among observations of the same group [28].

Consequently, for the purpose of the study, a ZINBM model was implemented for the supply demand forecasting because it is capable of handling count data with the excess of zeros, taking into account the hierarchical structure of the dataset.

The Poisson, NB, ZIP, and ZINB models were implemented, to compare and validate the fitness of the ZINBM model.

In this work, all estimates are obtained using the R software version 4.0.4, with the MASS package for the Poisson and NB models, the pscl package for the ZINB model, and the glmmTMB package for the model ZINBM.

3. Results

The results of the estimations of the models NB, ZINB, and ZINBM are presented in Table 5.

Low p-values (<0.01) suggest that the coefficients are statistically significant considering a significance level of 1% and tell us how well each predictor variable is capable of predicting the value of the response variable.

The logistic component of the ZINB and ZINBM models indicates that the increase of one unit in the moderate vulnerability increases the chance of structural zeros by 3%

(e^{1.74 + 1.62} / (1 + e^{(1.74 + 1.62)}) = 0.97)

, 15% for high vulnerability and 42% for very high vulnerability, therefore confirming that the vulnerability variable has a preponderance in the absence of demand for relief supplies in the municipalities. On the other hand, the NB model is not capable of identifying the effect of the vulnerability over the demand. The ZINBM estimation has the ability to estimate random effects for the three generated clusters for the municipalities of Sao Paulo. Thus, this model presents in the random effects different error terms related to the intercepts for each group of municipalities that characterize the hierarchical levels.

Equation (1) corresponds to the NB model. The ZINB model is represented by Equation (2) showing both the logistic and the count component. Finally, Equation (3) displays the ZINBM model, its logistic and count components, and the random error terms for the intercepts

(v_{0 j})

:

d_{i} = - 1.04968 + 0.60127 c l i m a t e_a n o m a l y_{y e s_{i}} + 0.59088 p o p u l a t i o n_l o g_{i},

(1)

d_{i} = \{1 - \frac{1}{1 + e^{[- (1.74095 + 1.62079 v u l_{m o d e r a t e_{i}} - 1.43442 v u l_{v e r y h i g h_{i}})]}}\} * e^{[2.98137 + 1.26171 c l i m a t e_a n o m a l y_{y e s_{i}} - 0.34713 p o p u l a t i o n_l o g_{i}]},

(2)

d_{i j} = \{1 - \frac{1}{1 + e^{[- (1.74001 + 1.6208 v u l_{m o d e r a t e_{i j}} - 1.43433 v u l_{v e r y h i g h_{i j}})]}}\} * e^{[3.95972 + 1.99652 c l i m a t e_a n o m a l y_{y e s_{i j}} - 0.26204 p o p u l a t i o n_l o g_{i j} + v_{0 j}]},

(3)

The fit of the GLM models was evaluated through AIC and log-likelihood. The AIC metric calculates the response variance penalizing it by the number of features of the model. Therefore, the model with less variance and features will obtain a lower AIC and will be considered as the best fit [29].

For comparison purposes, in addition to the ZINBM model, the following models are presented: Poisson, ZIP, NB, and ZINB. Table 6 presents the value of AIC and the log-likelihood associated with each model.

The Likelihood Ratio Test (LRT) is applied to compare the goodness of fit of two statistical models [30]. Table 7 shows the result of the test, comparing the ZINBM model (Model 1) with the ZINB model (Model 2). The p-value obtained lower than the significance level (5%) indicates that Model 1 offers a significant improvement in fit over Model 2.

Figure 4 presents demand histograms and estimates from the models ZINB and ZINBM. Both models were able to explain the behavior of excess zeros in demand. However, high demand values, which can be inferred as a result of disaster occurrences, are better estimated by the ZINBM model.

There are cases where the ZINBM model presents an overestimation in demand, which can be reflected in the increase in inventory costs. Overall, in the context of disasters and emergencies, it is a priority to avoid a lack of relief supplies, which can mean human loss and suffering.

Figure 5 shows the demand geographically, where each column corresponds to a demand interval, and each point represents the demand of a specific municipality.

Both models, ZINB and ZINBM, presented a concentration of estimates in the first range from 0 to 999 kg, which is caused by the excess of zeros in the real demand data. In the range from 1000 to 1999 kg, the ZINBM model overestimates the demand for some municipalities and one can consider a better performance of the ZINB model. In the ranges from 2000 to 2999 kg and greater than 3000 kg of demand, the ZINBM model generates more adherent estimates, in relation to the ZINB model.

4. Discussion and Conclusions

The present study is the first to address demand forecasting for relief supplies from a disaster preparedness perspective. We present and evaluate three models, NB, ZINB, and ZINBM, using exactly the same variables.

Strategies for selecting predictor variables were based on the understanding of the needs for relief supplies. The predictor variables are derived from vulnerability indicators, demographic factors, and occurrences of climatic anomalies related to the El Niño phenomenon. To establish the levels of the hierarchical model, the number of disasters was considered as the only dimension.

When considering the NB model in relation to the ZINB and ZINBM models, an underestimation of the climatic anomaly parameters and an overestimation of the predictor variable of the estimated population logarithm can be seen. The NB model is biased because it does not consider the excess of zeros in the outcome variable, and the estimate of NB does not identify the effect of the vulnerability. The logistic component of the models ZINB and ZINBM shows the effect of vulnerability on the values of structural zeros, confirming this way that the vulnerability variable has a preponderance in the absence of demand for relief supplies, that is, the lower the vulnerability level of a given municipality, the greater the probability of obtaining demands with zero value. Even though the ZINB model takes zero inflation into account, it fails to capture the natural nesting of data between municipalities. This bias is not evidenced in the logistical component. The random effects of each group of municipalities that characterize the hierarchical levels of the ZINBM model showed different error terms related to intersections, in addition to a slight smoothing in the logarithm variable of the estimated population. The evaluation of the models through the metrics of AIC and log-likelihood and the comparison of the values of the estimated demand with the real demand showed a better fit of the model ZINBM that reached the lowest values of AIC and log-likelihood and generated the most adherent estimates among all the models developed, demonstrating the superiority of GLMM models, in relation to GLM, when there is natural nesting in the data. The proposed model can be easily and regularly updated with new estimated population information and ENSO predictions. Such a forecasting model can prove to be a valuable tool for raising awareness of demand for supplies during the disaster preparedness phase.

Future research may apply the approach for the estimation of zero-inflated generalized in other regions that present data with similar characteristics, that is, models where the outcome variable presents an excess number of zeros and the data set has a multilevel perspective that requires random effects.

Author Contributions

Conceptualization, C.P.Y., H.T.Y.Y. and L.P.F.; Data curation, C.P.Y., H.T.Y.Y. and L.P.F.; Formal analysis, C.P.Y., H.T.Y.Y. and L.P.F.; Methodology, C.P.Y., H.T.Y.Y. and L.P.F.; Validation, C.P.Y., H.T.Y.Y. and L.P.F.; Writing—original draft, C.P.Y., H.T.Y.Y. and L.P.F.; Writing—review and editing, C.P.Y., H.T.Y.Y. and L.P.F.; Supervision, C.P.Y., H.T.Y.Y. and L.P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by CAPES Foundation contract No. 88887.387760/2019-00, and CNPq Grant No. 313687/2019-6.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original dataset of population, IPVS and ENSO are available at the following links https://www.ibge.gov.br/estatisticas/sociais/populacao/9109-projecao-da-populacao.html?=&t=notas-tecnicas, https://ipvs.seade.gov.br, https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_advisory/ensodisc.shtml, respectively (accessed on 28 November 2021). The historic demand dataset was provided by the CEPDEC-SP and is not available publicly.

Acknowledgments

The authors appreciate the support by the Sao Paulo Coordination of Civil Defence (CEPDEC-SP) for the data and the support during the research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GLM	Generalized Linear Models
GLMM	Generalized Linear Mixed Models
ZI	Zero Inflated
NB	Negative Binomial
ZINB	Zero-Inflated Negative Binomial
ZIP	Zero-Inflated Poisson
ZINBM	Zero-Inflated Negative Binomial Multilevel
CEPDEC-SP	Coordenadoria Estadual de Proteção e Defesa Civil do Estado de São Paulo
IBGE	Instituto Brasileiro de Geografia e Estatística
ENSO	El Niño Southern Oscillation
IPVS	Índice Paulista de Vulnerabilidade Social
LRT	Likelihood Ratio Test
EWMA	Exponentially Weighted Moving Average
IF	Intuitionistic Fuzzy
CBR	Case-Based Reasoning
PCA	Principal Component Analysis
SVM	Support Vector Machine
SVR	Support Vector Regression
MA	Moving Average
ETS	Exponential Smoothing
EM	Ensemble Model
LSTM	Long Short-Term Memory
RPCA	Robust Principal Component Analysis
AI	Artificial Intelligence
NN	Neural Network
RBF	Radial Basis Function
ARIMA	Autoregressive integrated moving average
PCR	Principal Component Regression

References

Zhu, X.; Zhang, G.; Sun, B. A comprehensive literature review of the demand forecasting methods of emergency resources from the perspective of artificial intelligence. Nat. Hazards 2019, 97, 65–82. [Google Scholar] [CrossRef]
Altay, N.; Narayanan, A. Forecasting in humanitarian operations: Literature review and research needs. Int. J. Forecast. 2020, 38, 1234–1244. [Google Scholar] [CrossRef] [PubMed]
Rostami-Tabar, B.; Ali, M.M.; Hong, T.; Hyndman, R.J.; Porter, M.D.; Syntetos, A. Forecasting for social good. Int. J. Forecast. 2022, 38, 1245–1257. [Google Scholar] [CrossRef]
Mude, A.G.; Barrett, C.B.; McPeak, J.G.; Kaitho, R.; Kristjanson, P. Empirical forecasting of slow-onset disasters for improved emergency response: An application to Kenya’s arid north. Food Policy 2009, 34, 329–339. [Google Scholar] [CrossRef] [Green Version]
Davis, L.B.; Jiang, S.X.; Morgan, S.D.; Nuamah, I.A.; Terry, J.R. Analysis and prediction of food donation behavior for a domestic hunger relief organization. Int. J. Prod. Econ. 2016, 182, 26–37. [Google Scholar] [CrossRef] [Green Version]
Mohammadi, R.; Ghomi, S.F.; Zeinali, F. A new hybrid evolutionary based RBF networks method for forecasting time series: A case study of forecasting emergency supply demand time series. Eng. Appl. Artif. Intell. 2014, 36, 204–214. [Google Scholar] [CrossRef]
Basu, S.; Roy, S.; DasBit, S. A post-disaster demand forecasting system using principal component regression analysis and case-based reasoning over smartphone-based DTN. IEEE Trans. Eng. Manag. 2018, 66, 224–239. [Google Scholar] [CrossRef]
Xu, X.; Qi, Y.; Hua, Z. Forecasting demand of commodities after natural disasters. Expert Syst. Appl. 2010, 37, 4313–4317. [Google Scholar] [CrossRef]
Holguín-Veras, J.; Jaller, M. Immediate resource requirements after hurricane Katrina. Nat. Hazards Rev. 2012, 13, 117–131. [Google Scholar] [CrossRef]
Paul, S.; Davis, L.B. An ensemble forecasting model for predicting contribution of food donors based on supply behavior. Ann. Oper. Res. 2021, 1–29. [Google Scholar] [CrossRef]
Shao, J.; Liang, C.; Liu, Y.; Xu, J.; Zhao, S. Relief demand forecasting based on intuitionistic fuzzy case-based reasoning. Socio-Econ. Plan. Sci. 2020, 74, 100932. [Google Scholar] [CrossRef]
Florez, J.V.; Lauras, M.; Dupont, L.; Charles, A. Towards a demand forecast methodology for recurrent disasters. WIT Trans. Built Environ. 2013, 133, 99–110. [Google Scholar]
Xing, H.; Zhonglin, Z.; Shaoyu, W. The prediction model of earthquake casuailty based on robust wavelet v-SVM. Nat. Hazards 2015, 77, 717–732. [Google Scholar] [CrossRef]
Jiang, P.; Liu, X.; Zheng, M. Emergency Blood Demand Forecasting after Earthquakes. IFAC-PapersOnLine 2019, 52, 773–777. [Google Scholar] [CrossRef]
Fuqua, D.; Hespeler, S. Commodity demand forecasting using modulated rank reduction for humanitarian logistics planning. Expert Syst. Appl. 2022, 206, 117753. [Google Scholar] [CrossRef]
Brollo, M.; Ferreira, C. Indicadores de desastres naturais no Estado de São Paulo. Simpósio de Geologia do Sudeste XI Águas deSão Pedro SP 2009, 14, 125. [Google Scholar]
SEADE. Índice Paulista de Vulnerabilidade Social. 2010. Available online: http://ipvs.seade.gov.br (accessed on 29 September 2022).
Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
Mullahy, J. Specification and testing of some modified count data models. J. Econom. 1986, 33, 341–365. [Google Scholar] [CrossRef]
Feng, C.X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl. 2021, 8, 1–19. [Google Scholar] [CrossRef]
Fávero, L.P.; Hair, J.F.; Souza, R.d.F.; Albergaria, M.; Brugni, T.V. Zero-Inflated Generalized Linear Mixed Models: A Better Way to Understand Data Relationships. Mathematics 2021, 9, 1100. [Google Scholar] [CrossRef]
Hartigan, J.A. Clustering Algorithms, 99th ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1975. [Google Scholar]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
Nelder, J.A.; Wedderburn, R.W. Generalized linear models. J. R. Stat. Soc. Ser. A (Gen.) 1972, 135, 370–384. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression-based tests for overdispersion in the Poisson model. J. Econom. 1990, 46, 347–364. [Google Scholar] [CrossRef]
Vuong, Q.H. Likelihood ratio tests for model selection and non-nested hypotheses. Econom. J. Econom. Soc. 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
Desmarais, B.A.; Harden, J.J. Testing for zero inflation in count models: Bias correction for the Vuong test. Stata J. 2013, 13, 810–835. [Google Scholar] [CrossRef] [Green Version]
Hall, D.B. Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 2000, 56, 1030–1039. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Wilks, S.S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]

Figure 1. Composition of the demand by type of relief supply.

Figure 2. Histogram of the average demand for relief supplies.

Figure 3. Silhouette-coefficient.

Figure 4. Histogram of actual demand × estimated demand.

Figure 5. Geographic view of actual demand vs. estimated demand.

Table 1. Description of the predictor variables.

Predictor Variable	Description	Source
Índice Paulista de Vulnerabilidade Social (IPVS)	Composite indicator of socioeconomic characterization of census sectors in the State of São Paulo	Governo Aberto SP
Climatic anomaly	Monitoring variations in temperature, precipitation, air pressure, and atmospheric circulation in the equatorial Pacific Ocean	El Niño Southern Oscillation (ENSO)
Estimated population	Population projections based on the 2010 Demographic Census	Instituto Brasileiro de Geografia e Estatística (IBGE)

Table 2. Descriptive statistics or frequency of variables involved in the model.

Demand	Population Logarithm	Climate Anomaly	Vulnerability	Clusters
Min.: 0.0	Min.: 6.7
1Q.: 0.0	1Q.: 8.7		Freq.“moderate”: 270	Freq.“a”: 15
Median: 0.0	Median: 9.6	Freq.“no”: 645	Freq.“high”: 942	Freq.“b”: 1124
3Q.: 0.0	3Q.: 10.7	Freq.“yes”: 645	Freq.“very high”: 78	Freq.“c”: 151
Max.: 22,847	Max.: 16.3
Average: 241.9	Average: 9.7

Note 1: Q. stands for quartile. Note 2: Freq. stands for Frequency.

Table 3. Results of the test for over-dispersion verification.

Lambda t Test Score	p-Value
4.4755	8.3 × 10 $^{- 6}$

Table 4. Results of the Vuong test NB × ZINB.

	Vuong z-Statistic	p-Value
original	40.6094	0.0000
Akaike Information Criterion (AIC)-corrected	40.6088	0.0000
BIC-corrected	40.6073	0.0000

Table 5. Estimations of NB, ZINB, and ZINBM.

	NB	ZINB	ZINBM
Fixed Effects
Intercept	−1.05 ¹	2.98 ¹	3.96 ¹
	(0.0000)	(0.0000)	(5 × 10 $^{- 6}$ )
Climate anomaly	0.60 ¹	1.26 ¹	1.99 ¹
	(0.0000)	(0.0000)	(0.0000)
Population logarithm	0.59 ¹	0.35 ¹	0.26 ¹
	(0.0000)	(0.0000)	(0.0000)
Logistic Component
High vulnerability	-	1.74 ¹	1.74 ¹
	-	(4× 10 $^{- 6}$ )	(4× 10 $^{- 6}$ )
Moderate vulnerability	-	1.62 ¹	1.62 ¹
	-	(0.0000)	(0.0000)
Very high vulnerability	-	−1.43 ¹	−1.43 ¹
	-	(0.0000)	(0.0000)
Random Effects
cluster a	-	-	1.31
cluster b	-	-	−1.21
cluster c	-	-	0.19
Observations	1290	1290	1290
Log-likelihood	−191,674	−1964	−1954

Note: p-values are in parentheses. ¹ Statistically significantly different from zero at 99% confidence.

Table 6. Comparison of fit measures of models.

Models	Log-Likelihood	AIC
Poisson	−708,513	1,417,033
NB	−708,058	1,416,125
ZIP	−191,674	383,361
ZINB	−1964	3942
ZINBM	−1954	3925

Table 7. Results of the Likelihood Ratio Test (LRT).

	Loglink	Df	Chisq	Pr(>Chisq)
Model 1	−1954.7
Model 2	−1964.1	−1	18.833	1.43 × 10 $^{- 5}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yale, C.P.; Yoshizaki, H.T.Y.; Fávero, L.P. A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil. Mathematics 2022, 10, 4352. https://doi.org/10.3390/math10224352

AMA Style

Yale CP, Yoshizaki HTY, Fávero LP. A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil. Mathematics. 2022; 10(22):4352. https://doi.org/10.3390/math10224352

Chicago/Turabian Style

Yale, Camila Pareja, Hugo Tsugunobu Yoshida Yoshizaki, and Luiz Paulo Fávero. 2022. "A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil" Mathematics 10, no. 22: 4352. https://doi.org/10.3390/math10224352

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Zero-Inflated Negative Binomial Multilevel Model for Forecasting the Demand of Disaster Relief Supplies in the State of Sao Paulo, Brazil

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Predictor Variables

2.3. Methodology

3. Results

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI