Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19
Abstract
:1. Introduction
2. Related Work
2.1. Common Methodology Weaknesses
2.1.1. Ignoring Testing
2.1.2. Ignoring Countermeasures
2.1.3. Limited Data and Misleading Conclusions
3. Dataset
3.1. Classes
3.1.1. Daily Number of Infections
3.1.2. Reproductive Rate
3.1.3. Exponential Shape
3.1.4. Class Comparison
3.2. Country and Time Selection
3.2.1. Selection Procedure
- Having the start and end date for each country, as defined by the testing rate and countermeasures, we selected the countries for which these dates form an interval. In some cases, the interval cannot be formed because a good testing policy was never established (no start date), countermeasures were never established (no end date), or they came earlier than good testing (end date before the start date).
- For the selected countries, we calculated their average duration and the average number of infections at the start of the interval. In addition, we defined each class as being positive if the corresponding metric (Section 3.1) exceeds its median value—this threshold value was also stored. This way, half the countries would be considered positive and half negative. The exception to this rule was the Exponential class, which did not use any threshold value but simply compared if the exponential fit was better than the linear one.
- Countries with no end date defined by the countermeasure start were added to the list of selected countries, and their interval length was defined by the average interval length calculated in the previous step. This ensured that these countries have a similar interval lengths as the others.
- For each country with no start date defined by testing (or if testing came too late), we created an interval that started on the day when the number of infections matched the average starting number of infections in other selected countries. Classes were defined using the precalculated threshold values for each metric, and using them, we calculated the majority class over the three class definitions. If the majority class was positive, this country was selected for further analysis; otherwise, it was rejected. The logic goes as follows: if there was no adequate testing on an interval, but there was still enough infection for the class to be positive, it is reasonable to assume that the class would be positive with more testing and thus more infected people would be found. On the other hand, if we have inadequate testing and negative class, we cannot say anything reliable about the infection in that country.
- The rest of the time intervals (and corresponding countries) were excluded from the list of selections.
3.2.2. Alternative Country Sets
3.3. Features
3.3.1. Weather
3.3.2. Culture
3.3.3. Travel
3.3.4. Health
3.3.5. Economy
3.3.6. Development
3.3.7. Geography
3.3.8. Countermeasures
4. Method
4.1. Feature Significance
4.1.1. Statistical Significance
4.1.2. Ml Significance
4.2. Creating Feature Categories
4.2.1. Manual Categorizing
4.2.2. K-Means Clustering
4.2.3. Hierarchical Clustering
4.3. Classification
4.3.1. Algorithms
4.3.2. Evaluation Scheme
4.3.3. Data and Feature Selection
- RF feature importance: We sorted the features based on this metric and then took the best 20 features for the classification. This value was chosen as lower values (10 or less) did not achieve high classification accuracy.
- FS with a Wrapper method: In addition to using the previously mentioned out-of-the-box RF feature selection, we also investigated a custom FS algorithm similar to the one used in our related work [72]. First, the features were sorted using RF feature importance as before. Then, if two features were correlated (Pearson coefficient ), we discarded the lower-ranking one. We started by using only the best feature for the classification. Then, we iteratively added the next best one but only kept it if it did not decrease the classification accuracy by more than two percentage points. To determine the classification accuracy, the whole LOIO procedure was repeated. This Wrapper procedure was then repeated with the next best feature, etc. The whole experiment was also conducted with slightly different threshold values, but by doing so, we achieved very similar results.
4.3.4. Hyper-Parameter Tuning
4.4. Generated Rules
Algorithm 1 Pseudo-code of the Rule discovery algorithm. |
|
5. Results
5.1. Correlations between Classes
5.2. Correlations between Features
5.3. Significance of Individual Features
5.4. Significance of Feature Groups
5.5. Machine Learning
6. Feature Discussion
Limitations of the Study
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ML | Machine Learning |
RF | Random Forest |
FS | Feature Selection |
Appendix A
Clusters | Features |
---|---|
Cluster 1 | Temperature high, Pressure, Visibility, Emotional stability, Power distance, Masculinity, Uncertainty avoidance, Social distance, Tightness, Plane passengers normalized, GDP growth rate (%), 5 Year GDP growth rate (%), Govt spending, Tax burden, Corporate tax rate (%), Vitamin D, BCG adjusted |
Cluster 2 | Wind gust, Precip probability, Climate, Conscientiousness, Individualism, Future orientation, Net migration, Mobility-walking, Region prosperity score, GDP per capita (PPP), Phones (per 1000), Tax burden % of GDP, Arable (%), Death rate, Region: Western Europe, Diabetes, Respiratory disease, Cardiovascular disease, ACE II, Chronic kidney disease |
Cluster 3 | Humidity, Cloud cover, Extraversion, Agreeableness, Openness, Indulgence, Country prosperity score, Literacy (%), Judicial effectiveness, Service, Monetary freedom, Labor freedom, Financial freedom, Trade freedom, Government efficiency, Fiscal health, Urban population (%), Obesity |
Cluster 4 | Precip. intensity, Google Trends, Plane passengers, Tourists, Plane passengers/population, Tourists/population, Mobility-driving, GDP (Billions, PPP), Unemployment (%), FDI Inflow (Millions), Infant mortality (per 1000 births), Agriculture, Industry, Tariff rate (%), Public debt (% of GDP), Inflation (%), Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Crops (%), Birthrate, PM2.5, Region: Asia (ex. near east), Region: Baltics, Region: C.W. of Ind. states, Region: Eastern Europe, Region: Latin Amer. Carib, Region: Near East, Region: Northern Africa, Region: Northern America, Region: Oceania, Region: Sub-Saharan Africa, Smoking, Blood O |
References
- Meyerowitz-Katz, G.; Merone, L. A systematic review and meta-analysis of published research data on covid-19 infection-fatality rates. Int. J. Infect. Dis. 2020, 101, 138–148. [Google Scholar] [CrossRef]
- Alimohamadi, Y.; Taghdir, M.; Sepandi, M. The estimate of the basic reproduction number for novel coronavirus disease (Covid-19): A systematic review and meta-analysis. J. Prev. Med. Public Health 2020, 53, 151–157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Biggerstaff, M.; Cauchemez, S.; Reed, C.; Gambhir, M.; Finelli, L. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: A systematic review of the literature. BMC Infect. Dis. 2014, 14, 480. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bullock, J.; Luccioni, A.; Pham, K.H.; Lam, C.S.N.; Luengo-Oroz, M. Mapping the landscape of artificial intelligence applications against Covid-19. J. Artif. Intell. Res. 2020, 69, 807–845. [Google Scholar] [CrossRef]
- Wynants, L.; Van Calster, B.; Bonten, M.M.; Collins, G.S.; Debray, T.P.; De Vos, M.; Haller, M.C.; Heinze, G.; Moons, K.G.; Riley, R.D.; et al. Prediction models for diagnosis and prognosis of COVID-19 infection: Systematic review and critical appraisal. BMJ 2020, 369. [Google Scholar] [CrossRef] [Green Version]
- Carrillo-Larco, R.M.; Castillo-Cara, M. Using country-level variables to classify countries according to the number of confirmed Covid-19 cases: An unsupervised machine learning approach. Wellcome Open Res. 2020, 5. [Google Scholar] [CrossRef] [PubMed]
- Malki, Z.; Atlam, E.-S.; Hassanien, A.E.; Dagnew, G.; Elhosseini, M.A.; Gad, I. Association between weather data and Covid-19 pandemic predicting mortality rate: Machine learning approaches. Chaos Solitons Fractals 2020, 138. [Google Scholar] [CrossRef] [PubMed]
- Mogi, R.; Spijker, J. The influence of social and economic ties to the spread of COVID-19 in Europe. SocArXiv 2020. Available online: https://osf.io/preprints/socarxiv/sb8xn/ (accessed on 20 June 2021).
- Nazrul, I.; Bukhari, Q.; Jameel, Y.; Shabnam, S.; Erzurumluoglu, A.M.; Siddique, M.A.; Massaro, J.M.; D’Agostino, R.B., Sr. COVID-19 and climatic factors: A global analysis. Environ. Res. 2021, 193. [Google Scholar] [CrossRef]
- Jinjarak, Y.; Ahmed, R.; Nair-Desai, S.; Xin, W.; Aizenman, J. Accounting for global COVID-19 diffusion patterns, January–April 2020. Econ. Disasters Clim. Chang. 2020, 4, 515–559. [Google Scholar] [CrossRef]
- Staszkiewicz, P.; Chomiak-Orsa, I.; Staszkiewicz, I. Dynamics of the COVID-19 contagion and mortality: Country factors, social media, and market response evidence from a global panel analysis. IEEE Access 2020, 8, 106009–106022. [Google Scholar] [CrossRef]
- Gupta, A.; Gharehgozli, A. Developing a Machine Learning Framework to Determine the Spread of COVID-19. SSRN 3635211. 2020. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3635211 (accessed on 20 June 2021).
- Xu, R.; Rahmandad, H.; Gupta, M.; DiGennaro, C.; Ghaffarzadegan, N.; Amini, H.; Jalali, M.S. The modest impact of weather and air pollution on COVID-19 transmission. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.05.05.20092627v3 (accessed on 20 June 2021).
- Cobb, J.; Seale, M. Examining the effect of social distancing on the compound growth rate of COVID-19 at the county level (United States) using statistical analyses and a random forest machine learning model. Public Health 2020, 185, 27–29. [Google Scholar] [CrossRef] [PubMed]
- Lakshmi Priyadarsini, S.; Suresh, M. Factors influencing the epidemiological characteristics of pandemic COVID 19: A TISM approach. Int. J. Healthc. Manag. 2020, 13, 89–98. [Google Scholar] [CrossRef] [Green Version]
- Qiu, Y.; Chen, X.; Shi, W. Impacts of social and economic factors on the transmission of coronavirus disease 2019 (COVID-19) in China. J. Popul. Econ. 2020, 33, 1127–1172. [Google Scholar] [CrossRef] [PubMed]
- Demongeot, J.; Flet-Berliac, Y.; Seligmann, H. Temperature decreases spread parameters of the new Covid-19 case dynamics. Biology 2020, 9, 94. [Google Scholar] [CrossRef] [PubMed]
- Siddiqui, M.K.; Morales-Menendez, R.; Gupta, P.K.; Iqbal, H.; Hussain, F.; Khatoon, K.; Ahmad, S. Correlation between temperature and COVID-19 (suspected, confirmed and death) cases based on machine learning analysis. J. Pure Appl. Microbiol. 2020, 14 (Suppl. 1), 1017–1024. [Google Scholar] [CrossRef]
- Goumenou, M.; Sarigiannis, D.; Tsatsakis, A.; Anesti, O.; Docea, A.O.; Petrakis, D.; Tsoukalas, D.; Kostoff, R.; Rakitskii, V.; Spandidos, D.A.; et al. COVID-19 in northern Italy: An integrative overview of factors possibly influencing the sharp increase of the outbreak. Mol. Med. Rep. 2020, 22, 20–32. [Google Scholar] [CrossRef]
- Liotta, G.; Marazzi, M.C.; Orlando, S.; Palombi, L. Is social connectedness a risk factor for the spreading of COVID-19 among older adults? The Italian paradox. PLoS ONE 2020, 15. [Google Scholar] [CrossRef]
- Grosshans, H.; Slack, F.J. Micro-RNAs: Small is plentiful. J. Cell Biol. 2002, 156, 17–22. [Google Scholar] [CrossRef]
- Gelfand, M.J.; Jackson, J.C.; Pan, X.; Nau, D.; Dagher, M.; Van Lange, P.; Chiu, C.-Y. The importance of cultural tightness and government efficiency for understanding COVID-19 growth and death rates. PsyArXiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Jacqueline, D.; Bragazzi, N.; Kong, J.D. The impact of non-pharmaceutical interventions, demographic, social, and climatic factors on the initial growth rate of COVID-19: A cross-country study. Sci. Total. Environ. 2021, 760. [Google Scholar]
- Pal, R.; Sekh, A.A.; Kar, S.; Prasad, D.K. Neural network based country wise risk prediction of COVID-19. Appl. Sci. 2020, 10, 6448. [Google Scholar] [CrossRef]
- Chimmula, V.K.R.; Zhang, L. Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 2020, 135. [Google Scholar] [CrossRef]
- Khan, H.R.; Hossain, A. Countries are clustered but number of tests is not vital to predict global COVID-19 confirmed cases: A machine learning approach. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.04.24.20078238v1 (accessed on 20 June 2021).
- Gola, A.; Arya, R.K.; Dugh, R. Review of forecasting models for coronavirus (COVID-19) pandemic in India during country-wise lockdown. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.08.03.20167254v1 (accessed on 20 June 2021).
- Wu, Y.; Jing, W.; Liu, J.; Ma, Q.; Yuan, J.; Wang, Y.; Du, M.; Liu, M. Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries. Sci. Total. Environ. 2020, 725. [Google Scholar] [CrossRef]
- Rapid Expert Consultations on the COVID-19 Pandemic: 14 March 2020–8 April 2020. Available online: https://www.nap.edu/catalog/25784/rapid-expert-consultations-on-the-covid-19-pandemic-march-14 (accessed on 20 June 2021).
- O’Reilly, K.M.; Auzenbergs, M.; Jafari, Y.; Liu, Y.; Flasche, S.; Lowe, R. Effective transmission across the globe: The role of climate in COVID-19 mitigation strategies. Lancet Planet. Health 2020, 4. [Google Scholar] [CrossRef]
- Xie, J.; Zhu, Y. Association between ambient temperature and COVID-19 infection in 122 cities from China. Sci. Total. Environ. 2020, 724. [Google Scholar] [CrossRef]
- Vaid, S.; Cakan, C.; Bhandari, M. Using machine learning to estimate unobserved COVID-19 infections in North America. J. Bone Jt. Surgery. Am. Vol. 2020, 102. [Google Scholar] [CrossRef]
- Magal, P.; Webb, G. Predicting the number of reported and unreported cases for the COVID-19 epidemic in South Korea, Italy, France and Germany. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.03.21.20040154v1 (accessed on 20 June 2021). [CrossRef]
- Repository for the Presented Data and Code. Available online: https://repo.ijs.si/vitojanko/covid-from-scratch (accessed on 31 August 2020).
- Dietz, K. The estimation of the basic reproduction number for infectious diseases. Stat. Methods Med Res. 1993, 2, 23–41. [Google Scholar] [CrossRef] [PubMed]
- Smith, D.; Moore, L. The SIR Model for Spread of Disease: The Differential Equation Model. Available online: https://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-the-differential-equation-model (accessed on 16 February 2021).
- Ardabili, S.F.; Mosavi, A.; Ghamisi, P.; Ferdinand, F.; Varkonyi-Koczy, A.R.; Reuter, U.; Rabczu, T.; Atkinson, P.M. COVID-19 outbreak prediction with machine learning. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.04.17.20070094v1 (accessed on 20 June 2021).
- WHO Testing Rate Recommendations. Available online: https://www.who.int/docs/default-source/coronaviruse/transcripts/who-audio-emergencies-coronavirus-press-conference-full-30mar2020.pdf?sfvrsn=6b68bc4a_2 (accessed on 10 May 2020).
- Countermeasure Data for Each Country. Available online: https://github.com/OxCGRT/covid-policy-tracker (accessed on 10 May 2020).
- Lauer, S.A.; Grantz, Q.B.; Jones, F.K.; Zheng, Q.; Meredith, H.R.; Azman, A.S.; Reich, N.G.; Lessler, J. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Ann. Intern. Med. 2020, 172, 577–582. [Google Scholar] [CrossRef] [Green Version]
- CIA, The World Factbook. Available online: https://www.cia.gov/library/publications/the-world-factbook/appendix/appendix-b.html (accessed on 15 April 2020).
- Dark Sky. Available online: https://darksky.net (accessed on 10 August 2020).
- Hibbs, D.A.; Olsson, O. Geography, biogeography, and why some countries are rich and others are poor. Proc. Natl. Acad. Sci. USA 2014, 101, 3715–3720. [Google Scholar] [CrossRef] [Green Version]
- Open-Source Psychometrics Project. Available online: https://openpsychometrics.org/ (accessed on 10 April 2020).
- Geert Hofstede Dimension Data Matrix. Available online: https://geerthofstede.com/research-and-vsm/dimension-data-matrix/ (accessed on 10 May 2020).
- Sorokowska, A.; Sorokowski, P.; Hilpert, P.; Cantarero, K.; Frackowiak, T.; Ahmadi, K.; Alghraibeh, A.M.; Aryeetey, R.; Bertoni, A. Karim Bettache Preferred interpersonal distances: A global comparison. J. Cross Cult. Psychol. 2017, 48, 577–592. [Google Scholar] [CrossRef]
- Farzanegan, M.R.; Gholipour, H.F.; Feizi, M.; Nunkoo, R.; Andargoli, A.E. International tourism and outbreak of coronavirus (COVID-19): A cross-country analysis. J. Travel Res. 2020. [Google Scholar] [CrossRef]
- The World Bank—Air Travel. Available online: https://data.worldbank.org/indicator/is.air.psgr (accessed on 15 April 2020).
- The World Bank—International Tourism. Available online: https://data.worldbank.org/indicator/ST.INT.ARVL (accessed on 15 April 2020).
- Apple COVID-19 Mobility Trend Reports. Available online: https://covid19.apple.com/mobility (accessed on 7 July 2020).
- Zheng, Z.; Peng, F.; Xu, B.; Zhao, J.; Liu, H.; Peng, J.; Li, Q.; Jiang, C.; Zhou, Y.; Liu, S.M.; et al. Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis. J. Infect. 2020, 81, e16–e25. [Google Scholar]
- Petrilli, C.M.; Jones, S.A.; Yang, J.; Rajagopalan, H.; O’Donnell, L.F.; Chernyak, Y.; Tobin, K.; Cerfolio, R.J.; Francois, F.; Horwitz, L.I. Factors associated with hospitalization and critical illness among 4103 patients with COVID-19 disease in New York city. medRxiv 2020. Available online: https://www.medrxiv.org/content/10.1101/2020.04.08.20057794v1 (accessed on 20 June 2021).
- Zhao, H.; Lu, X.; Deng, Y.; Tang, Y.; Lu, J. COVID-19: Asymptomatic carrier transmission is an underestimated problem. Epidemiol. Infect. 2020, 148. [Google Scholar] [CrossRef]
- Global Health Data Exchange. Available online: http://ghdx.healthdata.org/ (accessed on 4 April 2020).
- CIA the World Factbook—Obesity. Available online: https://www.cia.gov/library/publications/the-world-factbook/fields/367rank.html (accessed on 25 April 2020).
- The Tobacco Atlas Consumption. Available online: https://tobaccoatlas.org/topic/consumption/ (accessed on 25 April 2020).
- CIA the World Factbook—Median. Available online: https://www.cia.gov/library/publications/resources/the-world-factbook/fields/343rank.html (accessed on 10 April 2020).
- ArcGIS: Demographics and Lifestyle Data. Available online: https://developers.arcgis.com/features/demographics/ (accessed on 17 April 2020).
- Delanghe, J.R.; Speeckaert, M.M.; De Buyzere, M.L. COVID-19 infections are also affected by human ACE1 D/I polymorphism. Clin. Chem. Lab. Med. (CCLM) 2020, 58, 1125–1126. [Google Scholar] [CrossRef] [Green Version]
- Saab, Y.; Gard, P.; Overall, A. The geographic distribution of the ACE II genotype: A novel finding. Genet. Res. 2007, 89, 259–267. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zietz, M.; Tatonetti, N.P. Testing the association between blood type and Covid-19 infection, intubation, and death. MedRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Wikipedia: Blood Type Distribution by Country. Available online: https://en.wikipedia.org/wiki/Blood_type_distribution_by_country (accessed on 25 April 2020).
- Spolaore, E.; Wacziarg, R. Ancestry and development: New evidence. J. Appl. Econom. 2018, 33, 748–762. [Google Scholar] [CrossRef] [Green Version]
- Martineau, A.R.; Forouhi, N.G. Vitamin D for COVID-19: A case to answer? Lancet. Diabetes Endocrinol. 2020, 8, 735–736. [Google Scholar] [CrossRef]
- Palacios, C.; Gonzalez, L. Is vitamin D deficiency a major global public health problem? J. Steroid Biochem. Mol. Biol. 2014, 145, 138–145. [Google Scholar] [CrossRef] [Green Version]
- Escobar, L.E.; Molina-Cruz, A.; Barillas-Mury, C. BCG vaccine protection from severe coronavirus disease 2019 (COVID-19). Proc. Natl. Acad. Sci. USA 2020, 117, 17720–17726. [Google Scholar] [CrossRef]
- UNICEF: Immunization. Available online: https://data.unicef.org/topic/child-health/immunization/ (accessed on 12 May 2020).
- Kaggle: Countries of the World. Available online: https://www.kaggle.com/fernandol/countries-of-the-world (accessed on 17 April 2020).
- Epidemic Forecasting: Dataset of Covid-19 Containment and Mitigation Measures. Available online: http://epidemicforecasting.org/datasets (accessed on 20 May 2020).
- Google Trends: Coronavirus Search Trends. Available online: https://trends.google.com/trends/story/US_cu_4Rjdh3ABAABMHM_en (accessed on 15 August 2020).
- Scikit-Learn. Available online: https://scikit-learn.org/stable/ (accessed on 15 April 2020).
- Gjoreski, M.; Janko, V.; Slapničar, G.; Mlakar, M.; Reščič, N.; Bizjak, J.; Drobnič, V.; Marinko, M.; Mlakar, N.; Luštrek, M.; et al. Classical and deep learning methods for recognizing human activities and modes of transportation with smartphone sensors. Inf. Fusion 2020, 62, 47–62. [Google Scholar] [CrossRef]
- Kursa, M.B.; Rudnicki, W.R. Feature selection with the boruta package. J. Stat. Softw. 2010, 36. [Google Scholar] [CrossRef] [Green Version]
- Casiraghi, E.; Malchiodi, D.; Trucco, G.; Frasca, M.; Cappelletti, L.; Fontana, T.; Esposito, A.A.; Avola, E.; Jachetti, A.; Reese, J.; et al. Explainable machine learning for early assessment of COVID-19 risk prediction in emergency departments. IEEE Access 2020, 8, 196299–196325. [Google Scholar] [CrossRef]
- Vidulin, V.; Bohanec, M.; Gams, M. Combining human analysis and machine data mining to obtain credible data relations. Inf. Sci. 2014, 288, 254–278. [Google Scholar] [CrossRef]
- Second Place in the Pandemic Response Challenge. Available online: https://www.xprize.org/challenge/pandemicresponse/articles/pandemic-response-challenge-winners (accessed on 31 March 2020).
All | Selected | Developed | Semi-Selected | Semi-Developed | |
---|---|---|---|---|---|
Daily avg. | 23 | 49 | 71 | 30 | 50 |
Repr. rate | 30 | 61 | 46 | 30 | 64 |
Exponential | 39 | 68 | 64 | 35 | 64 |
# Countries | 149 | 59 | 35 | 60 | 36 |
Avg. Daily | Repr. Rate | Exponential | |
---|---|---|---|
Avg. daily | 1 | 0.12 | 0.12 |
Repr. rate | 0.12 | 1 | 0.42 |
Exponential | 0.12 | 0.42 | 1 |
Number of Positive Countries | ||
---|---|---|
Before | After | |
Repr. rate | 36 | 0 |
Exponential | 40 | 12 |
Avg. daily | 29 | 23 |
Avg. daily [ext] | 29 | 17 |
Category | Features |
---|---|
Weather (10) | Temperature high, Humidity, Pressure, Wind gust, Cloud cover, Precip. intensity, Precip probability, Visibility, Climate, PM2.5, Temperature Low, Temperature max, Temperature min, Apparent temperature high, Apparent temperature low, Apparent temperature max, Apparent temperature min, Dew point, Ozone, Wind speed, UV Index, Precip. intensity max |
Culture (14) | Extraversion, Emotional stability, Agreeableness, Conscientiousness, Openness, Power distance, Individualism, Masculinity, Uncertainty avoidance, Future orientation, Indulgence, Social distance, Tightness, Completed tightness |
Travel (8) | Plane passengers, Tourists, Net migration, Plane passengers/population, Tourists/population, Mobility-driving, Mobility-walking, FDI/GDP, Tourists normalized, FDI Inflow (Millions), Plane passengers normalized |
Health (11) | Diabetes, Respiratory disease, Cardiovascular disease, Obesity, Smoking, ACE II, Blood O, Chronic kidney disease, Vitamin D, Tuberculosis immunization, Death rate, Dementia, Cancer, Median age, Birthrate |
Economy (13) | GDP (Billions, PPP), GDP per capita (PPP), GDP growth rate (%), 5 year GDP growth rate (%), Unemployment (%), Tax burden, Tax burden % of GDP, Tariff rate (%), Corporate tax rate (%), Public debt (% of GDP), Fiscal health, Inflation (%), Gov’t spending, GDP ($ per capita)’, Income tax rate (%) |
Development (15) | Developed, Country prosperity score, Region prosperity score, Phones (per 1000), Literacy (%), Infant mortality (per 1000 births), Agriculture, Industry, Service, Monetary freedom, Labor freedom, Financial freedom, Trade freedom, Government efficiency, World rank, Property rights, Business freedom, Judicial effectiveness, Government integrity, Government effectiveness index, Education |
Geography (19) | Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Arable (%), Crops (%), Urban population (%), Region: Asia (ex. near east), Region: Baltics, Region: C.W. of Ind. states, Region: Eastern Europe, Region: Latin Amer. Carib, Region: Near East, Region: Northern Africa, Region: Northern America, Region: Oceania, Region: Sub-Saharan Africa, Region: Western Europe, ‘Other (%)’ |
Countermesures (2) | Eventual countermeasures, COVID awareness |
All | Selected | Developed | ||||
---|---|---|---|---|---|---|
Base | Corrected | Base | Corrected | Base | Corrected | |
Avg. daily | 61 | 37 | 40 | 9 | 7 | 0 |
Repr. rate | 48 | 31 | 4 | 0 | 6 | 0 |
Exponential | 26 | 5 | 26 | 1 | 4 | 0 |
All | Selected | Developed | Semi-Selected | Semi-Developed | |
---|---|---|---|---|---|
Avg. daily | 85 | 76 | 68 | 87 | 89 |
Repr. rate | 74 | 68 | 46 | 77 | 46 |
Exponential | 63 | 66 | 39 | 62 | 71 |
Avg. Daily | Repr. Rate | Exponential | |||||||
---|---|---|---|---|---|---|---|---|---|
Weather | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Temperature high | ✓ | ||||||||
PM2.5 | ✓ | ✓ | ✓ | ||||||
Culture | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Extraversion | ✓ | ✓ | |||||||
Emotional stability | ✓ | ✓ | ✓ | ||||||
Agreeableness | ✓ | ||||||||
Conscientiousness | ✓ | ✓ | |||||||
Openness | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Power distance | ✓ | ✓ | ✓ | ||||||
Individualism | ✓ | ✓ | ✓ | ||||||
Social distance | ✓ | ✓ | |||||||
Tightness | ✓ | ✓ | ✓ | ||||||
Travel | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Net migration | ✓ | ✓ | |||||||
Plane passengers/pop. | ✓ | ✓ | |||||||
Tourists/pop. | ✓ | ✓ | |||||||
Mobility-driving | ✓ | ||||||||
Mobility-walking | ✓ | ||||||||
Plane passengers norm. | ✓ | ✓ | |||||||
Health | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Blood 0 | ✓ | ||||||||
Chronic kidney disease | ✓ | ✓ | ✓ | ✓ | |||||
Vitamin D | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Tuberculosis immun. | ✓ | ✓ | |||||||
Economy | ST | RF | WR | ST | RF | WR | ST | RF | WR |
GDP per capita (PPP) | ✓ | ✓ | ✓ | ||||||
Unemployment (%) | ✓ | ✓ | ✓ | ✓ | |||||
Tax burden | ✓ | ✓ | ✓ | ||||||
Tariff rate (%) | ✓ | ||||||||
Fiscal health | ✓ | ✓ | ✓ | ||||||
Gov’t spending | ✓ | ✓ | |||||||
Development | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Developed | ✓ | ||||||||
Country prosperity | ✓ | ||||||||
Phones (per 1000) | ✓ | ✓ | ✓ | ||||||
Agriculture | ✓ | ||||||||
Industry | ✓ | ||||||||
Trade freedom | ✓ | ||||||||
Government efficiency | ✓ | ||||||||
Judical effectiveness | ✓ | ||||||||
Geography | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Population | ✓ | ✓ | |||||||
Coastline (coast/area) | ✓ | ✓ | |||||||
Region: Western Europe | ✓ | ✓ | ✓ | ||||||
Countermeasures | ST | RF | WR | ST | RF | WR | ST | RF | WR |
Eventual counterm. | ✓ | ||||||||
COVID awareness | ✓ |
RF Feature Score | Daily Average | Reproduction Rate | Exponential | Average |
Weather | 0.09 | 0.09 | 0.08 | 0.09 |
Culture | 0.14 | 0.18 | 0.21 | 0.18 |
Travel | 0.18 | 0.12 | 0.08 | 0.13 |
Economy | 0.09 | 0.15 | 0.13 | 0.12 |
Development | 0.12 | 0.16 | 0.18 | 0.18 |
Geography | 0.11 | 0.12 | 0.06 | 0.10 |
Health | 0.11 | 0.11 | 0.19 | 0.14 |
Countermeasures | 0.06 | 0.04 | 0.02 | 0.04 |
Accuracy | Daily Average | Reproduction Rate | Exponential | Average |
Weather | 71 | 54 | 54 | 60 |
Culture | 74 | 61 | 73 | 69 |
Travel | 74 | 69 | 68 | 70 |
Economy | 61 | 59 | 68 | 63 |
Development | 68 | 52 | 59 | 60 |
Geography | 78 | 63 | 58 | 66 |
Countermeasures | 59 | 58 | 58 | 58 |
Health | 75 | 56 | 59 | 63 |
Baseline | 49 | 61 | 67 | 59 |
Weat. | Cult. | Trav. | Econ. | Devel. | Geo. | Count. | Health | |
---|---|---|---|---|---|---|---|---|
Weat. | 1.0 | 0.35 | 0.22 | 0.59 | 0.56 | 0.52 | 0.13 | 0.58 |
Cult. | 0.35 | 1.0 | 0.23 | 0.54 | 0.64 | 0.55 | 0.01 | 0.62 |
Trav. | 0.22 | 0.23 | 1.0 | 0.12 | 0.16 | 0.19 | 0.04 | 0.12 |
Econ. | 0.59 | 0.54 | 0.12 | 1.0 | 0.70 | 0.80 | 0.08 | 0.77 |
Devel. | 0.56 | 0.64 | 0.16 | 0.70 | 1.0 | 0.65 | 0.06 | 0.87 |
Geo. | 0.52 | 0.55 | 0.19 | 0.80 | 0.65 | 1.0 | 0.20 | 0.75 |
Count. | 0.13 | 0.01 | 0.04 | 0.08 | 0.06 | 0.20 | 1.0 | 0.02 |
Health | 0.58 | 0.62 | 0.12 | 0.77 | 0.87 | 0.75 | 0.02 | 1.0 |
Cluster number | 9 | 4 | 8 | 8 | ||||
Test type | B | C | B | C | B | C | B | C |
Avg. daily | 7 | 5 | 4 | 3 | 7 | 7 | 6 | 5 |
Repr. rate | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Exponential | 5 | 0 | 4 | 0 | 8 | 1 | 4 | 0 |
Baseline | kNN | DT | RF | XGB | SVC | ADA | |
---|---|---|---|---|---|---|---|
Avg. daily | 51 | 69 | 76 | 76 | 71 | 63 | 76 |
Repr. rate | 61 | 61 | 59 | 61 | 61 | 61 | 63 |
Exponential | 68 | 58 | 76 | 73 | 68 | 59 | 69 |
Baseline | DT | RF | ADA | |
---|---|---|---|---|
Avg. daily | 51 | 86 | 80 | 83 |
Repr. rate | 61 | 71 | 70 | 66 |
Exponential | 68 | 66 | 70 | 70 |
DT | RF | ADA | |
---|---|---|---|
Avg. daily | 73 | 86 | 81 |
Repr. rate | 72 | 75 | 78 |
Exponential | 74 | 78 | 73 |
Feature | Category | Correlated |
---|---|---|
GDP per Capita (PPP) | Development | Net migration, Developed, Agriculture, Government efficiency, Phones (per 1000), Judical effectiveness, GDP ($ per capita), Government integrity, Property rights |
Blood O | Health | / |
Tuberculosis immunization | Health | GDP ($ per capita), Individualism, Developed, Phones (per 1000), Power distance, Net migration |
Vitamin D | Health | / |
Openness | Culture | Region: Asia (ex. near east), Tax burden % of GDP |
Individualism | Culture | Power distance, Cancer |
PM2.5 | Weather | Literacy (%) |
Population | Geography | / |
Region: Western Europe | Geography | Developed, Tax Burden, Gov’t Spending, Respiratory disease, GDP ($ per capita) |
COVID Awareness | Countermeas. | / |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Janko, V.; Slapničar, G.; Dovgan, E.; Reščič, N.; Kolenik, T.; Gjoreski, M.; Smerkol, M.; Gams, M.; Luštrek, M. Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19. Int. J. Environ. Res. Public Health 2021, 18, 6750. https://doi.org/10.3390/ijerph18136750
Janko V, Slapničar G, Dovgan E, Reščič N, Kolenik T, Gjoreski M, Smerkol M, Gams M, Luštrek M. Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19. International Journal of Environmental Research and Public Health. 2021; 18(13):6750. https://doi.org/10.3390/ijerph18136750
Chicago/Turabian StyleJanko, Vito, Gašper Slapničar, Erik Dovgan, Nina Reščič, Tine Kolenik, Martin Gjoreski, Maj Smerkol, Matjaž Gams, and Mitja Luštrek. 2021. "Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19" International Journal of Environmental Research and Public Health 18, no. 13: 6750. https://doi.org/10.3390/ijerph18136750
APA StyleJanko, V., Slapničar, G., Dovgan, E., Reščič, N., Kolenik, T., Gjoreski, M., Smerkol, M., Gams, M., & Luštrek, M. (2021). Machine Learning for Analyzing Non-Countermeasure Factors Affecting Early Spread of COVID-19. International Journal of Environmental Research and Public Health, 18(13), 6750. https://doi.org/10.3390/ijerph18136750