An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem
Abstract
:1. Introduction
- ○
- An innovative generic optimization procedure with very good values of classification quality measures that can be used to solve both classic prediction problems and in discriminative classification, which essentially determine the importance of individual factors in a multivariate problem in the general case, is proposed.
- ○
- The proposed algorithm belongs to the class of generic algorithms, which practically allows its application to a wide range of problems. In general, generic modeling could represent the development of the concept of a model library.
- ○
- A modern multi-agent application for solving a specific problem is developed by assessing the influence of certain factors on the success of hospital treatment. The developed application is available to the public for use and further development. Also, this application can be used to solve other similar problems in the field of healthcare but also in other fields of human activity.
2. Background Review
2.1. Literature Review of Different Methodologies That Deal with Patient Treatment
2.2. State of the Art
2.2.1. ML-Based Classification Method
2.2.2. ML-Based Feature Selection Techniques
- Filtering methods, of which the most known are Infogain and Gainratio;
- Wrapping methods, of which the most representative ones are BestFirst and LinearForwardSelection;
- Embedding methods that include different types of decision tree algorithms, such as J48 and PART.
3. Materials and Methods
3.1. Materials
Data Acquired during the period 2006 to 2009 by the Institute of Public Health in Nis
- Education has the value of one for a high education level of patients;
- HospitalType has a value of one for treatment at the Clinical Center—Nis and a value of zero for all other hospitals;
- Gender has a value of one for female patients and a value of zero for male patients;
- Age is the patients’ age in years—older than 50 is notated as a high level;
- DaysofTreatment is the number of days of a patient’s hospital stay—longer than 15 days is noted as a high level;
- UrbanHousing has the value 1 for patients living in the city;
- Outcome has a value of “true” for a positive outcome of a patient’s treatment, but a value of “false” for a negative outcome of a patient’s treatment.
3.2. Methods
- Handling null/missing values;
- Data visualization;
- Feature selection and scaling;
- Use of ensemble and boosting algorithms;
- hyper-parameter tuning.
- The type of problem we are solving;
- The characteristics of the set of attributes (features);
- The volume of data available.
3.2.1. Ensemble Prediction Methods
- Bootstrap aggregating (bagging);
- Boosting;
- Stacking.
- There are three main types of ensemble learning methods: bagging, boosting, and stacking. Ensemble learning combines multiple ML models into a single model, with the aim of increasing the performance of the model. Bagging aims to decrease variance, boosting aims to decrease bias, and stacking aims to improve prediction accuracy.
- The prediction of an ensemble method usually requires more computation than evaluating the prediction of a single model. It can be concluded that using an ensemble methodology is a way to compensate for poor learning algorithms that perform a lot of extra computations, and the alternative is to undergo additional learning in one non-ensemble system. An ensemble system may be made more efficient in terms of overall accuracy improvement by increasing computation complexity, storage, or communication resources as a consequence of the usage of two or more methods, in comparison with the same increase in resources for a single method usage. It has to be underlined that many problems do not have real-time working issues, as is true in the case study examined in this paper.
3.2.2. Ensemble Prediction Method of Selected Factor Effect on Inpatient Treatment Quality
Algorithm 1: Determining the importance of predictors for successful inpatient treatment. |
|
- Identifying and handling the missing values.
- Encoding the categorical data.
- Splitting the dataset.
4. Results and Discussion
4.1. Input Data for Considered Case Study
4.2. Using Logistic Regression Analysis and Classification Algorithms
4.2.1. Using Logistic Regression
- UrbanHousing (1) = urban place of housing of patients;
- Education (1)= high level of education of patients;
- HospitalType (1) = treatment at the Clinical Center—Nis;
- Gender (1) = female gender (1);
- Age = patient age (years > 50, old patients (1));
- DaysOfTreatment = length of hospital stay (days > 15(1));
- Outcome (1/0) = positive/negative outcome of treatment.
4.2.2. Using Classification Algorithms
4.2.3. Check Fulfillment of Set Conditions
4.3. Using Feature Selection
4.4. Decision Blcok
4.5. Discussion
5. Technical Solution—Code Implementation and Real-Life Software Platform Usage
- Export the model: Export the trained ML model into a file format that can be used by other software applications. This could be a serialized object or an ML library-specific format.
- Set up a server: Create a server to host the model and handle incoming requests from users. This server could be a cloud-based service like Amazon Web Services (AWS) or Microsoft Azure, or it could be set up on a local machine using software like Flask or Django.
- Create an API: Create an application programming interface (API) that will handle requests from clients and return responses with predictions from the model. This API can be created using a web framework like Flask or Django, and it will typically use HTTP requests to send and receive data.
- Create a client application: Create a client application that can be used to interface with the API. This client application can be a web application or a mobile application, and it will typically use HTTP requests to send data to the API and receive predictions from the model.
- Test and deploy: Test the deployed model using sample data to ensure that it is working as expected. Once testing is complete, deploy the model in a production environment where it can be accessed by users.
- Monitor and update: Monitor the deployed model to ensure that it is performing as expected and update it as needed with new data or changes to the model itself.
- Electronic Health Record (EHR): This is the source of data for the ML model. It could be a database or other storage mechanism that contains information about patients and their treatments.
- ML Model: This is the core of the solution, which determines the importance of non-medical factors affecting successful inpatient treatment. The model could be developed using various ML algorithms and techniques, depending on the specifics of the problem.
- Flask API Server: This component serves as the interface between the ML model and the client application. It provides a RESTful API that receives input data, performs model prediction, and returns output data.
- Web-based Client Application: This component provides a user interface for interacting with the ML model. It could be a web application that allows users to input data, view model predictions, and take actions based on the predictions.
- Export the model: We exported the trained ML model into a file format that other software applications can use.
- Set up a server: We developed a server app to host the model and handle incoming requests from users.
- Create an API: We implemented an application programming interface (API) that will handle requests from clients and return responses with predictions from the model. This API can be created using a web framework like Flask or Django, and it will typically use HTTP requests to send and receive data.
- Create a client application: We created a client application that can be used to interface with the API. This client application is a web application, and it uses HTTP requests to send data to the API and receive predictions from the model.
- Test and deploy: The deployed model has been tested using sample data to ensure that it is working as expected. After we finished testing, the model was deployed in a production environment where real-life users could access it.
- Monitor and update: The deployed model is monitored to ensure that it is performing as expected and updated as needed with new data or changes to the model itself, which has also been supported in this technical solution.
6. Conclusions
- ○
- From a scientific point of view, the authors propose an efficient generic optimization procedure with very good values of classification quality measures that can be used to solve both classic prediction problems and discriminative classification, which essentially determine the importance of individual factors in a multivariate problem in the general case.
- ○
- The proposed algorithm belongs to the class of the generic algorithms family, which allows its application to a wide range of different problems, and, in general, the generic modeling could represent the development of the concept of a model library.
- ○
- From a professional point of view, the authors have developed and made available to the public for use and further development a modern multi-agent application for solving the specific problem of assessing the influence of certain factors on the success of hospital treatment, but it is also usable as such for solving other, similar problems in healthcare and in other fields of human activity.
- ○
- Thereby, using the proposed procedure, the authors have also positively answered both sets of hypotheses, basic and fundamental.
- ○
- There is a difference between factors in their impact on the outcome depending on a particular process. The conducted analysis has shown that from the analyzed factors, the most important individual factors in successful treatment are hospital type and the number of days of treatment.
- ○
- It is possible to aggregate other types of algorithms to construct an ensemble procedure that has better characteristics than each of the included algorithms individually and also better characteristics than the existing ensemble methods.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- World Health Assembly Resolution WHA51.7. 1998 Health for all Policy for the Twenty-First Century Geneva: World Health Organization. Available online: http://legacy.library.ucsf.edu/documentStore/g/w/o/gwo93a99/Sgwo93a99.pdf (accessed on 12 August 2023).
- Health21: The Health for all Policy Framework for the WHO European Region 1999 (European Health for All series; no. 6.) Copenhagen: World Health Organization Regional Office for Europe. Available online: http://www.euro.who.int/_data/assets/pdf_file/0010/98398/wa540ga199heeng.pdf (accessed on 12 August 2023).
- Plan Zdravstvene Zastite iz Obaveznog Zdravstvenog Osiguranja u Republici Srbiji za 2012. Available online: https://www.rfzo.rs/download/plan%20zz/planZZ-2012.pdf (accessed on 12 August 2023).
- Zakon o Zdravstvenoj Zastiti Republike Srbije. Available online: http://www.zdravlje.gov.rs/tmpmzadmin/downloads/zakoni1/zakon_zdravstvena_zastita.pdf (accessed on 12 August 2023).
- Uredba o Nacionalnom Programu Prevencije, Lecenja i Kontrole Kardiovaskularnih Bolesti u Republici Srbiji do 2020. Available online: https://www.pravno-informacionisistem.rs/SlGlasnikPortal/eli/rep/sgrs/vlada/uredba/2010/11/5 (accessed on 12 August 2023).
- Meijden, V.D.; Tange, M.J.; Troost, H.J.; Hasman, J.A. Determinants of success of inpatient clinical information systems: A literature review. J. Am. Med. Inform. Assoc. 2003, 10, 235–243. [Google Scholar] [CrossRef]
- Non-Medical Determinants of Health. Available online: https://meteor.aihw.gov.au/content/392618 (accessed on 20 September 2023).
- Social Determinants of Health (SDOH) and PLACES Data. Available online: https://www.cdc.gov/about/sdoh/index.html (accessed on 12 August 2023).
- Valaitis, R.; Meagher-Stewart, D.; Martin-Misener, R.; Wong, S.T.; MacDonald, M.; O’Mara, L.; The Strengthening Primary Health Care through Primary Care and Public Health Collaboration Team. Organizational factors influencing successful primary care and public health collaboration. BMC Health Serv Res. 2018, 18, 420. [Google Scholar] [CrossRef] [PubMed]
- Mosadeghrad, A.M. Factors influencing healthcare service quality. Int J Health Policy Manag. 2014, 3, 77–89. [Google Scholar] [CrossRef]
- Truglio-Londrigan, M.; Slyer, J.T.; Singleton, J.K.; Worral, P. A qualitative systematic review of internal and external influences on shared decision making in all health care settings. JBI Database Syst. Rev. Implement. Rep. 2014, 12, 121–194. [Google Scholar] [CrossRef]
- Marmot, M.G.; Ruth, B. Action on health disparities in the United States: Commission on Social Determinants of Health. J. Am. Med. Assoc. 2009, 301, 1169–1171. [Google Scholar] [CrossRef]
- The Impact of Political, EConomic, Socio-CUltural, Environmental and Other External Influences. Available online: https://www.healthknowledge.org.uk/public-health-textbook/organisation-management/5b-understanding-ofs/assessing-impact-external-influences (accessed on 20 August 2023).
- Lewis Hunter, A.E.; Spatz, E.S.; Rosenthal, M.S. Factors influencing hospital admission of non-critically ill patients presenting to the emergency department: A cross-sectional study. J. Gen. Intern. Med. 2016, 31, 37–44. [Google Scholar] [CrossRef] [PubMed]
- Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Advantages and Disadvantages of Logistic Regression. Available online: https://www.geeksforgeeks.org/advantages-and-disadvantages-of-logistic-regression/ (accessed on 20 August 2023).
- Opitz, D.; Maclin, R. Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
- Nguyen, D.K.; Lan, C.H.; Chan, C.L. Deep ensemble learning approaches in healthcare to enhance the prediction and diagnosing performance: The workflows, deployments, and surveys on the statistical, image-based, and sequential datasets. Int. J. Environ. Res. Public Health 2021, 18, 10811. [Google Scholar] [CrossRef]
- Alekhya, B.; Sasikumar, R. An ensemble approach for healthcare application and diagnosis using natural language processing. Cogn. Neurodyn. 2022, 16, 1203–1220. [Google Scholar] [CrossRef]
- Breiman, L. Stacked regression. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
- Smyth, P.; Wolpert, D.H. Linearly combining density estimators via stacking. Mach. Learn. J. 1999, 36, 59–83. [Google Scholar] [CrossRef]
- Faltin, F.W.; Kenett, R.S.; Ruggeri, F. Statistical Methods in Healthcare; Wiley: Hoboken, NJ, USA, 2012; ISBN 978-0-470-67015-6. [Google Scholar]
- El-Sappagh, S.H.; El-Masri, S.; Riad, A.M.; Elmogy, M. Data mining and knowledge discovery: Applications, techniques, challenges and process models in healthcare. Int. J. Eng. Res. Appl. 2013, 3, 900–906. [Google Scholar]
- Bahel, V.; Pillai, S.; Malhotra, M. A Comparative Study on Various Binary Classification Algorithms and their Improved Variant for Optimal Performance. In Proceedings of the 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5–7 June 2020; pp. 495–498. [Google Scholar] [CrossRef]
- Bzovsky, S.; Phillips, M.R.; Guymer, R.H.; Wykoff, C.C.; Thabane, L.; Bhandari, M. The clinician’s guide to interpreting a regression analysis. Eye 2022, 36, 1715–1717. [Google Scholar] [CrossRef] [PubMed]
- Wilhelmsen, L.; Wedel, H.; Tibblin, G. Multivariate analysis of risk factors for coronary heart disease. Circulation 2015, 1973, 950–958. [Google Scholar] [CrossRef] [PubMed]
- Silver, M.; Sakata, T.; Su, H.C.; Herman, C.; Dolins, S.B.; OShea, M.J. Case study: How to apply data mining techniques in a healthcare data warehouse. J. Healthc. Inf. Manag. 2001, 15, 155–164. [Google Scholar] [PubMed]
- Koh, H.C.; Tan, G. Data mining applications in healthcare. J. Healthc. Inf. Manag. 2005, 19, 64–72. [Google Scholar]
- Milley, A. Healthcare and data mining. Health Manag. Technol. 2000, 21, 44–47. [Google Scholar]
- Saini, A.; Meitei, A.J.; Singh, J. Machine learning chine learning in healthcare: A review. In Proceedings of the International Conference on Innovative Computing & Communication (ICICC), University of Delhi, Delhi, India, 20–21 February 2021; Available online: https://ssrn.com/abstract=3834096 (accessed on 20 August 2023).
- Toh, C.; Brody, J. Applications of in healthcare In Smart Manufacturing—When Artificial Intelligence Meets the Internet of Things; Intechopen: London, UK, 2021. [Google Scholar] [CrossRef]
- Yan, L. The Effect of Risk Factors on Coronary Heart Disease: An Age-Relevant Multivariate Meta Analysis. Ph.D. Thesis, Florida State University, Tallahassee, FL, USA, August 2010. Available online: http://diginole.lib.fsu.edu/etd/1428 (accessed on 12 August 2023).
- Shouman, M.; Turner, T.; Stocker, R. Using data mining techniques in heart disease diagnosis and treatment. In Proceedings of the Conference on Electronics, Communications and Computers, Alexandria, Egypt, 6–9 March 2012; pp. 173–177. [Google Scholar] [CrossRef]
- Tang, J.W.; Caniza, M.A.; Dinn, M.; Dwyer, D.E.; Heraud, J.M.; Jennings, L.C.; Zaidi, S.K. An exploration of the political, social, economic and cultural factors affecting how different global regions initially reacted to the COVID-19 pandemic. Interface Focus 2022, 12, 20210079. [Google Scholar] [CrossRef]
- Rezaei, P.; Hachesu, P.R.; Ahmadi, M.; Alizadeh, S.; Sadoughi, F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc. Inform. Res. 2013, 19, 121–129. [Google Scholar] [CrossRef]
- Chen, H.; Poon, J.; Poon, S.K.; Cui, L.; Fan, K.; Sze, D.M.Y. Ensemble learning for prediction of the bioactivity capacity of herbal medicines from chromatographic fingerprints. BMC Bioinform. 2015, 16 (Suppl. 12), S4. [Google Scholar] [CrossRef] [PubMed]
- Rahmani, A.M.; Yousefpoor, E.; Yousefpoor, M.S.; Mehmood, Z.; Haider, A.; Hosseinzadeh, M.; Ali Naqvi, R. Machine learning in medicine: Review, applications, and challenges. Mathematics 2021, 9, 2970. [Google Scholar] [CrossRef]
- Panagiotis, P.; Livieris, I.E. Special issue on ensemble learning and applications. Algorithms 2020, 13, 140. [Google Scholar] [CrossRef]
- Ren, Y.; Zhang, L.; Suganthan, P.N. Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput. Intell. Mag. 2016, 11, 41–53. [Google Scholar] [CrossRef]
- Jazieh, A.R. Quality measures: Types, selection, and application in health care quality improvement projects. Glob. J. Qual. Saf. Healthc. 2020, 3, 144–146. [Google Scholar] [CrossRef] [PubMed]
- Donabedian, A. Evaluating the quality of medical care. Milbank Q. 2005, 83, 691–729. [Google Scholar] [CrossRef] [PubMed]
- Khan, H.; Srivastav, A.; Mishra, A.K. Use of classification algorithms in health care. In Big Data Analytics and Intelligence: A Perspective for Health Care; Tanwar, P., Jain, V., Liu, C.M., Goyal, V., Eds.; Emerald Publishing Limited: Bingley, UK, 2020; pp. 31–54. [Google Scholar] [CrossRef]
- Zikos, D.; Zikos, D.; Tsiakas, K.; Qudah, F.; Athitsos, V.; Makedon, F. Evaluation of classification methods for the prediction of hospital length of stay using medicare claims data. In Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA), Rhodes, Greece, 29–31 May 2013. [Google Scholar] [CrossRef]
- Mantas, J.; Zikos, D.; Diomidous, M. Exploring the potential of an electronic documentation system to reduce length of stay. In Proceedings of the 14th World Congress on Medical and Health Informatics, MEDINFO 2013, Copenhagen, Denmark, 20–23 August 2013. [Google Scholar] [CrossRef]
- Fontalvo-Herrera, T.; Delahoz-Dominguez, E.; Fontalvo, O. Methodology of classification, forecast and prediction of healthcare providers accredited in high quality in Colombia. Int. J. Product. Qual. Manag. 2021, 33, 1–20. Available online: https://repositorio.utb.edu.co/bitstream/handle/20.500.12585/10351/2021_IJPQM-27920_PPV%20%282%29_oz%20De%20la%20Hoz%20Domingu.pdf?sequence=1&isAllowed=y (accessed on 20 September 2023). [CrossRef]
- Mahesh, V.; Mudlappa, M. An ensemble classification based approach for breast cancer prediction. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1065, 012049. [Google Scholar] [CrossRef]
- Brandt, P.; Moodley, D.; Pillay, A.W.; Seebregts, C.J.; de Oliveira, T. An investigation of classification algorithms for predicting HIV drug resistance without genotype resistance testing. In Foundations of Health Information Engineering and Systems; Lecture Notes in Computer Science; Springer: Berlin, Germany, 2014; Volume 8315, pp. 236–253. [Google Scholar] [CrossRef]
- Rodrigues, D.S.; Nastri, A.C.S.; Magri, M.M.; Oliveira, M.S.D.; Sabino, E.C.; Figueiredo, P.H.; Ferreira, J.E. Predicting the outcome for COVID-19 patients by applying time series classification to electronic health records. BMC Med. Inform. Decis. Mak. 2022, 22, 187. [Google Scholar] [CrossRef]
- Sahoo, A.K.; Parida, P.; Muralibabu, K.; Dash, S. Efficient simultaneous segmentation and classification of brain tumors from MRI scans using deep learning. Biocybern. Biomed. Eng. 2023, 43, 616–633. [Google Scholar] [CrossRef]
- Ahmad, R.; Akhtar, N.; Choubey, N.S. Applications of Artificial Bee Colony Algorithms and its variants in Health care. Biochem. Ind. J. 2017, 11, 110. Available online: https://www.tsijournals.com/articles/applications-of-artificial-bee-colony-algorithms-and-its-variants-in-health-care.pdf (accessed on 20 September 2023).
- Zhang, Z.; Wang, H.; Zhang, W.; Cui, Z. Cooperative-competitive two-stage game mechanism assisted many-objective evolutionary algorithm. Inf. Sci. 2023, 647, 119559. [Google Scholar] [CrossRef]
- Rylan, H.; Caldeira, A.; Gnanavelbabu, A. Pareto based discrete Jaya algorithm for multi-objective flexible job shop scheduling problem. Expert Syst. Appl. 2021, 170, 114567. [Google Scholar] [CrossRef]
- Heydarpoor, F.; Karbassi, S.M.; Bidabadi, N.; Ebadi, M.J. Solving multi-ob jective functions for cancer treatment by using Metaheuristic Algorithms. Int. J. Comb. Optim. Probl. Inform. 2020, 11, 61–75. [Google Scholar]
- Sara Beheshtifar, S.; Alimohammadi, A. Multi-objective evolutionary algorithm for modeling of site suitability for health-care facilities. Health Sci. J. 2013, 7, 209. [Google Scholar]
- AbdelAziz, A.M.; Alarabi, L.; Basalamah, S.; Hendawi, A. Multi-Objective Optimization Method for Hospital Admission Problem-A Case Study on Covid-19 Patients. Algorithms 2021, 14, 38. [Google Scholar] [CrossRef]
- Ansarifar, J.; Tavakkoli-Moghaddam, R.; Akhavizadegan, F.; Hassanzadeh Amin, S. Multi-objective integrated planning and scheduling model for operating rooms under uncertainty. Proc. IMechE Part H J. Eng. Med. 2018, 232, 930–948. [Google Scholar] [CrossRef]
- Ghaheri, A.; Shoar, S.; Naderan, M.; Hoseini, S.S. The Applications of Genetic Algorithms in Medicine. Oman Med. J. 2015, 30, 406–416. [Google Scholar] [CrossRef]
- Espinosa, R.; Jiménez, F.; Palma, J. Multi-surrogate assisted multi-objective evolutionary algorithms for feature selection in regression and classification problems with time series data. Inf. Sci. 2023, 622, 1064–1091. [Google Scholar] [CrossRef]
- Hosmer, D.W.; Hosmer, T.; Le Cessie, S.; Lemeshow, S. A comparison of goodness of fit tests for the logistic regression model. Stat. Med. 1997, 16, 965–980. [Google Scholar] [CrossRef]
- Hosmer, D.W.; Lemeshow, S. A goodness of fit test for the multiple logistic regression model. Commun. Stat. 1980, 9, 1043–1069. [Google Scholar] [CrossRef]
- How to Improve the Accuracy of a Regression Model. Available online: https://towardsdatascience.com/how-to-improve-the-accuracy-of-a-regression-model-3517accf8604 (accessed on 20 August 2023).
- Fawcett, T. ROC Graphs: Notes and Practical Considerations for Data Mining Researchers; Technical Report; HP Laboratories: Palo Alto, CA, USA, 2003; Available online: https://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf (accessed on 1 September 2023.).
- Vuk, M.; Curk, T. ROC curve, lift chart and calibration plot. Metod. Zv. 2006, 3, 89–108. [Google Scholar] [CrossRef]
- Dimić, G.; Prokin, D.; Kuk, K.; Micalović, M. Primena decision trees i naive Bayes klasifikatora na skup podataka izdvojen iz Moodle kursa. In Proceedings of the Conference INFOTEH, Jahorina, Bosnia and Herzegovina, 21–23 March 2012; Volume 11, pp. 877–882. [Google Scholar]
- Witten, H.; Eibe, F. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann: San Francisco, CA USA, 2005. [Google Scholar]
- Benoît, G. Data Mining. Annu. Rev. Inf. Sci. Technol. 2002, 36, 265–310. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S.; Espejo, P.G.; Hervás, C. Data mining algorithms to classify students. In Proceedings of the 1st IC on Educational Data Mining (EDM08), Montreal, QC, Canada, 20–21 June 2008; pp. 20–21. [Google Scholar]
- Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufman: San Francisco, CA, USA, 1988. [Google Scholar]
- Zhang, H. The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, Miami Beach, FL, USA, 17–19 May 2004; pp. 562–567. [Google Scholar]
- Rokach, L.; Maimon, O. Decision trees. In The Data Mining and Knowledge Discovery Handbook; Springer: Berlin/Heidelberg, Germany, 2005; pp. 165–192. [Google Scholar] [CrossRef]
- Xiaohu, W.; Lele, W.; Nianfeng, L. An application of decision tree based on ID3. Phys. Procedia 2012, 25, 1017–1021. [Google Scholar] [CrossRef]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993. [Google Scholar]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Bella, A. Calibration of machine learning models. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques; IGI Global: Hershey, PA, USA, 2009; pp. 128–146. [Google Scholar] [CrossRef]
- Park, H.A. An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain. J. Korean Acad. Nurs. 2013, 43, 154–164. [Google Scholar] [CrossRef] [PubMed]
- Rajendra, P.; Latifi, S. Prediction of diabetes using logistic regression and ensemble techniques. Comput. Methods Programs Biomed. Update 2021, 1, 100032. [Google Scholar] [CrossRef]
- IBM SPSS Statistics. Available online: https://www.ibm.com/products/spss-statistics (accessed on 15 August 2023).
- Zadrozny, B.; Elkan, C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the Eighteenth International Conference on machine learning, ICML 2001, Williamstown, MA, USA, 28 June–1 July 2001; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2001; pp. 609–616. [Google Scholar]
- Weka (University of Waikato: New Zealand). Available online: http://www.cs.waikato.ac.nz/ml/weka (accessed on 20 August 2023).
- Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Kluwer Academic Publishers: New York, NY, USA, 1998. [Google Scholar]
- Hall, M.A.; Smith, L.A. Practical feature subset selection for machine learning. In Proceedings of the Computer Science ’98—21st Australasian Computer Science Conference ACSC’98, Perth, Australia, 4–6 February 1998; pp. 181–191. [Google Scholar]
- Moriwal, R.; Prakash, V. An efficient info-gain algorithm for finding frequent sequential traversal patterns from web logs based on dynamic weight constraint. In Proceedings of the International Information Technology Conference CUBE ’12, Pune, India, 3–5 September 2012; pp. 718–723, ISBN 978-1-4503-1185-4. [Google Scholar]
- Pravena, R.; Valamathi, M.; Sivakumari, S. Gain ratio based feature selection method for privacy preservation. ICTACT J. Soft Comput. 2011, 1, 201–205. [Google Scholar] [CrossRef]
- Turhan, N.S. Karl Pearson’s chi-square tests. Educ. Res. Rev. 2020, 15, 575–580. [Google Scholar] [CrossRef]
- Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
- Xie, Y.; Li, D.; Zhang, D.; Shuang, H. An Improved Multi-Label Relief Feature Selection Algorithm for Unbalanced Datasets. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2018; pp. 141–151. [Google Scholar] [CrossRef]
- Harrell, F. Hosmer-Lemeshow vs. AIC for Logistic Regression. Available online: https://stats.stackexchange.com/q/18772 (accessed on 20 August 2023).
- Steyerberg, E.W.; Vickers, A.J.; Cook, N.R.; Gerds, T.; Gonen, M.; Obuchowski, N.; Pencina, M.J.; Kattan, M.W. Assessing the performance of prediction models A framework for traditional and novel measures. Epidemiology 2010, 21, 128–138. [Google Scholar] [CrossRef]
- Arshed, N.; Pancholi, J. Porter’s generic strategies. In Enterprise and Its Business Environment; Arshed, N., McFarlane, J., Eds.; Goodfellow Publishers Ltd.: Oxford, UK, 2016; ISBN 978-1-910158-78-4. [Google Scholar]
- Vahdati, H.; Nejad, S.H.M.; Shahsia, N. Generic competitive strategies toward achieving sustainable and dynamic competitive advantage. Rev. Espac. 2018, 39, 25. Available online: https://www.revistaespacios.com/a18v39n13/18391325.html (accessed on 20 September 2023).
- Chikhachev, S.A. Generic models. Algebra Log. 1975, 14, 214–218. [Google Scholar] [CrossRef]
- Shelah, S.A. note on model complete models and generic models. Proc. Am. Math. Soc. 1972, 34, 509–514. [Google Scholar] [CrossRef]
- Mienye, D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
- Bennett, K.P.; Mangasarian, O.L. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1992, 1, 23–34. [Google Scholar] [CrossRef]
- Scikit Learn. Available online: https://scikit-learn.org/stable/modules/ensemble.html#stacking (accessed on 20 September 2023).
- Wolpert, D. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
- Jabbar, H.K.; Khan, R.Z. Methods to avoid over-fitting and under-fitting in supervised machine learning (Comparative study). Comput. Sci. Commun. Instrum. Devices 2015, 12, 978–981. [Google Scholar] [CrossRef]
- Aleksić, A.; Nedeljković, S.; Jovanović, M.; Ranđelović, M.; Vuković, M.; Stojanović, V.; Radovanović, R.; Ranđelović, M.; Ranđelović, D. Prediction of important factors for bleeding in liver cirrhosis disease using ensemble data mining approach. Mathematics 2020, 8, 1887. [Google Scholar] [CrossRef]
- Ranđelović, D.; Ranđelović, M.; Čabarkapa, M. Using machine learning in the prediction of the influence of atmospheric parameters on health. Mathematics 2022, 10, 3043. [Google Scholar] [CrossRef]
- Aleksić, A.; Ranđelović, M.; Ranđelović, D. Using machine learning in predicting the impact of meteorological parameters on traffic incidents. Mathematics 2023, 11, 479. [Google Scholar] [CrossRef]
- Ranđelović, M.; Aleksić, A.; Radovanović, R.; Stojanović, V.; Čabarkapa, M.; Ranđelović, D. One aggregated approach in multidisciplinary based modeling to predict further students’ education. Mathematics 2022, 10, 2381. [Google Scholar] [CrossRef]
Reference Citations | Organizational Factors | Socio-Economic Factors | Descriptive Statistics | Logistic Regression | ML and Data Mining | Ensemble Methods | Other Strategies |
---|---|---|---|---|---|---|---|
[9] | * | * | |||||
[10] | * | * | |||||
[11] | * | * | |||||
[12] | * | * | |||||
[13] | * | * | |||||
[14] | * | * | |||||
[24] | * | * | |||||
[25] | * | * | |||||
[26] | * | * | |||||
[27] | * | * | |||||
[28] | * | * | |||||
[29] | * | * | |||||
[30] | * | * | |||||
[31] | * | * | |||||
[32] | * | * | |||||
[33] | * | ||||||
[34] | * | ||||||
[35] | * | * | |||||
[36] | * | * | |||||
[37] | * | * | * | ||||
[38] | * | * | |||||
[39] | * | * | * | ||||
[40] | * | * | * | * | |||
[41] | * | * | |||||
[42] | * | * | |||||
[43] | * | * | |||||
[44] | * | * | |||||
[45] | * | * | |||||
[46] | * | * | |||||
[47] | * | * | |||||
[48] | * | * | |||||
[49] | * | * | |||||
[50] | * | * | |||||
[51] | * | * | |||||
[52] | * | * | |||||
[53] | * | * | |||||
[54] | * | * | |||||
[55] | * | * | |||||
[56] | * | * | |||||
[57] | * | * | |||||
[58] | * | * | * | * |
Label—Predicted | |||
---|---|---|---|
Positive | Negative | ||
Label—Actual | Positive | TP | FN |
Negative | FP | TN |
Variable’s Serial Number | Non-Medical Factor | Data Type |
---|---|---|
1 | Education | Boolean |
2 | HospitalType | Boolean |
3 | Gender | Boolean |
4 | Age | Boolean |
5 | DaysofTreatment | Boolean |
6 | UrbanHousing | Boolean |
7 | Outcome | Boolean |
Variables in the Equation | |||||||||
---|---|---|---|---|---|---|---|---|---|
B | S.E. | Wald | df | Sig. | Exp(B) | 95% CI for EXP(B) | |||
Lower | Upper | ||||||||
Step 1 a | HospitalType | 2.030 | 0.596 | 11.610 | 1 | 0.001 | 7.612 | 2.368 | 24.464 |
UrbanHousing | −1.214 | 0.055 | 492.897 | 1 | 0.000 | 0.297 | 0.267 | 0.331 | |
Education | −0.093 | 0.112 | 0.690 | 1 | 0.406 | 0.911 | 0.731 | 1.135 | |
Gender | 0.102 | 0.054 | 3.540 | 1 | 0.060 | 1.107 | 0.996 | 1.231 | |
Age | 0.289 | 0.071 | 16.524 | 1 | 0.000 | 1.335 | 1.162 | 1.535 | |
DaysOfTreatment | 0.823 | 0.107 | 58.707 | 1 | 0.000 | 2.277 | 1.845 | 2.811 | |
Constant | 1.796 | 0.077 | 548.193 | 1 | 0.000 | 6.027 |
Hosmer and Lemeshow Test | ||||||||
---|---|---|---|---|---|---|---|---|
Step 1 | Chi-square | df | Sig. | |||||
11.001 | 6 | 0.088 | ||||||
Classification Table a | ||||||||
Predicted | ||||||||
Outcome | Percentage Correct | |||||||
Step 1 | Observed | 0 | 1 | |||||
Outcome | 0 | 0 | 1735 | 0.0 | ||||
1 | 0 | 10,098 | 100.0 | |||||
Overall percentage | 85.3 |
Classifier Configuration | Precision | Recall | F1 Measure | Accuracy | AUC | |
---|---|---|---|---|---|---|
Naive Bayes | Default | 0.728 | 0.853 | 0.786 | 85.3376 | 0.669 |
Logit Boost | Default | 0.728 | 0.853 | 0.786 | 85.3376 | 0.670 |
J48 Decision Tree | Default | 0.728 | 0.853 | 0.786 | 85.3376 | 0.499 |
SerialNum. Tag | Attribute Name | GR-Ranking GainRatio | CHI-Ranking ChiSquared | REL-Ranking ReliefF |
---|---|---|---|---|
1 | HospitalType | 2 | 4 | 1 |
2 | UrbanHousing | 1 | 1 | 5 |
3 | Education | 6 | 6 | 3 |
4 | Gender | 5 | 5 | 6 |
5 | Age | 4 | 3 | 4 |
6 | DaysOfTreatment | 3 | 2 | 2 |
Algorithm/Number of Factors | 6 | 5 | 4 | 3 | 2 | 1 |
---|---|---|---|---|---|---|
GR | 0.671 | 0.670 | 0.661 | 0.651 | 0.634 | 0.631 |
CHI | 0.671 | 0.671 | 0.661 | 0.658 | 0.649 | 0.631 |
REL | 0.671 | 0.663 | 0.560 | 0.530 | 0.538 | 0.501 |
Number of Factors | AUC Value |
---|---|
6 factors | 0.670 |
5 factors | 0.671 |
Hosmer and Lemeshow Test | ||||||
---|---|---|---|---|---|---|
Step 1 | Chi-Square | df | Sig. | |||
9.606 | 4 | 0.058 | ||||
Classification Table a | ||||||
Predicted | ||||||
Outcome | Percentage Correct | |||||
Step 1 | Observed | 0 | 1 | |||
Outcome | 0 | 0 | 1735 | 0.0 | ||
1 | 0 | 10098 | 100.0 | |||
Overall percentage | 85.3 |
Variables in the Equation | |||||||||
---|---|---|---|---|---|---|---|---|---|
B | S.E. | Wald | df | Sig. | Exp (B) | 95% CI for EXP(B) | |||
Lower | Upper | ||||||||
Step1 a | Hospital Type | 2.039 | 0.596 | 11.727 | 1 | 0.001 | 7.687 | 2.392 | 24.698 |
Urban Housing | −1.209 | 0.054 | 494.128 | 1 | 0.000 | 0.298 | 0.268 | 0.332 | |
Gender | 0.108 | 0.054 | 4.012 | 1 | 0.045 | 1.114 | 1.002 | 1.237 | |
Age | 0.294 | 0.071 | 17.189 | 1 | 0.000 | 1.342 | 1.168 | 1.542 | |
Days of Treatment | 0.824 | 0.107 | 58.885 | 1 | 0.000 | 2.280 | 1.847 | 2.814 | |
Constant | 1.782 | 0.075 | 570.063 | 1 | 0.000 | 5.940 |
Precision | Recall | F1 Measure | Accuracy | AUC | |
---|---|---|---|---|---|
Proposed model | 0.728 | 0.853 | 0.786 | 85.3376 | 0.671 |
Ada Boost | 0.728 | 0.853 | 0.786 | 85.3376 | 0.670 |
Bagging | 0.728 | 0.853 | 0.786 | 85.3376 | 0.640 |
Random Forest | 0.728 | 0.853 | 0.787 | 85.303 | 0.670 |
Methodology | AUC |
---|---|
Proposed model | 0.671 |
Ada Boost | 0.670 |
Bagging | 0.640 |
Random Forest | 0.670 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mišić, J.; Kemiveš, A.; Ranđelović, M.; Ranđelović, D. An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem. Symmetry 2023, 15, 2050. https://doi.org/10.3390/sym15112050
Mišić J, Kemiveš A, Ranđelović M, Ranđelović D. An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem. Symmetry. 2023; 15(11):2050. https://doi.org/10.3390/sym15112050
Chicago/Turabian StyleMišić, Jelena, Aleksandar Kemiveš, Milan Ranđelović, and Dragan Ranđelović. 2023. "An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem" Symmetry 15, no. 11: 2050. https://doi.org/10.3390/sym15112050
APA StyleMišić, J., Kemiveš, A., Ranđelović, M., & Ranđelović, D. (2023). An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem. Symmetry, 15(11), 2050. https://doi.org/10.3390/sym15112050