Next Issue
Volume 3, June
Previous Issue
Volume 2, December
 
 

Analytics, Volume 3, Issue 1 (March 2024) – 9 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
13 pages, 281 KiB  
Article
Optimal Matching with Matching Priority
by Massimo Cannas and Emiliano Sironi
Analytics 2024, 3(1), 165-177; https://doi.org/10.3390/analytics3010009 - 19 Mar 2024
Viewed by 381
Abstract
Matching algorithms are commonly used to build comparable subsets (matchings) in observational studies. When a complete matching is not possible, some units must necessarily be excluded from the final matching. This may bias the final estimates comparing the two populations, and thus it [...] Read more.
Matching algorithms are commonly used to build comparable subsets (matchings) in observational studies. When a complete matching is not possible, some units must necessarily be excluded from the final matching. This may bias the final estimates comparing the two populations, and thus it is important to reduce the number of drops to avoid unsatisfactory results. Greedy matching algorithms may not reach the maximum matching size, thus dropping more units than necessary. Optimal matching algorithms do ensure a maximum matching size, but they implicitly assume that all units have the same matching priority. In this paper, we propose a matching strategy which is order optimal in the sense that it finds a maximum matching size which is consistent with a given matching priority. The strategy is based on an order-optimal matching algorithm originally proposed in connection with assignment problems by D. Gale. When a matching priority is given, the algorithm ensures that the discarded units have the lowest possible matching priority. We discuss the algorithm’s complexity and its relation with classic optimal matching. We illustrate its use with a problem in a case study concerning a comparison of female and male executives and a simulation. Full article
Show Figures

Figure 1

25 pages, 1197 KiB  
Review
Artificial Intelligence and Sustainability—A Review
by Rachit Dhiman, Sofia Miteff, Yuancheng Wang, Shih-Chi Ma, Ramila Amirikas and Benjamin Fabian
Analytics 2024, 3(1), 140-164; https://doi.org/10.3390/analytics3010008 - 01 Mar 2024
Viewed by 1322
Abstract
In recent decades, artificial intelligence has undergone transformative advancements, reshaping diverse sectors such as healthcare, transport, agriculture, energy, and the media. Despite the enthusiasm surrounding AI’s potential, concerns persist about its potential negative impacts, including substantial energy consumption and ethical challenges. This paper [...] Read more.
In recent decades, artificial intelligence has undergone transformative advancements, reshaping diverse sectors such as healthcare, transport, agriculture, energy, and the media. Despite the enthusiasm surrounding AI’s potential, concerns persist about its potential negative impacts, including substantial energy consumption and ethical challenges. This paper critically reviews the evolving landscape of AI sustainability, addressing economic, social, and environmental dimensions. The literature is systematically categorized into “Sustainability of AI” and “AI for Sustainability”, revealing a balanced perspective between the two. The study also identifies a notable trend towards holistic approaches, with a surge in publications and empirical studies since 2019, signaling the field’s maturity. Future research directions emphasize delving into the relatively under-explored economic dimension, aligning with the United Nations’ Sustainable Development Goals (SDGs), and addressing stakeholders’ influence. Full article
(This article belongs to the Special Issue Business Analytics and Applications)
Show Figures

Figure 1

24 pages, 8409 KiB  
Article
Visual Analytics for Robust Investigations of Placental Aquaporin Gene Expression in Response to Maternal SARS-CoV-2 Infection
by Raphael D. Isokpehi, Amos O. Abioye, Rickeisha S. Hamilton, Jasmin C. Fryer, Antoinesha L. Hollman, Antoinette M. Destefano, Kehinde B. Ezekiel, Tyrese L. Taylor, Shawna F. Brooks, Matilda O. Johnson, Olubukola Smile, Shirma Ramroop-Butts, Angela U. Makolo and Albert G. Hayward II
Analytics 2024, 3(1), 116-139; https://doi.org/10.3390/analytics3010007 - 05 Feb 2024
Viewed by 778
Abstract
The human placenta is a multifunctional, disc-shaped temporary fetal organ that develops in the uterus during pregnancy, connecting the mother and the fetus. The availability of large-scale datasets on the gene expression of placental cell types and scholarly articles documenting adverse pregnancy outcomes [...] Read more.
The human placenta is a multifunctional, disc-shaped temporary fetal organ that develops in the uterus during pregnancy, connecting the mother and the fetus. The availability of large-scale datasets on the gene expression of placental cell types and scholarly articles documenting adverse pregnancy outcomes from maternal infection warrants the use of computational resources to aid in knowledge generation from disparate data sources. Using maternal Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection as a case study in microbial infection, we constructed integrated datasets and implemented visual analytics resources to facilitate robust investigations of placental gene expression data in the dimensions of flow, curation, and analytics. The visual analytics resources and associated datasets can support a greater understanding of SARS-CoV-2-induced changes to the human placental expression levels of 18,882 protein-coding genes and at least 1233 human gene groups/families. We focus this report on the human aquaporin gene family that encodes small integral membrane proteins initially studied for their roles in water transport across cell membranes. Aquaporin-9 (AQP9) was the only aquaporin downregulated in term placental villi from SARS-CoV-2-positive mothers. Previous studies have found that (1) oxygen signaling modulates placental development; (2) oxygen tension could modulate AQP9 expression in the human placenta; and (3) SARS-CoV-2 can disrupt the formation of oxygen-carrying red blood cells in the placenta. Thus, future research could be performed on microbial infection-induced changes to (1) the placental hematopoietic stem and progenitor cells; and (2) placental expression of human aquaporin genes, especially AQP9. Full article
(This article belongs to the Special Issue Visual Analytics: Techniques and Applications)
Show Figures

Figure 1

32 pages, 7519 KiB  
Article
Interoperable Information Flow as Enabler for Efficient Predictive Maintenance
by Marco Franke, Quan Deng, Zisis Kyroudis, Maria Psarodimou, Jovana Milenkovic, Ioannis Meintanis, Dimitris Lokas, Stefano Borgia and Klaus-Dieter Thoben
Analytics 2024, 3(1), 84-115; https://doi.org/10.3390/analytics3010006 - 01 Feb 2024
Viewed by 790
Abstract
Industry 4.0 enables the modernisation of machines and opens up the digitalisation of processes in the manufacturing industry. As a result, these machines are ready for predictive maintenance as part of Industry 4.0 services. The benefit of predictive maintenance is that it can [...] Read more.
Industry 4.0 enables the modernisation of machines and opens up the digitalisation of processes in the manufacturing industry. As a result, these machines are ready for predictive maintenance as part of Industry 4.0 services. The benefit of predictive maintenance is that it can significantly extend the life of machines. The integration of predictive maintenance into existing production environments faces challenges in terms of data understanding and data preparation for machines and legacy systems. Current AI frameworks lack adequate support for the ongoing task of data integration. In this context, adequate support means that the data analyst does not need to know the technical background of the pilot’s data sources in terms of data formats and schemas. It should be possible to perform data analyses without knowing the characteristics of the pilot’s specific data sources. The aim is to achieve a seamless integration of data as information for predictive maintenance. For this purpose, the developed data-sharing infrastructure enables automatic data acquisition and data integration for AI frameworks using interoperability methods. The evaluation, based on two pilot projects, shows that the step of data understanding and data preparation for predictive maintenance is simplified and that the solution is applicable for new pilot projects. Full article
Show Figures

Figure 1

21 pages, 2471 KiB  
Article
Analysing the Influence of Macroeconomic Factors on Credit Risk in the UK Banking Sector
by Hemlata Sharma, Aparna Andhalkar, Oluwaseun Ajao and Bayode Ogunleye
Analytics 2024, 3(1), 63-83; https://doi.org/10.3390/analytics3010005 - 26 Jan 2024
Viewed by 985
Abstract
Macroeconomic factors have a critical impact on banking credit risk, which cannot be directly controlled by banks, and therefore, there is a need for an early credit risk warning system based on the macroeconomy. By comparing different predictive models (traditional statistical and machine [...] Read more.
Macroeconomic factors have a critical impact on banking credit risk, which cannot be directly controlled by banks, and therefore, there is a need for an early credit risk warning system based on the macroeconomy. By comparing different predictive models (traditional statistical and machine learning algorithms), this study aims to examine the macroeconomic determinants’ impact on the UK banking credit risk and assess the most accurate credit risk estimate using predictive analytics. This study found that the variance-based multi-split decision tree algorithm is the most precise predictive model with interpretable, reliable, and robust results. Our model performance achieved 95% accuracy and evidenced that unemployment and inflation rate are significant credit risk predictors in the UK banking context. Our findings provided valuable insights such as a positive association between credit risk and inflation, the unemployment rate, and national savings, as well as a negative relationship between credit risk and national debt, total trade deficit, and national income. In addition, we empirically showed the relationship between national savings and non-performing loans, thus proving the “paradox of thrift”. These findings benefit the credit risk management team in monitoring the macroeconomic factors’ thresholds and implementing critical reforms to mitigate credit risk. Full article
Show Figures

Figure 1

17 pages, 1756 KiB  
Article
Code Plagiarism Checking Function and Its Application for Code Writing Problem in Java Programming Learning Assistant System
by Ei Ei Htet, Khaing Hsu Wai, Soe Thandar Aung, Nobuo Funabiki, Xiqin Lu, Htoo Htoo Sandi Kyaw and Wen-Chung Kao
Analytics 2024, 3(1), 46-62; https://doi.org/10.3390/analytics3010004 - 17 Jan 2024
Viewed by 782
Abstract
A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code writing skills. One type of the implemented exercise problem is code writing problem (CWP), which asks [...] Read more.
A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code writing skills. One type of the implemented exercise problem is code writing problem (CWP), which asks students to create a source code that can pass the given test code. The correctness of this answer code is validated by running them on JUnit. In previous works, a Python-based answer code validation program was implemented to assist teachers. It automatically verifies the source codes from all the students for one test code, and reports the number of passed test cases by each code in the CSV file. While this program plays a crucial role in checking the correctness of code behaviors, it cannot detect code plagiarism that can often happen in programming courses. In this paper, we implement a code plagiarism checking function in the answer code validation program, and present its application results to a Java programming course at Okayama University, Japan. This function first removes the whitespace characters and the comments using the regular expressions. Next, it calculates the Levenshtein distance and similarity score for each pair of source codes from different students in the class. If the score is larger than a given threshold, they are regarded as plagiarism. Finally, it outputs the scores as a CSV file with the student IDs. For evaluations, we applied the proposed function to a total of 877 source codes for 45 CWP assignments submitted from 9 to 39 students and analyzed the results. It was found that (1) CWP assignments asking for shorter source codes generate higher scores than those for longer codes due to the use of test codes, (2) proper thresholds are different by assignments, and (3) some students often copied source codes from certain students. Full article
(This article belongs to the Special Issue New Insights in Learning Analytics)
Show Figures

Figure 1

16 pages, 1349 KiB  
Article
An Optimal House Price Prediction Algorithm: XGBoost
by Hemlata Sharma, Hitesh Harsora and Bayode Ogunleye
Analytics 2024, 3(1), 30-45; https://doi.org/10.3390/analytics3010003 - 02 Jan 2024
Viewed by 2509
Abstract
An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. [...] Read more.
An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning (ML) techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare XGBoost, support vector regressor, random forest regressor, multilayer perceptron, and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction. Our findings present valuable insights and tools for stakeholders, facilitating more accurate property price estimates and, in turn, enabling more informed decision making to meet the housing needs of diverse populations while considering budget constraints. Full article
Show Figures

Figure 1

16 pages, 3053 KiB  
Article
Exploring Infant Physical Activity Using a Population-Based Network Analysis Approach
by Rama Krishna Thelagathoti, Priyanka Chaudhary, Brian Knarr, Michaela Schenkelberg, Hesham H. Ali and Danae Dinkel
Analytics 2024, 3(1), 14-29; https://doi.org/10.3390/analytics3010002 - 31 Dec 2023
Viewed by 789
Abstract
Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying [...] Read more.
Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying areas for improvement. However, individual analysis of infant PA can be challenging and often leads to biased results due to an infant’s inability to self-report and constantly changing posture and movement. This manuscript explores a population-based network analysis approach to study infants’ PA. The network analysis approach allows us to draw conclusions that are generalizable to the entire population and to identify trends and patterns in PA levels. Methods: This study aims to analyze the PA of infants aged 6–15 months using accelerometer data. A total of 20 infants from different types of childcare settings were recruited, including home-based and center-based care. Each infant wore an accelerometer for four days (2 weekdays, 2 weekend days). Data were analyzed using a network analysis approach, exploring the relationship between PA and various demographic and social factors. Results: The results showed that infants in center-based care have significantly higher levels of PA than those in home-based care. Moreover, the ankle acceleration was much higher than the waist acceleration, and activity patterns differed on weekdays and weekends. Conclusions: This study highlights the need for further research to explore the factors contributing to disparities in PA levels among infants in different childcare settings. Additionally, there is a need to develop effective strategies to promote PA among infants, considering the findings from the network analysis approach. Such efforts can contribute to enhancing infant health and well-being through targeted interventions aimed at increasing PA levels. Full article
(This article belongs to the Special Issue Feature Papers in Analytics)
Show Figures

Figure 1

13 pages, 4697 KiB  
Article
Does Part of Speech Have an Influence on Cyberbullying Detection?
by Jingxiu Huang, Ruofei Ding, Yunxiang Zheng, Xiaomin Wu, Shumin Chen and Xiunan Jin
Analytics 2024, 3(1), 1-13; https://doi.org/10.3390/analytics3010001 - 21 Dec 2023
Viewed by 650
Abstract
With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely on the extraction of part-of-speech [...] Read more.
With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely on the extraction of part-of-speech (POS) tags to improve their performance. However, the current study only arbitrarily used part-of-speech labels that it considered reasonable, without investigating whether the chosen part-of-speech labels can better enhance the effectiveness of the cyberbullying detection task. In other words, the effectiveness of different part-of-speech labels in the automatic cyberbullying detection task was not proven. This study aimed to investigate the part of speech in statements related to cyberbullying and explore how three classification models (random forest, naïve Bayes, and support vector machine) are sensitive to parts of speech in detecting cyberbullying. We also examined which part-of-speech combinations are most appropriate for the models mentioned above. The results of our experiments showed that the predictive performance of different models differs when using different part-of-speech tags as inputs. Random forest showed the best predictive performance, and naive Bayes and support vector machine followed, respectively. Meanwhile, across the different models, the sensitivity to different part-of-speech tags was consistent, with greater sensitivity shown towards nouns, verbs, and measure words, and lower sensitivity shown towards adjectives and pronouns. We also found that the combination of different parts of speech as inputs had an influence on the predictive performance of the models. This study will help researchers to determine which combination of part-of-speech categories is appropriate to improve the accuracy of cyberbullying detection. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop