Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1210 KiB  
Article
Text Classification Based on the Heterogeneous Graph Considering the Relationships between Documents
by Hiromu Nakajima and Minoru Sasaki
Big Data Cogn. Comput. 2023, 7(4), 181; https://doi.org/10.3390/bdcc7040181 - 13 Dec 2023
Cited by 1 | Viewed by 2060
Abstract
Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. [...] Read more.
Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. Conventional graph-based methods express relationships between words and relationships between words and documents as weights between nodes. Then, a graph neural network is used for learning. However, there is a problem that conventional methods are not able to represent the relationship between documents on the graph. In this paper, we propose a graph structure that considers the relationships between documents. In the proposed method, the cosine similarity of document vectors is set as weights between document nodes. This completes a graph that considers the relationship between documents. The graph is then input into a graph convolutional neural network for training. Therefore, the aim of this study is to improve the text classification performance of conventional methods by using this graph that considers the relationships between document nodes. In this study, we conducted evaluation experiments using five different corpora of English documents. The results showed that the proposed method outperformed the performance of the conventional method by up to 1.19%, indicating that the use of relationships between documents is effective. In addition, the proposed method was shown to be particularly effective in classifying long documents. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

16 pages, 756 KiB  
Article
A New Approach to Data Analysis Using Machine Learning for Cybersecurity
by Shivashankar Hiremath, Eeshan Shetty, Allam Jaya Prakash, Suraj Prakash Sahoo, Kiran Kumar Patro, Kandala N. V. P. S. Rajesh and Paweł Pławiak
Big Data Cogn. Comput. 2023, 7(4), 176; https://doi.org/10.3390/bdcc7040176 - 21 Nov 2023
Cited by 5 | Viewed by 5532
Abstract
The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications [...] Read more.
The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications has led to a widespread shift toward computer-based activities within the corporate landscape. However, this digital transformation has exposed the information technology (IT) infrastructures of these organizations to a heightened risk of cyber-attacks, endangering sensitive data. Consequently, organizations must identify and address vulnerabilities within their systems, with a primary focus on scrutinizing customer-facing websites and applications. This work aims to tackle this pressing issue by employing data analysis tools, such as Power BI, to assess vulnerabilities within a client’s application or website. Through a rigorous analysis of data, valuable insights and information will be provided, which are necessary to formulate effective remedial measures against potential attacks. Ultimately, the central goal of this research is to demonstrate that clients can establish a secure environment, shielding their digital assets from potential attackers. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

16 pages, 382 KiB  
Article
ZeroTrustBlock: Enhancing Security, Privacy, and Interoperability of Sensitive Data through ZeroTrust Permissioned Blockchain
by Pratik Thantharate and Anurag Thantharate
Big Data Cogn. Comput. 2023, 7(4), 165; https://doi.org/10.3390/bdcc7040165 - 17 Oct 2023
Cited by 17 | Viewed by 3160
Abstract
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points [...] Read more.
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points of failure, making the data vulnerable to cyberattacks. Patients also have little control over their medical records, raising privacy concerns. Blockchain technology presents a promising solution to these challenges through its decentralized, transparent, and immutable properties. This research proposes ZeroTrustBlock, a comprehensive blockchain framework for secure and private health information exchange. The decentralized ledger enhances integrity, while permissioned access and smart contracts enable patient-centric control over medical data sharing. A hybrid on-chain and off-chain storage model balances transparency with confidentiality. Integration gateways bridge ZeroTrustBlock protocols with existing systems like EHRs. Implemented on Hyperledger Fabric, ZeroTrustBlock demonstrates substantial security improvements over mainstream databases via cryptographic mechanisms, formal privacy-preserving protocols, and access policies enacting patient consent. Results validate the architecture’s effectiveness in achieving 14,200 TPS average throughput, 480 ms average latency for 100,000 concurrent transactions, and linear scalability up to 20 nodes. However, enhancements around performance, advanced cryptography, and real-world pilots are future work. Overall, ZeroTrustBlock provides a robust application of blockchain capabilities to transform security, privacy, interoperability, and patient agency in health data management. Full article
(This article belongs to the Special Issue Big Data in Health Care Information Systems)
Show Figures

Figure 1

21 pages, 5814 KiB  
Article
Intelligent Method for Classifying the Level of Anthropogenic Disasters
by Khrystyna Lipianina-Honcharenko, Carsten Wolff, Anatoliy Sachenko, Ivan Kit and Diana Zahorodnia
Big Data Cogn. Comput. 2023, 7(3), 157; https://doi.org/10.3390/bdcc7030157 - 21 Sep 2023
Cited by 2 | Viewed by 1994
Abstract
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper [...] Read more.
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper is to develop an effective method for assessing the level of anthropogenic disasters based on information from witnesses to the event. For this purpose, a conceptual model for assessing the consequences of anthropogenic disasters is proposed, the main components of which are the following ones: the analysis of collected data, modeling and assessment of their consequences. The main characteristics of the intelligent method for classifying the level of anthropogenic disasters are considered, in particular, exploratory data analysis using the EDA method, classification based on textual data using SMOTE, and data classification by the ensemble method of machine learning using boosting. The experimental results confirmed that for textual data, the best classification is at level V and level I with an error of 0.97 and 0.94, respectively, and the average error estimate is 0.68. For quantitative data, the classification accuracy of Potential Accident Level relative to Industry Sector is 77%, and the f1-score is 0.88, which indicates a fairly high accuracy of the model. The architecture of a mobile application for classifying the level of anthropogenic disasters has been developed, which reduces the time required to assess consequences of danger in the region. In addition, the proposed approach ensures interaction with dynamic and uncertain environments, which makes it an effective tool for classifying. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

17 pages, 3217 KiB  
Article
Implementing a Synchronization Method between a Relational and a Non-Relational Database
by Cornelia A. Győrödi, Tudor Turtureanu, Robert Ş. Győrödi and Doina R. Zmaranda
Big Data Cogn. Comput. 2023, 7(3), 153; https://doi.org/10.3390/bdcc7030153 - 18 Sep 2023
Cited by 1 | Viewed by 3580
Abstract
The accelerating pace of application development requires more frequent database switching, as technological advancements demand agile adaptation. The increase in the volume of data and at the same time, the number of transactions has determined that some applications migrate from one database to [...] Read more.
The accelerating pace of application development requires more frequent database switching, as technological advancements demand agile adaptation. The increase in the volume of data and at the same time, the number of transactions has determined that some applications migrate from one database to another, especially from a relational database to a non-relational (NoSQL) alternative. In this transition phase, the coexistence of both databases becomes necessary. In addition, certain users choose to keep both databases permanently updated to exploit the individual strengths of each database in order to streamline operations. Existing solutions mainly focus on replication, failing to adequately address the management of synchronization between a relational and a non-relational (NoSQL) database. This paper proposes a practical IT approach to this problem and tests the feasibility of the proposed solution by developing an application that maintains the synchronization between a MySQL database as a relational database and MongoDB as a non-relational database. The performance and capabilities of the solution are analyzed to ensure data consistency and correctness. In addition, problems that arose during the development of the application are highlighted and solutions are proposed to solve them. Full article
Show Figures

Figure 1

27 pages, 3194 KiB  
Article
Predicting Forex Currency Fluctuations Using a Novel Bio-Inspired Modular Neural Network
by Christos Bormpotsis, Mohamed Sedky and Asma Patel
Big Data Cogn. Comput. 2023, 7(3), 152; https://doi.org/10.3390/bdcc7030152 - 15 Sep 2023
Cited by 2 | Viewed by 7347
Abstract
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes [...] Read more.
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes a novel neuroscience-informed modular network that harnesses closing prices and sentiments from Yahoo Finance and Twitter APIs. Compared to monolithic methods, the objective is to advance the effectiveness of predicting price fluctuations in Euro to British Pound Sterling (EUR/GBP). The proposed model offers a unique methodology based on a reinvigorated modular CNN, replacing pooling layers with orthogonal kernel initialisation RNNs coupled with Monte Carlo Dropout (MCoRNNMCD). It integrates two pivotal modules: a convolutional simple RNN and a convolutional Gated Recurrent Unit (GRU). These modules incorporate orthogonal kernel initialisation and Monte Carlo Dropout techniques to mitigate overfitting, assessing each module’s uncertainty. The synthesis of these parallel feature extraction modules culminates in a three-layer Artificial Neural Network (ANN) decision-making module. Established on objective metrics like the Mean Square Error (MSE), rigorous evaluation underscores the proposed MCoRNNMCD–ANN’s exceptional performance. MCoRNNMCD–ANN surpasses single CNNs, LSTMs, GRUs, and the state-of-the-art hybrid BiCuDNNLSTM, CLSTM, CNN–LSTM, and LSTM–GRU in predicting hourly EUR/GBP closing price fluctuations. Full article
Show Figures

Figure 1

28 pages, 4173 KiB  
Review
Innovative Robotic Technologies and Artificial Intelligence in Pharmacy and Medicine: Paving the Way for the Future of Health Care—A Review
by Maryna Stasevych and Viktor Zvarych
Big Data Cogn. Comput. 2023, 7(3), 147; https://doi.org/10.3390/bdcc7030147 - 30 Aug 2023
Cited by 22 | Viewed by 12747
Abstract
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such [...] Read more.
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such as personalized medicine and the need for more effective therapies. This review examines the major advances in robotics and AI in the pharmaceutical and medical fields, analyzing the advantages, obstacles, and potential implications for future health care. In addition, prominent organizations and research institutions leading the way in these technological advancements are highlighted, showcasing their pioneering efforts in creating and utilizing state-of-the-art robotic solutions in pharmacy and medicine. By thoroughly analyzing the current state of robotic technologies in health care and exploring the possibilities for further progress, this work aims to provide readers with a comprehensive understanding of the transformative power of robotics and AI in the evolution of the healthcare sector. Striking a balance between embracing technology and preserving the human touch, investing in R&D, and establishing regulatory frameworks within ethical guidelines will shape a future for robotics and AI systems. The future of pharmacy and medicine is in the seamless integration of robotics and AI systems to benefit patients and healthcare providers. Full article
Show Figures

Figure 1

22 pages, 3764 KiB  
Article
A Guide to Data Collection for Computation and Monitoring of Node Energy Consumption
by Alberto del Rio, Giuseppe Conti, Sandra Castano-Solis, Javier Serrano, David Jimenez and Jesus Fraile-Ardanuy
Big Data Cogn. Comput. 2023, 7(3), 130; https://doi.org/10.3390/bdcc7030130 - 11 Jul 2023
Viewed by 2034
Abstract
The digital transition that drives the new industrial revolution is largely driven by the application of intelligence and data. This boost leads to an increase in energy consumption, much of it associated with computing in data centers. This fact clashes with the growing [...] Read more.
The digital transition that drives the new industrial revolution is largely driven by the application of intelligence and data. This boost leads to an increase in energy consumption, much of it associated with computing in data centers. This fact clashes with the growing need to save and improve energy efficiency and requires a more optimized use of resources. The deployment of new services in edge and cloud computing, virtualization, and software-defined networks requires a better understanding of consumption patterns aimed at more efficient and sustainable models and a reduction in carbon footprints. These patterns are suitable to be exploited by machine, deep, and reinforced learning techniques in pursuit of energy consumption optimization, which can ideally improve the energy efficiency of data centers and big computing servers providing these kinds of services. For the application of these techniques, it is essential to investigate data collection processes to create initial information points. Datasets also need to be created to analyze how to diagnose systems and sort out new ways of optimization. This work describes a data collection methodology used to create datasets that collect consumption data from a real-world work environment dedicated to data centers, server farms, or similar architectures. Specifically, it covers the entire process of energy stimuli generation, data extraction, and data preprocessing. The evaluation and reproduction of this method is offered to the scientific community through an online repository created for this work, which hosts all the code available for its download. Full article
Show Figures

Figure 1

17 pages, 4460 KiB  
Article
An End-to-End Online Traffic-Risk Incident Prediction in First-Person Dash Camera Videos
by Hilmil Pradana
Big Data Cogn. Comput. 2023, 7(3), 129; https://doi.org/10.3390/bdcc7030129 - 6 Jul 2023
Cited by 4 | Viewed by 2257
Abstract
Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for [...] Read more.
Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for the driver to react safely and accurately while perceiving the diversity of traffic-risk predictions in real-world applications. In this paper, we aim to bridge the gap by investigating two key research questions regarding the driver’s current status of driving through online videos and the types of other moving objects that lead to dangerous situations. To address these problems, we proposed an end-to-end two-stage architecture: in the first stage, unsupervised learning is applied to collect all suspicious events on actual driving; in the second stage, supervised learning is used to classify all suspicious event results from the first stage to a common event type. To enrich the classification type, the metadata from the result of the first stage is sent to the second stage to handle the data limitation while training our classification model. Through the online situation, our method runs 9.60 fps on average with 1.44 fps on standard deviation. Our quantitative evaluation shows that our method reaches 81.87% and 73.43% for the average F1-score on labeled data of CST-S3D and real driving datasets, respectively. Furthermore, the proposed method has the potential to assist distribution companies in evaluating the driving performance of their driver by automatically monitoring near-miss events and analyzing driving patterns for training programs to reduce future accidents. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

24 pages, 4621 KiB  
Article
Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students
by Katherine Abramski, Salvatore Citraro, Luigi Lombardi, Giulio Rossetti and Massimo Stella
Big Data Cogn. Comput. 2023, 7(3), 124; https://doi.org/10.3390/bdcc7030124 - 27 Jun 2023
Cited by 21 | Viewed by 7275
Abstract
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing [...] Read more.
Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them. Full article
Show Figures

Figure 1

24 pages, 1122 KiB  
Article
Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level
by Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi and Mohamed Amine Ferrag
Big Data Cogn. Comput. 2023, 7(2), 119; https://doi.org/10.3390/bdcc7020119 - 14 Jun 2023
Cited by 1 | Viewed by 2033
Abstract
Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient [...] Read more.
Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search. Full article
Show Figures

Figure 1

21 pages, 2309 KiB  
Communication
Sentiment Analysis and Text Analysis of the Public Discourse on Twitter about COVID-19 and MPox
by Nirmalya Thakur
Big Data Cogn. Comput. 2023, 7(2), 116; https://doi.org/10.3390/bdcc7020116 - 9 Jun 2023
Cited by 21 | Viewed by 4048
Abstract
Mining and analysis of the big data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in [...] Read more.
Mining and analysis of the big data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in the last few years that focused on sentiment analysis and other forms of text analysis of tweets related to Ebola, E-Coli, Dengue, Human Papillomavirus (HPV), Middle East Respiratory Syndrome (MERS), Measles, Zika virus, H1N1, influenza-like illness, swine flu, flu, Cholera, Listeriosis, cancer, Liver Disease, Inflammatory Bowel Disease, kidney disease, lupus, Parkinson’s, Diphtheria, and West Nile virus. The recent outbreaks of COVID-19 and MPox have served as “catalysts” for Twitter usage related to seeking and sharing information, views, opinions, and sentiments involving both of these viruses. None of the prior works in this field analyzed tweets focusing on both COVID-19 and MPox simultaneously. To address this research gap, a total of 61,862 tweets that focused on MPox and COVID-19 simultaneously, posted between 7 May 2022 and 3 March 2023, were studied. The findings and contributions of this study are manifold. First, the results of sentiment analysis using the VADER (Valence Aware Dictionary for sEntiment Reasoning) approach shows that nearly half the tweets (46.88%) had a negative sentiment. It was followed by tweets that had a positive sentiment (31.97%) and tweets that had a neutral sentiment (21.14%), respectively. Second, this paper presents the top 50 hashtags used in these tweets. Third, it presents the top 100 most frequently used words in these tweets after performing tokenization, removal of stopwords, and word frequency analysis. The findings indicate that tweets in this context included a high level of interest regarding COVID-19, MPox and other viruses, President Biden, and Ukraine. Finally, a comprehensive comparative study that compares the contributions of this paper with 49 prior works in this field is presented to further uphold the relevance and novelty of this work. Full article
(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)
Show Figures

Figure 1

21 pages, 7048 KiB  
Article
Molecular Structure-Based Prediction of Absorption Maxima of Dyes Using ANN Model
by Neeraj Tomar, Geeta Rani, Vijaypal Singh Dhaka, Praveen K. Surolia, Kalpit Gupta, Eugenio Vocaturo and Ester Zumpano
Big Data Cogn. Comput. 2023, 7(2), 115; https://doi.org/10.3390/bdcc7020115 - 8 Jun 2023
Cited by 3 | Viewed by 2530
Abstract
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their [...] Read more.
The exponentially growing energy requirements and, in turn, extensive depletion of non-restorable sources of energy are a major cause of concern. Restorable energy sources such as solar cells can be used as an alternative. However, their low efficiency is a barrier to their practical use. This provokes the research community to design efficient solar cells. Based on the study of efficacy, design feasibility, and cost of fabrication, DSSC shows supremacy over other photovoltaic solar cells. However, fabricating DSSC in a laboratory and then assessing their characteristics is a costly affair. The researchers applied techniques of computational chemistry such as Time-Dependent Density Functional Theory, and an ab initio method for defining the structure and electronic properties of dyes without synthesizing them. However, the inability of descriptors to provide an intuitive physical depiction of the effect of all parameters is a limitation of the proposed approaches. The proven potential of neural network models in data analysis, pattern recognition, and object detection motivated researchers to extend their applicability for predicting the absorption maxima (λmax) of dye. The objective of this research is to develop an ANN-based QSPR model for correctly predicting the value of λmax for inorganic ruthenium complex dyes used in DSSC. Furthermore, it demonstrates the impact of different activation functions, optimizers, and loss functions on the prediction accuracy of λmax. Moreover, this research showcases the impact of atomic weight, types of bonds between constituents of the dye molecule, and the molecular weight of the dye molecule on the value of λmax. The experimental results proved that the value of λmax varies with changes in constituent atoms and types of bonds in a dye molecule. In addition, the model minimizes the difference in the experimental and calculated values of absorption maxima. The comparison with the existing models proved the dominance of the proposed model. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

17 pages, 617 KiB  
Article
Breaking Barriers: Unveiling Factors Influencing the Adoption of Artificial Intelligence by Healthcare Providers
by BM Zeeshan Hameed, Nithesh Naik, Sufyan Ibrahim, Nisha S. Tatkar, Milap J. Shah, Dharini Prasad, Prithvi Hegde, Piotr Chlosta, Bhavan Prasad Rai and Bhaskar K Somani
Big Data Cogn. Comput. 2023, 7(2), 105; https://doi.org/10.3390/bdcc7020105 - 30 May 2023
Cited by 10 | Viewed by 4689
Abstract
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is [...] Read more.
Artificial intelligence (AI) is an emerging technological system that provides a platform to manage and analyze data by emulating human cognitive functions with greater accuracy, revolutionizing patient care and introducing a paradigm shift to the healthcare industry. The purpose of this study is to identify the underlying factors that affect the adoption of artificial intelligence in healthcare (AIH) by healthcare providers and to understand “What are the factors that influence healthcare providers’ behavioral intentions to adopt AIH in their routine practice?” An integrated survey was conducted among healthcare providers, including consultants, residents/students, and nurses. The survey included items related to performance expectancy, effort expectancy, initial trust, personal innovativeness, task complexity, and technology characteristics. The collected data were analyzed using structural equation modeling. A total of 392 healthcare professionals participated in the survey, with 72.4% being male and 50.7% being 30 years old or younger. The results showed that performance expectancy, effort expectancy, and initial trust have a positive influence on the behavioral intentions of healthcare providers to use AIH. Personal innovativeness was found to have a positive influence on effort expectancy, while task complexity and technology characteristics have a positive influence on effort expectancy for AIH. The study’s empirically validated model sheds light on healthcare providers’ intention to adopt AIH, while the study’s findings can be used to develop strategies to encourage this adoption. However, further investigation is necessary to understand the individual factors affecting the adoption of AIH by healthcare providers. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

23 pages, 1484 KiB  
Article
Unsupervised Deep Learning for Structural Health Monitoring
by Roberto Boccagna, Maurizio Bottini, Massimo Petracca, Alessia Amelio and Guido Camata
Big Data Cogn. Comput. 2023, 7(2), 99; https://doi.org/10.3390/bdcc7020099 - 17 May 2023
Cited by 3 | Viewed by 3006
Abstract
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising [...] Read more.
In the last few decades, structural health monitoring has gained relevance in the context of civil engineering, and much effort has been made to automate the process of data acquisition and analysis through the use of data-driven methods. Currently, the main issues arising in automated monitoring processing regard the establishment of a robust approach that covers all intermediate steps from data acquisition to output production and interpretation. To overcome this limitation, we introduce a dedicated artificial-intelligence-based monitoring approach for the assessment of the health conditions of structures in near-real time. The proposed approach is based on the construction of an unsupervised deep learning algorithm, with the aim of establishing a reliable method of anomaly detection for data acquired from sensors positioned on buildings. After preprocessing, the data are fed into various types of artificial neural network autoencoders, which are trained to produce outputs as close as possible to the inputs. We tested the proposed approach on data generated from an OpenSees numerical model of a railway bridge and data acquired from physical sensors positioned on the Historical Tower of Ravenna (Italy). The results show that the approach actually flags the data produced when damage scenarios are activated in the OpenSees model as coming from a damaged structure. The proposed method is also able to reliably detect anomalous structural behaviors of the tower, preventing critical scenarios. Compared to other state-of-the-art methods for anomaly detection, the proposed approach shows very promising results. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

13 pages, 749 KiB  
Article
Massive Parallel Alignment of RNA-seq Reads in Serverless Computing
by Pietro Cinaglia, José Luis Vázquez-Poletti and Mario Cannataro
Big Data Cogn. Comput. 2023, 7(2), 98; https://doi.org/10.3390/bdcc7020098 - 15 May 2023
Cited by 4 | Viewed by 2208
Abstract
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its [...] Read more.
In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated. Full article
(This article belongs to the Special Issue Data-Based Bioinformatics and Applications)
Show Figures

Figure 1

22 pages, 930 KiB  
Systematic Review
A Systematic Review of Blockchain Technology Adoption Barriers and Enablers for Smart and Sustainable Agriculture
by Gopi Krishna Akella, Santoso Wibowo, Srimannarayana Grandhi and Sameera Mubarak
Big Data Cogn. Comput. 2023, 7(2), 86; https://doi.org/10.3390/bdcc7020086 - 4 May 2023
Cited by 22 | Viewed by 5085
Abstract
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve [...] Read more.
Smart and sustainable agricultural practices are more complex than other industries as the production depends on many pre- and post-harvesting factors which are difficult to predict and control. Previous studies have shown that technologies such as blockchain along with sustainable practices can achieve smart and sustainable agriculture. These studies state that there is a need for a reliable and trustworthy environment among the intermediaries throughout the agrifood supply chain to achieve sustainability. However, there are limited studies on blockchain technology adoption for smart and sustainable agriculture. Therefore, this systematic review uses the PRISMA technique to explore the barriers and enablers of blockchain adoption for smart and sustainable agriculture. Data was collected using exhaustive selection criteria and filters to evaluate the barriers and enablers of blockchain technology for smart and sustainable agriculture. The results provide on the one hand adoption enablers such as stakeholder collaboration, enhance customer trust, and democratization, and, on the other hand, barriers such as lack of global standards, industry level best practices and policies for blockchain adoption in the agrifood sector. The outcome of this review highlights the adoption barriers over enablers of blockchain technology for smart and sustainable agriculture. Furthermore, several recommendations and implications are presented for addressing knowledge gaps for successful implementation. Full article
Show Figures

Figure 1

20 pages, 3726 KiB  
Article
DLBCNet: A Deep Learning Network for Classifying Blood Cells
by Ziquan Zhu, Zeyu Ren, Siyuan Lu, Shuihua Wang and Yudong Zhang
Big Data Cogn. Comput. 2023, 7(2), 75; https://doi.org/10.3390/bdcc7020075 - 14 Apr 2023
Cited by 7 | Viewed by 3314
Abstract
Background: Blood is responsible for delivering nutrients to various organs, which store important health information about the human body. Therefore, the diagnosis of blood can indirectly help doctors judge a person’s physical state. Recently, researchers have applied deep learning (DL) to the automatic [...] Read more.
Background: Blood is responsible for delivering nutrients to various organs, which store important health information about the human body. Therefore, the diagnosis of blood can indirectly help doctors judge a person’s physical state. Recently, researchers have applied deep learning (DL) to the automatic analysis of blood cells. However, there are still some deficiencies in these models. Methods: To cope with these issues, we propose a novel network for the multi-classification of blood cells, which is called DLBCNet. A new specifical model for blood cells (BCGAN) is designed to generate synthetic images. The pre-trained ResNet50 is implemented as the backbone model, which serves as the feature extractor. The extracted features are fed to the proposed ETRN to improve the multi-classification performance of blood cells. Results: The average accuracy, average sensitivity, average precision, average specificity, and average f1-score of the proposed model are 95.05%, 93.25%, 97.75%, 93.72%, and 95.38%, accordingly. Conclusions: The performance of the proposed model surpasses other state-of-the-art methods in reported classification results. Full article
Show Figures

Figure 1

26 pages, 895 KiB  
Review
Predicting Colorectal Cancer Using Machine and Deep Learning Algorithms: Challenges and Opportunities
by Dabiah Alboaneen, Razan Alqarni, Sheikah Alqahtani, Maha Alrashidi, Rawan Alhuda, Eyman Alyahyan and Turki Alshammari
Big Data Cogn. Comput. 2023, 7(2), 74; https://doi.org/10.3390/bdcc7020074 - 13 Apr 2023
Cited by 19 | Viewed by 7627
Abstract
One of the three most serious and deadly cancers in the world is colorectal cancer. The most crucial stage, like with any cancer, is early diagnosis. In the medical industry, artificial intelligence (AI) has recently made tremendous strides and showing promise for clinical [...] Read more.
One of the three most serious and deadly cancers in the world is colorectal cancer. The most crucial stage, like with any cancer, is early diagnosis. In the medical industry, artificial intelligence (AI) has recently made tremendous strides and showing promise for clinical applications. Machine learning (ML) and deep learning (DL) applications have recently gained popularity in the analysis of medical texts and images due to the benefits and achievements they have made in the early diagnosis of cancerous tissues and organs. In this paper, we intend to systematically review the state-of-the-art research on AI-based ML and DL techniques applied to the modeling of colorectal cancer. All research papers in the field of colorectal cancer are collected based on ML and DL techniques, and they are then classified into three categories: the aim of the prediction, the method of the prediction, and data samples. Following that, a thorough summary and a list of the studies gathered under each topic are provided. We conclude our study with a critical discussion of the challenges and opportunities in colorectal cancer prediction using ML and DL techniques by concentrating on the technical and medical points of view. Finally, we believe that our study will be helpful to scientists who are considering employing ML and DL methods to diagnose colorectal cancer. Full article
Show Figures

Figure 1

22 pages, 2726 KiB  
Article
Parallelization Strategies for Graph-Code-Based Similarity Search
by Patrick Steinert, Stefan Wagenpfeil, Paul Mc Kevitt, Ingo Frommholz and Matthias Hemmje
Big Data Cogn. Comput. 2023, 7(2), 70; https://doi.org/10.3390/bdcc7020070 - 6 Apr 2023
Cited by 1 | Viewed by 2165
Abstract
The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning [...] Read more.
The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases. Full article
(This article belongs to the Special Issue Multimedia Systems for Multimedia Big Data)
Show Figures

Figure 1

19 pages, 960 KiB  
Review
An Overview on the Challenges and Limitations Using Cloud Computing in Healthcare Corporations
by Giuseppe Agapito and Mario Cannataro
Big Data Cogn. Comput. 2023, 7(2), 68; https://doi.org/10.3390/bdcc7020068 - 6 Apr 2023
Cited by 16 | Viewed by 4861
Abstract
Technological advances in high throughput platforms for biological systems enable the cost-efficient production of massive amounts of data, leading life science to the Big Data era. The availability of Big Data provides new opportunities and challenges for data analysis. Cloud Computing is ideal [...] Read more.
Technological advances in high throughput platforms for biological systems enable the cost-efficient production of massive amounts of data, leading life science to the Big Data era. The availability of Big Data provides new opportunities and challenges for data analysis. Cloud Computing is ideal for digging with Big Data in omics sciences because it makes data analysis, sharing, access, and storage effective and able to scale when the amount of data increases. However, Cloud Computing presents several issues regarding the security and privacy of data that are particularly important when analyzing patients’ data, such as in personalized medicine. The objective of the present study is to highlight the challenges, security issues, and impediments that restrict the widespread adoption of Cloud Computing in healthcare corporations. Full article
Show Figures

Figure 1

23 pages, 6473 KiB  
Review
Enhancing Digital Health Services with Big Data Analytics
by Nisrine Berros, Fatna El Mendili, Youness Filaly and Younes El Bouzekri El Idrissi
Big Data Cogn. Comput. 2023, 7(2), 64; https://doi.org/10.3390/bdcc7020064 - 30 Mar 2023
Cited by 15 | Viewed by 7673
Abstract
Medicine is constantly generating new imaging data, including data from basic research, clinical research, and epidemiology, from health administration and insurance organizations, public health services, and non-conventional data sources such as social media, Internet applications, etc. Healthcare professionals have gained from the integration [...] Read more.
Medicine is constantly generating new imaging data, including data from basic research, clinical research, and epidemiology, from health administration and insurance organizations, public health services, and non-conventional data sources such as social media, Internet applications, etc. Healthcare professionals have gained from the integration of big data in many ways, including new tools for decision support, improved clinical research methodologies, treatment efficacy, and personalized care. Finally, there are significant advantages in saving resources and reallocating them to increase productivity and rationalization. In this paper, we will explore how big data can be applied to the field of digital health. We will explain the features of health data, its particularities, and the tools available to use it. In addition, a particular focus is placed on the latest research work that addresses big data analysis in the health domain, as well as the technical and organizational challenges that have been discussed. Finally, we propose a general strategy for medical organizations looking to adopt or leverage big data analytics. Through this study, healthcare organizations and institutions considering the use of big data analytics technology, as well as those already using it, can gain a thorough and comprehensive understanding of the potential use, effective targeting, and expected impact. Full article
Show Figures

Figure 1

16 pages, 355 KiB  
Article
The Role of ChatGPT in Data Science: How AI-Assisted Conversational Interfaces Are Revolutionizing the Field
by Hossein Hassani and Emmanuel Sirmal Silva
Big Data Cogn. Comput. 2023, 7(2), 62; https://doi.org/10.3390/bdcc7020062 - 27 Mar 2023
Cited by 128 | Viewed by 43264
Abstract
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through this perspective article, we [...] Read more.
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through this perspective article, we seek to provide an overview of the potential opportunities and challenges associated with using ChatGPT in data science, provide readers with a snapshot of its advantages, and stimulate interest in its use for data science projects. The paper discusses how ChatGPT can assist data scientists in automating various aspects of their workflow, including data cleaning and preprocessing, model training, and result interpretation. It also highlights how ChatGPT has the potential to provide new insights and improve decision-making processes by analyzing unstructured data. We then examine the advantages of ChatGPT’s architecture, including its ability to be fine-tuned for a wide range of language-related tasks and generate synthetic data. Limitations and issues are also addressed, particularly around concerns about bias and plagiarism when using ChatGPT. Overall, the paper concludes that the benefits outweigh the costs and ChatGPT has the potential to greatly enhance the productivity and accuracy of data science workflows and is likely to become an increasingly important tool for intelligence augmentation in the field of data science. ChatGPT can assist with a wide range of natural language processing tasks in data science, including language translation, sentiment analysis, and text classification. However, while ChatGPT can save time and resources compared to training a model from scratch, and can be fine-tuned for specific use cases, it may not perform well on certain tasks if it has not been specifically trained for them. Additionally, the output of ChatGPT may be difficult to interpret, which could pose challenges for decision-making in data science applications. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
33 pages, 12116 KiB  
Article
MalBERTv2: Code Aware BERT-Based Model for Malware Identification
by Abir Rahali and Moulay A. Akhloufi
Big Data Cogn. Comput. 2023, 7(2), 60; https://doi.org/10.3390/bdcc7020060 - 24 Mar 2023
Cited by 13 | Viewed by 6213
Abstract
To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, [...] Read more.
To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

19 pages, 10184 KiB  
Article
Recognizing Road Surface Traffic Signs Based on Yolo Models Considering Image Flips
by Christine Dewi, Rung-Ching Chen, Yong-Cun Zhuang, Xiaoyi Jiang and Hui Yu
Big Data Cogn. Comput. 2023, 7(1), 54; https://doi.org/10.3390/bdcc7010054 - 22 Mar 2023
Cited by 12 | Viewed by 3994
Abstract
In recent years, there have been significant advances in deep learning and road marking recognition due to machine learning and artificial intelligence. Despite significant progress, it often relies heavily on unrepresentative datasets and limited situations. Drivers and advanced driver assistance systems rely on [...] Read more.
In recent years, there have been significant advances in deep learning and road marking recognition due to machine learning and artificial intelligence. Despite significant progress, it often relies heavily on unrepresentative datasets and limited situations. Drivers and advanced driver assistance systems rely on road markings to help them better understand their environment on the street. Road markings are signs and texts painted on the road surface, including directional arrows, pedestrian crossings, speed limit signs, zebra crossings, and other equivalent signs and texts. Pavement markings are also known as road markings. Our experiments briefly discuss convolutional neural network (CNN)-based object detection algorithms, specifically for Yolo V2, Yolo V3, Yolo V4, and Yolo V4-tiny. In our experiments, we built the Taiwan Road Marking Sign Dataset (TRMSD) and made it a public dataset so other researchers could use it. Further, we train the model to distinguish left and right objects into separate classes. Furthermore, Yolo V4 and Yolo V4-tiny results can benefit from the “No Flip” setting. In our case, we want the model to distinguish left and right objects into separate classes. The best model in the experiment is Yolo V4 (No Flip), with a test accuracy of 95.43% and an IoU of 66.12%. In this study, Yolo V4 (without flipping) outperforms state-of-the-art schemes, achieving 81.22% training accuracy and 95.34% testing accuracy on the TRMSD dataset. Full article
Show Figures

Figure 1

24 pages, 7711 KiB  
Article
Analysis of the Numerical Solutions of the Elder Problem Using Big Data and Machine Learning
by Roman Khotyachuk and Klaus Johannsen
Big Data Cogn. Comput. 2023, 7(1), 52; https://doi.org/10.3390/bdcc7010052 - 20 Mar 2023
Viewed by 1999
Abstract
In this study, the numerical solutions to the Elder problem are analyzed using Big Data technologies and data-driven approaches. The steady-state solutions to the Elder problem are investigated with regard to Rayleigh numbers (Ra), grid sizes, perturbations, and other parameters [...] Read more.
In this study, the numerical solutions to the Elder problem are analyzed using Big Data technologies and data-driven approaches. The steady-state solutions to the Elder problem are investigated with regard to Rayleigh numbers (Ra), grid sizes, perturbations, and other parameters of the system studied. The complexity analysis is carried out for the datasets containing different solutions to the Elder problem, and the time of the highest complexity of numerical solutions is estimated. An approach to the identification of transient fingers and the visualization of large ensembles of solutions is proposed. Predictive models are developed to forecast steady states based on early-time observations. These models are classified into three possible types depending on the features (predictors) used in a model. The numerical results of the prediction accuracy are given, including the estimated confidence intervals for the accuracy, and the estimated time of 95% predictability. Different solutions, their averages, principal components, and other parameters are visualized. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 2013 KiB  
Article
A Hybrid Deep Learning Framework with Decision-Level Fusion for Breast Cancer Survival Prediction
by Nermin Abdelhakim Othman, Manal A. Abdel-Fattah and Ahlam Talaat Ali
Big Data Cogn. Comput. 2023, 7(1), 50; https://doi.org/10.3390/bdcc7010050 - 16 Mar 2023
Cited by 12 | Viewed by 3665
Abstract
Because of technological advancements and their use in the medical area, many new methods and strategies have been developed to address complex real-life challenges. Breast cancer, a particular kind of tumor that arises in breast cells, is one of the most prevalent types [...] Read more.
Because of technological advancements and their use in the medical area, many new methods and strategies have been developed to address complex real-life challenges. Breast cancer, a particular kind of tumor that arises in breast cells, is one of the most prevalent types of cancer in women and is. Early breast cancer detection and classification are crucial. Early detection considerably increases the likelihood of survival, which motivates us to contribute to different detection techniques from a technical standpoint. Additionally, manual detection requires a lot of time and effort and carries the risk of pathologist error and inaccurate classification. To address these problems, in this study, a hybrid deep learning model that enables decision making based on data from multiple data sources is proposed and used with two different classifiers. By incorporating multi-omics data (clinical data, gene expression data, and copy number alteration data) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, the accuracy of patient survival predictions is expected to be improved relative to prediction utilizing only one modality of data. A convolutional neural network (CNN) architecture is used for feature extraction. LSTM and GRU are used as classifiers. The accuracy achieved by LSTM is 97.0%, and that achieved by GRU is 97.5, while using decision fusion (LSTM and GRU) achieves the best accuracy of 98.0%. The prediction performance assessed using various performance indicators demonstrates that our model outperforms currently used methodologies. Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
Show Figures

Figure 1

17 pages, 12591 KiB  
Article
Real-Time Attention Monitoring System for Classroom: A Deep Learning Approach for Student’s Behavior Recognition
by Zouheir Trabelsi, Fady Alnajjar, Medha Mohan Ambali Parambil, Munkhjargal Gochoo and Luqman Ali
Big Data Cogn. Comput. 2023, 7(1), 48; https://doi.org/10.3390/bdcc7010048 - 9 Mar 2023
Cited by 29 | Viewed by 20327
Abstract
Effective classroom instruction requires monitoring student participation and interaction during class, identifying cues to simulate their attention. The ability of teachers to analyze and evaluate students’ classroom behavior is becoming a crucial criterion for quality teaching. Artificial intelligence (AI)-based behavior recognition techniques can [...] Read more.
Effective classroom instruction requires monitoring student participation and interaction during class, identifying cues to simulate their attention. The ability of teachers to analyze and evaluate students’ classroom behavior is becoming a crucial criterion for quality teaching. Artificial intelligence (AI)-based behavior recognition techniques can help evaluate students’ attention and engagement during classroom sessions. With rapid digitalization, the global education system is adapting and exploring emerging technological innovations, such as AI, the Internet of Things, and big data analytics, to improve education systems. In educational institutions, modern classroom systems are supplemented with the latest technologies to make them more interactive, student centered, and customized. However, it is difficult for instructors to assess students’ interest and attention levels even with these technologies. This study harnesses modern technology to introduce an intelligent real-time vision-based classroom to monitor students’ emotions, attendance, and attention levels even when they have face masks on. We used a machine learning approach to train students’ behavior recognition models, including identifying facial expressions, to identify students’ attention/non-attention in a classroom. The attention/no-attention dataset is collected based on nine categories. The dataset is given the YOLOv5 pre-trained weights for training. For validation, the performance of various versions of the YOLOv5 model (v5m, v5n, v5l, v5s, and v5x) are compared based on different evaluation measures (precision, recall, mAP, and F1 score). Our results show that all models show promising performance with 76% average accuracy. Applying the developed model can enable instructors to visualize students’ behavior and emotional states at different levels, allowing them to appropriately manage teaching sessions by considering student-centered learning scenarios. Overall, the proposed model will enhance instructors’ performance and students at an academic level. Full article
Show Figures

Figure 1

18 pages, 2033 KiB  
Article
Machine Learning-Based Identifications of COVID-19 Fake News Using Biomedical Information Extraction
by Faizi Fifita, Jordan Smith, Melissa B. Hanzsek-Brill, Xiaoyin Li and Mengshi Zhou
Big Data Cogn. Comput. 2023, 7(1), 46; https://doi.org/10.3390/bdcc7010046 - 7 Mar 2023
Cited by 8 | Viewed by 4600
Abstract
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 [...] Read more.
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic. Full article
Show Figures

Figure 1

19 pages, 800 KiB  
Systematic Review
Disclosing Edge Intelligence: A Systematic Meta-Survey
by Vincenzo Barbuto, Claudio Savaglio, Min Chen and Giancarlo Fortino
Big Data Cogn. Comput. 2023, 7(1), 44; https://doi.org/10.3390/bdcc7010044 - 2 Mar 2023
Cited by 32 | Viewed by 4449
Abstract
The Edge Intelligence (EI) paradigm has recently emerged as a promising solution to overcome the inherent limitations of cloud computing (latency, autonomy, cost, etc.) in the development and provision of next-generation Internet of Things (IoT) services. Therefore, motivated by its increasing popularity, relevant [...] Read more.
The Edge Intelligence (EI) paradigm has recently emerged as a promising solution to overcome the inherent limitations of cloud computing (latency, autonomy, cost, etc.) in the development and provision of next-generation Internet of Things (IoT) services. Therefore, motivated by its increasing popularity, relevant research effort was expended in order to explore, from different perspectives and at different degrees of detail, the many facets of EI. In such a context, the aim of this paper was to analyze the wide landscape on EI by providing a systematic analysis of the state-of-the-art manuscripts in the form of a tertiary study (i.e., a review of literature reviews, surveys, and mapping studies) and according to the guidelines of the PRISMA methodology. A comparison framework is, hence, provided and sound research questions outlined, aimed at exploring (for the benefit of both experts and beginners) the past, present, and future directions of the EI paradigm and its relationships with the IoT and the cloud computing worlds. Full article
Show Figures

Figure 1

16 pages, 3458 KiB  
Article
An Obstacle-Finding Approach for Autonomous Mobile Robots Using 2D LiDAR Data
by Lesia Mochurad, Yaroslav Hladun and Roman Tkachenko
Big Data Cogn. Comput. 2023, 7(1), 43; https://doi.org/10.3390/bdcc7010043 - 1 Mar 2023
Cited by 15 | Viewed by 3716
Abstract
Obstacle detection is crucial for the navigation of autonomous mobile robots: it is necessary to ensure their presence as accurately as possible and find their position relative to the robot. Autonomous mobile robots for indoor navigation purposes use several special sensors for various [...] Read more.
Obstacle detection is crucial for the navigation of autonomous mobile robots: it is necessary to ensure their presence as accurately as possible and find their position relative to the robot. Autonomous mobile robots for indoor navigation purposes use several special sensors for various tasks. One such study is localizing the robot in space. In most cases, the LiDAR sensor is employed to solve this problem. In addition, the data from this sensor are critical, as the sensor is directly related to the distance of objects and obstacles surrounding the robot, so LiDAR data can be used for detection. This article is devoted to developing an obstacle detection algorithm based on 2D LiDAR sensor data. We propose a parallelization method to speed up this algorithm while processing big data. The result is an algorithm that finds obstacles and objects with high accuracy and speed: it receives a set of points from the sensor and data about the robot’s movements. It outputs a set of line segments, where each group of such line segments describes an object. The two proposed metrics assessed accuracy, and both averages are high: 86% and 91% for the first and second metrics, respectively. The proposed method is flexible enough to optimize it for a specific configuration of the LiDAR sensor. Four hyperparameters are experimentally found for a given sensor configuration to maximize the correspondence between real and found objects. The work of the proposed algorithm has been carefully tested on simulated and actual data. The authors also investigated the relationship between the selected hyperparameters’ values and the algorithm’s efficiency. Potential applications, limitations, and opportunities for future research are discussed. Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
Show Figures

Figure 1

10 pages, 1071 KiB  
Article
Adoption Case of IIoT and Machine Learning to Improve Energy Consumption at a Process Manufacturing Firm, under Industry 5.0 Model
by Andrés Redchuk, Federico Walas Mateo, Guadalupe Pascal and Julian Eloy Tornillo
Big Data Cogn. Comput. 2023, 7(1), 42; https://doi.org/10.3390/bdcc7010042 - 24 Feb 2023
Cited by 5 | Viewed by 3144
Abstract
Considering the novel concept of Industry 5.0 model, where sustainability is aimed together with integration in the value chain and centrality of people in the production environment, this article focuses on a case where energy efficiency is achieved. The work presents a food [...] Read more.
Considering the novel concept of Industry 5.0 model, where sustainability is aimed together with integration in the value chain and centrality of people in the production environment, this article focuses on a case where energy efficiency is achieved. The work presents a food industry case where a low-code AI platform was adopted to improve the efficiency and lower environmental footprint impact of its operations. The paper describes the adoption process of the solution integrated with an IIoT architecture that generates data to achieve process optimization. The case shows how a low-code AI platform can ease energy efficiency, considering people in the process, empowering them, and giving a central role in the improvement opportunity. The paper includes a conceptual framework on issues related to Industry 5.0 model, the food industry, IIoT, and machine learning. The adoption case’s relevancy is marked by how the business model looks to democratize artificial intelligence in industrial firms. The proposed model delivers value to ease traditional industries to obtain better operational results and contribute to a better use of resources. Finally, the work intends to go through opportunities that arise around artificial intelligence as a driver for new business and operating models considering the role of people in the process. By empowering industrial engineers with data driven solutions, organizations can ensure that their domain expertise can be applied to data insights to achieve better outcomes. Full article
Show Figures

Figure 1

18 pages, 5142 KiB  
Article
Analyzing the Performance of Transformers for the Prediction of the Blood Glucose Level Considering Imputation and Smoothing
by Edgar Acuna, Roxana Aparicio and Velcy Palomino
Big Data Cogn. Comput. 2023, 7(1), 41; https://doi.org/10.3390/bdcc7010041 - 23 Feb 2023
Cited by 3 | Viewed by 2919
Abstract
In this paper we investigate the effect of two preprocessing techniques, data imputation and smoothing, in the prediction of blood glucose level in type 1 diabetes patients, using a novel deep learning model called Transformer. We train three models: XGBoost, a one-dimensional convolutional [...] Read more.
In this paper we investigate the effect of two preprocessing techniques, data imputation and smoothing, in the prediction of blood glucose level in type 1 diabetes patients, using a novel deep learning model called Transformer. We train three models: XGBoost, a one-dimensional convolutional neural network (1D-CNN), and the Transformer model to predict future blood glucose levels for a 30-min horizon using a 60-min time series history in the OhioT1DM dataset. We also compare four methods of handling missing time series data during the model training: hourly mean, linear interpolation, cubic interpolation, and spline interpolation; and two smoothing techniques: Kalman smoothing and smoothing splines. Our experiments show that the Transformer performs better than XGBoost and 1D-CNN when only continuous glucose monitoring (CGM) is used as a predictor, and that it is very competitive against XGBoost when CGM and carbohydrate intake from the meal are used to predict blood glucose level. Overall, our results are more accurate than those appearing in the literature. Full article
Show Figures

Figure 1

25 pages, 6265 KiB  
Article
COVID-19 Classification through Deep Learning Models with Three-Channel Grayscale CT Images
by Maisarah Mohd Sufian, Ervin Gubin Moung, Mohd Hanafi Ahmad Hijazi, Farashazillah Yahya, Jamal Ahmad Dargham, Ali Farzamnia, Florence Sia and Nur Faraha Mohd Naim
Big Data Cogn. Comput. 2023, 7(1), 36; https://doi.org/10.3390/bdcc7010036 - 16 Feb 2023
Cited by 4 | Viewed by 3760
Abstract
COVID-19, an infectious coronavirus disease, has triggered a pandemic that has claimed many lives. Clinical institutes have long considered computed tomography (CT) as an excellent and complementary screening method to reverse transcriptase-polymerase chain reaction (RT-PCR). Because of the limited dataset available on COVID-19, [...] Read more.
COVID-19, an infectious coronavirus disease, has triggered a pandemic that has claimed many lives. Clinical institutes have long considered computed tomography (CT) as an excellent and complementary screening method to reverse transcriptase-polymerase chain reaction (RT-PCR). Because of the limited dataset available on COVID-19, transfer learning-based models have become the go-to solutions for automatic COVID-19 detection. However, CT images are typically provided in grayscale, thus posing a challenge for automatic detection using pre-trained models, which were previously trained on RGB images. Several methods have been proposed in the literature for converting grayscale images to RGB (three-channel) images for use with pre-trained deep-learning models, such as pseudo-colorization, replication, and colorization. The most common method is replication, where the one-channel grayscale image is repeated in the three-channel image. While this technique is simple, it does not provide new information and can lead to poor performance due to redundant image features fed into the DL model. This study proposes a novel image pre-processing method for grayscale medical images that utilize Histogram Equalization (HE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) to create a three-channel image representation that provides different information on each channel. The effectiveness of this method is evaluated using six other pre-trained models, including InceptionV3, MobileNet, ResNet50, VGG16, ViT-B16, and ViT-B32. The results show that the proposed image representation significantly improves the classification performance of the models, with the InceptionV3 model achieving an accuracy of 99.60% and a recall (also referred as sensitivity) of 99.59%. The proposed method addresses the limitation of using grayscale medical images for COVID-19 detection and can potentially improve the early detection and control of the disease. Additionally, the proposed method can be applied to other medical imaging tasks with a grayscale image input, thus making it a generalizable solution. Full article
Show Figures

Figure 1

10 pages, 1754 KiB  
Article
“What Can ChatGPT Do?” Analyzing Early Reactions to the Innovative AI Chatbot on Twitter
by Viriya Taecharungroj
Big Data Cogn. Comput. 2023, 7(1), 35; https://doi.org/10.3390/bdcc7010035 - 16 Feb 2023
Cited by 230 | Viewed by 36644
Abstract
In this study, the author collected tweets about ChatGPT, an innovative AI chatbot, in the first month after its launch. A total of 233,914 English tweets were analyzed using the latent Dirichlet allocation (LDA) topic modeling algorithm to answer the question “what can [...] Read more.
In this study, the author collected tweets about ChatGPT, an innovative AI chatbot, in the first month after its launch. A total of 233,914 English tweets were analyzed using the latent Dirichlet allocation (LDA) topic modeling algorithm to answer the question “what can ChatGPT do?”. The results revealed three general topics: news, technology, and reactions. The author also identified five functional domains: creative writing, essay writing, prompt writing, code writing, and answering questions. The analysis also found that ChatGPT has the potential to impact technologies and humans in both positive and negative ways. In conclusion, the author outlines four key issues that need to be addressed as a result of this AI advancement: the evolution of jobs, a new technological landscape, the quest for artificial general intelligence, and the progress-ethics conundrum. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

16 pages, 4313 KiB  
Article
A Novel Approach for Diabetic Retinopathy Screening Using Asymmetric Deep Learning Features
by Pradeep Kumar Jena, Bonomali Khuntia, Charulata Palai, Manjushree Nayak, Tapas Kumar Mishra and Sachi Nandan Mohanty
Big Data Cogn. Comput. 2023, 7(1), 25; https://doi.org/10.3390/bdcc7010025 - 29 Jan 2023
Cited by 52 | Viewed by 4685
Abstract
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of the fundus image. As bright [...] Read more.
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of the fundus image. As bright lesions are prominent in the green channel, we applied contrast-limited adaptive histogram equalization (CLAHE) on the green channel for image enhancement. This work proposes a novel diabetic retinopathy screening technique using an asymmetric deep learning feature. The asymmetric deep learning features are extracted using U-Net for segmentation of the optic disc and blood vessels. Then a convolutional neural network (CNN) with a support vector machine (SVM) is used for the DR lesions classification. The lesions are classified into four classes, i.e., normal, microaneurysms, hemorrhages, and exudates. The proposed method is tested with two publicly available retinal image datasets, i.e., APTOS and MESSIDOR. The accuracy achieved for non-diabetic retinopathy detection is 98.6% and 91.9% for the APTOS and MESSIDOR datasets, respectively. The accuracies of exudate detection for these two datasets are 96.9% and 98.3%, respectively. The accuracy of the DR screening system is improved due to the precise retinal image segmentation. Full article
Show Figures

Figure 1

20 pages, 3764 KiB  
Article
A Real-Time Computer Vision Based Approach to Detection and Classification of Traffic Incidents
by Mohammed Imran Basheer Ahmed, Rim Zaghdoud, Mohammed Salih Ahmed, Razan Sendi, Sarah Alsharif, Jomana Alabdulkarim, Bashayr Adnan Albin Saad, Reema Alsabt, Atta Rahman and Gomathi Krishnasamy
Big Data Cogn. Comput. 2023, 7(1), 22; https://doi.org/10.3390/bdcc7010022 - 28 Jan 2023
Cited by 53 | Viewed by 9862
Abstract
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the [...] Read more.
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the response speed of incident management, which in turn minimizes road injuries that have been induced by the accident’s occurrence. To attain a permeating effect in increasing the entailed demand for road traffic security and safety, this paper presents a real-time traffic incident detection and alert system that is based on a computer vision approach. The proposed framework consists of three models, each of which is integrated within a prototype interface to fully visualize the system’s overall architecture. To begin, the vehicle detection and tracking model utilized the YOLOv5 object detector with the DeepSORT tracker to detect and track the vehicles’ movements by allocating a unique identification number (ID) to each vehicle. This model attained a mean average precision (mAP) of 99.2%. Second, a traffic accident and severity classification model attained a mAP of 83.3% while utilizing the YOLOv5 algorithm to accurately detect and classify an accident’s severity level, sending an immediate alert message to the nearest hospital if a severe accident has taken place. Finally, the ResNet152 algorithm was utilized to detect the ignition of a fire following the accident’s occurrence; this model achieved an accuracy rate of 98.9%, with an automated alert being sent to the fire station if this perilous event occurred. This study employed an innovative parallel computing technique for reducing the overall complexity and inference time of the AI-based system to run the proposed system in a concurrent and parallel manner. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 3402 KiB  
Article
X-Wines: A Wine Dataset for Recommender Systems and Machine Learning
by Rogério Xavier de Azambuja, A. Jorge Morais and Vítor Filipe
Big Data Cogn. Comput. 2023, 7(1), 20; https://doi.org/10.3390/bdcc7010020 - 22 Jan 2023
Cited by 7 | Viewed by 7936
Abstract
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work [...] Read more.
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work presents X-Wines, a new and consistent wine dataset containing 100,000 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries. A demonstration of some applications using X-Wines in the scope of recommender systems with deep learning algorithms is also presented. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 975 KiB  
Article
Federated Learning to Safeguard Patients Data: A Medical Image Retrieval Case
by Gurtaj Singh, Vincenzo Violi and Marco Fisichella
Big Data Cogn. Comput. 2023, 7(1), 18; https://doi.org/10.3390/bdcc7010018 - 18 Jan 2023
Cited by 12 | Viewed by 3899
Abstract
Healthcare data are distributed and confidential, making it difficult to use centralized automatic diagnostic techniques. For example, different hospitals hold the electronic health records (EHRs) of different patient populations; however, transferring this data between hospitals is difficult due to the sensitive nature of [...] Read more.
Healthcare data are distributed and confidential, making it difficult to use centralized automatic diagnostic techniques. For example, different hospitals hold the electronic health records (EHRs) of different patient populations; however, transferring this data between hospitals is difficult due to the sensitive nature of the information. This presents a significant obstacle to the development of efficient and generalizable analytical methods that require a large amount of diverse Big Data. Federated learning allows multiple institutions to work together to develop a machine learning algorithm without sharing their data. We conducted a systematic study to analyze the current state of FL in the healthcare industry and explore both the limitations of this technology and its potential. Organizations share the parameters of their models with each other. This allows them to reap the benefits of a model developed with a richer data set while protecting the confidentiality of their data. Standard methods for large-scale machine learning, distributed optimization, and privacy-friendly data analytics need to be fundamentally rethought to address the new problems posed by training on diverse networks that may contain large amounts of data. In this article, we discuss the particular qualities and difficulties of federated learning, provide a comprehensive overview of current approaches, and outline several directions for future work that are relevant to a variety of research communities. These issues are important to many different research communities. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

24 pages, 2389 KiB  
Article
The Extended Digital Maturity Model
by Tining Haryanti, Nur Aini Rakhmawati and Apol Pribadi Subriadi
Big Data Cogn. Comput. 2023, 7(1), 17; https://doi.org/10.3390/bdcc7010017 - 17 Jan 2023
Cited by 21 | Viewed by 11189
Abstract
The Digital Transformation (DX) potentially affects productivity and efficiency while offering high risks to organizations. Necessary frameworks and tools to help organizations navigate such radical changes are needed. An extended framework of DMM is presented through a comparative analysis of various digital maturity [...] Read more.
The Digital Transformation (DX) potentially affects productivity and efficiency while offering high risks to organizations. Necessary frameworks and tools to help organizations navigate such radical changes are needed. An extended framework of DMM is presented through a comparative analysis of various digital maturity models and qualitative approaches through expert feedback. The maturity level determination uses the Emprise test of the international standard ISO/IEC Assessment known as SPICE. This research reveals seven interrelated dimensions for supporting the success of DX as a form of development of an existing Maturity Model. The DX–Self Assessment Maturity Model (DX-SAMM) is built to guide organizations by providing a broad roadmap for improving digital maturity. This article presents a digital maturity model from a holistic point of view and meets the criteria for assessment maturity. The case study results show that DX-SAMM can identify DX maturity levels while providing roadmap recommendations for increasing maturity levels in every aspect of its dimensions. It offers practical implications for improving maturity levels and the ease of real-time monitoring and evaluating digital maturity. With the development of maturity measurement, DX-SAMM contributes to the sustainability of the organization by proposing DX strategies in the future based on the current maturity achievements. Full article
(This article belongs to the Special Issue Human Factor in Information Systems Development and Management)
Show Figures

Figure 1

31 pages, 732 KiB  
Systematic Review
Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods
by Tiago P. Pagano, Rafael B. Loureiro, Fernanda V. N. Lisboa, Rodrigo M. Peixoto, Guilherme A. S. Guimarães, Gustavo O. R. Cruz, Maira M. Araujo, Lucas L. Santos, Marco A. S. Cruz, Ewerton L. S. Oliveira, Ingrid Winkler and Erick G. S. Nascimento
Big Data Cogn. Comput. 2023, 7(1), 15; https://doi.org/10.3390/bdcc7010015 - 13 Jan 2023
Cited by 61 | Viewed by 26156
Abstract
One of the difficulties of artificial intelligence is to ensure that model decisions are fair and free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and mitigate algorithmic unfairness and bias. This study examines the current knowledge on [...] Read more.
One of the difficulties of artificial intelligence is to ensure that model decisions are fair and free of bias. In research, datasets, metrics, techniques, and tools are applied to detect and mitigate algorithmic unfairness and bias. This study examines the current knowledge on bias and unfairness in machine learning models. The systematic review followed the PRISMA guidelines and is registered on OSF plataform. The search was carried out between 2021 and early 2022 in the Scopus, IEEE Xplore, Web of Science, and Google Scholar knowledge bases and found 128 articles published between 2017 and 2022, of which 45 were chosen based on search string optimization and inclusion and exclusion criteria. We discovered that the majority of retrieved works focus on bias and unfairness identification and mitigation techniques, offering tools, statistical approaches, important metrics, and datasets typically used for bias experiments. In terms of the primary forms of bias, data, algorithm, and user interaction were addressed in connection to the preprocessing, in-processing, and postprocessing mitigation methods. The use of Equalized Odds, Opportunity Equality, and Demographic Parity as primary fairness metrics emphasizes the crucial role of sensitive attributes in mitigating bias. The 25 datasets chosen span a wide range of areas, including criminal justice image enhancement, finance, education, product pricing, and health, with the majority including sensitive attributes. In terms of tools, Aequitas is the most often referenced, yet many of the tools were not employed in empirical experiments. A limitation of current research is the lack of multiclass and multimetric studies, which are found in just a few works and constrain the investigation to binary-focused method. Furthermore, the results indicate that different fairness metrics do not present uniform results for a given use case, and that more research with varied model architectures is necessary to standardize which ones are more appropriate for a given context. We also observed that all research addressed the transparency of the algorithm, or its capacity to explain how decisions are taken. Full article
Show Figures

Figure 1

28 pages, 4508 KiB  
Article
Big Data Analytics Applications in Information Management Driving Operational Efficiencies and Decision-Making: Mapping the Field of Knowledge with Bibliometric Analysis Using R
by Konstantina Ragazou, Ioannis Passas, Alexandros Garefalakis, Emilios Galariotis and Constantin Zopounidis
Big Data Cogn. Comput. 2023, 7(1), 13; https://doi.org/10.3390/bdcc7010013 - 12 Jan 2023
Cited by 19 | Viewed by 11190
Abstract
Organizations may examine both past and present data with the aid of information management, giving them access to all the knowledge they need to make sound strategic choices. For the majority of contemporary enterprises, using data to make relevant, valid, and timely choices [...] Read more.
Organizations may examine both past and present data with the aid of information management, giving them access to all the knowledge they need to make sound strategic choices. For the majority of contemporary enterprises, using data to make relevant, valid, and timely choices has become a must for success. The volume and format of data have changed significantly over the past few years as a result of the development of new technologies and applications, but there are also impressive possibilities for their analysis and processing. This study offers a bibliometric analysis of 650 publications written by 1977 academics on the use of information management and big data analytics. The Bibliometrix function in the R package and VOSviewer program were used to obtain the bibliographic data from the Scopus database and to analyze it. Based on citation analysis criteria, the top research journals, authors, and organizations were identified. The cooperation network at the author level reveals the connections between academics throughout the world, and Multiple Correspondence Analysis (MCA) identifies the research holes in the area. The recommendations for further study are influenced by the findings. Full article
Show Figures

Figure 1

17 pages, 1267 KiB  
Review
Impact of Artificial Intelligence on COVID-19 Pandemic: A Survey of Image Processing, Tracking of Disease, Prediction of Outcomes, and Computational Medicine
by Khaled H. Almotairi, Ahmad MohdAziz Hussein, Laith Abualigah, Sohaib K. M. Abujayyab, Emad Hamdi Mahmoud, Bassam Omar Ghanem and Amir H. Gandomi
Big Data Cogn. Comput. 2023, 7(1), 11; https://doi.org/10.3390/bdcc7010011 - 11 Jan 2023
Cited by 19 | Viewed by 8922
Abstract
Integrating machine learning technologies into artificial intelligence (AI) is at the forefront of the scientific and technological tools employed to combat the COVID-19 pandemic. This study assesses different uses and deployments of modern technology for combating the COVID-19 pandemic at various levels, such [...] Read more.
Integrating machine learning technologies into artificial intelligence (AI) is at the forefront of the scientific and technological tools employed to combat the COVID-19 pandemic. This study assesses different uses and deployments of modern technology for combating the COVID-19 pandemic at various levels, such as image processing, tracking of disease, prediction of outcomes, and computational medicine. The results prove that computerized tomography (CT) scans help to diagnose patients infected by COVID-19. This includes two-sided, multilobar ground glass opacification (GGO) by a posterior distribution or peripheral, primarily in the lower lobes, and fewer recurrences in the intermediate lobe. An extensive search of modern technology databases relating to COVID-19 was undertaken. Subsequently, a review of the extracted information from the database search looked at how technology can be employed to tackle the pandemic. We discussed the technological advancements deployed to alleviate the communicability and effect of the pandemic. Even though there are many types of research on the use of technology in combating COVID-19, the application of technology in combating COVID-19 is still not yet fully explored. In addition, we suggested some open research issues and challenges in deploying AI technology to combat the global pandemic. Full article
Show Figures

Figure 1

20 pages, 698 KiB  
Review
Artificial Intelligence in Pharmaceutical and Healthcare Research
by Subrat Kumar Bhattamisra, Priyanka Banerjee, Pratibha Gupta, Jayashree Mayuren, Susmita Patra and Mayuren Candasamy
Big Data Cogn. Comput. 2023, 7(1), 10; https://doi.org/10.3390/bdcc7010010 - 11 Jan 2023
Cited by 50 | Viewed by 32804
Abstract
Artificial intelligence (AI) is a branch of computer science that allows machines to work efficiently, can analyze complex data. The research focused on AI has increased tremendously, and its role in healthcare service and research is emerging at a greater pace. This review [...] Read more.
Artificial intelligence (AI) is a branch of computer science that allows machines to work efficiently, can analyze complex data. The research focused on AI has increased tremendously, and its role in healthcare service and research is emerging at a greater pace. This review elaborates on the opportunities and challenges of AI in healthcare and pharmaceutical research. The literature was collected from domains such as PubMed, Science Direct and Google scholar using specific keywords and phrases such as ‘Artificial intelligence’, ‘Pharmaceutical research’, ‘drug discovery’, ‘clinical trial’, ‘disease diagnosis’, etc. to select the research and review articles published within the last five years. The application of AI in disease diagnosis, digital therapy, personalized treatment, drug discovery and forecasting epidemics or pandemics was extensively reviewed in this article. Deep learning and neural networks are the most used AI technologies; Bayesian nonparametric models are the potential technologies for clinical trial design; natural language processing and wearable devices are used in patient identification and clinical trial monitoring. Deep learning and neural networks were applied in predicting the outbreak of seasonal influenza, Zika, Ebola, Tuberculosis and COVID-19. With the advancement of AI technologies, the scientific community may witness rapid and cost-effective healthcare and pharmaceutical research as well as provide improved service to the general public. Full article
Show Figures

Figure 1

20 pages, 1799 KiB  
Article
An Information System Supporting Insurance Use Cases by Automated Anomaly Detection
by Thoralf Reis, Alexander Kreibich, Sebastian Bruchhaus, Thomas Krause, Florian Freund, Marco X. Bornschlegl and Matthias L. Hemmje
Big Data Cogn. Comput. 2023, 7(1), 4; https://doi.org/10.3390/bdcc7010004 - 28 Dec 2022
Cited by 3 | Viewed by 3568
Abstract
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance [...] Read more.
The increasing availability of vast quantities of data from various sources significantly impacts the insurance industry, although this industry has always been data driven. It accelerates manual processes and enables new products or business models. On the other hand, it also burdens insurance analysts and other users that need to cope with this development parallel to other global changes. A novel information system (IS) for artificial intelligence (AI)-supported big data analysis, introduced within this paper, shall help to overcome user overload and to empower human data analysts in the insurance industry. The IS research’s focus lies neither in novel algorithms nor datasets but in concepts that combine AI and big data analysis for synergies, such as usability enhancements. For this purpose, this paper systematically designs and implements an AI2VIS4BigData reference model to help information systems conform to automatically detect anomalies and increase its users’ confidence and efficiency. Practical relevance is assured by an interview with an insurance analyst to verify the demand for the developed system and derive all requirements from two insurance industry user stories. A core contribution is the introduction of the IS. Another significant contribution is an extension of the AI2VIS4BigData service-based architecture and user interface (UI) concept on AI and machine learning (ML)-based user empowerment and data transformation. The implemented prototype was applied to synthetic data to enable the evaluation of the system. The quantitative and qualitative evaluations confirm the system’s usability and applicability to the insurance domain yet reveal the need for improvements toward bigger quantities of data and further evaluations with a more extensive user group. Full article
Show Figures

Figure 1

20 pages, 1359 KiB  
Article
A Scientific Perspective on Using Artificial Intelligence in Sustainable Urban Development
by Emanuel Rieder, Matthias Schmuck and Alexandru Tugui
Big Data Cogn. Comput. 2023, 7(1), 3; https://doi.org/10.3390/bdcc7010003 - 20 Dec 2022
Cited by 12 | Viewed by 5447
Abstract
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent [...] Read more.
Digital transformation (or digitalization) is the process of continuous further development of digital technologies (such as smart devices, cloud services, and Big Data) that have a lasting impact on our economy and society. In this manner, digitalization is a huge driver for permanent change, even in the field of Sustainable Urban Development. In the wake of digitalization, expectations are changing, placing pressure at the societal level on the design and development of smart environments for everything that means Sustainable Urban Development. In this sense, the solution is the integration of Artificial Intelligence into Sustainable Urban Development, because technology can simplify people’s lives. The aim of this paper is to ascertain which Sustainable Urban Development dimensions are taken into account when integrating Artificial Intelligence and what results can be achieved. These questions formed the basic framework for this research article. In order to make the current state of Artificial Intelligence in Sustainable Urban Development as a snapshot visible, a systematic review of the current literature between 2012 and 2022 was conducted. The data were collected and analyzed using PRISMA. Based on the studies identified, we found a significant growth in studies, starting in 2018, and that Artificial Intelligence applications refer to the Sustainable Urban Development dimensions of environmental protection, economic development, social justice and equity, culture, and governance. The used Artificial Intelligence techniques in Sustainable Urban Development cover a broad field of Artificial Intelligence, such as Artificial Intelligence in general, Machine Learning, Deep Learning, Artificial Neuronal Networks, Operations Research, Predictive Analytics, and Data Mining. However, with the integration of Artificial Intelligence in Sustainable Urban Development, challenges are marked out. These include responsible municipal policies, awareness of data quality, privacy and data security, the formation of partnerships among stakeholders (e.g., local citizens, civil society, industry, and various levels of government), and transparency and traceability in the implementation and rollout of Artificial Intelligence. A first step was taken towards providing an overview of the possible applications of Artificial Intelligence in Sustainable Urban Development. It was clearly shown that Artificial Intelligence is also gaining ground in this sector. Full article
Show Figures

Figure 1

19 pages, 784 KiB  
Review
A Survey on Big Data in Pharmacology, Toxicology and Pharmaceutics
by Krithika Latha Bhaskaran, Richard Sakyi Osei, Evans Kotei, Eric Yaw Agbezuge, Carlos Ankora and Ernest D. Ganaa
Big Data Cogn. Comput. 2022, 6(4), 161; https://doi.org/10.3390/bdcc6040161 - 19 Dec 2022
Cited by 7 | Viewed by 3380
Abstract
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make [...] Read more.
Patients, hospitals, sensors, researchers, providers, phones, and healthcare organisations are producing enormous amounts of data in both the healthcare and drug detection sectors. The real challenge in these sectors is to find, investigate, manage, and collect information from patients in order to make their lives easier and healthier, not only in terms of formulating new therapies and understanding diseases, but also to predict the results at earlier stages and make effective decisions. The volumes of data available in the fields of pharmacology, toxicology, and pharmaceutics are constantly increasing. These increases are driven by advances in technology, which allow for the analysis of ever-larger data sets. Big Data (BD) has the potential to transform drug development and safety testing by providing new insights into the effects of drugs on human health. However, harnessing this potential involves several challenges, including the need for specialised skills and infrastructure. In this survey, we explore how BD approaches are currently being used in the pharmacology, toxicology, and pharmaceutics fields; in particular, we highlight how researchers have applied BD in pharmacology, toxicology, and pharmaceutics to address various challenges and establish solutions. A comparative analysis helps to trace the implementation of big data in the fields of pharmacology, toxicology, and pharmaceutics. Certain relevant limitations and directions for future research are emphasised. The pharmacology, toxicology, and pharmaceutics fields are still at an early stage of BD adoption, and there are many research challenges to be overcome, in order to effectively employ BD to address specific issues. Full article
Show Figures

Figure 1

20 pages, 1982 KiB  
Article
Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter
by Somayeh Labafi, Sanee Ebrahimzadeh, Mohamad Mahdi Kavousi, Habib Abdolhossein Maregani and Samad Sepasgozar
Big Data Cogn. Comput. 2022, 6(4), 160; https://doi.org/10.3390/bdcc6040160 - 19 Dec 2022
Viewed by 3040
Abstract
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and [...] Read more.
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and intense user engagement, the question to ask is whether the data on social media can be used as evidence in the policy-making process. The question gives rise to the debate on what characteristics of data should be considered as evidence. Despite the numerous research studies carried out on social media analysis or policy-making, this domain has not been dealt with through an “evidence detection” lens. Thus, this study addresses the gap in the literature on how to analyze the big text data produced by social media and how to use it for policy-making based on evidence detection. The present paper seeks to fill the gap by developing and offering a model that can help policy-makers to distinguish “evidence” from “non-evidence”. To do so, in the first phase of the study, the researchers elicited the characteristics of the “evidence” by conducting a thematic analysis of semi-structured interviews with experts and policy-makers. In the second phase, the developed model was tested against 6-month data elicited from Twitter accounts. The experimental results show that the evidence detection model performed better with decision tree (DT) than the other algorithms. Decision tree (DT) outperformed the other algorithms by an 85.9% accuracy score. This study shows how the model managed to fulfill the aim of the present study, which was detecting Twitter posts that can be used as evidence. This study contributes to the body of knowledge by exploring novel models of text processing and offering an efficient method for analyzing big text data. The practical implication of the study also lies in its efficiency and ease of use, which offers the required evidence for policy-makers. Full article
Show Figures

Figure 1

13 pages, 2240 KiB  
Article
Proposal of Decentralized P2P Service Model for Transfer between Blockchain-Based Heterogeneous Cryptocurrencies and CBDCs
by Keundug Park and Heung-Youl Youm
Big Data Cogn. Comput. 2022, 6(4), 159; https://doi.org/10.3390/bdcc6040159 - 19 Dec 2022
Cited by 5 | Viewed by 3448
Abstract
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies [...] Read more.
This paper proposes a solution to the transfer problem between blockchain-based heterogeneous cryptocurrencies and CBDCs, with research derived from an analysis of the existing literature. Interoperability between heterogeneous blockchains has been an obstacle to service diversity and user convenience. Many types of cryptocurrencies are currently trading on the market, and many countries are researching and testing central bank digital currencies (CBDCs). In this paper, existing interoperability studies and solutions between heterogeneous blockchains and differences from the proposed service model are described. To enhance digital financial services and improve user convenience, transfer between heterogeneous cryptocurrencies, transfer between heterogeneous CBDCs, and transfer between cryptocurrency and CBDC should be required. This paper proposes an interoperable architecture between heterogeneous blockchains, and a decentralized peer-to-peer (P2P) service model based on the interoperable architecture for transferring between blockchain-based heterogeneous cryptocurrencies and CBDCs. Security threats to the proposed service model are identified and security requirements to prevent the identified security threats are specified. The mentioned security threats and security requirements should be considered when implementing the proposed service model. Full article
Show Figures

Figure 1

23 pages, 735 KiB  
Review
Explore Big Data Analytics Applications and Opportunities: A Review
by Zaher Ali Al-Sai, Mohd Heikal Husin, Sharifah Mashita Syed-Mohamad, Rasha Moh’d Sadeq Abdin, Nour Damer, Laith Abualigah and Amir H. Gandomi
Big Data Cogn. Comput. 2022, 6(4), 157; https://doi.org/10.3390/bdcc6040157 - 14 Dec 2022
Cited by 21 | Viewed by 12125
Abstract
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big [...] Read more.
Big data applications and analytics are vital in proposing ultimate strategic decisions. The existing literature emphasizes that big data applications and analytics can empower those who apply Big Data Analytics during the COVID-19 pandemic. This paper reviews the existing literature specializing in big data applications pre and peri-COVID-19. A comparison between Pre and Peri of the pandemic for using Big Data applications is presented. The comparison is expanded to four highly recognized industry fields: Healthcare, Education, Transportation, and Banking. A discussion on the effectiveness of the four major types of data analytics across the mentioned industries is highlighted. Hence, this paper provides an illustrative description of the importance of big data applications in the era of COVID-19, as well as aligning the applications to their relevant big data analytics models. This review paper concludes that applying the ultimate big data applications and their associated data analytics models can harness the significant limitations faced by organizations during one of the most fateful pandemics worldwide. Future work will conduct a systematic literature review and a comparative analysis of the existing Big Data Systems and models. Moreover, future work will investigate the critical challenges of Big Data Analytics and applications during the COVID-19 pandemic. Full article
Show Figures

Figure 1

Back to TopTop