Next Issue
Volume 8, October
Previous Issue
Volume 8, August
 
 

Big Data Cogn. Comput., Volume 8, Issue 9 (September 2024) – 29 articles

Cover Story (view full-size image): Sentiment analysis is an important task in natural language processing (NLP), enabling the extraction of opinions from user-generated content such as product reviews and social media posts. This paper presents a comparative performance study of modern sentiment classification methods, including artificial neural networks, transfer learning, and large language models, against traditional machine learning models on a large dataset of Greek product reviews from e-commerce websites. The results show that advanced models like GreekBERT and GPT-4 outperform traditional machine learning classifiers, confirming their superior effectiveness for Greek sentiment analysis. This work also provides valuable insights into the capabilities of advanced models for Greek sentiment classification. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
17 pages, 2813 KiB  
Article
Brain Tumor Detection Using Magnetic Resonance Imaging and Convolutional Neural Networks
by Rafael Martínez-Del-Río-Ortega, Javier Civit-Masot, Francisco Luna-Perejón and Manuel Domínguez-Morales
Big Data Cogn. Comput. 2024, 8(9), 123; https://doi.org/10.3390/bdcc8090123 - 21 Sep 2024
Abstract
Early and precise detection of brain tumors is critical for improving clinical outcomes and patient quality of life. This research focused on developing an image classifier using convolutional neural networks (CNN) to detect brain tumors in magnetic resonance imaging (MRI). Brain tumors are [...] Read more.
Early and precise detection of brain tumors is critical for improving clinical outcomes and patient quality of life. This research focused on developing an image classifier using convolutional neural networks (CNN) to detect brain tumors in magnetic resonance imaging (MRI). Brain tumors are a significant cause of morbidity and mortality worldwide, with approximately 300,000 new cases diagnosed annually. Magnetic resonance imaging (MRI) offers excellent spatial resolution and soft tissue contrast, making it indispensable for identifying brain abnormalities. However, accurate interpretation of MRI scans remains challenging, due to human subjectivity and variability in tumor appearance. This study employed CNNs, which have demonstrated exceptional performance in medical image analysis, to address these challenges. Various CNN architectures were implemented and evaluated to optimize brain tumor detection. The best model achieved an accuracy of 97.5%, sensitivity of 99.2%, and binary accuracy of 98.2%, surpassing previous studies. These results underscore the potential of deep learning techniques in clinical applications, significantly enhancing diagnostic accuracy and reliability. Full article
Show Figures

Figure 1

33 pages, 918 KiB  
Article
The Relative Importance of Key Factors for Integrating Enterprise Resource Planning (ERP) Systems and Performance Management Practices in the UAE Healthcare Sector
by Karam Al-Assaf, Wadhah Alzahmi, Ryan Alshaikh, Zied Bahroun and Vian Ahmed
Big Data Cogn. Comput. 2024, 8(9), 122; https://doi.org/10.3390/bdcc8090122 - 13 Sep 2024
Abstract
This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency, [...] Read more.
This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency, patient care, and administrative processes. A literature review identified thirty-six critical factors, refined through expert interviews to highlight nine weak integration areas and two new factors. An online survey with 81 experts, who rated the 38 factors on a five-point Likert scale, provided data to calculate the Relative Importance Index (RII). The results reveal that employee involvement in performance metrics and effective organizational measures significantly impact system effectiveness and alignment. Mid-tier factors such as leadership and managerial support are essential for integration momentum, while foundational elements like infrastructure, scalability, security, and compliance are crucial for long-term success. The study recommends a holistic approach to these factors to maximize ERP benefits, offering insights for healthcare administrators and policymakers. Additionally, it highlights the need to address the challenges, opportunities, and ethical considerations associated with using digital health technology in healthcare. Future research should explore ERP integration challenges in public and private healthcare settings, tailoring systems to specific organizational needs. Full article
Show Figures

Figure 1

28 pages, 2936 KiB  
Systematic Review
Medical IoT Record Security and Blockchain: Systematic Review of Milieu, Milestones, and Momentum
by Simeon Okechukwu Ajakwe, Igboanusi Ikechi Saviour, Vivian Ukamaka Ihekoronye, Odinachi U. Nwankwo, Mohamed Abubakar Dini, Izuazu Urslla Uchechi, Dong-Seong Kim and Jae Min Lee
Big Data Cogn. Comput. 2024, 8(9), 121; https://doi.org/10.3390/bdcc8090121 - 12 Sep 2024
Abstract
The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security [...] Read more.
The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security issues, especially unauthorized third-party access. Over time, substantial progress has been made in preventing unauthorized access to this critical and highly classified information. This review investigated the mainstream security challenges associated with the transmissibility of medical records, the evolutionary security strategies for maintaining confidentiality, and the existential enablers of trustworthy and transparent authorization and authentication before data transmission can be carried out. The review adopted the PRSIMA-SPIDER methodology for a systematic review of 122 articles, comprising 9 surveys (7.37%) for qualitative analysis, 109 technical papers (89.34%), and 4 online reports (3.27%) for quantitative studies. The review outcome indicates that the sensitivity and confidentiality of a highly classified document, such as a medical record, demand unabridged authorization by the owner, unquestionable preservation by the host, untainted transparency in transmission, unbiased traceability, and ubiquitous security, which blockchain technology guarantees, although at the infancy stage. Therefore, developing blockchain-assisted frameworks for digital medical record preservation and addressing inherent technological hitches in blockchain will further accelerate transparent and trustworthy preservation, user authorization, and authentication of medical records before they are transmitted by the host for third-party access. Full article
(This article belongs to the Special Issue Research on Privacy and Data Security)
Show Figures

Figure 1

14 pages, 871 KiB  
Article
An Efficient Green AI Approach to Time Series Forecasting Based on Deep Learning
by Luis Balderas, Miguel Lastra and José M. Benítez
Big Data Cogn. Comput. 2024, 8(9), 120; https://doi.org/10.3390/bdcc8090120 - 11 Sep 2024
Abstract
Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this [...] Read more.
Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this area. On the other hand, growing concerns about the steady increase in the amount of resources required by deep learning-based tools have made Green AI gain traction as a move towards making machine learning more sustainable. In this paper, we present a deep learning-based time series forecasting methodology called GreeNNTSF, which aims to reduce the size of the resulting model, thereby diminishing the associated computational and energetic costs without giving up adequate forecasting performance. The methodology, based on the ODF2NNA algorithm, produces models that outperform state-of-the-art techniques not only in terms of prediction accuracy but also in terms of computational costs and memory footprint. To prove this claim, after presenting the main state-of-the-art methods that utilize deep learning for time series forecasting and introducing our methodology we test GreeNNTSF on a selection of real-world forecasting problems that are commonly used as benchmarks, such as SARS-CoV-2 and PhysioNet (medicine), Brazilian Weather (climate), WTI and Electricity (economics), and Traffic (smart cities). The results of each experiment conducted objectively demonstrate, rigorously following the experimentation presented in the original papers that addressed these problems, that our method is more competitive than other state-of-the-art approaches, producing more accurate and efficient models. Full article
Show Figures

Figure 1

18 pages, 13182 KiB  
Article
Hierarchical Progressive Image Forgery Detection and Localization Method Based on UNet
by Yang Liu, Xiaofei Li, Jun Zhang, Shuohao Li, Shengze Hu and Jun Lei
Big Data Cogn. Comput. 2024, 8(9), 119; https://doi.org/10.3390/bdcc8090119 - 10 Sep 2024
Abstract
The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately [...] Read more.
The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately detect and locate forged regions has become an important task. This paper proposes a hierarchical and progressive forged image detection and localization method called HPUNet. This method assigns more reasonable hierarchical multi-level labels to the dataset as supervisory information at different levels, following cognitive laws. Secondly, multiple types of features are extracted from AI-generated images for detection and localization, and the detection and localization results are combined to enhance the task-relevant features. Subsequently, HPUNet expands the obtained image features into four different resolutions and performs detection and localization at different levels in a coarse-to-fine cognitive order. To address the limited feature field of view caused by inconsistent forgery sizes, we employ three sets of densely cross-connected hierarchical networks for sufficient interaction between feature images at different resolutions. Finally, a UNet network with a soft-threshold-constrained feature enhancement module is used to achieve detection and localization at different scales, and the reliance on a progressive mechanism establishes relationships between different branches. We use ACC and F1 as evaluation metrics, and extensive experiments on our method and the baseline methods demonstrate the effectiveness of our approach. Full article
Show Figures

Figure 1

18 pages, 1889 KiB  
Article
DBSCAN SMOTE LSTM: Effective Strategies for Distributed Denial of Service Detection in Imbalanced Network Environments
by Rissal Efendi, Teguh Wahyono and Indrastanti Ratna Widiasari
Big Data Cogn. Comput. 2024, 8(9), 118; https://doi.org/10.3390/bdcc8090118 - 10 Sep 2024
Abstract
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced [...] Read more.
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced network environments. This research employed DBSCAN and SMOTE to increase the class distribution of the dataset by allowing models using LSTM to learn time anomalies effectively when DDoS attacks occur. The experiments carried out revealed significant improvement in the performance of the LSTM model when integrated with DBSCAN and SMOTE. These include validation loss results of 0.048 for LSTM DBSCAN and SMOTE and 0.1943 for LSTM without DBSCAN and SMOTE, with accuracy of 99.50 and 97.50. Apart from that, there was an increase in the F1 score from 93.4% to 98.3%. This research proved that DBSCAN and SMOTE can be used as an effective strategy to improve model performance in detecting DDoS attacks on heterogeneous networks, as well as increasing model robustness and reliability. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

40 pages, 4095 KiB  
Article
An End-to-End Scene Text Recognition for Bilingual Text
by Bayan M. Albalawi, Amani T. Jamal, Lama A. Al Khuzayem and Olaa A. Alsaedi
Big Data Cogn. Comput. 2024, 8(9), 117; https://doi.org/10.3390/bdcc8090117 - 9 Sep 2024
Abstract
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily [...] Read more.
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images. Full article
Show Figures

Figure 1

23 pages, 3337 KiB  
Article
Attention-Driven Transfer Learning Model for Improved IoT Intrusion Detection
by Salma Abdelhamid, Islam Hegazy, Mostafa Aref and Mohamed Roushdy
Big Data Cogn. Comput. 2024, 8(9), 116; https://doi.org/10.3390/bdcc8090116 - 9 Sep 2024
Abstract
The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This [...] Read more.
The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This study proposes an Intrusion Detection System (IDS) for IoT environments that leverages Transfer Learning (TL) and the Convolutional Block Attention Module (CBAM). We extensively evaluate four prominent pre-trained models, each integrated with an independent CBAM at the uppermost layer. Our methodology is validated using the BoT-IoT dataset, which undergoes preprocessing to rectify the imbalanced data distribution, eliminate redundancy, and reduce dimensionality. Subsequently, the tabular dataset is transformed into RGB images to enhance the interpretation of complex patterns. Our evaluation results demonstrate that integrating TL models with the CBAM significantly improves classification accuracy and reduces false-positive rates. Additionally, to further enhance the system performance, we employ an Ensemble Learning (EL) technique to aggregate predictions from the two best-performing models. The final findings prove that our TL-CBAM-EL model achieves superior performance, attaining an accuracy of 99.93% as well as high recall, precision, and F1-score. Henceforth, the proposed IDS is a robust and efficient solution for securing IoT networks. Full article
(This article belongs to the Special Issue Advances in Intelligent Defense Systems for the Internet of Things)
Show Figures

Figure 1

15 pages, 3809 KiB  
Article
QA-RAG: Exploring LLM Reliance on External Knowledge
by Aigerim Mansurova, Aiganym Mansurova and Aliya Nugumanova
Big Data Cogn. Comput. 2024, 8(9), 115; https://doi.org/10.3390/bdcc8090115 - 9 Sep 2024
Abstract
Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by [...] Read more.
Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by combining external knowledge with parametric memory. In contrast, our proposed QA-RAG solution relies solely on the data stored within an external knowledge base, specifically a dense vector index database. In this paper, we compare RAG configurations using two LLMs—Llama 2b and 13b—systematically examining their performance in three key RAG capabilities: noise robustness, knowledge gap detection, and external truth integration. The evaluation reveals that while our approach achieves an accuracy of 83.3%, showcasing its effectiveness across all baselines, the model still struggles significantly in terms of external truth integration. These findings suggest that considerable work is still required to fully leverage RAG in question-answering tasks. Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
Show Figures

Figure 1

11 pages, 3275 KiB  
Article
Analysis of Highway Vehicle Lane Change Duration Based on Survival Model
by Sheng Zhao, Shengwen Huang, Huiying Wen and Weiming Liu
Big Data Cogn. Comput. 2024, 8(9), 114; https://doi.org/10.3390/bdcc8090114 - 6 Sep 2024
Abstract
To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest [...] Read more.
To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest models to analyze the influence of various factors on lane change duration, assess their statistical significance, and compare the performance of multiple random survival forest models. Our findings indicate that several variables significantly impact lane change duration, including the standard deviation of lane-changing vehicles, lane-changing vehicle speed, distance to the following vehicle in the target lane, lane-changing vehicle length, and distance to the following vehicle in the current lane. Notably, the standard deviation and vehicle length act as protective factors, with increases in these variables correlating with longer lane change durations. Conversely, higher lane-changing vehicle speeds and shorter distances to following vehicles in both the current and target lanes are associated with shorter lane change durations, indicating their role as risk factors. Feature variable selection did not substantially improve the training performance of the random survival forest model based on our findings. However, validation set evaluation showed that careful feature variable selection can enhance model accuracy, leading to improved AUC values. These insights lay the groundwork for advancing research in predicting lane-changing behaviors, understanding lane-changing intentions, and developing pre-emptive safety measures against hazardous lane changes. Full article
Show Figures

Figure 1

25 pages, 632 KiB  
Article
Detection of Hate Speech, Racism and Misogyny in Digital Social Networks: Colombian Case Study
by Luis Gabriel Moreno-Sandoval, Alexandra Pomares-Quimbaya, Sergio Andres Barbosa-Sierra and Liliana Maria Pantoja-Rojas
Big Data Cogn. Comput. 2024, 8(9), 113; https://doi.org/10.3390/bdcc8090113 - 6 Sep 2024
Abstract
The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and [...] Read more.
The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and therefore, other languages, such as Spanish, are involved. This article addresses the task of identifying hate speech, racism, and misogyny in Spanish within the Colombian context on social networks, and introduces a gold standard dataset specifically developed for this purpose. Indeed, the experiment compares the performance of TLM models from Deep Learning methods, such as BERT, Roberta, XLM, and BETO adjusted to the Colombian slang domain, then compares the best TLM model against a GPT, having a significant impact on achieving more accurate predictions in this task. Finally, this study provides a detailed understanding of the different components used in the system, including the architecture of the models and the selection of functions. The best results show that the BERT model achieves an accuracy of 83.6% for hate speech detection, while the GPT model achieves an accuracy of 90.8% for racism speech and 90.4% for misogyny detection. Full article
Show Figures

Figure 1

16 pages, 840 KiB  
Article
Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection
by Bayode Ogunleye, Hemlata Sharma and Olamilekan Shobayo
Big Data Cogn. Comput. 2024, 8(9), 112; https://doi.org/10.3390/bdcc8090112 - 5 Sep 2024
Abstract
The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal [...] Read more.
The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal with data complexities, prone to overfitting, and limited in generalization. To this end, our paper examined the performance of several ML algorithms for early-stage depression detection using two benchmark social media datasets (D1 and D2). More specifically, we incorporated sentiment indicators to improve our model performance. Our experimental results showed that sentence bidirectional encoder representations from transformers (SBERT) numerical vectors fitted into the stacking ensemble model achieved comparable F1 scores of 69% in the dataset (D1) and 76% in the dataset (D2). Our findings suggest that utilizing sentiment indicators as an additional feature for depression detection yields an improved model performance, and thus, we recommend the development of a depressive term corpus for future work. Full article
Show Figures

Figure 1

19 pages, 7056 KiB  
Article
A Data-Centric Approach to Understanding the 2020 U.S. Presidential Election
by Satish Mahadevan Srinivasan and Yok-Fong Paat
Big Data Cogn. Comput. 2024, 8(9), 111; https://doi.org/10.3390/bdcc8090111 - 4 Sep 2024
Viewed by 72
Abstract
The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and [...] Read more.
The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and mining tweets from different individuals on a particular topic, we can capture not only the sentiments and emotions of an individual but also the sentiments and emotions expressed by a larger group. Using the well-known Lexicon-based NRC classifier, we classified nearly seven million tweets across seven battleground states in the U.S. to understand the emotions and sentiments expressed by U.S. citizens toward the 2020 presidential candidates. We used the emotions and sentiments expressed within these tweets as proxies for their votes and predicted the swing directions of each battleground state. When compared to the outcome of the 2020 presidential candidates, we were able to accurately predict the swing directions of four battleground states (Arizona, Michigan, Texas, and North Carolina), thus revealing the potential of this approach in predicting future election outcomes. The week-by-week analysis of the tweets using the NRC classifier corroborated well with the various political events that took place before the election, making it possible to understand the dynamics of the emotions and sentiments of the supporters in each camp. These research strategies and evidence-based insights may be translated into real-world settings and practical interventions to improve election outcomes. Full article
(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)
Show Figures

Figure 1

19 pages, 714 KiB  
Article
Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews
by Goran Mitrov, Boris Stanoev, Sonja Gievska, Georgina Mirceva and Eftim Zdravevski
Big Data Cogn. Comput. 2024, 8(9), 110; https://doi.org/10.3390/bdcc8090110 - 4 Sep 2024
Viewed by 134
Abstract
The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting [...] Read more.
The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

29 pages, 4437 KiB  
Article
Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems
by Atena Jalali Mojahed, Mohammad Hossein Moattar and Hamidreza Ghaffari
Big Data Cogn. Comput. 2024, 8(9), 109; https://doi.org/10.3390/bdcc8090109 - 4 Sep 2024
Viewed by 128
Abstract
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the [...] Read more.
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance. Full article
Show Figures

Figure 1

24 pages, 7001 KiB  
Article
Appendicitis Diagnosis: Ensemble Machine Learning and Explainable Artificial Intelligence-Based Comprehensive Approach
by Mohammed Gollapalli, Atta Rahman, Sheriff A. Kudos, Mohammed S. Foula, Abdullah Mahmoud Alkhalifa, Hassan Mohammed Albisher, Mohammed Taha Al-Hariri and Nazeeruddin Mohammad
Big Data Cogn. Comput. 2024, 8(9), 108; https://doi.org/10.3390/bdcc8090108 - 4 Sep 2024
Viewed by 153
Abstract
Appendicitis is a condition wherein the appendix becomes inflamed, and it can be difficult to diagnose accurately. The type of appendicitis can also be hard to determine, leading to misdiagnosis and difficulty in managing the condition. To avoid complications and reduce mortality, early [...] Read more.
Appendicitis is a condition wherein the appendix becomes inflamed, and it can be difficult to diagnose accurately. The type of appendicitis can also be hard to determine, leading to misdiagnosis and difficulty in managing the condition. To avoid complications and reduce mortality, early diagnosis and treatment are crucial. While Alvarado’s clinical scoring system is not sufficient, ultrasound and computed tomography (CT) imaging are effective but have downsides such as operator-dependency and radiation exposure. This study proposes the use of machine learning methods and a locally collected reliable dataset to enhance the identification of acute appendicitis while detecting the differences between complicated and non-complicated appendicitis. Machine learning can help reduce diagnostic errors and improve treatment decisions. This study conducted four different experiments using various ML algorithms, including K-nearest neighbors (KNN), DT, bagging, and stacking. The experimental results showed that the stacking model had the highest training accuracy, test set accuracy, precision, and F1 score, which were 97.51%, 92.63%, 95.29%, and 92.04%, respectively. Feature importance and explainable AI (XAI) identified neutrophils, WBC_Count, Total_LOS, P_O_LOS, and Symptoms_Days as the principal features that significantly affected the performance of the model. Based on the outcomes and feedback from medical health professionals, the scheme is promising in terms of its effectiveness in diagnosing of acute appendicitis. Full article
(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)
Show Figures

Figure 1

15 pages, 450 KiB  
Article
A Comparative Study of Sentiment Classification Models for Greek Reviews
by Panagiotis D. Michailidis
Big Data Cogn. Comput. 2024, 8(9), 107; https://doi.org/10.3390/bdcc8090107 - 4 Sep 2024
Viewed by 161
Abstract
In recent years, people have expressed their opinions and sentiments about products, services, and other issues on social media platforms and review websites. These sentiments are typically classified as either positive or negative based on their text content. Research interest in sentiment analysis [...] Read more.
In recent years, people have expressed their opinions and sentiments about products, services, and other issues on social media platforms and review websites. These sentiments are typically classified as either positive or negative based on their text content. Research interest in sentiment analysis for text reviews written in Greek is limited compared to that in English. Existing studies conducted for the Greek language have focused more on posts collected from social media platforms rather than on consumer reviews from e-commerce websites and have primarily used traditional machine learning (ML) methods, with little to no work utilizing advanced methods like neural networks, transfer learning, and large language models. This study addresses this gap by testing the hypothesis that modern methods for sentiment classification, including artificial neural networks (ANNs), transfer learning (TL), and large language models (LLMs), perform better than traditional ML models in analyzing a Greek consumer review dataset. Several classification methods, namely, ML, ANNs, TL, and LLMs, were evaluated and compared using performance metrics on a large collection of Greek product reviews. The empirical findings showed that the GreekBERT and GPT-4 models perform significantly better than traditional ML classifiers, with BERT achieving an accuracy of 96% and GPT-4 reaching 95%, while ANNs showed similar performance to ML models. This study confirms the hypothesis, with the BERT model achieving the highest classification accuracy. Full article
Show Figures

Figure 1

18 pages, 2336 KiB  
Article
Performance and Board Diversity: A Practical AI Perspective
by Lee-Wen Yang, Thi Thanh Binh Nguyen and Wei-Ju Young
Big Data Cogn. Comput. 2024, 8(9), 106; https://doi.org/10.3390/bdcc8090106 - 4 Sep 2024
Viewed by 154
Abstract
The face of corporate governance is changing as new technologies in the scope of artificial intelligence and data analytics are used to make better future-oriented decisions on performance management. This study attempts to provide empirical results to analyze when the impact of diversity [...] Read more.
The face of corporate governance is changing as new technologies in the scope of artificial intelligence and data analytics are used to make better future-oriented decisions on performance management. This study attempts to provide empirical results to analyze when the impact of diversity on the board of directors is most evident through the multi-breaks model and artificial neural networks. The input data for the simulation includes 853 electronic companies listed on the Taiwan Stock Exchange from 2000 to 2021. The empirical results show that the higher the percentage of female board members, the more influential the company’s performance is, which is only evident when the company is in good business condition. By integrating ANNs with multi-breakpoint regression, this study introduces a novel approach to management research, providing a detailed perspective on how board diversity impacts firm performance across different conditions. The ANN results show that using the number of business board members for predicting Return on Assets yields the highest accuracy, with female board members following closely in predictive effectiveness. The presence of women on the board contributes positively to ROA, particularly when the company is experiencing favorable business conditions and high profitability. Our analysis also reveals that a higher percentage of male board members improves company performance, but this benefit is observed only in highly favorable and unfavorable business conditions. Conversely, a higher percentage of business members tends to affect performance during periods of high profitability negatively. The power of the board of directors and significant shareholders is positively correlated with performance, whereas CEO power positively impacts performance only when it is not extremely low. Independent board members generally do not have a significant effect on profits. Additionally, the company’s asset value positively influences performance primarily when the return on assets is high, and increased financial leverage is associated with reduced profitability. Full article
(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)
Show Figures

Figure 1

18 pages, 723 KiB  
Article
Ethical AI in Financial Inclusion: The Role of Algorithmic Fairness on User Satisfaction and Recommendation
by Qin Yang and Young-Chan Lee
Big Data Cogn. Comput. 2024, 8(9), 105; https://doi.org/10.3390/bdcc8090105 - 3 Sep 2024
Viewed by 541
Abstract
This study investigates the impact of artificial intelligence (AI) on financial inclusion satisfaction and recommendation, with a focus on the ethical dimensions and perceived algorithmic fairness. Drawing upon organizational justice theory and the heuristic–systematic model, we examine how algorithm transparency, accountability, and legitimacy [...] Read more.
This study investigates the impact of artificial intelligence (AI) on financial inclusion satisfaction and recommendation, with a focus on the ethical dimensions and perceived algorithmic fairness. Drawing upon organizational justice theory and the heuristic–systematic model, we examine how algorithm transparency, accountability, and legitimacy influence users’ perceptions of fairness and, subsequently, their satisfaction with and likelihood to recommend AI-driven financial inclusion services. Through a survey-based quantitative analysis of 675 users in China, our results reveal that perceived algorithmic fairness acts as a significant mediating factor between the ethical attributes of AI systems and the user responses. Specifically, higher levels of transparency, accountability, and legitimacy enhance users’ perceptions of fairness, which, in turn, significantly increases both their satisfaction with AI-facilitated financial inclusion services and their likelihood to recommend them. This research contributes to the literature on AI ethics by empirically demonstrating the critical role of transparent, accountable, and legitimate AI practices in fostering positive user outcomes. Moreover, it addresses a significant gap in the understanding of the ethical implications of AI in financial inclusion contexts, offering valuable insights for both researchers and practitioners in this rapidly evolving field. Full article
Show Figures

Figure 1

23 pages, 4464 KiB  
Article
A Hybrid Segmentation Algorithm for Rheumatoid Arthritis Diagnosis Using X-ray Images
by Govindan Rajesh, Nandagopal Malarvizhi and Man-Fai Leung
Big Data Cogn. Comput. 2024, 8(9), 104; https://doi.org/10.3390/bdcc8090104 - 2 Sep 2024
Viewed by 374
Abstract
Rheumatoid Arthritis (RA) is a chronic autoimmune illness that occurs in the joints, resulting in inflammation, pain, and stiffness. X-ray examination is one of the most common diagnostic procedures for RA, but manual X-ray image analysis has limitations because it is a time-consuming [...] Read more.
Rheumatoid Arthritis (RA) is a chronic autoimmune illness that occurs in the joints, resulting in inflammation, pain, and stiffness. X-ray examination is one of the most common diagnostic procedures for RA, but manual X-ray image analysis has limitations because it is a time-consuming procedure and is prone to errors. A specific algorithm aims to a lay stable and accurate segmenting of carpal bones from hand bone images, which is vitally important for identifying rheumatoid arthritis. The algorithm demonstrates several stages, starting with Carpal bone Region of Interest (CROI) specification, dynamic thresholding, and Gray Level Co-occurrence Matrix (GLCM) application for texture analysis. To get the clear edges of the image, the component is first converted to the greyscale function and thresholding is carried out to separate the hand from the background. The pad region is identified to obtain the contours of it, and the CROI is defined by the bounding box of the largest contour. The threshold value used in the CROI method is given a dynamic feature that can separate the carpal bones from the surrounding tissue. Then the GLCM texture analysis is carried out, calculating the number of pixel neighbors, with the specific intensity and neighbor relations of the pixels. The resulting feature matrix is then employed to extract features such as contrast and energy, which are later used to categorize the images of the affected carpal bone into inflamed and normal. The proposed technique is tested on a rheumatoid arthritis image dataset, and the results show its contribution to diagnosis of the disease. The algorithm efficiently divides carpal bones and extracts the signature parameters that are critical for correct classification of the inflammation in the cartilage images. Full article
Show Figures

Figure 1

21 pages, 3639 KiB  
Article
AHEAD: A Novel Technique Combining Anti-Adversarial Hierarchical Ensemble Learning with Multi-Layer Multi-Anomaly Detection for Blockchain Systems
by Muhammad Kamran, Muhammad Maaz Rehan, Wasif Nisar and Muhammad Waqas Rehan
Big Data Cogn. Comput. 2024, 8(9), 103; https://doi.org/10.3390/bdcc8090103 - 2 Sep 2024
Viewed by 332
Abstract
Blockchain technology has impacted various sectors and is transforming them through its decentralized, immutable, transparent, smart contracts (automatically executing digital agreements) and traceable attributes. Due to the adoption of blockchain technology in versatile applications, millions of transactions take place globally. These transactions are [...] Read more.
Blockchain technology has impacted various sectors and is transforming them through its decentralized, immutable, transparent, smart contracts (automatically executing digital agreements) and traceable attributes. Due to the adoption of blockchain technology in versatile applications, millions of transactions take place globally. These transactions are no exception to adversarial attacks which include data tampering, double spending, data corruption, Sybil attacks, eclipse attacks, DDoS attacks, P2P network partitioning, delay attacks, selfish mining, bribery, fake transactions, fake wallets or phishing, false advertising, malicious smart contracts, and initial coin offering scams. These adversarial attacks result in operational, financial, and reputational losses. Although numerous studies have proposed different blockchain anomaly detection mechanisms, challenges persist. These include detecting anomalies in just a single layer instead of multiple layers, targeting a single anomaly instead of multiple, not encountering adversarial machine learning attacks (for example, poisoning, evasion, and model extraction attacks), and inadequate handling of complex transactional data. The proposed AHEAD model solves the above problems by providing the following: (i) data aggregation transformation to detect transactional and user anomalies at the data and network layers of the blockchain, respectively, (ii) a Three-Layer Hierarchical Ensemble Learning Model (HELM) incorporating stratified random sampling to add resilience against adversarial attacks, and (iii) an advanced preprocessing technique with hybrid feature selection to handle complex transactional data. The performance analysis of the proposed AHEAD model shows that it achieves higher anti-adversarial resistance and detects multiple anomalies at the data and network layers. A comparison of the proposed AHEAD model with other state-of-the-art models shows that it achieves 98.85% accuracy against anomaly detection on data and network layers targeting transaction and user anomalies, along with 95.97% accuracy against adversarial machine learning attacks, which surpassed other models. Full article
Show Figures

Figure 1

27 pages, 7680 KiB  
Article
Federated Learning with Multi-Method Adaptive Aggregation for Enhanced Defect Detection in Power Systems
by Linghao Zhang, Bing Bian, Linyu Luo, Siyang Li and Hongjun Wang
Big Data Cogn. Comput. 2024, 8(9), 102; https://doi.org/10.3390/bdcc8090102 - 2 Sep 2024
Viewed by 262
Abstract
The detection and identification of defects in transmission lines using computer vision techniques is essential for maintaining the safety and reliability of power supply systems. However, existing training methods for transmission line defect detection models predominantly rely on single-node training, potentially limiting the [...] Read more.
The detection and identification of defects in transmission lines using computer vision techniques is essential for maintaining the safety and reliability of power supply systems. However, existing training methods for transmission line defect detection models predominantly rely on single-node training, potentially limiting the enhancement of detection accuracy. To tackle this issue, this paper proposes a server-side adaptive parameter aggregation algorithm based on multi-method fusion (SAPAA-MMF) and formulates the corresponding objective function. Within the federated learning framework proposed in this paper, each client executes distributed synchronous training in alignment with the fundamental process of federated learning. The hierarchical difference between the global model, aggregated using the improved joint mean algorithm, and the global model from the previous iteration is computed and utilized as the pseudo-gradient for the adaptive aggregation algorithm. This enables the adaptive aggregation to produce a new global model with improved performance. To evaluate the potential of SAPAA-MMF, comprehensive experiments were conducted on five datasets, involving comparisons with several algorithms. The experimental results are analyzed independently for both the server and client sides. The findings indicate that SAPAA-MMF outperforms existing federated learning algorithms on both the server and client sides. Full article
Show Figures

Figure 1

27 pages, 2771 KiB  
Article
Contextual Intelligence: An AI Approach to Manufacturing Skills’ Forecasting
by Xolani Maphisa, Mpho Nkadimeng and Arnesh Telukdarie
Big Data Cogn. Comput. 2024, 8(9), 101; https://doi.org/10.3390/bdcc8090101 - 2 Sep 2024
Viewed by 684
Abstract
The manufacturing industry is skill-intensive and plays a pivotal role in South Africa’s economy, reflecting the nation’s progress and development. The advent of technology has initiated a transformative era within the manufacturing sector. Workforce skills are at the heart of ensuring the sustained [...] Read more.
The manufacturing industry is skill-intensive and plays a pivotal role in South Africa’s economy, reflecting the nation’s progress and development. The advent of technology has initiated a transformative era within the manufacturing sector. Workforce skills are at the heart of ensuring the sustained growth of the industry. This study delves into the skill-related aspects of the occupational landscape of the South African manufacturing sector, with a particular focus on two important manufacturing sectors: the food and beverage manufacturing (FoodBev) sector and the chemical manufacturing (CHIETA) sector. Leveraging the forecasting prowess of Autoregressive Integrated Moving Average (ARIMA), this paper outlines a sectorial occupational forecasting modeling exercise to reveal which job roles are poised for expansion and which are expected to decline. The approach predicted future skills’ demand 80% accuracy for 473 out of 713 (66%) occupations for FoodBev and 474 out of 522 (91%) for CHIETA. These insights are invaluable for industry stakeholders and educational institutions, providing guidance to support the sector’s growth in an era marked by technological advancement. Full article
Show Figures

Figure 1

22 pages, 9693 KiB  
Article
A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication
by Tianyi Shi, Ruixiao Wu, Chuantian Zhou, Siyang Zheng, Zhu Meng, Zhe Cui, Jin Huang, Changrui Ren and Zhicheng Zhao
Big Data Cogn. Comput. 2024, 8(9), 100; https://doi.org/10.3390/bdcc8090100 - 2 Sep 2024
Viewed by 391
Abstract
At the current stage of autonomous driving, monitoring the behavior of safety stewards (drivers) is crucial to establishing liability in the event of an accident. However, there is currently no method for the quantitative assessment of safety steward behavior that is trusted by [...] Read more.
At the current stage of autonomous driving, monitoring the behavior of safety stewards (drivers) is crucial to establishing liability in the event of an accident. However, there is currently no method for the quantitative assessment of safety steward behavior that is trusted by multiple stakeholders. In recent years, deep-learning-based methods can automatically detect abnormal behaviors with surveillance video, and blockchain as a decentralized and tamper-resistant distributed ledger technology is very suitable as a tool for providing evidence when determining liability. In this paper, a trusted supervision paradigm for autonomous driving (TSPAD) based on multimodal data authentication is proposed. Specifically, this paradigm consists of a deep learning model for driving abnormal behavior detection based on key frames adaptive selection and a blockchain system for multimodal data on-chaining and certificate storage. First, the deep-learning-based detection model enables the quantification of abnormal driving behavior and the selection of key frames. Second, the key frame selection and image compression coding balance the trade-off between the amount of information and efficiency in multiparty data sharing. Third, the blockchain-based data encryption sharing strategy ensures supervision and mutual trust among the regulatory authority, the logistic platform, and the enterprise in the driving process. Full article
(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)
Show Figures

Figure 1

36 pages, 4696 KiB  
Review
Review of Federated Learning and Machine Learning-Based Methods for Medical Image Analysis
by Netzahualcoyotl Hernandez-Cruz, Pramit Saha, Md Mostafa Kamal Sarker and J. Alison Noble
Big Data Cogn. Comput. 2024, 8(9), 99; https://doi.org/10.3390/bdcc8090099 - 28 Aug 2024
Viewed by 370
Abstract
Federated learning is an emerging technology that enables the decentralised training of machine learning-based methods for medical image analysis across multiple sites while ensuring privacy. This review paper thoroughly examines federated learning research applied to medical image analysis, outlining technical contributions. We followed [...] Read more.
Federated learning is an emerging technology that enables the decentralised training of machine learning-based methods for medical image analysis across multiple sites while ensuring privacy. This review paper thoroughly examines federated learning research applied to medical image analysis, outlining technical contributions. We followed the guidelines of Okali and Schabram, a review methodology, to produce a comprehensive summary and discussion of the literature in information systems. Searches were conducted at leading indexing platforms: PubMed, IEEE Xplore, Scopus, ACM, and Web of Science. We found a total of 433 papers and selected 118 of them for further examination. The findings highlighted research on applying federated learning to neural network methods in cardiology, dermatology, gastroenterology, neurology, oncology, respiratory medicine, and urology. The main challenges reported were the ability of machine learning models to adapt effectively to real-world datasets and privacy preservation. We outlined two strategies to address these challenges: non-independent and identically distributed data and privacy-enhancing methods. This review paper offers a reference overview for those already working in the field and an introduction to those new to the topic. Full article
Show Figures

Figure 1

25 pages, 755 KiB  
Article
Ontology Merging Using the Weak Unification of Concepts
by Norman Kuusik and Jüri Vain
Big Data Cogn. Comput. 2024, 8(9), 98; https://doi.org/10.3390/bdcc8090098 - 27 Aug 2024
Viewed by 298
Abstract
Knowledge representation and manipulation in knowledge-based systems typically rely on ontologies. The aim of this work is to provide a novel weak unification-based method and an automatic tool for OWL ontology merging to ensure well-coordinated task completion in the context of collaborative agents. [...] Read more.
Knowledge representation and manipulation in knowledge-based systems typically rely on ontologies. The aim of this work is to provide a novel weak unification-based method and an automatic tool for OWL ontology merging to ensure well-coordinated task completion in the context of collaborative agents. We employ a technique based on integrating string and semantic matching with the additional consideration of structural heterogeneity of concepts. The tool is implemented in Prolog and makes use of its inherent unification mechanism. Experiments were run on an OAEI data set with a matching accuracy of 60% across 42 tests. Additionally, we ran the tool on several ontologies from the domain of robotics. producing a small, but generally accurate, set of matched concepts. These results clearly show a good capability of the method and the tool to match semantically similar concepts. The results also highlight the challenges related to the evaluation of ontology-merging algorithms without a definite ground truth. Full article
(This article belongs to the Special Issue Recent Advances in Big Data-Driven Prescriptive Analytics)
Show Figures

Figure 1

15 pages, 1856 KiB  
Article
DaSAM: Disease and Spatial Attention Module-Based Explainable Model for Brain Tumor Detection
by Sara Tehsin, Inzamam Mashood Nasir, Robertas Damaševičius and Rytis Maskeliūnas
Big Data Cogn. Comput. 2024, 8(9), 97; https://doi.org/10.3390/bdcc8090097 - 25 Aug 2024
Viewed by 497
Abstract
Brain tumors are the result of irregular development of cells. It is a major cause of adult demise worldwide. Several deaths can be avoided with early brain tumor detection. Magnetic resonance imaging (MRI) for earlier brain tumor diagnosis may improve the chance of [...] Read more.
Brain tumors are the result of irregular development of cells. It is a major cause of adult demise worldwide. Several deaths can be avoided with early brain tumor detection. Magnetic resonance imaging (MRI) for earlier brain tumor diagnosis may improve the chance of survival for patients. The most common method of diagnosing brain tumors is MRI. The improved visibility of malignancies in MRI makes therapy easier. The diagnosis and treatment of brain cancers depend on their identification and treatment. Numerous deep learning models are proposed over the last decade including Alexnet, VGG, Inception, ResNet, DenseNet, etc. All these models are trained on a huge dataset, ImageNet. These general models have many parameters, which become irrelevant when implementing these models for a specific problem. This study uses a custom deep-learning model for the classification of brain MRIs. The proposed Disease and Spatial Attention Model (DaSAM) has two modules; (a) the Disease Attention Module (DAM), to distinguish between disease and non-disease regions of an image, and (b) the Spatial Attention Module (SAM), to extract important features. The experiments of the proposed model are conducted on two multi-class datasets that are publicly available, the Figshare and Kaggle datasets, where it achieves precision values of 99% and 96%, respectively. The proposed model is also tested using cross-dataset validation, where it achieved 85% accuracy when trained on the Figshare dataset and validated on the Kaggle dataset. The incorporation of DAM and SAM modules enabled the functionality of feature mapping, which proved to be useful for the highlighting of important features during the decision-making process of the model. Full article
Show Figures

Figure 1

15 pages, 552 KiB  
Article
An Efficient Algorithm for Sorting and Duplicate Elimination by Using Logarithmic Prime Numbers
by Wei-Chang Yeh and Majid Forghani-elahabad
Big Data Cogn. Comput. 2024, 8(9), 96; https://doi.org/10.3390/bdcc8090096 - 23 Aug 2024
Viewed by 303
Abstract
Data structures such as sets, lists, and arrays are fundamental in mathematics and computer science, playing a crucial role in numerous real-life applications. These structures represent a variety of entities, including solutions, conditions, and objectives. In scenarios involving large datasets, eliminating duplicate elements [...] Read more.
Data structures such as sets, lists, and arrays are fundamental in mathematics and computer science, playing a crucial role in numerous real-life applications. These structures represent a variety of entities, including solutions, conditions, and objectives. In scenarios involving large datasets, eliminating duplicate elements is essential to reduce complexity and enhance performance. This paper introduces a novel algorithm that uses logarithmic prime numbers to efficiently sort data structures and remove duplicates. The algorithm is mathematically rigorous, ensuring correctness and providing a thorough analysis of its time complexity. To demonstrate its practicality and effectiveness, we compare our method with existing algorithms, highlighting its superior speed and accuracy. An extensive experimental analysis across one thousand random test problems shows that our approach significantly outperforms two alternative techniques from the literature. By discussing the potential applications of the proposed algorithm in various domains, including computer science, engineering, and data management, we illustrate its adaptability through two practical examples in which our algorithm solves the problem more than 3×104 and 7×104 times faster than the existing algorithms in the literature. The results of these examples demonstrate that the superiority of our algorithm becomes increasingly pronounced with larger problem sizes. Full article
Show Figures

Figure 1

31 pages, 2308 KiB  
Review
Data Privacy and Security in Autonomous Connected Vehicles in Smart City Environment
by Tanweer Alam
Big Data Cogn. Comput. 2024, 8(9), 95; https://doi.org/10.3390/bdcc8090095 - 23 Aug 2024
Cited by 1 | Viewed by 450
Abstract
A self-driving vehicle can navigate autonomously in smart cities without the need for human intervention. The emergence of Autonomous Connected Vehicles (ACVs) poses a substantial threat to public and passenger safety due to the possibility of cyber-attacks, which encompass remote hacking, manipulation of [...] Read more.
A self-driving vehicle can navigate autonomously in smart cities without the need for human intervention. The emergence of Autonomous Connected Vehicles (ACVs) poses a substantial threat to public and passenger safety due to the possibility of cyber-attacks, which encompass remote hacking, manipulation of sensor data, and probable disablement or accidents. The sensors collect data to facilitate the network’s recognition of local landmarks, such as trees, curbs, pedestrians, signs, and traffic lights. ACVs gather vast amounts of data, encompassing the exact geographical coordinates of the vehicle, captured images, and signals received from various sensors. To create a fully autonomous system, it is imperative to intelligently integrate several technologies, such as sensors, communication, computation, machine learning (ML), data analytics, and other technologies. The primary issues in ACVs involve data privacy and security when instantaneously exchanging substantial volumes of data. This study investigates related data security and privacy research in ACVs using the Blockchain-enabled Federated Reinforcement Learning (BFRL) framework. This paper provides a literature review examining data security and privacy in ACVs and the BFRL framework that can be used to protect ACVs. This study presents the integration of FRL and Blockchain (BC) in the context of smart cities. Furthermore, the challenges and opportunities for future research on ACVs utilising BFRL frameworks are discussed. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop