Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 3.9 days (median values for papers published in this journal in the second half of 2023).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2022)
Latest Articles
Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network
Big Data Cogn. Comput. 2024, 8(3), 33; https://doi.org/10.3390/bdcc8030033 (registering DOI) - 19 Mar 2024
Abstract
MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting
[...] Read more.
MicroRNAs (miRNAs) play a crucial role in cancer development, but not all miRNAs are equally significant in cancer detection. Traditional methods face challenges in effectively identifying cancer-associated miRNAs due to data complexity and volume. This study introduces a novel, feature-based technique for detecting attributes related to cancer-affecting microRNAs. It aims to enhance cancer diagnosis accuracy by identifying the most relevant miRNAs for various cancer types using a hybrid approach. In particular, we used a combination of particle swarm optimization (PSO) and artificial neural networks (ANNs) for this purpose. PSO was employed for feature selection, focusing on identifying the most informative miRNAs, while ANNs were used for recognizing patterns within the miRNA data. This hybrid method aims to overcome limitations in traditional miRNA analysis by reducing data redundancy and focusing on key genetic markers. The application of this method showed a significant improvement in the detection accuracy for various cancers, including breast and lung cancer and melanoma. Our approach demonstrated a higher precision in identifying relevant miRNAs compared to existing methods, as evidenced by the analysis of different datasets. The study concludes that the integration of PSO and ANNs provides a more efficient, cost-effective, and accurate method for cancer detection via miRNA analysis. This method can serve as a supplementary tool for cancer diagnosis and potentially aid in developing personalized cancer treatments.
Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
►
Show Figures
Open AccessArticle
AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture
by
Hamed Alshammari, Ahmed El-Sayed and Khaled Elleithy
Big Data Cogn. Comput. 2024, 8(3), 32; https://doi.org/10.3390/bdcc8030032 - 18 Mar 2024
Abstract
►▼
Show Figures
The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written
[...] Read more.
The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written texts (HWTs), an area where existing AI detectors have demonstrated significant limitations. To achieve this goal, this paper utilized and fine-tuned two Transformer-based models, AraELECTRA and XLM-R, by training them on two distinct datasets: a large dataset comprising 43,958 examples and a custom dataset with 3078 examples that contain HWT and AI-generated texts (AIGTs) from various sources, including ChatGPT 3.5, ChatGPT-4, and BARD. The proposed architecture is adaptable to any language, but this work evaluates these models’ efficiency in recognizing HWTs versus AIGTs in Arabic as an example of Semitic languages. The performance of the proposed models has been compared against the two prominent existing AI detectors, GPTZero and OpenAI Text Classifier, particularly on the AIRABIC benchmark dataset. The results reveal that the proposed classifiers outperform both GPTZero and OpenAI Text Classifier with 81% accuracy compared to 63% and 50% for GPTZero and OpenAI Text Classifier, respectively. Furthermore, integrating a Dediacritization Layer prior to the classification model demonstrated a significant enhancement in the detection accuracy of both HWTs and AIGTs. This Dediacritization step markedly improved the classification accuracy, elevating it from 81% to as high as 99% and, in some instances, even achieving 100%.
Full article
Figure 1
Open AccessArticle
Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students
by
Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker and Mohammed Al-Hariri
Big Data Cogn. Comput. 2024, 8(3), 31; https://doi.org/10.3390/bdcc8030031 - 13 Mar 2024
Abstract
Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict
[...] Read more.
Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.
Full article
(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)
►▼
Show Figures
Figure 1
Open AccessArticle
Proposal of a Service Model for Blockchain-Based Security Tokens
by
Keundug Park and Heung-Youl Youm
Big Data Cogn. Comput. 2024, 8(3), 30; https://doi.org/10.3390/bdcc8030030 - 12 Mar 2024
Abstract
The volume of the asset investment and trading market can be expanded through the issuance and management of blockchain-based security tokens that logically divide the value of assets and guarantee ownership. This paper proposes a service model to solve a problem with the
[...] Read more.
The volume of the asset investment and trading market can be expanded through the issuance and management of blockchain-based security tokens that logically divide the value of assets and guarantee ownership. This paper proposes a service model to solve a problem with the existing investment service model, identifies security threats to the service model, and specifies security requirements countering the identified security threats for privacy protection and anti-money laundering (AML) involving security tokens. The identified security threats and specified security requirements should be taken into consideration when implementing the proposed service model. The proposed service model allows users to invest in tokenized tangible and intangible assets and trade in blockchain-based security tokens. This paper discusses considerations to prevent excessive regulation and market monopoly in the issuance of and trading in security tokens when implementing the proposed service model and concludes with future works.
Full article
(This article belongs to the Special Issue Blockchain Meets IoT for Big Data)
►▼
Show Figures
Figure 1
Open AccessArticle
The Distribution and Accessibility of Elements of Tourism in Historic and Cultural Cities
by
Wei-Ling Hsu, Yi-Jheng Chang, Lin Mou, Juan-Wen Huang and Hsin-Lung Liu
Big Data Cogn. Comput. 2024, 8(3), 29; https://doi.org/10.3390/bdcc8030029 - 11 Mar 2024
Abstract
Historic urban areas are the foundations of urban development. Due to rapid urbanization, the sustainable development of historic urban areas has become challenging for many cities. Elements of tourism and tourism service facilities play an important role in the sustainable development of historic
[...] Read more.
Historic urban areas are the foundations of urban development. Due to rapid urbanization, the sustainable development of historic urban areas has become challenging for many cities. Elements of tourism and tourism service facilities play an important role in the sustainable development of historic areas. This study analyzed policies related to tourism in Panguifang and Meixian districts in Meizhou, Guangdong, China. Kernel density estimation was used to study the clustering characteristics of tourism elements through point of interest (POI) data, while space syntax was used to study the accessibility of roads. In addition, the Pearson correlation coefficient and regression were used to analyze the correlation between the elements and accessibility. The results show the following: (1) the overall number of tourism elements was high on the western side of the districts and low on the eastern one, and the elements were predominantly distributed along the main transportation arteries; (2) according to the integration degree and depth value, the western side was easier to access than the eastern one; and (3) the depth value of the area negatively correlated with kernel density, while the degree of integration positively correlated with it. Based on the results, the study put forward measures for optimizing the elements of tourism in Meizhou’s historic urban area to improve cultural tourism and emphasize the importance of the elements.
Full article
(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage 2nd Edition)
►▼
Show Figures
Graphical abstract
Open AccessArticle
Enhancing Supervised Model Performance in Credit Risk Classification Using Sampling Strategies and Feature Ranking
by
Niwan Wattanakitrungroj, Pimchanok Wijitkajee, Saichon Jaiyen, Sunisa Sathapornvajana and Sasiporn Tongman
Big Data Cogn. Comput. 2024, 8(3), 28; https://doi.org/10.3390/bdcc8030028 - 06 Mar 2024
Abstract
For the financial health of lenders and institutions, one important risk assessment called credit risk is about correctly deciding whether or not a borrower will fail to repay a loan. It not only helps in the approval or denial of loan applications but
[...] Read more.
For the financial health of lenders and institutions, one important risk assessment called credit risk is about correctly deciding whether or not a borrower will fail to repay a loan. It not only helps in the approval or denial of loan applications but also aids in managing the non-performing loan (NPL) trend. In this study, a dataset provided by the LendingClub company based in San Francisco, CA, USA, from 2007 to 2020 consisting of 2,925,492 records and 141 attributes was experimented with. The loan status was categorized as “Good” or “Risk”. To yield highly effective results of credit risk prediction, experiments on credit risk prediction were performed using three widely adopted supervised machine learning techniques: logistic regression, random forest, and gradient boosting. In addition, to solve the imbalanced data problem, three sampling algorithms, including under-sampling, over-sampling, and combined sampling, were employed. The results show that the gradient boosting technique achieves nearly perfect , , , and values, which are better than 99.92%, but its values are greater than 99.77%. Three imbalanced data handling approaches can enhance the model performance of models trained by three algorithms. Moreover, the experiment of reducing the number of features based on mutual information calculation revealed slightly decreasing performance for 50 data features with values greater than 99.86%. For 25 data features, which is the smallest size, the random forest supervised model yielded 99.15% . Both sampling strategies and feature selection help to improve the supervised model for accurately predicting credit risk, which may be beneficial in the lending business.
Full article
(This article belongs to the Topic Big Data and Artificial Intelligence, 2nd Volume)
►▼
Show Figures
Figure 1
Open AccessArticle
Temporal Dynamics of Citizen-Reported Urban Challenges: A Comprehensive Time Series Analysis
by
Andreas F. Gkontzis, Sotiris Kotsiantis, Georgios Feretzakis and Vassilios S. Verykios
Big Data Cogn. Comput. 2024, 8(3), 27; https://doi.org/10.3390/bdcc8030027 - 04 Mar 2024
Abstract
In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly
[...] Read more.
In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities of urban life becomes increasingly paramount. This study employs time series analysis to scrutinize citizen interactions with the coordinate-based problem mapping platform in the Municipality of Patras in Greece. The research explores the temporal dynamics of reported urban issues, with a specific focus on identifying recurring patterns through the lens of seasonality. The analysis, employing the seasonal decomposition technique, dissects time series data to expose trends in reported issues and areas of the city that might be obscured in raw big data. It accentuates a distinct seasonal pattern, with concentrations peaking during the summer months. The study extends its approach to forecasting, providing insights into the anticipated evolution of urban issues over time. Projections for the coming years show a consistent upward trend in both overall city issues and those reported in specific areas, with distinct seasonal variations. This comprehensive exploration of time series analysis and seasonality provides valuable insights for city stakeholders, enabling informed decision-making and predictions regarding future urban challenges.
Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
►▼
Show Figures
Figure 1
Open AccessArticle
Democratic Erosion of Data-Opolies: Decentralized Web3 Technological Paradigm Shift Amidst AI Disruption
by
Igor Calzada
Big Data Cogn. Comput. 2024, 8(3), 26; https://doi.org/10.3390/bdcc8030026 - 26 Feb 2024
Abstract
This article investigates the intricate dynamics of data monopolies, referred to as “data-opolies”, and their implications for democratic erosion. Data-opolies, typically embodied by large technology corporations, accumulate extensive datasets, affording them significant influence. The sustainability of such data practices is critically examined within
[...] Read more.
This article investigates the intricate dynamics of data monopolies, referred to as “data-opolies”, and their implications for democratic erosion. Data-opolies, typically embodied by large technology corporations, accumulate extensive datasets, affording them significant influence. The sustainability of such data practices is critically examined within the context of decentralized Web3 technologies amidst Artificial Intelligence (AI) disruption. Additionally, the article explores emancipatory datafication strategies to counterbalance the dominance of data-opolies. It presents an in-depth analysis of two emergent phenomena within the decentralized Web3 emerging landscape: People-Centered Smart Cities and Datafied Network States. The article investigates a paradigm shift in data governance and advocates for joint efforts to establish equitable data ecosystems, with an emphasis on prioritizing data sovereignty and achieving digital self-governance. It elucidates the remarkable roles of (i) blockchain, (ii) decentralized autonomous organizations (DAOs), and (iii) data cooperatives in empowering citizens to have control over their personal data. In conclusion, the article introduces a forward-looking examination of Web3 decentralized technologies, outlining a timely path toward a more transparent, inclusive, and emancipatory data-driven democracy. This approach challenges the prevailing dominance of data-opolies and offers a framework for regenerating datafied democracies through decentralized and emerging Web3 technologies.
Full article
Open AccessArticle
Sign-to-Text Translation from Panamanian Sign Language to Spanish in Continuous Capture Mode with Deep Neural Networks
by
Alvaro A. Teran-Quezada, Victor Lopez-Cabrera, Jose Carlos Rangel and Javier E. Sanchez-Galan
Big Data Cogn. Comput. 2024, 8(3), 25; https://doi.org/10.3390/bdcc8030025 - 26 Feb 2024
Abstract
Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes
[...] Read more.
Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes the development of a sign language translation system that converts Panamanian Sign Language (PSL) signs into text in Spanish using an LSTM model that, among many things, makes it possible to work with non-static signs (as sequential data). The deep learning model presented focuses on action detection, in this case, the execution of the signs. This involves processing in a precise manner the frames in which a sign language gesture is made. The proposal is a holistic solution that considers, in addition to the seeking of the hands of the speaker, the face and pose determinants. These were added due to the fact that when communicating through sign languages, other visual characteristics matter beyond hand gestures. For the training of this system, a data set of 330 videos (of 30 frames each) for five possible classes (different signs considered) was created. The model was tested having an accuracy of 98.8%, making this a valuable base system for effective communication between PSL users and Spanish speakers. In conclusion, this work provides an improvement of the state of the art for PSL–Spanish translation by using the possibilities of translatable signs via deep learning.
Full article
(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)
►▼
Show Figures
Figure 1
Open AccessArticle
Experimental Evaluation: Can Humans Recognise Social Media Bots?
by
Maxim Kolomeets, Olga Tushkanova, Vasily Desnitsky, Lidia Vitkova and Andrey Chechulin
Big Data Cogn. Comput. 2024, 8(3), 24; https://doi.org/10.3390/bdcc8030024 - 26 Feb 2024
Abstract
This paper aims to test the hypothesis that the quality of social media bot detection systems based on supervised machine learning may not be as accurate as researchers claim, given that bots have become increasingly sophisticated, making it difficult for human annotators to
[...] Read more.
This paper aims to test the hypothesis that the quality of social media bot detection systems based on supervised machine learning may not be as accurate as researchers claim, given that bots have become increasingly sophisticated, making it difficult for human annotators to detect them better than random selection. As a result, obtaining a ground-truth dataset with human annotation is not possible, which leads to supervised machine-learning models inheriting annotation errors. To test this hypothesis, we conducted an experiment where humans were tasked with recognizing malicious bots on the VKontakte social network. We then compared the “human” answers with the “ground-truth” bot labels (‘a bot’/‘not a bot’). Based on the experiment, we evaluated the bot detection efficiency of annotators in three scenarios typical for cybersecurity but differing in their detection difficulty as follows: (1) detection among random accounts, (2) detection among accounts of a social network ‘community’, and (3) detection among verified accounts. The study showed that humans could only detect simple bots in all three scenarios but could not detect more sophisticated ones (p-value = 0.05). The study also evaluates the limits of hypothetical and existing bot detection systems that leverage non-expert-labelled datasets as follows: the balanced accuracy of such systems can drop to 0.5 and lower, depending on bot complexity and detection scenario. The paper also describes the experiment design, collected datasets, statistical evaluation, and machine learning accuracy measures applied to support the results. In the discussion, we raise the question of using human labelling in bot detection systems and its potential cybersecurity issues. We also provide open access to the datasets used, experiment results, and software code for evaluating statistical and machine learning accuracy metrics used in this paper on GitHub.
Full article
(This article belongs to the Special Issue Security, Privacy, and Trust in Artificial Intelligence Applications)
►▼
Show Figures
Figure 1
Open AccessArticle
Solar and Wind Data Recognition: Fourier Regression for Robust Recovery
by
Abdullah F. Al-Aboosi, Aldo Jonathan Muñoz Vazquez, Fadhil Y. Al-Aboosi, Mahmoud El-Halwagi and Wei Zhan
Big Data Cogn. Comput. 2024, 8(3), 23; https://doi.org/10.3390/bdcc8030023 - 24 Feb 2024
Abstract
►▼
Show Figures
Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intelligence are being leveraged to enhance forecasting methodologies, enabling more accurate
[...] Read more.
Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intelligence are being leveraged to enhance forecasting methodologies, enabling more accurate predictions and optimized decision-making capabilities. Integrating these novel paradigms improves forecasting accuracy, fostering a more efficient and reliable energy grid. These advancements allow better demand management, optimize resource allocation, and improve robustness to potential disruptions. The data collected from solar intensity and wind speed is often recorded through sensor-equipped instruments, which may encounter intermittent or permanent faults. Hence, this paper proposes a novel Fourier network regression model to process solar irradiance and wind speed data. The proposed approach enables accurate prediction of the underlying smooth components, facilitating effective reconstruction of missing data and enhancing the overall forecasting performance. The present study focuses on Midland, Texas, as a case study to assess direct normal irradiance (DNI), diffuse horizontal irradiance (DHI), and wind speed. Remarkably, the model exhibits a correlation of 1 with a minimal RMSE (root mean square error) of 0.0007555. This study leverages Fourier analysis for renewable energy applications, with the aim of establishing a methodology that can be applied to a novel geographic context.
Full article
Figure 1
Open AccessArticle
Comparison of Bagging and Sparcity Methods for Connectivity Reduction in Spiking Neural Networks with Memristive Plasticity
by
Roman Rybka, Yury Davydov, Danila Vlasov, Alexey Serenko, Alexander Sboev and Vyacheslav Ilyin
Big Data Cogn. Comput. 2024, 8(3), 22; https://doi.org/10.3390/bdcc8030022 - 23 Feb 2024
Abstract
Developing a spiking neural network architecture that could prospectively be trained on energy-efficient neuromorphic hardware to solve various data analysis tasks requires satisfying the limitations of prospective analog or digital hardware, i.e., local learning and limited numbers of connections, respectively. In this work,
[...] Read more.
Developing a spiking neural network architecture that could prospectively be trained on energy-efficient neuromorphic hardware to solve various data analysis tasks requires satisfying the limitations of prospective analog or digital hardware, i.e., local learning and limited numbers of connections, respectively. In this work, we compare two methods of connectivity reduction that are applicable to spiking networks with local plasticity; instead of a large fully-connected network (which is used as the baseline for comparison), we employ either an ensemble of independent small networks or a network with probabilistic sparse connectivity. We evaluate both of these methods with a three-layer spiking neural network, which are applied to handwritten and spoken digit classification tasks using two memristive plasticity models and the classical spike time-dependent plasticity (STDP) rule. Both methods achieve an F1-score of 0.93–0.95 on the handwritten digits recognition task and 0.85–0.93 on the spoken digits recognition task. Applying a combination of both methods made it possible to obtain highly accurate models while reducing the number of connections by more than three times compared to the basic model.
Full article
(This article belongs to the Special Issue Computational Intelligence: Spiking Neural Networks)
►▼
Show Figures
Figure 1
Open AccessArticle
Anomaly Detection of IoT Cyberattacks in Smart Cities Using Federated Learning and Split Learning
by
Ishaani Priyadarshini
Big Data Cogn. Comput. 2024, 8(3), 21; https://doi.org/10.3390/bdcc8030021 - 22 Feb 2024
Abstract
The swift proliferation of the Internet of Things (IoT) devices in smart city infrastructures has created an urgent demand for robust cybersecurity measures. These devices are susceptible to various cyberattacks that can jeopardize the security and functionality of urban systems. This research presents
[...] Read more.
The swift proliferation of the Internet of Things (IoT) devices in smart city infrastructures has created an urgent demand for robust cybersecurity measures. These devices are susceptible to various cyberattacks that can jeopardize the security and functionality of urban systems. This research presents an innovative approach to identifying anomalies caused by IoT cyberattacks in smart cities. The proposed method harnesses federated and split learning and addresses the dual challenge of enhancing IoT network security while preserving data privacy. This study conducts extensive experiments using authentic datasets from smart cities. To compare the performance of classical machine learning algorithms and deep learning models for detecting anomalies, model effectiveness is assessed using precision, recall, F-1 score, accuracy, and training/deployment time. The findings demonstrate that federated learning and split learning have the potential to balance data privacy concerns with competitive performance, providing robust solutions for detecting IoT cyberattacks. This study contributes to the ongoing discussion about securing IoT deployments in urban settings. It lays the groundwork for scalable and privacy-conscious cybersecurity strategies. The results underscore the vital role of these techniques in fortifying smart cities and promoting the development of adaptable and resilient cybersecurity measures in the IoT era.
Full article
(This article belongs to the Special Issue Deep Network Learning and Its Applications)
►▼
Show Figures
Figure 1
Open AccessArticle
A Machine Learning-Based Pipeline for the Extraction of Insights from Customer Reviews
by
Róbert Lakatos, Gergő Bogacsovics, Balázs Harangi, István Lakatos, Attila Tiba, János Tóth, Marianna Szabó and András Hajdu
Big Data Cogn. Comput. 2024, 8(3), 20; https://doi.org/10.3390/bdcc8030020 - 22 Feb 2024
Abstract
The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. This paper presents a model that can extract insights from customer reviews
[...] Read more.
The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. This paper presents a model that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector-embedding-based keyword extraction, and clustering. The elements of our model have been integrated and tailored to better meet the requirements of efficient information extraction and topic modeling of the extracted information for opinion mining. Our approach was validated and compared with other state-of-the-art methods using publicly available benchmark datasets. The results show that our system performs better than existing topic modeling and keyword extraction methods in this task.
Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
►▼
Show Figures
Figure 1
Open AccessArticle
A Novel Algorithm for Multi-Criteria Ontology Merging through Iterative Update of RDF Graph
by
Mohammed Suleiman Mohammed Rudwan and Jean Vincent Fonou-Dombeu
Big Data Cogn. Comput. 2024, 8(3), 19; https://doi.org/10.3390/bdcc8030019 - 21 Feb 2024
Abstract
►▼
Show Figures
Ontology merging is an important task in ontology engineering to date. However, despite the efforts devoted to ontology merging, the incorporation of relevant features of ontologies such as axioms, individuals and annotations in the output ontologies remains challenging. Consequently, existing ontology-merging solutions produce
[...] Read more.
Ontology merging is an important task in ontology engineering to date. However, despite the efforts devoted to ontology merging, the incorporation of relevant features of ontologies such as axioms, individuals and annotations in the output ontologies remains challenging. Consequently, existing ontology-merging solutions produce new ontologies that do not include all the relevant semantic features from the candidate ontologies. To address these limitations, this paper proposes a novel algorithm for multi-criteria ontology merging that automatically builds a new ontology from candidate ontologies by iteratively updating an RDF graph in the memory. The proposed algorithm leverages state-of-the-art Natural Language Processing tools as well as a Machine Learning-based framework to assess the similarities and merge various criteria into the resulting output ontology. The key contribution of the proposed algorithm lies in its ability to merge relevant features from the candidate ontologies to build a more accurate, integrated and cohesive output ontology. The proposed algorithm is tested with five ontologies of different computing domains and evaluated in terms of its asymptotic behavior, quality and computational performance. The experimental results indicate that the proposed algorithm produces output ontologies that meet the integrity, accuracy and cohesion quality criteria better than related studies. This performance demonstrates the effectiveness and superior capabilities of the proposed algorithm. Furthermore, the proposed algorithm enables iterative in-memory update and building of the RDF graph of the resulting output ontology, which enhances the processing speed and improves the computational efficiency, making it an ideal solution for big data applications.
Full article
Figure 1
Open AccessFeature PaperArticle
Inverse Firefly-Based Search Algorithms for Multi-Target Search Problem
by
Ouarda Zedadra, Antonio Guerrieri, Hamid Seridi, Aymen Benzaid and Giancarlo Fortino
Big Data Cogn. Comput. 2024, 8(2), 18; https://doi.org/10.3390/bdcc8020018 - 19 Feb 2024
Abstract
Efficiently searching for multiple targets in complex environments with limited perception and computational capabilities is challenging for multiple robots, which can coordinate their actions indirectly through their environment. In this context, swarm intelligence has been a source of inspiration for addressing multi-target search
[...] Read more.
Efficiently searching for multiple targets in complex environments with limited perception and computational capabilities is challenging for multiple robots, which can coordinate their actions indirectly through their environment. In this context, swarm intelligence has been a source of inspiration for addressing multi-target search problems in the literature. So far, several algorithms have been proposed for solving such a problem, and in this study, we propose two novel multi-target search algorithms inspired by the Firefly algorithm. Unlike the conventional Firefly algorithm, where light is an attractor, light represents a negative effect in our proposed algorithms. Upon discovering targets, robots emit light to repel other robots from that region. This repulsive behavior is intended to achieve several objectives: (1) partitioning the search space among different robots, (2) expanding the search region by avoiding areas already explored, and (3) preventing congestion among robots. The proposed algorithms, named Global Lawnmower Firefly Algorithm (GLFA) and Random Bounce Firefly Algorithm (RBFA), integrate inverse light-based behavior with two random walks: random bounce and global lawnmower. These algorithms were implemented and evaluated using the ArGOS simulator, demonstrating promising performance compared to existing approaches.
Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
►▼
Show Figures
Figure 1
Open AccessArticle
A Model for Enhancing Unstructured Big Data Warehouse Execution Time
by
Marwa Salah Farhan, Amira Youssef and Laila Abdelhamid
Big Data Cogn. Comput. 2024, 8(2), 17; https://doi.org/10.3390/bdcc8020017 - 06 Feb 2024
Abstract
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing
[...] Read more.
Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract–Transform–Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract–Clean–Load–Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.
Full article
(This article belongs to the Special Issue Big Data and Information Science Technology)
►▼
Show Figures
Figure 1
Open AccessArticle
Fair-CMNB: Advancing Fairness-Aware Stream Learning with Naïve Bayes and Multi-Objective Optimization
by
Maryam Badar and Marco Fisichella
Big Data Cogn. Comput. 2024, 8(2), 16; https://doi.org/10.3390/bdcc8020016 - 31 Jan 2024
Abstract
Fairness-aware mining of data streams is a challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans in critical decision-making processes, e.g., hiring staff, assessing credit risk, etc. This calls for handling massive amounts of
[...] Read more.
Fairness-aware mining of data streams is a challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans in critical decision-making processes, e.g., hiring staff, assessing credit risk, etc. This calls for handling massive amounts of incoming information with minimal response delay while ensuring fair and high-quality decisions. Although deep learning has achieved success in various domains, its computational complexity may hinder real-time processing, making traditional algorithms more suitable. In this context, we propose a novel adaptation of Naïve Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance through multi-objective optimization (MOO). Class imbalance is an inherent problem in discrimination-aware learning paradigms. To deal with class imbalance, we propose a dynamic instance weighting module that gives more importance to new instances and less importance to obsolete instances based on their membership in a minority or majority class. We have conducted experiments on a range of streaming and static datasets and concluded that our proposed methodology outperforms existing state-of-the-art (SoTA) fairness-aware methods in terms of both discrimination score and balanced accuracy.
Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
►▼
Show Figures
Figure 1
Open AccessArticle
A Simultaneous Wireless Information and Power Transfer-Based Multi-Hop Uneven Clustering Routing Protocol for EH-Cognitive Radio Sensor Networks
by
Jihong Wang, Zhuo Wang and Lidong Zhang
Big Data Cogn. Comput. 2024, 8(2), 15; https://doi.org/10.3390/bdcc8020015 - 31 Jan 2024
Abstract
►▼
Show Figures
Clustering protocols and simultaneous wireless information and power transfer (SWIPT) technology can solve the issue of imbalanced energy consumption among nodes in energy harvesting-cognitive radio sensor networks (EH-CRSNs). However, dynamic energy changes caused by EH/SWIPT and dynamic spectrum availability prevent existing clustering routing
[...] Read more.
Clustering protocols and simultaneous wireless information and power transfer (SWIPT) technology can solve the issue of imbalanced energy consumption among nodes in energy harvesting-cognitive radio sensor networks (EH-CRSNs). However, dynamic energy changes caused by EH/SWIPT and dynamic spectrum availability prevent existing clustering routing protocols from fully leveraging the advantages of EH and SWIPT. Therefore, a multi-hop uneven clustering routing protocol is proposed for EH-CRSNs utilizing SWIPT technology in this paper. Specifically, an EH-based energy state function is proposed to accurately track the dynamic energy variations in nodes. Utilizing this function, dynamic spectrum availability, neighbor count, and other information are integrated to design the criteria for selecting high-quality cluster heads (CHs) and relays, thereby facilitating effective data transfer to the sink. Intra-cluster and inter-cluster SWIPT mechanisms are incorporated to allow for the immediate energy replenishment for CHs or relays with insufficient energy while transmitting data, thereby preventing data transmission failures due to energy depletion. An energy status control mechanism is introduced to avoid the energy waste caused by excessive activation of the SWIPT mechanism. Simulation results indicate that the proposed protocol markedly improves the balance of energy consumption among nodes and enhances network surveillance capabilities when compared to existing clustering routing protocols.
Full article
Graphical abstract
Open AccessArticle
Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis
by
Chao He, Xinghua Zhang, Dongqing Song, Yingshan Shen, Chengjie Mao, Huosheng Wen, Dingju Zhu and Lihua Cai
Big Data Cogn. Comput. 2024, 8(2), 14; https://doi.org/10.3390/bdcc8020014 - 29 Jan 2024
Abstract
With the popularization of better network access and the penetration of personal smartphones in today’s world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism
[...] Read more.
With the popularization of better network access and the penetration of personal smartphones in today’s world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism has served as the foundation technique for most state-of-the-art MSA models due to its ability to learn complex inter- and intra-relationships among different modalities embedded in video messages, both temporally and spatially. However, modal fusion is still a major challenge due to the vast feature space created by the interactions among different data modalities. To address the modal fusion challenge, we propose an MSA algorithm based on deep learning and the attention mechanism, namely the Mixture of Attention Variants for Modal Fusion (MAVMF). The MAVMF algorithm includes a two-stage process: in stage one, self-attention is applied to effectively extract image and text features, and the dependency relationships in the context of video discourse are captured by a bidirectional gated recurrent neural module; in stage two, four multi-modal attention variants are leveraged to learn the emotional contributions of important features from different modalities. Our proposed approach is end-to-end and has been shown to achieve a superior performance to the state-of-the-art algorithms when tested with two largest public datasets, CMU-MOSI and CMU-MOSEI.
Full article
(This article belongs to the Special Issue Research Progress in Artificial Intelligence and Social Network Analysis)
►▼
Show Figures
Figure 1
Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
BDCC, Economies, Information, Remote Sensing, Sustainability
Big Data and Artificial Intelligence, 2nd Volume
Topic Editors: Miltiadis D. Lytras, Andreea Claudia SerbanDeadline: 31 March 2024
Topic in
AI, Algorithms, BDCC, Future Internet, Informatics, Information, Languages, Publications
AI Chatbots: Threat or Opportunity?
Topic Editors: Antony Bryant, Roberto Montemanni, Min Chen, Paolo Bellavista, Kenji Suzuki, Jeanine Treffers-DallerDeadline: 30 April 2024
Topic in
Algorithms, BDCC, BioMedInformatics, Information, Mathematics
Machine Learning Empowered Drug Screen
Topic Editors: Teng Zhou, Jiaqi Wang, Youyi SongDeadline: 31 August 2024
Topic in
BDCC, Entropy, Information, MCA, Mathematics
New Advances in Granular Computing and Data Mining
Topic Editors: Xibei Yang, Bin Xie, Pingxin Wang, Hengrong JuDeadline: 30 October 2024
Conferences
Special Issues
Special Issue in
BDCC
Privacy-Enhancing Technologies of Data for Sustainable and Secure Cooperation
Guest Editors: Yi Sun, Shujie YangDeadline: 30 March 2024
Special Issue in
BDCC
Smarter Healthcare via Big Data and Machine Learning
Guest Editors: Maryam S. Mirian, Abdol-Hossein Vahabie, Reyhaneh BakhtiariDeadline: 31 March 2024
Special Issue in
BDCC
Machine Learning for Dependable Edge Computing Systems and Services
Guest Editors: Renyu Yang, Zhenyu Wen, Xu Wang, Prosanta Gope, Bin ShiDeadline: 30 April 2024
Special Issue in
BDCC
Multimedia Systems for Multimedia Big Data
Guest Editors: Michael Alexander Riegler, Pål HalvorsenDeadline: 31 May 2024
Topical Collections
Topical Collection in
BDCC
Machine Learning and Artificial Intelligence for Health Applications on Social Networks
Collection Editor: Carmela Comito