Big Data and Cognitive Computing

23 pages, 6422 KiB

Open AccessEditor’s ChoiceArticle

Triggers and Tweets: Implicit Aspect-Based Sentiment and Emotion Analysis of Community Chatter Relevant to Education Post-COVID-19

by Heba Ismail, Ashraf Khalil, Nada Hussein and Rawan Elabyad

Big Data Cogn. Comput. 2022, 6(3), 99; https://doi.org/10.3390/bdcc6030099 - 16 Sep 2022

Cited by 10 | Viewed by 3119

Abstract

This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, [...] Read more.

This research proposes a well-being analytical framework using social media chatter data. The proposed framework infers analytics and provides insights into the public’s well-being relevant to education throughout and post the COVID-19 pandemic through a comprehensive Emotion and Aspect-based Sentiment Analysis (ABSA). Moreover, this research aims to examine the variability in emotions of students, parents, and faculty toward the e-learning process over time and across different locations. The proposed framework curates Twitter chatter data relevant to the education sector, identifies tweets with the sentiment, and then identifies the exact emotion and emotional triggers associated with those feelings through implicit ABSA. The produced analytics are then factored by location and time to provide more comprehensive insights that aim to assist the decision-makers and personnel in the educational sector enhance and adapt the educational process during and following the pandemic and looking toward the future. The experimental results for emotion classification show that the Linear Support Vector Classifier (SVC) outperformed other classifiers in terms of overall accuracy, precision, recall, and F-measure of 91%. Moreover, the Logistic Regression classifier outperformed all other classifiers in terms of overall accuracy, recall, an F-measure of 81%, and precision of 83% for aspect classification. In online experiments using UAE COVID-19 education-related data, the analytics show high relevance with the public concerns around the education process that were reported during the experiment’s timeframe. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

15 pages, 364 KiB

Open AccessEditor’s ChoiceArticle

Machine Learning Techniques for Chronic Kidney Disease Risk Prediction

by Elias Dritsas and Maria Trigka

Big Data Cogn. Comput. 2022, 6(3), 98; https://doi.org/10.3390/bdcc6030098 - 14 Sep 2022

Cited by 47 | Viewed by 6531

Abstract

Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually [...] Read more.

Chronic kidney disease (CKD) is a condition characterized by progressive loss of kidney function over time. It describes a clinical entity that causes kidney damage and affects the general health of the human body. Improper diagnosis and treatment of the disease can eventually lead to end-stage renal disease and ultimately lead to the patient’s death. Machine Learning (ML) techniques have acquired an important role in disease prediction and are a useful tool in the field of medical science. In the present research work, we aim to build efficient tools for predicting CKD occurrence, following an approach which exploits ML techniques. More specifically, first, we apply class balancing in order to tackle the non-uniform distribution of the instances in the two classes, then features ranking and analysis are performed, and finally, several ML models are trained and evaluated based on various performance metrics. The derived results highlighted the Rotation Forest (RotF), which prevailed in relation to compared models with an Area Under the Curve (AUC) of 100%, Precision, Recall, F-Measure and Accuracy equal to 99.2%. Full article

(This article belongs to the Special Issue Digital Health and Data Analytics in Public Health)

► Show Figures

Figure 1

21 pages, 5657 KiB

Open AccessArticle

Computational Techniques Enabling the Perception of Virtual Images Exclusive to the Retinal Afterimage

by Staas de Jong and Gerrit van der Veer

Big Data Cogn. Comput. 2022, 6(3), 97; https://doi.org/10.3390/bdcc6030097 - 13 Sep 2022

Cited by 2 | Viewed by 2080

Abstract

The retinal afterimage is a widely known effect in the human visual system, which has been studied and used in the context of a number of major art movements. Therefore, when considering the general role of computation in the visual arts, this begs [...] Read more.

The retinal afterimage is a widely known effect in the human visual system, which has been studied and used in the context of a number of major art movements. Therefore, when considering the general role of computation in the visual arts, this begs the question whether this effect, too, may be induced using partly automated techniques. If so, it may become a computationally controllable ingredient of (interactive) visual art, and thus take its place among the many other aspects of visual perception which already have preceded it in this sense. The present moment provides additional inspiration to lay the groundwork for extending computer graphics in general with the retinal afterimage: Historically, we are in a phase where some head-mounted stereoscopic AR/VR technologies are now providing eye tracking by default, thereby allowing realtime monitoring of the processes of visual fixation that can induce the retinal afterimage. A logical starting point for general investigation is then shape display via the retinal afterimage, since shape recognition lends itself well to unambiguous reporting. Shape recognition, however, may also occur due to normal vision, which happens simultaneously. Carefully and rigorously excluding this possibility, we develop computational techniques enabling shape display exclusive to the retinal afterimage. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

21 pages, 6061 KiB

Open AccessEditor’s ChoiceArticle

Improving Real Estate Rental Estimations with Visual Data

by Ilia Azizi and Iegor Rudnytskyi

Big Data Cogn. Comput. 2022, 6(3), 96; https://doi.org/10.3390/bdcc6030096 - 9 Sep 2022

Cited by 3 | Viewed by 2938

Abstract

Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is [...] Read more.

Multi-modal data are widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and a combination of property and satellite images. In a supervised context, the branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties. Full article

(This article belongs to the Topic Complex Data Analytics and Computing with Real-World Applications)

► Show Figures

Figure 1

21 pages, 7858 KiB

Open AccessEditor’s ChoiceArticle

Multimodal Emotional Classification Based on Meaningful Learning

by Hajar Filali, Jamal Riffi, Chafik Boulealam, Mohamed Adnane Mahraz and Hamid Tairi

Big Data Cogn. Comput. 2022, 6(3), 95; https://doi.org/10.3390/bdcc6030095 - 8 Sep 2022

Cited by 6 | Viewed by 2655

Abstract

Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved [...] Read more.

Emotion recognition has become one of the most researched subjects in the scientific community, especially in the human–computer interface field. Decades of scientific research have been conducted on unimodal emotion analysis, whereas recent contributions concentrate on multimodal emotion recognition. These efforts have achieved great success in terms of accuracy in diverse areas of Deep Learning applications. To achieve better performance for multimodal emotion recognition systems, we exploit Meaningful Neural Network Effectiveness to enable emotion prediction during a conversation. Using the text and the audio modalities, we proposed feature extraction methods based on Deep Learning. Then, the bimodal modality that is created following the fusion of the text and audio features is used. The feature vectors from these three modalities are assigned to feed a Meaningful Neural Network to separately learn each characteristic. Its architecture consists of a set of neurons for each component of the input vector before combining them all together in the last layer. Our model was evaluated on a multimodal and multiparty dataset for emotion recognition in conversation MELD. The proposed approach reached an accuracy of 86.69%, which significantly outperforms all current multimodal systems. To sum up, several evaluation techniques applied to our work demonstrate the robustness and superiority of our model over other state-of-the-art MELD models. Full article

► Show Figures

Figure 1

18 pages, 2118 KiB

Open AccessArticle

Learning Performance of International Students and Students with Disabilities: Early Prediction and Feature Selection through Educational Data Mining

by Thao-Trang Huynh-Cam, Long-Sheng Chen and Khai-Vinh Huynh

Big Data Cogn. Comput. 2022, 6(3), 94; https://doi.org/10.3390/bdcc6030094 - 7 Sep 2022

Cited by 4 | Viewed by 2146

Abstract

The learning performance of international students and students with disabilities has increasingly attracted many theoretical and practical researchers. However, previous studies used questionnaires, surveys, and/or interviews to investigate factors affecting students’ learning performance. These methods cannot help universities to provide on-time support to [...] Read more.

The learning performance of international students and students with disabilities has increasingly attracted many theoretical and practical researchers. However, previous studies used questionnaires, surveys, and/or interviews to investigate factors affecting students’ learning performance. These methods cannot help universities to provide on-time support to excellent and poor students. Thus, this study utilized Multilayer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) algorithms to build prediction models for the academic performance of international students, students with disabilities, and local students based on students’ admission profiles and their first-semester Grade Point Average results. The real samples included 4036 freshmen of a Taiwanese technical and vocational university. The experimental results showed that for international students, three models: SVM (100%), MLP (100%), and DT (100%) were significantly superior to RF (96.6%); for students with disabilities, SVM (100%) outperformed RF (98.0%), MLP (96.0%), and DT (94.0%); for local students, RF (98.6%) outperformed DT (95.2%) MLP (94.9%), and SVM (91.9%). The most important features were [numbers of required credits], [main source of living expenses], [department], [father occupations], [mother occupations], [numbers of elective credits], [parent average income per month], and [father education]. The outcomes of this study may assist academic communities in proposing preventive measures at the early stages to attract more international students and enhance school competitive advantages. Full article

► Show Figures

Figure 1

14 pages, 421 KiB

Open AccessEditor’s ChoiceArticle

Hierarchical Co-Attention Selection Network for Interpretable Fake News Detection

by Xiaoyi Ge, Shuai Hao, Yuxiao Li, Bin Wei and Mingshu Zhang

Big Data Cogn. Comput. 2022, 6(3), 93; https://doi.org/10.3390/bdcc6030093 - 5 Sep 2022

Cited by 2 | Viewed by 3574

Abstract

Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable [...] Read more.

Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable success in interpretable fake news detection. However, individuals’ judgments of news are usually hierarchical, prioritizing valuable words above essential sentences, which is neglected by existing fake news detection models. In this paper, we propose an interpretable novel neural network-based model, the hierarchical co-attention selection network (HCSN), to predict whether the source post is fake, as well as an explanation that emphasizes important comments and particular words. The key insight of the HCSN model is to incorporate the Gumbel–Max trick in the hierarchical co-attention selection mechanism that captures sentence-level and word-level information from the source post and comments following the sequence of words–sentences–words–event. In addition, HCSN enjoys the additional benefit of interpretability—it provides a conscious explanation of how it reaches certain results by selecting comments and highlighting words. According to the experiments conducted on real-world datasets, our model outperformed state-of-the-art methods and generated reasonable explanations. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Figure 1

35 pages, 1343 KiB

Open AccessArticle

Predictors of Smartphone Addiction and Social Isolation among Jordanian Children and Adolescents Using SEM and ML

by Evon M. Abu-Taieh, Issam AlHadid, Khalid Kaabneh, Rami S. Alkhawaldeh, Sufian Khwaldeh, Ra’ed Masa’deh and Ala’Aldin Alrowwad

Big Data Cogn. Comput. 2022, 6(3), 92; https://doi.org/10.3390/bdcc6030092 - 2 Sep 2022

Cited by 5 | Viewed by 4169

Abstract

Smartphone addiction has become a major problem for everyone. According to recent studies, a considerable number of children and adolescents are more attracted to smartphones and exhibit addictive behavioral indicators, which are emerging as serious social problems. The main goal of this study [...] Read more.

Smartphone addiction has become a major problem for everyone. According to recent studies, a considerable number of children and adolescents are more attracted to smartphones and exhibit addictive behavioral indicators, which are emerging as serious social problems. The main goal of this study is to identify the determinants that influence children’s smartphone addiction and social isolation among children and adolescents in Jordan. The theoretical foundation of this study model is based on constructs adopted from the Technology Acceptance Model (TAM) (i.e., perceived ease of use and perceived usefulness), with social influence and trust adopted from the TAM extended model along with perceived enjoyment. In terms of methodology, the study uses data from 511 parents who responded via convenient sampling, and the data was collected via a survey questionnaire and used to evaluate the research model. To test the study hypotheses, the empirical validity of the research model was set up, and the data were analyzed with SPSS version 21.0 and AMOS 26 software. Structural equation modeling (SEM), confirmatory factor analysis (CFA), and machine learning (ML) methods were used to test the study hypotheses and validate the properties of the instrument items. The ML methods used are support vector machine (SMO), the bagging reduced error pruning tree (REPTree), artificial neural network (ANN), and random forest. Several major findings were indicated by the results: perceived usefulness, trust, and social influence were significant antecedent behavioral intentions to use the smartphone. Also, findings prove that behavioral intention is statistically supported to have a significant influence on smartphone addiction. Furthermore, the findings confirm that smartphone addiction positively influences social isolation among Jordanian children and adolescents. Yet, perceived ease of use and perceived enjoyment did not have a significant effect on behavioral intention to use the smartphone among Jordanian children and adolescents. The research contributes to the body of knowledge and literature by empirically examining and theorizing the implications of smartphone addiction on social isolation. Further details of the study contribution, as well as research future directions and limitations, are presented in the discussion section. Full article

(This article belongs to the Special Issue Graph-Based Data Mining and Social Network Analysis)

► Show Figures

Figure 1

17 pages, 715 KiB

Open AccessEditor’s ChoiceArticle

Argumentation-Based Query Answering under Uncertainty with Application to Cybersecurity

by Mario A. Leiva, Alejandro J. García, Paulo Shakarian and Gerardo I. Simari

Big Data Cogn. Comput. 2022, 6(3), 91; https://doi.org/10.3390/bdcc6030091 - 26 Aug 2022

Cited by 6 | Viewed by 2148

Abstract

Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete [...] Read more.

Decision support tools are key components of intelligent sociotechnical systems, and their successful implementation faces a variety of challenges, including the multiplicity of information sources, heterogeneous format, and constant changes. Handling such challenges requires the ability to analyze and process inconsistent and incomplete information with varying degrees of associated uncertainty. Moreover, some domains require the system’s outputs to be explainable and interpretable; an example of this is cyberthreat analysis (CTA) in cybersecurity domains. In this paper, we first present the P-DAQAP system, an extension of a recently developed query-answering platform based on defeasible logic programming (DeLP) that incorporates a probabilistic model and focuses on delivering these capabilities. After discussing the details of its design and implementation, and describing how it can be applied in a CTA use case, we report on the results of an empirical evaluation designed to explore the effectiveness and efficiency of a possible world sampling-based approximate query answering approach that addresses the intractability of exact computations. Full article

(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)

► Show Figures

Figure 1

19 pages, 705 KiB

Open AccessEditor’s ChoiceArticle

PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data

by Gaia Gambarelli and Aldo Gangemi

Big Data Cogn. Comput. 2022, 6(3), 90; https://doi.org/10.3390/bdcc6030090 - 26 Aug 2022

Cited by 3 | Viewed by 2657

Abstract

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to [...] Read more.

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors. Full article

(This article belongs to the Special Issue Artificial Intelligence for Online Safety)

► Show Figures

Figure 1

19 pages, 33832 KiB

Open AccessEditor’s ChoiceArticle

Large-Scale Oil Palm Trees Detection from High-Resolution Remote Sensing Images Using Deep Learning

by Hery Wibowo, Imas Sukaesih Sitanggang, Mushthofa Mushthofa and Hari Agung Adrianto

Big Data Cogn. Comput. 2022, 6(3), 89; https://doi.org/10.3390/bdcc6030089 - 24 Aug 2022

Cited by 10 | Viewed by 4647

Abstract

Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, [...] Read more.

Tree counting is an important plantation practice for biological asset inventories, etc. The application of precision agriculture in counting oil palm trees can be implemented by detecting oil palm trees from aerial imagery. This research uses the deep learning approach using YOLOv3, YOLOv4, and YOLOv5m in detecting oil palm trees. The dataset consists of drone images of an oil palm plantation acquired using a Fixed Wing VTOL drone with a resolution of 5cm/pixel, covering an area of 730 ha labeled with an oil palm class of 56,614 labels. The test dataset covers an area of 180 ha with flat and hilly conditions with sparse, dense, and overlapping canopy and oil palm trees intersecting with other vegetations. Model testing using images from 24 regions, each of which covering 12 ha with up to 1000 trees (for a total of 17,343 oil palm trees), yielded F1-scores of 97.28%, 97.74%, and 94.94%, with an average detection time of 43 s, 45 s, and 21 s for models trained with YOLOv3, YOLOv4, and YOLOv5m, respectively. This result shows that the method is sufficiently accurate and efficient in detecting oil palm trees and has the potential to be implemented in commercial applications for plantation companies. Full article

► Show Figures

Figure 1

13 pages, 859 KiB

Open AccessArticle

StEduCov: An Explored and Benchmarked Dataset on Stance Detection in Tweets towards Online Education during COVID-19 Pandemic

by Omama Hamad, Ali Hamdi, Sayed Hamdi and Khaled Shaban

Big Data Cogn. Comput. 2022, 6(3), 88; https://doi.org/10.3390/bdcc6030088 - 22 Aug 2022

Cited by 2 | Viewed by 2603

Abstract

In this paper, we present StEduCov, an annotated dataset for the analysis of stances toward online education during the COVID-19 pandemic. StEduCov consists of 16,572 tweets gathered over 15 months, from March 2020 to May 2021, using the Twitter API. The tweets were [...] Read more.

In this paper, we present StEduCov, an annotated dataset for the analysis of stances toward online education during the COVID-19 pandemic. StEduCov consists of 16,572 tweets gathered over 15 months, from March 2020 to May 2021, using the Twitter API. The tweets were manually annotated into the classes agree, disagreeor neutral. We performed benchmarking on the dataset using state-of-the-art and traditional machine learning models. Specifically, we trained deep learning models—bidirectional encoder representations from transformers, long short-term memory, convolutional neural networks, attention-based biLSTM and Naive Bayes SVM—in addition to naive Bayes, logistic regression, support vector machines, decision trees, K-nearest neighbor and random forest. The average accuracy in the 10-fold cross-validation of these models ranged from 75% to

84.8

% and from

52.6

% to 68% for binary and multi-class stance classifications, respectively. Performances were affected by high vocabulary overlaps between classes and unreliable transfer learning using deep models pre-trained on general texts in relation to specific domains such as COVID-19 and distance education. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

15 pages, 299 KiB

Open AccessArticle

Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions

by Mohammad Daoud

Big Data Cogn. Comput. 2022, 6(3), 87; https://doi.org/10.3390/bdcc6030087 - 22 Aug 2022

Cited by 1 | Viewed by 2620

Abstract

Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective [...] Read more.

Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective manner. We detail methods for identifying similarities between Arabic questions that have been posted online by Internet users and organizations. Our novel approach uses a non-topical rule-based methodology and topical information (textual similarity, lexical similarity, and semantic similarity) to determine if a pair of Arabic questions are similarly paraphrased. Our method counts the lexical and linguistic distances between each question. Additionally, it identifies questions in accordance with their format and scope using expert hypotheses (rules) that have been experimentally shown to be useful and practical. Even if there is a high degree of lexical similarity between a When question (Timex Factoid—inquiring about time) and a Who inquiry (Enamex Factoid—asking about a named entity), they will not be similar. In an experiment using 2200 question pairs, our method attained an accuracy of 0.85, which is remarkable given the simplicity of the solution and the fact that we did not employ any language models or word embedding. In order to cover common Arabic queries presented by Arabic Internet users, we gathered the questions from various online forums and resources. In this study, we describe a unique method for detecting question similarity that does not require intensive processing, a sizable linguistic corpus, or a costly semantic repository. Because there are not many rich Arabic textual resources, this is especially important for informal Arabic text processing on the Internet. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

30 pages, 1145 KiB

Open AccessArticle

A Holistic Scalability Strategy for Time Series Databases Following Cascading Polyglot Persistence

by Carlos Garcia Calatrava, Yolanda Becerra Fontal and Fernando M. Cucchietti

Big Data Cogn. Comput. 2022, 6(3), 86; https://doi.org/10.3390/bdcc6030086 - 18 Aug 2022

Cited by 1 | Viewed by 2024

Abstract

Time series databases aim to handle big amounts of data in a fast way, both when introducing new data to the system, and when retrieving it later on. However, depending on the scenario in which these databases participate, reducing the number of requested [...] Read more.

Time series databases aim to handle big amounts of data in a fast way, both when introducing new data to the system, and when retrieving it later on. However, depending on the scenario in which these databases participate, reducing the number of requested resources becomes a further requirement. Following this goal, NagareDB and its Cascading Polyglot Persistence approach were born. They were not just intended to provide a fast time series solution, but also to find a great cost-efficiency balance. However, although they provided outstanding results, they lacked a natural way of scaling out in a cluster fashion. Consequently, monolithic approaches could extract the maximum value from the solution but distributed ones had to rely on general scalability approaches. In this research, we proposed a holistic approach specially tailored for databases following Cascading Polyglot Persistence to further maximize its inherent resource-saving goals. The proposed approach reduced the cluster size by 33%, in a setup with just three ingestion nodes and up to 50% in a setup with 10 ingestion nodes. Moreover, the evaluation shows that our scaling method is able to provide efficient cluster growth, offering scalability speedups greater than 85% in comparison to a theoretically 100% perfect scaling, while also ensuring data safety via data replication. Full article

(This article belongs to the Topic Electronic Communications, IOT and Big Data)

► Show Figures

Figure 1

19 pages, 38420 KiB

Open AccessArticle

Combination of Deep Cross-Stage Partial Network and Spatial Pyramid Pooling for Automatic Hand Detection

by Christine Dewi and Henoch Juli Christanto

Big Data Cogn. Comput. 2022, 6(3), 85; https://doi.org/10.3390/bdcc6030085 - 9 Aug 2022

Cited by 10 | Viewed by 3394

Abstract

The human hand is involved in many computer vision tasks, such as hand posture estimation, hand movement identification, human activity analysis, and other similar tasks, in which hand detection is an important preprocessing step. It is still difficult to correctly recognize some hands [...] Read more.

The human hand is involved in many computer vision tasks, such as hand posture estimation, hand movement identification, human activity analysis, and other similar tasks, in which hand detection is an important preprocessing step. It is still difficult to correctly recognize some hands in a cluttered environment because of the complex display variations of agile human hands and the fact that they have a wide range of motion. In this study, we provide a brief assessment of CNN-based object identification algorithms, specifically Densenet Yolo V2, Densenet Yolo V2 CSP, Densenet Yolo V2 CSP SPP, Resnet 50 Yolo V2, Resnet 50 CSP, Resnet 50 CSP SPP, Yolo V4 SPP, Yolo V4 CSP SPP, and Yolo V5. The advantages of CSP and SPP are thoroughly examined and described in detail in each algorithm. We show in our experiments that Yolo V4 CSP SPP provides the best level of precision available. The experimental results show that the CSP and SPP layers help improve the accuracy of CNN model testing performance. Our model leverages the advantages of CSP and SPP. Our proposed method Yolo V4 CSP SPP outperformed previous research results by an average of 8.88%, with an improvement from 87.6% to 96.48%. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

26 pages, 5309 KiB

Open AccessEditor’s ChoiceArticle

RSS-Based Wireless LAN Indoor Localization and Tracking Using Deep Architectures

by Muhammed Zahid Karakusak, Hasan Kivrak, Hasan Fehmi Ates and Mehmet Kemal Ozdemir

Big Data Cogn. Comput. 2022, 6(3), 84; https://doi.org/10.3390/bdcc6030084 - 8 Aug 2022

Cited by 9 | Viewed by 3179

Abstract

Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking [...] Read more.

Wireless Local Area Network (WLAN) positioning is a challenging task indoors due to environmental constraints and the unpredictable behavior of signal propagation, even at a fixed location. The aim of this work is to develop deep learning-based approaches for indoor localization and tracking by utilizing Received Signal Strength (RSS). The study proposes Multi-Layer Perceptron (MLP), One and Two Dimensional Convolutional Neural Networks (1D CNN and 2D CNN), and Long Short Term Memory (LSTM) deep networks architectures for WLAN indoor positioning based on the data obtained by actual RSS measurements from an existing WLAN infrastructure in a mobile user scenario. The results, using different types of deep architectures including MLP, CNNs, and LSTMs with existing WLAN algorithms, are presented. The Root Mean Square Error (RMSE) is used as the assessment criterion. The proposed LSTM Model 2 achieved a dynamic positioning RMSE error of

1.73 m

, which outperforms probabilistic WLAN algorithms such as Memoryless Positioning (RMSE:

10.35 m

) and Nonparametric Information (NI) filter with variable acceleration (RMSE:

5.2 m

) under the same experiment environment. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

17 pages, 1280 KiB

Open AccessEditor’s ChoiceArticle

Impactful Digital Twin in the Healthcare Revolution

by Hossein Hassani, Xu Huang and Steve MacFeely

Big Data Cogn. Comput. 2022, 6(3), 83; https://doi.org/10.3390/bdcc6030083 - 8 Aug 2022

Cited by 63 | Viewed by 8774

Abstract

Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, [...] Read more.

Over the last few decades, our digitally expanding world has experienced another significant digitalization boost because of the COVID-19 pandemic. Digital transformations are changing every aspect of this world. New technological innovations are springing up continuously, attracting increasing attention and investments. Digital twin, one of the highest trending technologies of recent years, is now joining forces with the healthcare sector, which has been under the spotlight since the outbreak of COVID-19. This paper sets out to promote a better understanding of digital twin technology, clarify some common misconceptions, and review the current trajectory of digital twin applications in healthcare. Furthermore, the functionalities of the digital twin in different life stages are summarized in the context of a digital twin model in healthcare. Following the Internet of Things as a service concept and digital twining as a service model supporting Industry 4.0, we propose a paradigm of digital twinning everything as a healthcare service, and different groups of physical entities are also clarified for clear reference of digital twin architecture in healthcare. This research discusses the value of digital twin technology in healthcare, as well as current challenges and insights for future research. Full article

(This article belongs to the Collection Machine Learning and Artificial Intelligence for Health Applications on Social Networks)

► Show Figures

Figure 1

20 pages, 7999 KiB

Open AccessArticle

Multi-State Synchronization of Chaotic Systems with Distributed Fractional Order Derivatives and Its Application in Secure Communications

by Ali Akbar Kekha Javan, Assef Zare and Roohallah Alizadehsani

Big Data Cogn. Comput. 2022, 6(3), 82; https://doi.org/10.3390/bdcc6030082 - 27 Jul 2022

Cited by 2 | Viewed by 1877

Abstract

This study investigates multiple synchronizations of distributed fractional-order chaotic systems. These systems consider unknown parameters, disturbance, and time delays. A robust adaptive control method is designed for multistage distributed fractional-order chaotic systems. In this paper, system parameters are changed step by step. Using [...] Read more.

This study investigates multiple synchronizations of distributed fractional-order chaotic systems. These systems consider unknown parameters, disturbance, and time delays. A robust adaptive control method is designed for multistage distributed fractional-order chaotic systems. In this paper, system parameters are changed step by step. Using Lyapunov’s function, while the synchronization error convergence to zero is guaranteed, adaptive rules are designed to estimate the parameters. Then, a secure communication scheme is proposed using the new chaotic masking method. Finally, the simulations are performed on a chaotic system of distributed Duffing fractional order. The results show the high efficiency of the proposed synchronization scheme using robust adaptive control, despite the parametric uncertainties, external disturbance, and variable and unknown time delays. Then, the simulations were performed on the sinusoidal signals of the message in the application of secure communications. The results showed the success of the proposed masking scheme with synchronization in coding and decoding information. Full article

► Show Figures

Figure 1

35 pages, 4319 KiB

Open AccessArticle

An Evaluation of Key Adoption Factors towards Using the Fog Technology

by Omar Ali, Anup Shrestha, Ashraf Jaradat and Ahmad Al-Ahmad

Big Data Cogn. Comput. 2022, 6(3), 81; https://doi.org/10.3390/bdcc6030081 - 26 Jul 2022

Cited by 6 | Viewed by 2935

Abstract

Fog technology is one of the recent improvements in cloud technology that is designed to reduce some of its drawbacks. Fog technology architecture is often widely distributed to minimize the time required for data processing and enable Internet of Things (IoT) innovations. The [...] Read more.

Fog technology is one of the recent improvements in cloud technology that is designed to reduce some of its drawbacks. Fog technology architecture is often widely distributed to minimize the time required for data processing and enable Internet of Things (IoT) innovations. The purpose of this paper is to evaluate the main factors that might influence the adoption of fog technology. This paper offers a combined framework that addresses fog technology adoption based on the technology adoption perspective, which has been comprehensively researched in the information systems discipline. The proposed integrated framework combines the technology acceptance model (TAM) and diffusion of innovation (DOI) theory to develop a holistic perspective on the adoption of fog technology. The factors that might affect the adoption of fog technology are analyzed from the results of an online survey in 43 different organizations across a wide range of industries. These factors are observed based on data collected from 216 participants, including professional IT staff and senior business executives. This analysis was conducted by using structural equation modeling (SEM). The research results identified nine factors with a statistically significant impact on the adoption of fog technology, and these factors included relative advantage, compatibility, awareness, cost-effectiveness, security, infrastructure, ease of use, usefulness, and location. The findings from this research offer insight to organizations looking to implement fog technology to enable IoT and tap into the digital transformation opportunities presented by this new digital economy. Full article

► Show Figures

Figure 1

10 pages, 554 KiB

Open AccessArticle

How Does AR Technology Adoption and Involvement Behavior Affect Overseas Residents’ Life Satisfaction?

by Nargis Dewan, Md Billal Hossain, Gwi-Gon Kim, Anna Dunay and Csaba Bálint Illés

Big Data Cogn. Comput. 2022, 6(3), 80; https://doi.org/10.3390/bdcc6030080 - 25 Jul 2022

Viewed by 1926

Abstract

This study aims to better understand foreign residents’ life satisfaction by exploring residents’ AR technology adoption behavior (a combination of transportation applications’ usefulness and ease of use) and travel involvement. Data were collected from 400 respondents randomly through a questionnaire-based survey. SPSS and [...] Read more.

This study aims to better understand foreign residents’ life satisfaction by exploring residents’ AR technology adoption behavior (a combination of transportation applications’ usefulness and ease of use) and travel involvement. Data were collected from 400 respondents randomly through a questionnaire-based survey. SPSS and AMOS were used to analyze and gather results. This study suggests overall life satisfaction as an operationalized dependent variable to measure a traveler’s sense of satisfaction, a traveler’s involvement, and AR adoption of necessary transportation apps is constructed as an independent variable. The model was proposed to explore the impacts of travel satisfaction on overall life satisfaction. The model focused on the role of traveling involvement when it is considered a first variable to explore the impact of travel satisfaction on the overall quality of life. Furthermore, AR technology adoption behavior is where people use traveling apps before and during traveling to fulfill travel needs, obtain details about locations, and make proper arrangements, as well as other facilities. Two significant roles of transportation apps and travelers’ involvement in travel-satisfaction development and overall life satisfaction were found; variables had a positive effect on travel satisfaction and life satisfaction. The results also revealed that AR mobile travel applications with traveler involvement could help improve individual overseas residents’ travel satisfaction; travel satisfaction provides more feelings of satisfaction with life in South Korea. Full article

(This article belongs to the Special Issue Virtual Reality, Augmented Reality, and Human-Computer Interaction)

► Show Figures

Figure 1

17 pages, 26907 KiB

Open AccessEditor’s ChoiceArticle

Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation

by Konlakorn Wongpatikaseree, Sattaya Singkul, Narit Hnoohom and Sumeth Yuenyong

Big Data Cogn. Comput. 2022, 6(3), 79; https://doi.org/10.3390/bdcc6030079 - 15 Jul 2022

Cited by 8 | Viewed by 4756

Abstract

Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to [...] Read more.

Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than high-resource languages such as German. This paper describes the framework of using a pretrained-model-based front-end and back-end network to adapt feature spaces from the speech recognition domain to the speech emotion classification domain. It consists of two parts: a speech recognition front-end network and a speech emotion recognition back-end network. For speech recognition, Wav2Vec2 is the state-of-the-art for high-resource languages, while XLSR is used for low-resource languages. Wav2Vec2 and XLSR have proposed generalized end-to-end learning for speech understanding based on the speech recognition domain as feature space representations from feature encoding. This is one reason why our front-end network was selected as Wav2Vec2 and XLSR for the pretrained model. The pre-trained Wav2Vec2 and XLSR are used for front-end networks and fine-tuned for specific languages using the Common Voice 7.0 dataset. Then, feature vectors of the front-end network are input for back-end networks; this includes convolution time reduction (CTR) and linear mean encoding transformation (LMET). Experiments using two different datasets show that our proposed framework can outperform the baselines in terms of unweighted and weighted accuracies. Full article

► Show Figures

Figure 1

21 pages, 1199 KiB

Open AccessArticle

Enhancing Marketing Provision through Increased Online Safety That Imbues Consumer Confidence: Coupling AI and ML with the AIDA Model

by Yang-Im Lee and Peter R. J. Trim

Big Data Cogn. Comput. 2022, 6(3), 78; https://doi.org/10.3390/bdcc6030078 - 12 Jul 2022

Cited by 5 | Viewed by 7711

Abstract

To enhance the effectiveness of artificial intelligence (AI) and machine learning (ML) in online retail operations and avoid succumbing to digital myopia, marketers need to be aware of the different approaches to utilizing AI/ML in terms of the information they make available to [...] Read more.

To enhance the effectiveness of artificial intelligence (AI) and machine learning (ML) in online retail operations and avoid succumbing to digital myopia, marketers need to be aware of the different approaches to utilizing AI/ML in terms of the information they make available to appropriate groups of consumers. This can be viewed as utilizing AI/ML to improve the customer journey experience. Reflecting on this, the main question to be addressed is: how can retailers utilize big data through the implementation of AI/ML to improve the efficiency of their marketing operations so that customers feel safe buying online? To answer this question, we conducted a systematic literature review and posed several subquestions that resulted in insights into why marketers need to pay specific attention to AI/ML capability. We explain how different AI/ML tools/functionalities can be related to different stages of the AIDA (Awareness, Interest, Desire, and Action) model, which in turn helps retailers to recognize potential opportunities as well as increase consumer confidence. We outline how digital myopia can be reduced by focusing on human inputs. Although challenges still exist, it is clear that retailers need to identify the boundaries in terms of AI/ML’s ability to enhance the company’s business model. Full article

(This article belongs to the Special Issue Artificial Intelligence for Online Safety)

► Show Figures

Figure 1

22 pages, 1108 KiB

Open AccessEditor’s ChoiceArticle

We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model

by Lihardo Faisal Simanjuntak, Rahmad Mahendra and Evi Yulianti

Big Data Cogn. Comput. 2022, 6(3), 77; https://doi.org/10.3390/bdcc6030077 - 7 Jul 2022

Cited by 15 | Viewed by 3665

Abstract

Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works [...] Read more.

Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

13 pages, 1267 KiB

Open AccessArticle

Optimizing Operation Room Utilization—A Prediction Model

by Benyamine Abbou, Orna Tal, Gil Frenkel, Robyn Rubin and Nadav Rappoport

Big Data Cogn. Comput. 2022, 6(3), 76; https://doi.org/10.3390/bdcc6030076 - 6 Jul 2022

Cited by 8 | Viewed by 5186

Abstract

Background: Operating rooms are the core of hospitals. They are a primary source of revenue and are often seen as one of the bottlenecks in the medical system. Many efforts are made to increase throughput, reduce costs, and maximize incomes, as well as [...] Read more.

Background: Operating rooms are the core of hospitals. They are a primary source of revenue and are often seen as one of the bottlenecks in the medical system. Many efforts are made to increase throughput, reduce costs, and maximize incomes, as well as optimize clinical outcomes and patient satisfaction. We trained a predictive model on the length of surgeries to improve the productivity and utility of operative rooms in general hospitals. Methods: We collected clinical and administrative data for the last 10 years from two large general public hospitals in Israel. We trained a machine learning model to give the expected length of surgery using pre-operative data. These data included diagnoses, laboratory tests, risk factors, demographics, procedures, anesthesia type, and the main surgeon’s level of experience. We compared our model to a naïve model that represented current practice. Findings: Our prediction model achieved better performance than the naïve model and explained almost 70% of the variance in surgery durations. Interpretation: A machine learning-based model can be a useful approach for increasing operating room utilization. Among the most important factors were the type of procedures and the main surgeon’s level of experience. The model enables the harmonizing of hospital productivity through wise scheduling and matching suitable teams for a variety of clinical procedures for the benefit of the individual patient and the system as a whole. Full article

(This article belongs to the Special Issue Data Science in Health Care)

► Show Figures

Figure 1

9 pages, 258 KiB

Open AccessOpinion

Environmental Justice and the Use of Artificial Intelligence in Urban Air Pollution Monitoring

by Tatyana G. Krupnova, Olga V. Rakova, Kirill A. Bondarenko and Valeria D. Tretyakova

Big Data Cogn. Comput. 2022, 6(3), 75; https://doi.org/10.3390/bdcc6030075 - 5 Jul 2022

Cited by 9 | Viewed by 4273

Abstract

The main aims of urban air pollution monitoring are to optimize the interaction between humanity and nature, to combine and integrate environmental databases, and to develop sustainable approaches to the production and the organization of the urban environment. One of the main applications [...] Read more.

The main aims of urban air pollution monitoring are to optimize the interaction between humanity and nature, to combine and integrate environmental databases, and to develop sustainable approaches to the production and the organization of the urban environment. One of the main applications of urban air pollution monitoring is for exposure assessment and public health studies. Artificial intelligence (AI) and machine learning (ML) approaches can be used to build air pollution models to predict pollutant concentrations and assess environmental and health risks. Air pollution data can be uploaded into AI/ML models to estimate different exposure levels within different communities. The correlation between exposure estimates and public health surveys is important for assessing health risks. These aspects are critical when it concerns environmental injustice. Computational approaches should efficiently manage, visualize, and integrate large datasets. Effective data integration and management are a key to the successful application of computational intelligence approaches in ecology. In this paper, we consider some of these constraints and discuss possible ways to overcome current problems and environmental injustice. The most successful global approach is the development of the smart city; however, such an approach can only increase environmental injustice as not all the regions have access to AI/ML technologies. It is challenging to develop successful regional projects for the analysis of environmental data in the current complicated operating conditions, as well as taking into account the time, computing power, and constraints in the context of environmental injustice. Full article

(This article belongs to the Special Issue Big Data and Internet of Things)

15 pages, 695 KiB

Open AccessEditor’s ChoiceArticle

Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets

by Ran Deng and Fedor Duzhin

Big Data Cogn. Comput. 2022, 6(3), 74; https://doi.org/10.3390/bdcc6030074 - 5 Jul 2022

Cited by 6 | Viewed by 4249

Abstract

Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural [...] Read more.

Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model’s accuracy if the available training set is very small. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

19 pages, 491 KiB

Open AccessEditor’s ChoiceArticle

Digital Technologies and the Role of Data in Cultural Heritage: The Past, the Present, and the Future

by Vassilis Poulopoulos and Manolis Wallace

Big Data Cogn. Comput. 2022, 6(3), 73; https://doi.org/10.3390/bdcc6030073 - 4 Jul 2022

Cited by 19 | Viewed by 8019

Abstract

Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and [...] Read more.

Is culture considered to be our past, our roots, ancient ruins, or an old piece of art? Culture is all the factors that define who we are, how we act and interact in our world, in our daily activities, in our personal and public relations, in our life. Culture is all the things we are not obliged to do. However, today, we live in a mixed environment, an environment that is a combination of “offline” and the online, digital world. In this mixed environment, it is technology that defines our behaviour, technology that unites people in a large world, that finally, defines a status of “monoculture”. In this article, we examine the role of technology, and especially big data, in relation to the culture. We present the advances that led to paradigm shifts in the research area of cultural informatics, and forecast the future of culture as will be defined in this mixed world. Full article

(This article belongs to the Special Issue Big Data Analytics for Cultural Heritage)

► Show Figures

Figure 1

16 pages, 4437 KiB

Open AccessEditor’s ChoiceArticle

Lightweight AI Framework for Industry 4.0 Case Study: Water Meter Recognition

by Jalel Ktari, Tarek Frikha, Monia Hamdi, Hela Elmannai and Habib Hmam

Big Data Cogn. Comput. 2022, 6(3), 72; https://doi.org/10.3390/bdcc6030072 - 1 Jul 2022

Cited by 21 | Viewed by 3999

Abstract

The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this [...] Read more.

The evolution of applications in telecommunication, network, computing, and embedded systems has led to the emergence of the Internet of Things and Artificial Intelligence. The combination of these technologies enabled improving productivity by optimizing consumption and facilitating access to real-time information. In this work, there is a focus on Industry 4.0 and Smart City paradigms and a proposal of a new approach to monitor and track water consumption using an OCR, as well as the artificial intelligence algorithm and, in particular the YoLo 4 machine learning model. The goal of this work is to provide optimized results in real time. The recognition rate obtained with the proposed algorithms is around 98%. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Deep Federated Learning Models)

► Show Figures

Figure 1

25 pages, 3658 KiB

Open AccessEditor’s ChoiceArticle

A Comprehensive Spark-Based Layer for Converting Relational Databases to NoSQL

by Manal A. Abdel-Fattah, Wael Mohamed and Sayed Abdelgaber

Big Data Cogn. Comput. 2022, 6(3), 71; https://doi.org/10.3390/bdcc6030071 - 27 Jun 2022

Cited by 1 | Viewed by 3763

Abstract

Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data [...] Read more.

Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data because NoSQL represents data in diverse models and uses a variety of query languages, unlike traditional relational databases. Therefore, using NoSQL has become essential, and many studies have attempted to propose different layers to convert relational databases to NoSQL; however, most of them targeted only one or two models of NoSQL, and evaluated their layers on a single node, not in a distributed environment. This study proposes a Spark-based layer for mapping relational databases to NoSQL models, focusing on the document, column, and key–value databases of NoSQL models. The proposed Spark-based layer comprises of two parts. The first part is concerned with converting relational databases to document, column, and key–value databases, and encompasses two phases: a metadata analyzer of relational databases and Spark-based transformation and migration. The second part focuses on executing a structured query language (SQL) on the NoSQL. The suggested layer was applied and compared with Unity, as it has similar components and features and supports sub-queries and join operations in a single-node environment. The experimental results show that the proposed layer outperformed Unity in terms of the query execution time by a factor of three. In addition, the proposed layer was applied to multi-node clusters using different scenarios, and the results show that the integration between the Spark cluster and NoSQL databases on multi-node clusters provided better performance in reading and writing while increasing the dataset size than using a single node. Full article

► Show Figures

Figure 1

20 pages, 6876 KiB

Open AccessEditor’s ChoiceArticle

DeepWings©: Automatic Wing Geometric Morphometrics Classification of Honey Bee (Apis mellifera) Subspecies Using Deep Learning for Detecting Landmarks

by Pedro João Rodrigues, Walter Gomes and Maria Alice Pinto

Big Data Cogn. Comput. 2022, 6(3), 70; https://doi.org/10.3390/bdcc6030070 - 27 Jun 2022

Cited by 11 | Viewed by 5536

Abstract

Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes [...] Read more.

Honey bee classification by wing geometric morphometrics entails the first step of manual annotation of 19 landmarks in the forewing vein junctions. This is a time-consuming and error-prone endeavor, with implications for classification accuracy. Herein, we developed a software called DeepWings© that overcomes this constraint in wing geometric morphometrics classification by automatically detecting the 19 landmarks on digital images of the right forewing. We used a database containing 7634 forewing images, including 1864 analyzed by F. Ruttner in the original delineation of 26 honey bee subspecies, to tune a convolutional neural network as a wing detector, a deep learning U-Net as a landmarks segmenter, and a support vector machine as a subspecies classifier. The implemented MobileNet wing detector was able to achieve a mAP of 0.975 and the landmarks segmenter was able to detect the 19 landmarks with 91.8% accuracy, with an average positional precision of 0.943 resemblance to manually annotated landmarks. The subspecies classifier, in turn, presented an average accuracy of 86.6% for 26 subspecies and 95.8% for a subset of five important subspecies. The final implementation of the system showed good speed performance, requiring only 14 s to process 10 images. DeepWings© is very user-friendly and is the first fully automated software, offered as a free Web service, for honey bee classification from wing geometric morphometrics. DeepWings© can be used for honey bee breeding, conservation, and even scientific purposes as it provides the coordinates of the landmarks in excel format, facilitating the work of research teams using classical identification approaches and alternative analytical tools. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 6, Issue 3 (September 2022) – 31 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI