Information doi: 10.3390/info15030168
Authors: Jiacun Wang Guipeng Xi Xiwang Guo Shujin Qin Henry Han
The scheduling of disassembly lines is of great importance to achieve optimized productivity. In this paper, we address the Hybrid Disassembly Line Balancing Problem that combines linear disassembly lines and U-shaped disassembly lines, considering multi-skilled workers, and targeting profit and carbon emissions. In contrast to common approaches in reinforcement learning that typically employ weighting strategies to solve multi-objective problems, our approach innovatively incorporates non-dominated ranking directly into the reward function. The exploration of Pareto frontier solutions or better solutions is moderated by comparing performance between solutions and dynamically adjusting rewards based on the occurrence of repeated solutions. The experimental results show that the multi-objective Advantage Actor-Critic algorithm based on Pareto optimization exhibits superior performance in terms of metrics superiority in the comparison of six experimental cases of different scales, with an excellent metrics comparison rate of 70%. In some of the experimental cases in this paper, the solutions produced by the multi-objective Advantage Actor-Critic algorithm show some advantages over other popular algorithms such as the Deep Deterministic Policy Gradient Algorithm, the Soft Actor-Critic Algorithm, and the Non-Dominated Sorting Genetic Algorithm II. This further corroborates the effectiveness of our proposed solution.
]]>Information doi: 10.3390/info15030167
Authors: Marios Arampatzis Maria Pempetzoglou Athanasios Tsadiras
Effective inventory management is crucial for businesses to balance minimizing holding costs while optimizing ordering strategies. Monthly or sporadic orders over time may lead to high ordering or holding costs, respectively. In this study, we introduce two novel algorithms designed to optimize ordering replenishment quantities, minimizing total replenishment, and holding costs over a planning horizon for both partially loaded and fully loaded trucks. The novelty of the first algorithm is that it extends the classical Wagner–Whitin approach by incorporating various additional cost elements, stock retention considerations, and warehouse capacity constraints, making it more suitable for real-world problems. The second algorithm presented in this study is a variation of the first algorithm, with its contribution being that it incorporates the requirement of several suppliers to receive order quantities that regard only fully loaded trucks. These two algorithms are implemented in Python, creating the software tool called “Inventory Cost Minimizing tool” (ICM). This tool takes relevant data inputs and outputs optimal order timing and quantities, minimizing total costs. This research offers practical and novel solutions for businesses seeking to streamline their inventory management processes and reduce overall expenses.
]]>Information doi: 10.3390/info15030166
Authors: Zhao Xiong Jiang Wu
Malaria is one of the major global health threats. Microscopic examination has been designated as the “gold standard” for malaria detection by the World Health Organization. However, it heavily relies on the experience of doctors, resulting in long diagnosis time, low efficiency, and a high risk of missed or misdiagnosed cases. To alleviate the pressure on healthcare workers and achieve automated malaria detection, numerous target detection models have been applied to the blood smear examination for malaria cells. This paper introduces the multi-level attention split network (MAS-Net) that improves the overall detection performance by addressing the issues of information loss for small targets and mismatch between the detection receptive field and target size. Therefore, we propose the split contextual attention structure (SPCot), which fully utilizes contextual information and avoids excessive channel compression operations, reducing information loss and improving the overall detection performance of malaria cells. In the shallow detection layer, we introduce the multi-scale receptive field detection head (MRFH), which better matches targets of different scales and provides a better detection receptive field, thus enhancing the performance of malaria cell detection. On the NLM—Malaria Dataset provided by the National Institutes of Health, the improved model achieves an average accuracy of 75.9% in the public dataset of Plasmodium vivax (malaria)-infected human blood smear. Considering the practical application of the model, we introduce the Performance-aware Approximation of Global Channel Pruning (PAGCP) to compress the model size while sacrificing a small amount of accuracy. Compared to other state-of-the-art (SOTA) methods, the proposed MAS-Net achieves competitive results.
]]>Information doi: 10.3390/info15030165
Authors: Retno Kusumaningrum Selvi Fitria Khoerunnisa Khadijah Khadijah Muhammad Syafrudin
The mangrove ecosystem is crucial for addressing climate change and supporting marine life. To preserve this ecosystem, understanding community awareness is essential. While latent Dirichlet allocation (LDA) is commonly used for this, it has drawbacks such as high resource requirements and an inability to capture semantic nuances. We propose a technique using Sentence-BERT and K-Means Clustering for topic identification, addressing these drawbacks. Analyzing mangrove-related Twitter data in Indonesian from 1 September 2021 to 31 August 2022 revealed nine topics. The visualized tweet frequency indicates a growing public awareness of the mangrove ecosystem, showcasing collaborative efforts between the government and society. Our method proves effective and can be extended to other domains.
]]>Information doi: 10.3390/info15030164
Authors: Max Schrötter Andreas Niemann Bettina Schnor
Over the last few years, a plethora of papers presenting machine-learning-based approaches for intrusion detection have been published. However, the majority of those papers do not compare their results with a proper baseline of a signature-based intrusion detection system, thus violating good machine learning practices. In order to evaluate the pros and cons of the machine-learning-based approach, we replicated a research study that uses a deep neural network model for intrusion detection. The results of our replicated research study expose several systematic problems with the used datasets and evaluation methods. In our experiments, a signature-based intrusion detection system with a minimal setup was able to outperform the tested model even under small traffic changes. Testing the replicated neural network on a new dataset recorded in the same environment with the same attacks using the same tools showed that the accuracy of the neural network dropped to 54%. Furthermore, the often-claimed advantage of being able to detect zero-day attacks could not be seen in our experiments.
]]>Information doi: 10.3390/info15030163
Authors: Grace-Mercure Bakanina Kissanga Hasan Zulfiqar Shenghan Gao Sophyani Banaamwini Yussif Biffon Manyura Momanyi Lin Ning Hao Lin Cheng-Bing Huang
Accurate prediction of subcellular localization of viral proteins is crucial for understanding their functions and developing effective antiviral drugs. However, this task poses a significant challenge, especially when relying on expensive and time-consuming classical biological experiments. In this study, we introduced a computational model called E-MuLA, based on a deep learning network that combines multiple local attention modules to enhance feature extraction from protein sequences. The superior performance of the E-MuLA has been demonstrated through extensive comparisons with LSTM, CNN, AdaBoost, decision trees, KNN, and other state-of-the-art methods. It is noteworthy that the E-MuLA achieved an accuracy of 94.87%, specificity of 98.81%, and sensitivity of 84.18%, indicating that E-MuLA has the potential to become an effective tool for predicting virus subcellular localization.
]]>Information doi: 10.3390/info15030162
Authors: Leon Kopitar Iztok Fister Gregor Stiglic
Introduction: Type 2 diabetes mellitus is a major global health concern, but interpreting machine learning models for diagnosis remains challenging. This study investigates combining association rule mining with advanced natural language processing to improve both diagnostic accuracy and interpretability. This novel approach has not been explored before in using pretrained transformers for diabetes classification on tabular data. Methods: The study used the Pima Indians Diabetes dataset to investigate Type 2 diabetes mellitus. Python and Jupyter Notebook were employed for analysis, with the NiaARM framework for association rule mining. LightGBM and the dalex package were used for performance comparison and feature importance analysis, respectively. SHAP was used for local interpretability. OpenAI GPT version 3.5 was utilized for outcome prediction and interpretation. The source code is available on GitHub. Results: NiaARM generated 350 rules to predict diabetes. LightGBM performed better than the GPT-based model. A comparison of GPT and NiaARM rules showed disparities, prompting a similarity score analysis. LightGBM’s decision making leaned heavily on glucose, age, and BMI, as highlighted in feature importance rankings. Beeswarm plots demonstrated how feature values correlate with their influence on diagnosis outcomes. Discussion: Combining association rule mining with GPT for Type 2 diabetes mellitus classification yields limited effectiveness. Enhancements like preprocessing and hyperparameter tuning are required. Interpretation challenges and GPT’s dependency on provided rules indicate the necessity for prompt engineering and similarity score methods. Variations in feature importance rankings underscore the complexity of T2DM. Concerns regarding GPT’s reliability emphasize the importance of iterative approaches for improving prediction accuracy.
]]>Information doi: 10.3390/info15030161
Authors: Anastasios Fanariotis Theofanis Orphanoudakis Vassilis Fotopoulos
Having as a main objective the exploration of power efficiency of microcontrollers running machine learning models, this manuscript contrasts the performance of two types of state-of-the-art microcontrollers, namely ESP32 with an LX6 core and ESP32-S3 with an LX7 core, focusing on the impact of process acceleration technologies like cache memory and vectoring. The research employs experimental methods, where identical machine learning models are run on both microcontrollers under varying conditions, with particular attention to cache optimization and vector instruction utilization. Results indicate a notable difference in power efficiency between the two microcontrollers, directly linked to their respective process acceleration capabilities. The study concludes that while both microcontrollers show efficacy in running machine learning models, ESP32-S3 with an LX7 core demonstrates superior power efficiency, attributable to its advanced vector instruction set and optimized cache memory usage. These findings provide valuable insights for the design of power-efficient embedded systems supporting machine learning for a variety of applications, including IoT and wearable devices, ambient intelligence, and edge computing and pave the way for future research in optimizing machine learning models for low-power, embedded environments.
]]>Information doi: 10.3390/info15030160
Authors: Khaled Rabieh Rasha Samir Marianne A. Azer
Rapid advances in technology and shifting tastes among motorists have reworked the contemporary automobile production sector. Driving is now much safer and more convenient than ever before thanks to a plethora of new technology and apps. Millions of people are hurt every year despite the fact that automobiles are networked and have several sensors and radars for collision avoidance. Each year, many of them are injured in car accidents and need emergency care, and sadly, the fatality rate is growing. Vehicle and pedestrian collisions are still a serious problem, making it imperative to advance methods that prevent them. This paper refines our previous efficient VANET-based pedestrian safety system based on two-way communication between smart cars and the cell phones of vulnerable road users. We implemented the scheme using C and NS3 to simulate different traffic scenarios. Our objective is to measure the additional overhead to protect vulnerable road users. We prove that our proposed scheme adds just a little amount of additional overhead and successfully satisfies the stringent criteria of safety applications.
]]>Information doi: 10.3390/info15030159
Authors: Tao Tang Yuting Cui Rui Feng Deliang Xiang
With the development of deep learning in the field of computer vision, convolutional neural network models and attention mechanisms have been widely applied in SAR image target recognition. The improvement of convolutional neural network attention in existing SAR image target recognition focuses on spatial and channel information but lacks research on the relationship and recognition mechanism between spatial and channel information. In response to this issue, this article proposes a hybrid attention module and introduces a Mixed Attention (MA) mechanism module in the MobileNetV2 network. The proposed MA mechanism fully considers the comprehensive calculation of spatial attention (SPA), channel attention (CHA), and coordinated attention (CA). It can input feature maps for comprehensive weighting to enhance the features of the regions of interest, in order to improve the recognition rate of vehicle targets in SAR images.The superiority of our algorithm was verified through experiments on the MSTAR dataset.
]]>Information doi: 10.3390/info15030157
Authors: Yang Zhang Yuan Feng Shiqi Wang Zhicheng Tang Zhenduo Zhai Reid Viegut Lisa Webb Andrew Raedeke Yi Shang
Waterfowl populations monitoring is essential for wetland conservation. Lately, deep learning techniques have shown promising advancements in detecting waterfowl in aerial images. In this paper, we present performance evaluation of several popular supervised and semi-supervised deep learning models for waterfowl detection in aerial images using four new image datasets containing 197,642 annotations. The best-performing model, Faster R-CNN, achieved 95.38% accuracy in terms of mAP. Semi-supervised learning models outperformed supervised models when the same amount of labeled data was used for training. Additionally, we present performance evaluation of several deep learning models on waterfowl classifications on aerial images using a new real-bird classification dataset consisting of 6,986 examples and a new decoy classification dataset consisting of about 10,000 examples per category of 20 categories. The best model achieved accuracy of 91.58% on the decoy dataset and 82.88% on the real-bird dataset.
]]>Information doi: 10.3390/info15030158
Authors: Georgios Karantaidis Constantine Kotropoulos
The detection of computer-generated (CG) multimedia content has become of utmost importance due to the advances in digital image processing and computer graphics. Realistic CG images could be used for fraudulent purposes due to the deceiving recognition capabilities of human eyes. So, there is a need to deploy algorithmic tools for distinguishing CG images from natural ones within multimedia forensics. Here, an end-to-end framework is proposed to tackle the problem of distinguishing CG images from natural ones by utilizing supervised contrastive learning and arbitrary style transfer by means of a two-stage deep neural network architecture. This architecture enables discrimination by leveraging per-class embeddings and generating multiple training samples to increase model capacity without the need for a vast amount of initial data. Stochastic weight averaging (SWA) is also employed to improve the generalization and stability of the proposed framework. Extensive experiments are conducted to investigate the impact of various noise conditions on the classification accuracy and the proposed framework’s generalization ability. The conducted experiments demonstrate superior performance over the existing state-of-the-art methodologies on the public DSTok, Rahmouni, and LSCGB benchmark datasets. Hypothesis testing asserts that the improvements in detection accuracy are statistically significant.
]]>Information doi: 10.3390/info15030156
Authors: Min-Joon Kim Thi-Thu-Huong Le
This study delves into the intricate relationship between fluctuations in the real exchange rate and the trade balance, situated within the framework of a ‘two-country’ trade theory model. Despite a wealth of prior research on the impact of exchange rates on international trade, the precise extent of this influence remains a contentious issue. To bridge this gap, our research adopts a pioneering approach, employing three distinct artificial intelligence-based influence measurement methods: Mean Decrease Impurity (MDI), Permutation Importance Measurement (PIM), and Shapley Additive Explanation (SHAP). These sophisticated techniques provide a nuanced and differentiated perspective, enabling specific and quantitative measurements of the real exchange rate’s impact on the trade balance. The outcomes derived from the application of these innovative methods shed light on the substantial contribution of the real exchange rate to the trade balance. Notably, the real exchange rate (RER) emerges as the second most influential factor within the ‘two-country’ trade model. This empirical evidence, drawn from a panel dataset of 78 nations over the period 1992–2021, addresses crucial gaps in the existing literature, offering a finer-grained understanding of how real exchange rates shape international trade dynamics. Importantly, our study implies that policymakers should recognize the pivotal role of the real exchange rate as a key determinant of trade flow.
]]>Information doi: 10.3390/info15030155
Authors: Alessandro Pinheiro Abílio Oliveira Bráulio Alturas Mónica Cruz
The gaming industry has seen a considerable expansion thanks to the ever-increasing and widespread consumption of digital games in different contexts of use and across all age groups. We are witnessing a commercial boom and awakening the attention of researchers from different scientific areas to address an interdisciplinary topic. Digital games consumption has inspired some studies investigating the use and adoption of these games and, in this context, we ask: “how has the use and adoption of digital games by adults been studied?”. We conducted a documental study with a meta-analysis approach to answer these questions, considering the most relevant research papers published in the last fifteen years, according to a set of inclusion criteria. The planned objectives consider identifying the main dimensions in the studies about the use and adoption of digital games by adults and the findings of this study delineate several dimensions as prospective latent variables for inclusion in future studies within acceptance models for digital games. Furthermore, our research illuminates the socialization dimension, particularly when amalgamated with other conceptual dimensions. This nuanced understanding underscores the intricate interplay between various factors influencing the acceptance and adoption of digital gaming technologies.
]]>Information doi: 10.3390/info15030154
Authors: Nikola Anđelić Sandi Baressi Šegota
This investigation underscores the paramount imperative of discerning network intrusions as a pivotal measure to fortify digital systems and shield sensitive data from unauthorized access, manipulation, and potential compromise. The principal aim of this study is to leverage a publicly available dataset, employing a Genetic Programming Symbolic Classifier (GPSC) to derive symbolic expressions (SEs) endowed with the capacity for exceedingly precise network intrusion detection. In order to augment the classification precision of the SEs, a pioneering Random Hyperparameter Value Search (RHVS) methodology was conceptualized and implemented to discern the optimal combination of GPSC hyperparameter values. The GPSC underwent training via a robust five-fold cross-validation regimen, mitigating class imbalances within the initial dataset through the application of diverse oversampling techniques, thereby engendering balanced dataset iterations. Subsequent to the acquisition of SEs, the identification of the optimal set ensued, predicated upon metrics inclusive of accuracy, area under the receiver operating characteristics curve, precision, recall, and F1-score. The selected SEs were subsequently subjected to rigorous testing on the original imbalanced dataset. The empirical findings of this research underscore the efficacy of the proposed methodology, with the derived symbolic expressions attaining an impressive classification accuracy of 0.9945. If the accuracy achieved in this research is compared to the average state-of-the-art accuracy, the accuracy obtained in this research represents the improvement of approximately 3.78%. In summation, this investigation contributes salient insights into the efficacious deployment of GPSC and RHVS for the meticulous detection of network intrusions, thereby accentuating the potential for the establishment of resilient cybersecurity defenses.
]]>Information doi: 10.3390/info15030153
Authors: Yohanes Yohanie Fridelin Panduman Nobuo Funabiki Evianita Dewi Fajrianti Shihao Fang Sritrusta Sukaridhoto
In this paper, we have developed the SEMAR (Smart Environmental Monitoring and Analytics in Real-Time) IoT application server platform for fast deployments of IoT application systems. It provides various integration capabilities for the collection, display, and analysis of sensor data on a single platform. Recently, Artificial Intelligence (AI) has become very popular and widely used in various applications including IoT. To support this growth, the integration of AI into SEMAR is essential to enhance its capabilities after identifying the current trends of applicable AI technologies in IoT applications. In this paper, we first provide a comprehensive review of IoT applications using AI techniques in the literature. They cover predictive analytics, image classification, object detection, text spotting, auditory perception, Natural Language Processing (NLP), and collaborative AI. Next, we identify the characteristics of each technique by considering the key parameters, such as software requirements, input/output (I/O) data types, processing methods, and computations. Third, we design the integration of AI techniques into SEMAR based on the findings. Finally, we discuss use cases of SEMAR for IoT applications with AI techniques. The implementation of the proposed design in SEMAR and its use to IoT applications will be in future works.
]]>Information doi: 10.3390/info15030152
Authors: Javier Domingo-Espiñeira Oscar Fraile-Martínez Cielo Garcia-Montero María Montero Andrea Varaona Francisco J. Lara-Abelenda Miguel A. Ortega Melchor Alvarez-Mon Miguel Angel Alvarez-Mon
Neurological disorders represent the primary cause of disability and the secondary cause of mortality globally. The incidence and prevalence of the most notable neurological disorders are growing rapidly. Considering their social and public perception by using different platforms like Twitter can have a huge impact on the patients, relatives, caregivers and professionals involved in the multidisciplinary management of neurological disorders. In this study, we collected and analyzed all tweets posted in English or Spanish, between 2007 and 2023, referring to headache disorders, dementia, epilepsy, multiple sclerosis, spinal cord injury or Parkinson’s disease using a search engine that has access to 100% of the publicly available tweets. The aim of our work was to deepen our understanding of the public perception of neurological disorders by addressing three major objectives: (1) analyzing the number and temporal evolution of both English and Spanish tweets discussing the most notable neurological disorders (dementias, Parkinson’s disease, multiple sclerosis, spinal cord injury, epilepsy and headache disorders); (2) determining the main thematic content of the Twitter posts and the interest they generated temporally by using topic modeling; and (3) analyzing the sentiments associated with the different topics that were previously collected. Our results show that dementias were, by far, the most common neurological disorders whose treatment was discussed on Twitter, and that the most discussed topics in the tweets included the impact of neurological diseases on patients and relatives, claims to increase public awareness, social support and research, activities to ameliorate disease development and existent/potential treatments or approaches to neurological disorders, with a significant number of the tweets showing negative emotions like fear, anger and sadness, and some also demonstrating positive emotions like joy. Thus, our study shows that not only is Twitter an important and active platform implicated in the dissemination and normalization of neurological disorders, but also that the number of tweets discussing these different entities is quite inequitable, and that a greater intervention and more accurate dissemination of information by different figures and professionals on social media could help to convey a better understanding of the current state, and to project the future state, of neurological diseases for the general public.
]]>Information doi: 10.3390/info15030151
Authors: Florin Leon Marius Gavrilescu Sabina-Adriana Floria Alina Adriana Minea
This paper proposes a classification methodology aimed at identifying correlations between job ad requirements and transversal skill sets, with a focus on predicting the necessary skills for individual job descriptions using a deep learning model. The approach involves data collection, preprocessing, and labeling using ESCO (European Skills, Competences, and Occupations) taxonomy. Hierarchical classification and multi-label strategies are used for skill identification, while augmentation techniques address data imbalance, enhancing model robustness. A comparison between results obtained with English-specific and multi-language sentence embedding models reveals close accuracy. The experimental case studies detail neural network configurations, hyperparameters, and cross-validation results, highlighting the efficacy of the hierarchical approach and the suitability of the multi-language model for the diverse European job market. Thus, a new approach is proposed for the hierarchical classification of transversal skills from job ads.
]]>Information doi: 10.3390/info15030150
Authors: Vasileios Thomopoulos Kostas Tsichlas
In this research, we present the first steps toward developing a data-driven agent-based model (ABM) specifically designed for simulating infectious disease dynamics in Greece. Amidst the ongoing COVID-19 pandemic caused by SARS-CoV-2, this research holds significant importance as it can offer valuable insights into disease transmission patterns and assist in devising effective intervention strategies. To the best of our knowledge, no similar study has been conducted in Greece. We constructed a prototype ABM that utilizes publicly accessible data to accurately represent the complex interactions and dynamics of disease spread in the Greek population. By incorporating demographic information and behavioral patterns, our model captures the specific characteristics of Greece, enabling accurate and context-specific simulations. By using our proposed ABM, we aim to assist policymakers in making informed decisions regarding disease control and prevention. Through the use of simulations, policymakers have the opportunity to explore different scenarios and predict the possible results of various intervention measures. These may include strategies like testing approaches, contact tracing, vaccination campaigns, and social distancing measures. Through these simulations, policymakers can assess the effectiveness and feasibility of these interventions, leading to the development of well-informed strategies aimed at reducing the impact of infectious diseases on the Greek population. This study is an initial exploration toward understanding disease transmission patterns and a first step towards formulating effective intervention strategies for Greece.
]]>Information doi: 10.3390/info15030148
Authors: Zongshun Wang Ce Li Jialin Ma Zhiqiang Feng Limei Xiao
In this study, we introduce a novel framework for the semantic segmentation of point clouds in autonomous driving scenarios, termed PVI-Net. This framework uniquely integrates three different data perspectives—point clouds, voxels, and distance maps—executing feature extraction through three parallel branches. Throughout this process, we ingeniously design a point cloud–voxel cross-attention mechanism and a multi-perspective feature fusion strategy for point images. These strategies facilitate information interaction across different feature dimensions of perspectives, thereby optimizing the fusion of information from various viewpoints and significantly enhancing the overall performance of the model. The network employs a U-Net structure and residual connections, effectively merging and encoding information to improve the precision and efficiency of semantic segmentation. We validated the performance of PVI-Net on the SemanticKITTI and nuScenes datasets. The results demonstrate that PVI-Net surpasses most of the previous methods in various performance metrics.
]]>Information doi: 10.3390/info15030149
Authors: Eike Blomeier Sebastian Schmidt Bernd Resch
In the early stages of a disaster caused by a natural hazard (e.g., flood), the amount of available and useful information is low. To fill this informational gap, emergency responders are increasingly using data from geo-social media to gain insights from eyewitnesses to build a better understanding of the situation and design effective responses. However, filtering relevant content for this purpose poses a challenge. This work thus presents a comparison of different machine learning models (Naïve Bayes, Random Forest, Support Vector Machine, Convolutional Neural Networks, BERT) for semantic relevance classification of flood-related, German-language Tweets. For this, we relied on a four-category training data set created with the help of experts from human aid organisations. We identified fine-tuned BERT as the most suitable model, averaging a precision of 71% with most of the misclassifications occurring across similar classes. We thus demonstrate that our methodology helps in identifying relevant information for more efficient disaster management.
]]>Information doi: 10.3390/info15030147
Authors: Fukuharu Tanaka Teruhiro Mizumoto Hirozumi Yamaguchi
Advances in image analysis and deep learning technologies have expanded the use of floor plans, traditionally used for sales and rentals, to include 3D reconstruction and automated design. However, a typical floor plan does not provide detailed information, such as the type and number of outlets and locations affecting the placement of furniture and appliances. Electrical plans, providing details on electrical installations, are intricate due to overlapping symbols and lines and remain unutilized as house manufacturers independently manage them. This paper proposes an analysis method that extracts the house structure, room semantics, connectivities, and specifics of wall and ceiling sockets from electrical plans, achieving robustness to noise and overlaps by leveraging the unique features of symbols and lines. The experiments using 544 electrical plans show that our method achieved better accuracy (+3.6 pt) for recognizing room structures than the state-of-the-art method, 87.2% in identifying room semantics and 97.7% in detecting sockets.
]]>Information doi: 10.3390/info15030146
Authors: Iman I. M. Abu Sulayman Peter Voege Abdelkader Ouda
The increasing significance of data analytics in modern information analysis is underpinned by vast amounts of user data. However, it is only feasible to amass sufficient data for various tasks in specific data-gathering contexts that either have limited security information or are associated with older applications. There are numerous scenarios where a domain is too new, too specialized, too secure, or data are too sparsely available to adequately support data analytics endeavors. In such cases, synthetic data generation becomes necessary to facilitate further analysis. To address this challenge, we have developed an Algorithm-based Data Generation (ADG) Engine that enables data generation without the need for initial data, relying instead on user behavior patterns, including both normal and abnormal behavior. The ADG Engine uses a structured database system to keep track of users across different types of activity. It then uses all of this information to make the generated data as real as possible. Our efforts are particularly focused on data analytics, achieved by generating abnormalities within the data and allowing users to customize the generation of normal and abnormal data ratios. In situations where obtaining additional data through conventional means would be impractical or impossible, especially in the case of specific characteristics like anomaly percentages, algorithmically generated datasets provide a viable alternative. In this paper, we introduce the ADG Engine, which can create coherent datasets for multiple users engaged in different activities and across various platforms, entirely from scratch. The ADG Engine incorporates normal and abnormal ratios within each data platform through the application of core algorithms for time-based and numeric-based anomaly generation. The resulting abnormal percentage is compared against the expected values and ranges from 0.13 to 0.17 abnormal data instances in each column. Along with the normal/abnormal ratio, the results strongly suggest that the ADG Engine has successfully completed its primary task.
]]>Information doi: 10.3390/info15030145
Authors: Syed As-Sadeq Tahfim Yan Chen
Severe and fatal crashes involving large trucks result in significant social and economic losses for human society. Unfortunately, the notably low proportion of severe and fatal injury crashes involving large trucks creates an imbalance in crash data. Models trained on imbalanced crash data are likely to produce erroneous results. Therefore, there is a need to explore novel sampling approaches for imbalanced crash data, and it is crucial to determine the appropriate combination of a machine learning model, sampling approach, and ratio. This study introduces a novel cluster-based under-sampling technique, utilizing the k-prototypes clustering algorithm. After initial cluster-based under-sampling, the consolidated cluster-based under-sampled data set was further resampled using three different sampling approaches (i.e., adaptive synthetic sampling (ADASYN), NearMiss-2, and the synthetic minority oversampling technique + Tomek links (SMOTETomek)). Later, four machine learning models (logistic regression (LR), random forest (RF), gradient-boosted decision trees (GBDT), and the multi-layer perceptron (MLP) neural network) were trained and evaluated using the geometric mean (G-Mean) and area under the receiver operating characteristic curve (AUC) scores. The findings suggest that cluster-based under-sampling coupled with the investigated sampling approaches improve the performance of the machine learning models developed on crash data significantly. In addition, the GBDT model combined with ADASYN or SMOTETomek is likely to yield better predictions than any model combined with NearMiss-2. Regarding changes in sampling ratios, increasing the sampling ratio with ADASYN and SMOTETomek is likely to improve the performance of models up to a certain level, whereas with NearMiss-2, performance is likely to drop significantly beyond a specific point. These findings provide valuable insights for selecting optimal strategies for treating the class imbalance issue in crash data.
]]>Information doi: 10.3390/info15030144
Authors: Md Saiful Islam Fei Liu
In the realm of artificial intelligence, knowledge graphs have become an effective area of research. Relationships between entities are depicted through a structural framework in knowledge graphs. In this paper, we propose to build a domain-specific medicine dictionary (DSMD) based on the principles of knowledge graphs. Our dictionary is composed of structured triples, where each entity is defined as a concept, and these concepts are interconnected through relationships. This comprehensive dictionary boasts more than 348,000 triples, encompassing over 20,000 medicine brands and 1500 generic medicines. It presents an innovative method of storing and accessing medical data. Our dictionary facilitates various functionalities, including medicine brand information extraction, brand-specific queries, and queries involving two words or question answering. We anticipate that our dictionary will serve a broad spectrum of users, catering to both human users, such as a diverse range of healthcare professionals, and AI applications.
]]>Information doi: 10.3390/info15030142
Authors: Guilherme Ramos Milis Christophe Gay Marie-Cécile Alvarez-Herault Raphaël Caire
In the context of increasingly necessary energy transition, the precise modeling of profiles for low-voltage (LV) network consumers is crucial to enhance hosting capacity. Typically, load curves for these consumers are estimated through measurement campaigns conducted by Distribution System Operators (DSOs) for a representative subset of customers or through the aggregation of load curves from household appliances within a residence. With the instrumentation of smart meters becoming more common, a new approach to modeling profiles for residential customers is proposed to make the most of the measurements from these meters. The disaggregation model estimates the load profile of customers on a low-voltage network by disaggregating the load curve measured at the secondary substation level. By utilizing only the maximum power measured by Linky smart meters, along with the load curve of the secondary substation, this model can estimate the daily profile of customers. For 48 secondary substations in our dataset, the model obtained an average symmetric mean average percentage error (SMAPE) error of 4.91% in reconstructing the load curve of the secondary substation from the curves disaggregated by the model. This methodology can allow for an estimation of the daily consumption behaviors of the low-voltage customers. In this way, we can safely envision solutions that enhance the grid hosting capacity.
]]>Information doi: 10.3390/info15030143
Authors: Adil Redaoui Amina Belalia Kamel Belloulata
Deep network-based hashing has gained significant popularity in recent years, particularly in the field of image retrieval. However, most existing methods only focus on extracting semantic information from the final layer, disregarding valuable structural information that contains important semantic details, which are crucial for effective hash learning. On the one hand, structural information is important for capturing the spatial relationships between objects in an image. On the other hand, image retrieval tasks often require a more holistic representation of the image, which can be achieved by focusing on the semantic content. The trade-off between structural information and image retrieval accuracy in the context of image hashing and retrieval is a crucial consideration. Balancing these aspects is essential to ensure both accurate retrieval results and meaningful representation of the underlying image structure. To address this limitation and improve image retrieval accuracy, we propose a novel deep hashing method called Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF). Our approach involves extracting multiscale features from multiple convolutional layers and fusing them to generate more robust representations for efficient image retrieval. The experimental results demonstrate that our method surpasses the performance of state-of-the-art hashing techniques, with absolute increases of 11.1% and 8.3% in Mean Average Precision (MAP) on the CIFAR-10 and NUS-WIDE datasets, respectively.
]]>Information doi: 10.3390/info15030141
Authors: He Hu Junhua Chen Jianhao Zhu Yunze Yang Han Zheng
With the rapid development of the economy, it is imperative to improve the quality of training for operational and managerial talents in the railway industry. To address issues such as efficiency, safety, and cost in railway industry practical training, it is crucial to establish a more comprehensive and efficient high-value virtual–real integration simulation experimental training resource. Therefore, this paper takes the actual work process as the core driving force, utilizes advanced information technology and intelligent devices, and is based on the comprehensive training platform for automatic control of physical trains combined with unmanned aerial vehicle (UAV) equipment. It achieves the integration of hardware and software construction to design and develop a comprehensive simulation training sand table system that incorporates functions such as training, demonstration, testing, and experiments. This system builds an integrated platform for training simulation functions, capable of simulating a railway centralized traffic control system, enabling railway dispatching simulation, driving simulation, and inspection simulation experiences. Additionally, it designs experimental processes at three levels: cognition, practical operation, and enhancement, tailored to the needs of talent development. The rail transportation training simulation sand table effectively reduces training costs and enhances the practical ability training quality of railway operation management personnel, meeting the requirements for talent development in the railway and related industries under new circumstances.
]]>Information doi: 10.3390/info15030140
Authors: Chuanbo Wang Amirreza Mahbod Isabella Ellinger Adrian Galdran Sandeep Gopalakrishnan Jeffrey Niezgoda Zeyun Yu
Wound care professionals provide proper diagnosis and treatment with heavy reliance on images and image documentation. Segmentation of wound boundaries in images is a key component of the care and diagnosis protocol since it is important to estimate the area of the wound and provide quantitative measurement for the treatment. Unfortunately, this process is very time-consuming and requires a high level of expertise, hence the need for automatic wound measurement methods. Recently, automatic wound segmentation methods based on deep learning have shown promising performance; yet, they heavily rely on large training datasets. A few wound image datasets were published including the Diabetic Foot Ulcer Challenge dataset, the Medetec wound dataset, and WoundDB. Existing public wound image datasets suffer from small size and a lack of annotation. There is a need to build a fully annotated dataset to benchmark wound segmentation methods. To address these issues, we propose the Foot Ulcer Segmentation Challenge (FUSeg), organized in conjunction with the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). It contains 1210 pixel-wise annotated foot ulcer images collected over 2 years from 889 patients. The submitted algorithms are reviewed in this paper and the dataset can be accessed through the Foot Ulcer Segmentation Challenge website.
]]>Information doi: 10.3390/info15030139
Authors: Qishun Mei Xuhui Li
To address the limitations of existing methods of short-text entity disambiguation, specifically in terms of their insufficient feature extraction and reliance on massive training samples, we propose an entity disambiguation model called COLBERT, which fuses LDA-based topic features and BERT-based semantic features, as well as using contrastive learning, to enhance the disambiguation process. Experiments on a publicly available Chinese short-text entity disambiguation dataset show that the proposed model achieves an F1-score of 84.0%, which outperforms the benchmark method by 0.6%. Moreover, our model achieves an F1-score of 74.5% with a limited number of training samples, which is 2.8% higher than the benchmark method. These results demonstrate that our model achieves better effectiveness and robustness and can reduce the burden of data annotation as well as training costs.
]]>Information doi: 10.3390/info15030138
Authors: Diego Sánchez-Moreno Vivian F. López Batista María Dolores Muñoz Vicente Ángel Luis Sánchez Lázaro María N. Moreno-García
Information from social networks is currently being widely used in many application domains, although in the music recommendation area, its use is less common because of the limited availability of social data. However, most streaming platforms allow for establishing relationships between users that can be leveraged to address some drawbacks of recommender systems. In this work, we take advantage of the social network structure to improve recommendations for users with unusual preferences and new users, thus dealing with the gray-sheep and cold-start problems, respectively. Since collaborative filtering methods base the recommendations for a given user on the preferences of his/her most similar users, the scarcity of users with similar tastes to the gray-sheep users and the unawareness of the preferences of the new users usually lead to bad recommendations. These general problems of recommender systems are worsened in the music domain, where the popularity bias drawback is also present. In order to address these problems, we propose a user similarity metric based on the network structure as well as on user ratings. This metric significantly improves the recommendation reliability in those scenarios by capturing both homophily effects in implicit communities of users in the network and user similarity in terms of preferences.
]]>Information doi: 10.3390/info15030137
Authors: Huda Lughbi Mourad Mars Khaled Almotairi
The pervasive reach of social media like the X platform, formerly known as Twitter, offers unique opportunities for real-time analysis of cyberattack developments. By parsing and classifying tweets related to cyberattacks, we can glean valuable insights into their type, location, impact, and potential mitigation strategies. However, with millions of daily tweets, manual analysis is inefficient and time-consuming. This paper proposes an interactive and automated dashboard powered by natural language processing to effectively address this challenge. First, we created the CybAttT dataset, which contains 36,071 manually labeled English cyberattack tweets. We experimented with different classification algorithms. Following that, the best model was deployed and integrated into the streaming pipeline for real-time classification. This dynamic dashboard makes use of four different visualization formats: a geographical map, a data table, informative tiles, and a bar chart. Users can readily access crucial information about attacks, including location, timing, and perpetrators, enabling a swift response and mitigation efforts. Our experimental results demonstrated the dashboard’s promising visualization capabilities, highlighting its potential as a valuable tool for organizations and individuals seeking an intuitive and comprehensive overview of cyberattack events.
]]>Information doi: 10.3390/info15030136
Authors: Ive Botunac Jurica Bosna Maja Matetić
Investment decision-makers increasingly rely on modern digital technologies to enhance their strategies in today’s rapidly changing and complex market environment. This paper examines the impact of incorporating Long Short-term Memory (LSTM) models into traditional trading strategies. The core investigation revolves around whether strategies enhanced with LSTM technology perform better than traditional methods alone. Traditional trading strategies typically depend on analyzing current closing prices and various technical indicators to take trading action. However, by applying LSTM models, this study aims to forecast closing prices with greater accuracy, thereby improving trading performance. Our findings indicate that trading strategies that utilize LSTM models outperform traditional strategies. This improvement suggests a significant advantage in using LSTM models for market prediction and trading decision making. Acknowledging that no one-size-fits-all strategy works for every market condition or stock is crucial. As such, traders are encouraged to select and tailor their strategies based on thorough testing and analysis to best suit their needs and market conditions. This study contributes to a better understanding of how integrating LSTM models can enhance traditional trading strategies, offering a path toward more effective decision making in the unpredictable stock market.
]]>Information doi: 10.3390/info15030135
Authors: Thomas Kopalidis Vassilios Solachidis Nicholas Vretos Petros Daras
Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person’s emotional state in an image or a video. This process, called “Facial Expression Recognition (FER)”, has become one of the most popular research areas in computer vision. In recent times, deep FER systems have primarily concentrated on addressing two significant challenges: the problem of overfitting due to limited training data availability, and the presence of expression-unrelated variations, including illumination, head pose, image resolution, and identity bias. In this paper, a comprehensive survey is provided on deep FER, encompassing algorithms and datasets that offer insights into these intrinsic problems. Initially, this paper presents a detailed timeline showcasing the evolution of methods and datasets in deep facial expression recognition (FER). This timeline illustrates the progression and development of the techniques and data resources used in FER. Then, a comprehensive review of FER methods is introduced, including the basic principles of FER (components such as preprocessing, feature extraction and classification, and methods, etc.) from the pro-deep learning era (traditional methods using handcrafted features, i.e., SVM and HOG, etc.) to the deep learning era. Moreover, a brief introduction is provided related to the benchmark datasets (there are two categories: controlled environments (lab) and uncontrolled environments (in the wild)) used to evaluate different FER methods and a comparison of different FER models. Existing deep neural networks and related training strategies designed for FER, based on static images and dynamic image sequences, are discussed. The remaining challenges and corresponding opportunities in FER and the future directions for designing robust deep FER systems are also pinpointed.
]]>Information doi: 10.3390/info15030134
Authors: Youngkwang Kim Woochan Kim Jungwoo Yoon Sangkug Chung Daegeun Kim
This paper presents a practical contamination detection system for camera lenses using image analysis with deep learning. The proposed system can detect contamination in camera digital images through contamination learning utilizing deep learning, and it aims to prevent performance degradation of intelligent vision systems due to lens contamination in cameras. This system is based on the object detection algorithm YOLO (v5n, v5s, v5m, v5l, and v5x), which is trained with 4000 images captured under different lighting and background conditions. The trained models showed that the average precision improves as the algorithm size increases, especially for YOLOv5x, which showed excellent efficiency in detecting droplet contamination within 23 ms. They also achieved an average precision (mAP@0.5) of 87.46%, recall (mAP@0.5:0.95) of 51.90%, precision of 90.28%, recall of 81.47%, and F1 score of 85.64%. As a proof of concept, we demonstrated the identification and removal of contamination on camera lenses by integrating a contamination detection system and a transparent heater-based cleaning system. The proposed system is anticipated to be applied to autonomous driving systems, public safety surveillance cameras, environmental monitoring drones, etc., to increase operational safety and reliability.
]]>Information doi: 10.3390/info15030133
Authors: May Alsaidi Nadim Obeid Nailah Al-Madi Hazem Hiary Ibrahim Aljarah
Autism spectrum disorder (ASD) is a developmental disorder that encompasses difficulties in communication (both verbal and non-verbal), social skills, and repetitive behaviors. The diagnosis of autism spectrum disorder typically involves specialized procedures and techniques, which can be time-consuming and expensive. The accuracy and efficiency of the diagnosis depend on the expertise of the specialists and the diagnostic methods employed. To address the growing need for early, rapid, cost-effective, and accurate diagnosis of autism spectrum disorder, there has been a search for advanced smart methods that can automatically classify the disorder. Machine learning offers sophisticated techniques for building automated classifiers that can be utilized by users and clinicians to enhance accuracy and efficiency in diagnosis. Eye-tracking scan paths have emerged as a tool increasingly used in autism spectrum disorder clinics. This methodology examines attentional processes by quantitatively measuring eye movements. Its precision, ease of use, and cost-effectiveness make it a promising platform for developing biomarkers for use in clinical trials for autism spectrum disorder. The detection of autism spectrum disorder can be achieved by observing the atypical visual attention patterns of children with the disorder compared to typically developing children. This study proposes a deep learning model, known as T-CNN-Autism Spectrum Disorder (T-CNN-ASD), that utilizes eye-tracking scans to classify participants into ASD and typical development (TD) groups. The proposed model consists of two hidden layers with 300 and 150 neurons, respectively, and underwent 10 rounds of cross-validation with a dropout rate of 20%. In the testing phase, the model achieved an accuracy of 95.59%, surpassing the accuracy of other machine learning algorithms such as random forest (RF), decision tree (DT), K-Nearest Neighbors (KNN), and multi-layer perceptron (MLP). Furthermore, the proposed model demonstrated superior performance when compared to the findings reported in previous studies. The results demonstrate that the proposed model can accurately classify children with ASD from those with TD without human intervention.
]]>Information doi: 10.3390/info15030132
Authors: Shi-Yi Jin Dong-Hyun Seo Yeon-Jin Kim Yong-Eun Kim Samuel Woo Jin-Gyun Chung
To authenticate a controller area network (CAN) data frame, a message authentication code (MAC) must be sent along with the CAN frame, but there is no space reserved for the MAC in the CAN frame. Recently, difference-based compression (DBC) algorithms have been used to create a space inside the frame. DBC has the advantage of being very efficient, but its drawback is that, if an error occurs in one frame, the effects of that error propagate to subsequent frames. In this paper, a CAN data compression algorithm is proposed that compresses the current frame without relying on previous frames. Therefore, an error generated in one frame cannot be propagated to subsequent frames. In addition, a CAN signal grouping technique is proposed based on entropy analysis. To efficiently authenticate CAN frames, the length of the compressed data must be 4 bytes or less (4BL). Simulation shows that the 4BL-compression ratio of a Kia Sorento vehicle is 99.36% in the DBC method, but 100% in the proposed method. In an LS Mtron tractor, the 4BL-compression ratio is 98.58% in the DBC method, but 100% in the proposed method. In addition, the execution time of the proposed compression algorithm is only 27.39% of that of the DBC algorithm. The results show that the proposed algorithm has better compression characteristics for CAN security than the DBC algorithms.
]]>Information doi: 10.3390/info15030131
Authors: Xie He Arash Habibi Lashkari Nikhill Vombatkere Dilli Prasad Sharma
Over the past few decades, researchers have put their effort and paid significant attention to the authorship attribution field, as it plays an important role in software forensics analysis, plagiarism detection, security attack detection, and protection of trade secrets, patent claims, copyright infringement, or cases of software theft. It helps new researchers understand the state-of-the-art works on authorship attribution methods, identify and examine the emerging methods for authorship attribution, and discuss their key concepts, associated challenges, and potential future work that could help newcomers in this field. This paper comprehensively surveys authorship attribution methods and their key classifications, used feature types, available datasets, model evaluation criteria and metrics, and challenges and limitations. In addition, we discuss the potential future research directions of the authorship attribution field based on the insights and lessons learned from this survey work.
]]>Information doi: 10.3390/info15030130
Authors: Satoshi Warita Katsuhide Fujita
Recently, multi-agent systems have become widespread as essential technologies for various practical problems. An essential problem in multi-agent systems is collaborative automating picking and delivery operations in warehouses. The warehouse commissioning task involves finding specified items in a warehouse and moving them to a specified location using robots. This task is defined as a spatial task-allocation problem (SPATAP) based on a Markov decision process (MDP). It is considered a decentralized multi-agent system rather than a system that manages and optimizes agents in a central manner. Existing research on SPATAP involving modeling the environment as a MDP and applying Monte Carlo tree searches has shown that this approach is efficient. However, there has not been sufficient research into scenarios in which all agents are provided a common plan despite the fact that their actions are decided independently. Thus, previous studies have not considered cooperative robot behaviors with different goals, and the problem where each robot has different goals has not been studied extensively. In terms of the cooperative element, the item exchange approach has not been considered effectively in previous studies. Therefore, in this paper, we focus on the problem of each robot being assigned a different task to optimize the percentage of picking and delivering items in time in social situations. We propose an action-planning method based on the Monte Carlo tree search and an item-exchange method between agents. We also generate a simulator to evaluate the proposed methods. The results of simulations demonstrate that the achievement rate is improved in small- and medium-sized warehouses. However, the achievement rate did not improve in large warehouses because the average distance from the depot to the items increased.
]]>Information doi: 10.3390/info15030129
Authors: Xingchen Wang Peng Li
With the widespread adoption of cloud computing, the face verification process often requires the client to upload the face to an untrusted cloud server to obtain the verification results. Privacy leakage issues may arise if the client’s private information is not protected. This paper proposes a secure and anonymous face verification scheme using fully homomorphic encryption technology and SealPIR. Our scheme is a three-party solution that requires a third-party server trusted by the client. This scheme not only prevents the client’s facial data from being obtained by untrusted data servers but also prevents the data server from learning the index corresponding to the face that the client wants to verify. In a single-face verification process, the client only needs to perform one upload operation and one download operation, with a communication volume of 264 KB. We can complete a privacy-protected anonymous face verification process in 84.91 ms.
]]>Information doi: 10.3390/info15030128
Authors: Paolo Fantozzi Valentina Rotondi Matteo Rizzolli Paola Dalla Torre Maurizio Naldi
Moral features are essential components of TV series, helping the audience to engage with the story, exploring themes beyond sheer entertainment, reflecting current social issues, and leaving a long-lasting impact on the viewers. Their presence shows through the language employed in the plot description. Their detection helps regarding understanding the series writers’ underlying message. In this paper, we propose an approach to detect moral features in TV series. We rely on the Moral Foundations Theory (MFT) framework to classify moral features and use the associated MFT dictionary to identify the words expressing those features. Our approach combines that dictionary with word embedding and similarity analysis through a deep learning SBERT (Sentence-Bidirectional Encoder Representations from Transformers) architecture to quantify the comparative prominence of moral features. We validate the approach by applying it to the definition of the MFT moral feature labels as appearing in general authoritative dictionaries. We apply our technique to the summaries of a selection of TV series representative of several genres and relate the results to the actual content of each series, showing the consistency of results.
]]>Information doi: 10.3390/info15030127
Authors: Yu Ji Wenxu Yan Wenyuan Wang
With the increase in the use of high-frequency power electronic devices, the harmonics injected into the power grid show a trend of high-frequency development. The continuous rise of the supraharmonic emission level in the distribution network has become one of the power quality problems that needs to be solved urgently in the power grid. In this paper, an algorithm based on the Interpolation of the Self-convolutional Window All-phase Compressive Sampling Matching Pursuit (ISWApCoSaMP) is proposed. Firstly, the self-convolution operation is used for the maximum sidelobe decay (MSD) window, and then the compressed sampling matching pursuit model based on the All-phase is constructed, leading to the All-phase Compressive Sampling Matching Pursuit (ApCoSaMP). Finally, the four-spectrum-line interpolation is combined to utilize spectrum line information to improve the accuracy of signal parameter detection in the frequency domain. The introduced All-phase greatly improves the phase measurement accuracy because the initial phase of the supraharmonic signal is selected for phase estimation. In addition, the self-convolutional window and four-spectrum-line interpolation make full use of the information in the time and frequency domains, thus optimizing the measurement results of amplitude and frequency. The algorithm achieves high accuracy in the measurement results of simulated signals and accurately measures supraharmonics.
]]>Information doi: 10.3390/info15030126
Authors: Swati Kumari Vatsal Tulshyan Hitesh Tewari
Due to rising cyber threats, IoT devices’ security vulnerabilities are expanding. However, these devices cannot run complicated security algorithms locally due to hardware restrictions. Data must be transferred to cloud nodes for processing, giving attackers an entry point. This research investigates distributed computing on the edge, using AI-enabled IoT devices and container orchestration tools to process data in real time at the network edge. The purpose is to identify and mitigate DDoS assaults while minimizing CPU usage to improve security. It compares typical IoT devices with and without AI-enabled chips, container orchestration, and assesses their performance in running machine learning models with different cluster settings. The proposed architecture aims to empower IoT devices to process data locally, minimizing the reliance on cloud transmission and bolstering security in IoT environments. The results correlate with the update in the architecture. With the addition of AI-enabled IoT device and container orchestration, there is a difference of 60% between the new architecture and traditional architecture where only Raspberry Pi were being used.
]]>Information doi: 10.3390/info15030125
Authors: Sasha Petrenko Daniel B. Hier Mary A. Bone Tayo Obafemi-Ajayi Erik J. Timpson William E. Marsh Michael Speight Donald C. Wunsch
Biomedical datasets distill many mechanisms of human diseases, linking diseases to genes and phenotypes (signs and symptoms of disease), genetic mutations to altered protein structures, and altered proteins to changes in molecular functions and biological processes. It is desirable to gain new insights from these data, especially with regard to the uncovering of hierarchical structures relating disease variants. However, analysis to this end has proven difficult due to the complexity of the connections between multi-categorical symbolic data. This article proposes symbolic tree adaptive resonance theory (START), with additional supervised, dual-vigilance (DV-START), and distributed dual-vigilance (DDV-START) formulations, for the clustering of multi-categorical symbolic data from biomedical datasets by demonstrating its utility in clustering variants of Charcot–Marie–Tooth disease using genomic, phenotypic, and proteomic data.
]]>Information doi: 10.3390/info15030124
Authors: Vahid Safavi Arash Mohammadi Vaniar Najmeh Bazmohammadi Juan C. Vasquez Josep M. Guerrero
Predicting the remaining useful life (RUL) of lithium-ion (Li-ion) batteries is crucial to preventing system failures and enhancing operational performance. Knowing the RUL of a battery enables one to perform preventative maintenance or replace the battery before its useful life expires, which is vital in safety-critical applications. The prediction of the RUL of Li-ion batteries plays a critical role in their optimal utilization throughout their lifetime and supporting sustainable practices. This paper conducts a comparative analysis to assess the effectiveness of multiple machine learning (ML) models in predicting the capacity fade and RUL of Li-ion batteries. Three case studies are analyzed to assess the performances of the state-of-the-art ML models, considering two distinct datasets. These case studies are conducted under various operating conditions such as temperature, C-rate, state of charge (SOC), and depth of discharge (DOD) of the batteries in Cases 1 and 2, and a different set of features and charging policies for the second dataset in Case 3. Meanwhile, diverse extracted features from the initial cycles of the second dataset are considered in Case 3 to predict the RUL of Li-ion batteries in all cycles. In addition, a multi-feature multi-target (MFMT) feature mapping is introduced to investigate the performance of the developed ML models in predicting the battery capacity fade and RUL in the entire life cycle. Multiple ML models that are developed for the comparison analysis in the proposed methodology include Random Forest (RF), extreme gradient boosting (XGBoost), light gradient-boosting machine (LightGBM), multi-layer perceptron (MLP), long short-term memory (LSTM), and attention-LSTM. Furthermore, hyperparameter tuning is applied to improve the performance of the XGBoost and LightGBM models. The results demonstrate that the extreme gradient boosting with hyperparameter tuning (XGBoost-HT) model outperforms the other ML models in terms of the root-mean-squared error (RMSE) and mean absolute percentage error (MAPE) of the battery capacity fade and RUL for all cycles. The obtained RMSE and MAPE values for XGBoost-HT in terms of cycle life are 69 cycles and 6.5%, respectively, for the third case. In addition, the XGBoost-HT model handles the MFMT feature mapping within an acceptable range of RMSE and MAPE, compared to the rest of the developed ML models and similar benchmarks.
]]>Information doi: 10.3390/info15030123
Authors: Ioana-Raluca Zaman Stefan Trausan-Matu
Neuropsychiatric disorders affect the lives of individuals from cognitive, emotional, and behavioral aspects, impact the quality of their lives, and even lead to death. Outside the medical area, these diseases have also started to be the subject of investigation in the field of Artificial Intelligence: especially Natural Language Processing (NLP) and Computer Vision. The usage of NLP techniques to understand medical symptoms eases the process of identifying and learning more about language-related aspects of neuropsychiatric conditions, leading to better diagnosis and treatment options. This survey shows the evolution of the detection of linguistic markers specific to a series of neuropsychiatric disorders and symptoms. For each disease or symptom, the article presents a medical description, specific linguistic markers, the results obtained using markers, and datasets. Furthermore, this paper offers a critical analysis of the work undertaken to date and suggests potential directions for future research in the field.
]]>Information doi: 10.3390/info15030122
Authors: Jawaher Alghamdi Yuqing Lin Suhuai Luo
The detection of fake news has emerged as a crucial area of research due to its potential impact on society. In this study, we propose a robust methodology for identifying fake news by leveraging diverse aspects of language representation and incorporating auxiliary information. Our approach is based on the utilisation of Bidirectional Encoder Representations from Transformers (BERT) to capture contextualised semantic knowledge. Additionally, we employ a multichannel Convolutional Neural Network (mCNN) integrated with stacked Bidirectional Gated Recurrent Units (sBiGRU) to jointly learn multi-aspect language representations. This enables our model to effectively identify valuable clues from news content while simultaneously incorporating content- and context-based cues, such as user posting behaviour, to enhance the detection of fake news. Through extensive experimentation on four widely used real-world datasets, our proposed framework demonstrates superior performance (↑3.59% (PolitiFact), ↑6.8% (GossipCop), ↑2.96% (FA-KES), and ↑12.51% (LIAR), considering both content-based features and additional auxiliary information) compared to existing state-of-the-art approaches, establishing its effectiveness in the challenging task of fake news detection.
]]>Information doi: 10.3390/info15030121
Authors: Jiahang Chen Jan Reitz Rebecca Richstein Kai-Uwe Schröder Jürgen Roßmann
Advancing digitalization is reaching the realm of lightweight construction and structural–mechanical components. Through the synergistic combination of distributed sensors and intelligent evaluation algorithms, traditional structures evolve into smart sensing systems. In this context, Structural Health Monitoring (SHM) plays a key role in managing potential risks to human safety and environmental integrity due to structural failures by providing analysis, localization, and records of the structure’s loading and damaging conditions. The establishment of networks between sensors and data-processing units via Internet of Things (IoT) technologies is an elementary prerequisite for the integration of SHM into smart sensing systems. However, this integrating of SHM faces significant restrictions due to scalability challenges of smart sensing systems and IoT-specific issues, including communication security and interoperability. To address the issue, this paper presents a comprehensive methodological framework aimed at facilitating the scalable integration of objects ranging from components via systems to clusters into SHM systems. Furthermore, we detail a prototypical implementation of the conceptually developed framework, demonstrating a structural component and its corresponding Digital Twin. Here, real-time capable deformation and strain-based monitoring of the structure are achieved, showcasing the practical applicability of the proposed framework.
]]>Information doi: 10.3390/info15020120
Authors: Hellena Hempe Alexander Bigalke Mattias Paul Heinrich
Background: Degenerative spinal pathologies are highly prevalent among the elderly population. Timely diagnosis of osteoporotic fractures and other degenerative deformities enables proactive measures to mitigate the risk of severe back pain and disability. Methods: We explore the use of shape auto-encoders for vertebrae, advancing the state of the art through robust automatic segmentation models trained without fracture labels and recent geometric deep learning techniques. Our shape auto-encoders are pre-trained on a large set of vertebrae surface patches. This pre-training step addresses the label scarcity problem faced when learning the shape information of vertebrae for fracture detection from image intensities directly. We further propose a novel shape decoder architecture: the point-based shape decoder. Results: Employing segmentation masks that were generated using the TotalSegmentator, our proposed method achieves an AUC of 0.901 on the VerSe19 testset. This outperforms image-based and surface-based end-to-end trained models. Our results demonstrate that pre-training the models in an unsupervised manner enhances geometric methods like PointNet and DGCNN. Conclusion: Our findings emphasize the advantages of explicitly learning shape features for diagnosing osteoporotic vertebrae fractures. This approach improves the reliability of classification results and reduces the need for annotated labels.
]]>Information doi: 10.3390/info15020119
Authors: Martin Wynn Christian Weber
The development and implementation of information systems strategy in multi-national corporations (MNCs) faces particular challenges—cultural differences and variations in work values and practices across different countries, numerous technology landscapes and legacy issues, language and accounting particularities, and differing business models. This article builds upon the existing literature and in-depth interviews with eighteen industry practitioners employed in six MNCs to construct an operational model to address these challenges. The research design is based on an inductive, qualitative approach that develops an initial conceptual framework—derived from the literature—into an operational model, which is then applied and refined in a case study company. The final model consists of change components and process phases. Six change components are identified that drive and underpin IS strategy—business strategy, systems projects, technology infrastructure, process change, skills and competencies, and costs and benefits. Five core process phases are recognized—review, align, engage, execute, and control. The model is based on the interaction between these two dimensions—change components and process phases—and an action list is also developed to support the application of the model, which contributes to the theory and practice of information systems deployment in MNCs.
]]>Information doi: 10.3390/info15020118
Authors: Tao Feng Taining Chen Xiang Gong
This paper presents a formal security analysis of the ISA100.11a standard protocol using the Colored Petri Net (CPN) modeling approach. Firstly, we establish a security threat model for the ISA100.11a protocol and provide a detailed description and analysis of the identified security threats. Secondly, we use the CPN tool to model the protocol formally and conduct model checking and security analysis. Finally, we analyze and discuss the results of the model checking, which demonstrate that the ISA100.11a standard protocol may have vulnerabilities when certain security threats exist, and provide some suggestions to enhance the security of the protocol. This research provides a certain level of security assurance for the ISA100.11a standard protocol and serves as a reference for similar security research on protocols.
]]>Information doi: 10.3390/info15020117
Authors: Madhav Mukherjee Ngoc Thuy Le Yang-Wai Chow Willy Susilo
As the demand for cybersecurity experts in the industry grows, we face a widening shortage of skilled professionals. This pressing concern has spurred extensive research within academia and national bodies, who are striving to bridge this skills gap through refined educational frameworks, including the integration of innovative information applications like remote laboratories and virtual classrooms. Despite these initiatives, current higher education models for cybersecurity, while effective in some areas, fail to provide a holistic solution to the root causes of the skills gap. Our study conducts a thorough examination of established cybersecurity educational frameworks, with the goal of identifying crucial learning outcomes that can mitigate the factors contributing to this skills gap. Furthermore, by analyzing six different educational models, for each one that can uniquely leverage technology like virtual classrooms and online platforms and is suited to various learning contexts, we categorize these contexts into four distinct categories. This categorization introduces a holistic dimension of context awareness enriched by digital learning tools into the process, enhancing the alignment with desired learning outcomes, a consideration sparsely addressed in the existing literature. This thorough analysis further strengthens the framework for guiding education providers in selecting models that most effectively align with their targeted learning outcomes and implies practical uses for technologically enhanced environments. This review presents a roadmap for educators and institutions, offering insights into relevant teaching models, including the opportunities for the utilization of remote laboratories and virtual classrooms, and their contextual applications, thereby aiding curriculum designers in making strategic decisions.
]]>Information doi: 10.3390/info15020116
Authors: Kevin K. W. Ho Shaoyu Ye
The COVID-19 pandemic heightened concerns about health and safety, leading people to seek information to protect themselves from infection. Even before the pandemic, false health information was spreading on social media. We conducted a review of recent literature in health and social sciences and proposed a theoretical model to understand the factors influencing the spread of false health information. Our focus was on how false health information circulated before and during the pandemic, impacting people’s perceptions of believing information on social media. We identified four possible strategies to counteract the negative effects of false health information: prebunking, refuting, legislation, and media literacy. We argue that improving people’s social media literacy skills is among the most effective ways to address this issue. Our findings provide a basis for future research and the development of policies to minimize the impact of false health information on society.
]]>Information doi: 10.3390/info15020115
Authors: Philippe J. Giabbanelli Grace MacEwan
The Provincial Health Services Authority (PHSA) of British Columbia suggested that a paradigm shift from weight to well-being could address the unintended consequences of focusing on obesity and improve the outcomes of efforts to address the challenges facing both individuals and our healthcare system. In this paper, we jointly used artificial intelligence (AI) and participatory modeling to examine the possible consequences of this paradigm shift. Specifically, we created a conceptual map with 19 experts to understand how obesity and physical and mental well-being connect to each other and other factors. Three analyses were performed. First, we analyzed the factors that directly connect to obesity and well-being, both in terms of causes and consequences. Second, we created a reduced version of the map and examined the connections between categories of factors (e.g., food production, and physiology). Third, we explored the themes in the interviews when discussing either well-being or obesity. Our results show that obesity was viewed from a medical perspective as a problem, whereas well-being led to broad and diverse solution-oriented themes. In particular, we found that taking a well-being perspective can be more comprehensive without losing the relevance of the physiological aspects that an obesity-centric perspective focuses on.
]]>Information doi: 10.3390/info15020114
Authors: Yusuf Brima Ulf Krumnack Simone Pika Gunther Heidemann
Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. Barlow Twins (BTs) is an SSL technique inspired by theories of redundancy reduction in human perception. In downstream tasks, BTs representations accelerate learning and transfer this learning across applications. This study applies BTs to speech data and evaluates the obtained representations on several downstream tasks, showing the applicability of the approach. However, limitations exist in disentangling key explanatory factors, with redundancy reduction and invariance alone being insufficient for factorization of learned latents into modular, compact, and informative codes. Our ablation study isolated gains from invariance constraints, but the gains were context-dependent. Overall, this work substantiates the potential of Barlow Twins for sample-efficient speech encoding. However, challenges remain in achieving fully hierarchical representations. The analysis methodology and insights presented in this paper pave a path for extensions incorporating further inductive priors and perceptual principles to further enhance the BTs self-supervision framework.
]]>Information doi: 10.3390/info15020113
Authors: Bohdan Petryshyn Serhii Postupaiev Soufiane Ben Bari Armantas Ostreika
The development of autonomous driving models through reinforcement learning has gained significant traction. However, developing obstacle avoidance systems remains a challenge. Specifically, optimising path completion times while navigating obstacles is an underexplored research area. Amazon Web Services (AWS) DeepRacer emerges as a powerful infrastructure for engineering and analysing autonomous models, providing a robust foundation for addressing these complexities. This research investigates the feasibility of training end-to-end self-driving models focused on obstacle avoidance using reinforcement learning on the AWS DeepRacer autonomous race car platform. A comprehensive literature review of autonomous driving methodologies and machine learning model architectures is conducted, with a particular focus on object avoidance, followed by hands-on experimentation and the analysis of training data. Furthermore, the impact of sensor choice, reward function, action spaces, and training time on the autonomous obstacle avoidance task are compared. The results of the best configuration experiment demonstrate a significant improvement in obstacle avoidance performance compared to the baseline configuration, with a 95.8% decrease in collision rate, while taking about 79% less time to complete the trial circuit.
]]>Information doi: 10.3390/info15020112
Authors: Agostinho Agra Jose Maria Samuco
Given a social network modelled by a graph, the goal of the influence maximization problem is to find k vertices that maximize the number of active vertices through a process of diffusion. For this diffusion, the linear threshold model is considered. A new algorithm, called ClusterGreedy, is proposed to solve the influence maximization problem. The ClusterGreedy algorithm creates a partition of the original set of nodes into small subsets (the clusters), applies the SimpleGreedy algorithm to the subgraphs induced by each subset of nodes, and obtains the seed set from a combination of the seed set of each cluster by solving an integer linear program. This algorithm is further improved by exploring the submodularity property of the diffusion function. Experimental results show that the ClusterGreedy algorithm provides, on average, higher influence spread and lower running times than the SimpleGreedy algorithm on Watts–Strogatz random graphs.
]]>Information doi: 10.3390/info15020111
Authors: Rébaï Soret Noemie Prea Vsevolod Peysakhovich
Attentional orienting is a crucial process in perceiving our environment and guiding human behavior. Recent studies have suggested a forward attentional bias, where faster reactions are observed to spatial cues indicating information appearing in the forward rather than the rear direction. This study investigated how the body position affects attentional orienting, using a modified version of the Posner cueing task within a virtual reality environment. Participants, seated at a 90° angle or reclined at 45°, followed arrows directing their attention to one of four spatial positions where a spaceship will appear, visible either through transparent windows (front space) or in mirrors (rear space). Their task was to promptly identify the spaceship’s color as red or blue. The results indicate that participants reacted more swiftly when the cue correctly indicated the target’s location (valid cues) and when targets appeared in the front rather than the rear. Moreover, the “validity effect”—the advantage of valid over invalid cues—on early eye movements, varied based on both the participant’s body position and the target’s location (front or rear). These findings suggest that the body position may modulate the forward attentional bias, highlighting its relevance in attentional orienting. This study’s implications are further discussed within contexts like aviation and space exploration, emphasizing the necessity for precise and swift responses to stimuli across diverse spatial environments.
]]>Information doi: 10.3390/info15020110
Authors: Paolo Ciancarini Raffaele Giancarlo Gennaro Grimaudo
Digital transformation in the public sector provides digital services to the citizens aiming at increasing their quality of life, as well as the transparency and accountability of a public administration. Since adaptation to the citizens changing needs is central for its success, Agile methodologies seem best suited for the software development of digital services in that area. However, as well documented by an attempt to use Scrum for an important Public Administration in Italy, substantial modifications to standard Agile were needed, giving rise to a new proposal called improved Agile (in short, iAgile). Another notable example is the Scrum@IMI method developed by the City of Barcelona for the deployment of its digital services. However, given the importance of digital transformation in the public sector and the scarcity of efforts (documented in the scholarly literature) to effectively bring Agile within it, a strategically important contribution that Computer Science can offer is a general paradigm describing how to tailor Agile methodologies and, in particular, Scrum, for such a specific context. Our proposal, called Scrum@PA, addresses this strategic need. Based on it, a public administration has a technically sound avenue to follow to adopt Scrum rather than a generic set of guidelines as in the current state of the art. We show the validity of our proposal by describing how the quite successful Scrum@IMI approach can be derived from Scrum@PA. Although iAgile can also be derived from our paradigm, we have chosen Scrum@IMI as a pilot example since it is publicly available on GitHub.
]]>Information doi: 10.3390/info15020109
Authors: Saad Said Alqahtany Toqeer Ali Syed
In the domain of computer forensics, ensuring the integrity of operations like preservation, acquisition, analysis, and documentation is critical. Discrepancies in these processes can compromise evidence and lead to potential miscarriages of justice. To address this, we developed a generic methodology integrating each forensic transaction into an immutable blockchain entry, establishing transparency and authenticity from data preservation to final reporting. Our framework was designed to manage a wide range of forensic applications across different domains, including technology-focused areas such as the Internet of Things (IoT) and cloud computing, as well as sector-specific fields like healthcare. Centralizing our approach are smart contracts that seamlessly connect forensic applications to the blockchain via specialized APIs. Every action within the forensic process triggers a verifiable transaction on the blockchain, enabling a comprehensive and tamper-proof case presentation in court. Performance evaluations confirmed that our system operates with minimal overhead, ensuring that the integration bolsters the judicial process without hindering forensic investigations.
]]>Information doi: 10.3390/info15020108
Authors: Linhua Zhang Ning Xiong Wuyang Gao Peng Wu
With the exponential growth of remote sensing images in recent years, there has been a significant increase in demand for micro-target detection. Recently, effective detection methods for small targets have emerged; however, for micro-targets (even fewer pixels than small targets), most existing methods are not fully competent in feature extraction, target positioning, and rapid classification. This study proposes an enhanced detection method, especially for micro-targets, in which a combined loss function (consisting of NWD and CIOU) is used instead of a singular CIOU loss function. In addition, the lightweight Content-Aware Reassembly of Features (CARAFE) replaces the original bilinear interpolation upsampling algorithm, and a spatial pyramid structure is added into the network model’s small target layer. The proposed algorithm undergoes training and validation utilizing the benchmark dataset known as AI-TOD. Compared to speed-oriented YOLOv7-tiny, the mAP0.5 and mAP0.5:0.95 of our improved algorithm increased from 42.0% and 16.8% to 48.7% and 18.9%, representing improvements of 6.7% and 2.1%, respectively, while the detection speed was almost equal to that of YOLOv7-tiny. Furthermore, our method was also tested on a dataset of multi-scale targets, which contains small targets, medium targets, and large targets. The results demonstrated that mAP0.5:0.95 increased from “9.8%, 54.8%, and 68.2%” to “12.6%, 55.6%, and 70.1%” for detection across different scales, indicating improvements of 2.8%, 0.8%, and 1.9%, respectively. In summary, the presented method improves detection metrics for micro-targets in various scenarios while satisfying the requirements of detection speed in a real-time system.
]]>Information doi: 10.3390/info15020107
Authors: Yanan Wu Yalin Yang May Yuan
Conventional spatiotemporal methods take frequentist or density-based approaches to map event clusters over time. While these methods discern hotspots of varying continuity in space and time, their findings overlook locations of routine occurrences where the geographic context may contribute to the regularity of event occurrences. Hence, this research aims to recognize the routine occurrences of point events and relate site characteristics and situation dynamics around these locations to explain the regular occurrences. We developed an algorithm, Location Analytics of Routine Occurrences (LARO), to determine an appropriate temporal unit based on event periodicity, seek locations of routine occurrences, and geographically contextualize these locations through spatial association mining. We demonstrated LARO in a case study with over 250,000 reported traffic accidents from 2010 to 2018 in Dallas, Texas, United States. LARO identified three distinctive locations, each exhibiting varying frequencies of traffic accidents at each weekly hour. The findings indicated that locations with routine traffic accidents are surrounded by high densities of stores, restaurants, entertainment, and businesses. The timing of traffic accidents showed a strong relationship with human activities around these points of interest. Besides the LARO algorithm, this study contributes to the understanding of previously overlooked periodicity in traffic accidents, emphasizing the association between periodic human activities and the occurrence of street-level traffic accidents. The proposed LARO algorithm is applicable to occurrences of point-based events, such as crime incidents or animal sightings.
]]>Information doi: 10.3390/info15020106
Authors: Samreen Mahmood Mehmood Chadhar Selena Firmin
Purpose: The purpose of this research paper was to analyse the counterstrategies to mitigate cybersecurity challenges using organisational learning loops amidst major crises in the Higher Education and Research Sector (HERS). The authors proposed the learning loop framework revealing several counterstrategies to mitigate cybersecurity issues in HERS. The counterstrategies are explored, and their implications for research and practice are discussed. Methodology: The qualitative methodology was adopted, and semi-structured interviews with cybersecurity experts and top managers were conducted. Results: This exploratory paper proposed the learning loop framework revealing introducing new policies and procedures, changing existing systems, partnership with other companies, integrating new software, improving employee learning, enhancing security, and monitoring and evaluating security measures as significant counterstrategies to ensure the cyber-safe working environment in HERS. These counterstrategies will help to tackle cybersecurity in HERS, not only during the current major crisis but also in the future. Implications: The outcomes provide insightful implications for both theory and practice. This study proposes a learning framework that prioritises counterstrategies to mitigate cybersecurity challenges in HERS amidst a major crisis. The proposed model can help HERS be more efficient in mitigating cybersecurity issues in future crises. The counterstrategies can also be tested, adopted, and implemented by practitioners working in other sectors to mitigate cybersecurity issues during and after major crises. Future research can focus on addressing the shortcomings and limitations of the proposed learning framework adopted by HERS.
]]>Information doi: 10.3390/info15020105
Authors: Rafał Michalski Szymon Zaleski
Although there have been some studies on the success factors for IT software projects, there is still a lack of coherent research on the success factors for IT service projects. Therefore, this study aimed to identify and understand the factors and their relationships that contribute to the success of IT service projects. For this purpose, multivariate regressions and structural equation models (SEMs) were developed and analyzed. The regression models included six project management success criteria used as dependent variables (quality of the delivered product, scope realization and requirements, timeliness of delivery, delivery within budget, customer satisfaction, and provider satisfaction) and four independent variables (agile techniques and change management, organization and people, stakeholders and risk analysis, work environment), which had been identified through exploratory factor analysis. The results showed that not all success factors were relevant to all success criteria, and there were differences in their importance. An additional series of exploratory and confirmatory factor analyses along with appropriate statistical measures were employed to evaluate the quality of these four factors. The SEM approach was based on five latent constructs with a total of twenty components. The study suggests that investing in improving people’s knowledge and skills, using agile methodologies, creating a supportive work environment, and involving stakeholders in regular risk analysis are important for project management success. The results also suggest that the success factors for IT service projects depend on both traditional and agile approaches. The study extensively compared its findings with similar research and discussed common issues and differences in both the model structures and methodologies applied. The investigation utilized mathematical methods and techniques that are not commonly applied in the field of project management success modeling. The comprehensive methodology that was applied may be helpful to other researchers who are interested in this topic.
]]>Information doi: 10.3390/info15020104
Authors: Majdi Sukkar Madhu Shukla Dinesh Kumar Vassilis C. Gerogiannis Andreas Kanavos Biswaranjan Acharya
Effective collision risk reduction in autonomous vehicles relies on robust and straightforward pedestrian tracking. Challenges posed by occlusion and switching scenarios significantly impede the reliability of pedestrian tracking. In the current study, we strive to enhance the reliability and also the efficacy of pedestrian tracking in complex scenarios. Particularly, we introduce a new pedestrian tracking algorithm that leverages both the YOLOv8 (You Only Look Once) object detector technique and the StrongSORT algorithm, which is an advanced deep learning multi-object tracking (MOT) method. Our findings demonstrate that StrongSORT, an enhanced version of the DeepSORT MOT algorithm, substantially improves tracking accuracy through meticulous hyperparameter tuning. Overall, the experimental results reveal that the proposed algorithm is an effective and efficient method for pedestrian tracking, particularly in complex scenarios encountered in the MOT16 and MOT17 datasets. The combined use of Yolov8 and StrongSORT contributes to enhanced tracking results, emphasizing the synergistic relationship between detection and tracking modules.
]]>Information doi: 10.3390/info15020103
Authors: Xiaoqun Wang Xihui Chen Zhouyi Gu
Grasping the concerns of customers is paramount, serving as a foundation for both attracting and retaining a loyal customer base. While customer satisfaction has been extensively explored across diverse industries, there remains a dearth of insights into how distinct rural bed and breakfasts (RB&Bs) can effectively cater to the specific needs of their target audience. This research utilized latent semantic analysis and text regression techniques on online reviews, uncovering previously unrecognized factors contributing to RB&B customer satisfaction. Furthermore, the study demonstrates that certain factors wield distinct impacts on guest satisfaction within varying RB&B market segments. The implications of these findings extend to empowering RB&B owners with actionable insights to enhance the overall customer experience.
]]>Information doi: 10.3390/info15020102
Authors: Ehab Alkhateeb Ali Ghorbani Arash Habibi Lashkari
This research addresses a critical need in the ongoing battle against malware, particularly in the form of obfuscated malware, which presents a formidable challenge in the realm of cybersecurity. Developing effective antivirus (AV) solutions capable of combating packed malware remains a crucial endeavor. Packed malicious programs employ encryption and advanced techniques to obfuscate their payloads, rendering them elusive to AV scanners and security analysts. The introduced research presents an innovative malware packer classifier specifically designed to adeptly identify packer families and detect unknown packers in real-world scenarios. To fortify packer identification performance, we have curated a meticulously crafted dataset comprising precisely packed samples, enabling comprehensive training and validation. Our approach employs a sophisticated feature engineering methodology, encompassing multiple layers of analysis to extract salient features used as input to the classifier. The proposed packer identifier demonstrates remarkable accuracy in distinguishing between known and unknown packers, while also ensuring operational efficiency. The results reveal an impressive accuracy rate of 99.60% in identifying known packers and 91% accuracy in detecting unknown packers. This novel research not only significantly advances the field of malware detection but also equips both cybersecurity practitioners and AV engines with a robust tool to effectively counter the persistent threat of packed malware.
]]>Information doi: 10.3390/info15020101
Authors: Ying-Qing Guo Meng Li Yang Yang Zhao-Dong Xu Wen-Han Xie
As a typical intelligent device, magnetorheological (MR) dampers have been widely applied in vibration control and mitigation. However, the inherent hysteresis characteristics of magnetic materials can cause significant time delays and fluctuations, affecting the controllability and damping performance of MR dampers. Most existing mathematical models have not considered the adverse effects of magnetic hysteresis characteristics, and this study aims to consider such effects in MR damper models. Based on the magnetic circuit analysis of MR dampers, the Jiles–Atherton (J-A) model is adopted to characterize the magnetic hysteresis properties. Then, a weight adaptive particle swarm optimization algorithm (PSO) is introduced to the J-A model for efficient parameter identifications of this model, in which the differential evolution and the Cauchy variation are combined to improve the diversity of the population and the ability to jump out of the local optimal solution. The results obtained from the improved J-A model are compared with the experimental data under different working conditions, and it shows that the proposed J-A model can accurately predict the damping performance of MR dampers with magnetic hysteresis characteristics.
]]>Information doi: 10.3390/info15020100
Authors: Nikolaos Zafeiropoulos Pavlos Bitilis George E. Tsekouras Konstantinos Kotis
In the realm of Parkinson’s Disease (PD) research, the integration of wearable sensor data with personal health records (PHR) has emerged as a pivotal avenue for patient alerting and monitoring. This study delves into the complex domain of PD patient care, with a specific emphasis on harnessing the potential of wearable sensors to capture, represent and semantically analyze crucial movement data and knowledge. The primary objective is to enhance the assessment of PD patients by establishing a robust foundation for personalized health insights through the development of Personal Health Knowledge Graphs (PHKGs) and the employment of personal health Graph Neural Networks (PHGNNs) that utilize PHKGs. The objective is to formalize the representation of related integrated data, unified sensor and PHR data in higher levels of abstraction, i.e., in a PHKG, to facilitate interoperability and support rule-based high-level event recognition such as patient’s missing dose or falling. This paper, extending our previous related work, presents the Wear4PDmove ontology in detail and evaluates the ontology within the development of an experimental PHKG. Furthermore, this paper focuses on the integration and evaluation of PHKG within the implementation of a Graph Neural Network (GNN). This work emphasizes the importance of integrating PD-related data for monitoring and alerting patients with appropriate notifications. These notifications offer health experts precise and timely information for the continuous evaluation of personal health-related events, ultimately contributing to enhanced patient care and well-informed medical decision-making. Finally, the paper concludes by proposing a novel approach for integrating personal health KGs and GNNs for PD monitoring and alerting solutions.
]]>Information doi: 10.3390/info15020099
Authors: Fahim Sufi
GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities extend beyond conventional applications, contributing to critical analysis, data augmentation, and research design, thereby elevating the efficiency and quality of scholarly endeavors. Strategically narrowing its focus, this review explores alternative dimensions of GPT and LLM applications, specifically data augmentation and the generation of synthetic data for research. Employing a meticulous examination of 412 scholarly works, it distills a selection of 77 contributions addressing three critical research questions: (1) GPT on Generating Research data, (2) GPT on Data Analysis, and (3) GPT on Research Design. The systematic literature review adeptly highlights the central focus on data augmentation, encapsulating 48 pertinent scholarly contributions, and extends to the proactive role of GPT in critical analysis of research data and shaping research design. Pioneering a comprehensive classification framework for “GPT’s use on Research Data”, the study classifies existing literature into six categories and 14 sub-categories, providing profound insights into the multifaceted applications of GPT in research data. This study meticulously compares 54 pieces of literature, evaluating research domains, methodologies, and advantages and disadvantages, providing scholars with profound insights crucial for the seamless integration of GPT across diverse phases of their scholarly pursuits.
]]>Information doi: 10.3390/info15020098
Authors: Nadia Brancati Maria Frucci
To support pathologists in breast tumor diagnosis, deep learning plays a crucial role in the development of histological whole slide image (WSI) classification methods. However, automatic classification is challenging due to the high-resolution data and the scarcity of representative training data. To tackle these limitations, we propose a deep learning-based breast tumor gigapixel histological image multi-classifier integrated with a high-resolution data augmentation model to process the entire slide by exploring its local and global information and generating its different synthetic versions. The key idea is to perform the classification and augmentation in feature latent space, reducing the computational cost while preserving the class label of the input. We adopt a deep learning-based multi-classification method and evaluate the contribution given by a conditional generative adversarial network-based data augmentation model on the classifier’s performance for three tumor classes in the BRIGHT Challenge dataset. The proposed method has allowed us to achieve an average F1 equal to 69.5, considering only the WSI dataset of the Challenge. The results are comparable to those obtained by the Challenge winning method (71.6), also trained on the annotated tumor region dataset of the Challenge.
]]>Information doi: 10.3390/info15020097
Authors: Jie Zhang Qiao Wang Paul Mitchell Hamed Ahmadi
Due to an Editorial Office error [...]
]]>Information doi: 10.3390/info15020096
Authors: Miranda Bellezza Azzurra di Palma Andrea Frosini
Alzheimer’s disease (AD) is a neurodegenerative disorder that leads to the loss of cognitive functions due to the deterioration of brain tissue. Current diagnostic methods are often invasive or costly, limiting their widespread use. Developing non-invasive and cost-effective screening methods is crucial, especially for identifying patients with mild cognitive impairment (MCI) at the risk of developing Alzheimer’s disease. This study employs a Machine Learning (ML) approach, specifically K-means clustering, on a subset of pixels common to all magnetic resonance imaging (MRI) images to rapidly classify subjects with AD and those with normal Normal Cognitive (NC). In particular, we benefited from defining significant pixels, a narrow subset of points (in the range of 1.5% to 6% of the total) common to all MRI images and related to more intense degeneration of white or gray matter. We performed K-means clustering, with k = 2, on the significant pixels of AD and NC MRI images to separate subjects belonging to the two classes and detect the class centroids. Subsequently, we classified subjects with MCI using only the significant pixels. This approach enables quick classification of subjects with AD and NC, and more importantly, it predicts MCI-to-AD conversion with high accuracy and low computational cost, making it a rapid and effective diagnostic tool for real-time assessments.
]]>Information doi: 10.3390/info15020095
Authors: Huihui Zhu Hexiang Lin Shaojun Wu Wei Luo Hui Zhang Yuancheng Zhan Xiaoting Wang Aiqun Liu Leong Chuan Kwek
Integrated photonic chips leverage the recent developments in integrated circuit technology, along with the control and manipulation of light signals, to realize the integration of multiple optical components onto a single chip. By exploiting the power of light, integrated photonic chips offer numerous advantages over traditional optical and electronic systems, including miniaturization, high-speed data processing and improved energy efficiency. In this review, we survey the current status of quantum computation, optical neural networks and the realization of some algorithms on integrated optical chips.
]]>Information doi: 10.3390/info15020094
Authors: Louis Closson Christophe Cérin Didier Donsez Jean-Luc Baudouin
This paper aims to provide discernment toward establishing a general framework, dedicated to data analysis and forecasting in smart buildings. It constitutes an industrial return of experience from an industrialist specializing in IoT supported by the academic world. With the necessary improvement of energy efficiency, discernment is paramount for facility managers to optimize daily operations and prioritize renovation work in the building sector. With the scale of buildings and the complexity of Heating, Ventilation, and Air Conditioning (HVAC) systems, the use of artificial intelligence is deemed the cheapest tool, holding the highest potential, even if it requires IoT sensors and a deluge of data to establish genuine models. However, the wide variety of buildings, users, and data hinders the development of industrial solutions, as specific studies often lack relevance to analyze other buildings, possibly with different types of data monitored. The relevance of the modeling can also disappear over time, as buildings are dynamic systems evolving with their use. In this paper, we propose to study the forecasting ability of the widely used Long Short-Term Memory (LSTM) network algorithm, which is well-designed for time series modeling, across an instrumented building. In this way, we considered the consistency of the performances for several issues as we compared to the cases with no prediction, which is lacking in the literature. The insight provided let us examine the quality of AI models and the quality of data needed in forecasting tasks. Finally, we deduced that efficient models and smart choices about data allow meaningful insight into developing time series modeling frameworks for smart buildings. For reproducibility concerns, we also provide our raw data, which came from one “real” smart building, as well as significant information regarding this building. In summary, our research aims to develop a methodology for exploring, analyzing, and modeling data from the smart buildings sector. Based on our experiment on forecasting temperature sensor measurements, we found that a bigger AI model (1) does not always imply a longer time in training and (2) can have little impact on accuracy and (3) using more features is tied to data processing order. We also observed that providing more data is irrelevant without a deep understanding of the problem physics.
]]>Information doi: 10.3390/info15020093
Authors: Shifeng Chen Jialin Wang Ketai He
The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.
]]>Information doi: 10.3390/info15020092
Authors: Aniket Kumar Singh Bishal Lamichhane Suman Devkota Uttam Dhakal Chandra Dhakal
This study investigates self-assessment tendencies in Large Language Models (LLMs), examining if patterns resemble human cognitive biases like the Dunning–Kruger effect. LLMs, including GPT, BARD, Claude, and LLaMA, are evaluated using confidence scores on reasoning tasks. The models provide self-assessed confidence levels before and after responding to different questions. The results show cases where high confidence does not correlate with correctness, suggesting overconfidence. Conversely, low confidence despite accurate responses indicates potential underestimation. The confidence scores vary across problem categories and difficulties, reducing confidence for complex queries. GPT-4 displays consistent confidence, while LLaMA and Claude demonstrate more variations. Some of these patterns resemble the Dunning–Kruger effect, where incompetence leads to inflated self-evaluations. While not conclusively evident, these observations parallel this phenomenon and provide a foundation to further explore the alignment of competence and confidence in LLMs. As LLMs continue to expand their societal roles, further research into their self-assessment mechanisms is warranted to fully understand their capabilities and limitations.
]]>Information doi: 10.3390/info15020091
Authors: Marie-Therese Charlotte Evans Majid Latifi Mominul Ahsan Julfikar Haider
Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent work focuses on fine-tuned, Pre-trained Transformers. Yet, F1 scores for scientific literature achieve just 53.2, versus 69 in the general domain. The research demonstrates the failure of existing work to evidence the rationale for optimisations to finetuned classifiers. In contrast, emerging research subjectively adopts the common belief that Natural Language Processing techniques fail to derive context and shared knowledge. In fact, global context and shared knowledge account for just 10.4% and 11.2% of total relation misclassifications, respectively. In this work, the novel employment of semantic text analysis presents objective challenges for the Transformer-based classification of Joint Relation Extraction. This is the first known work to quantify that pipelined error propagation accounts for 45.3% of total relation misclassifications, the most poignant challenge in this domain. More specifically, Part-of-Speech tagging highlights the misclassification of complex noun phrases, accounting for 25.47% of relation misclassifications. Furthermore, this study identifies two limitations in the purported bidirectionality of the Bidirectional Encoder Representations from Transformers (BERT) Pre-trained Language Model. Firstly, there is a notable imbalance in the misclassification of right-to-left relations, which occurs at a rate double that of left-to-right relations. Additionally, a failure to recognise local context through determiners and prepositions contributes to 16.04% of misclassifications. Furthermore, it is highlighted that the annotation scheme of the singular dataset utilised in existing research, Scientific Entities, Relations and Coreferences (SciERC), is marred by ambiguity. Notably, two asymmetric relations within this dataset achieve recall rates of only 10% and 29%.
]]>Information doi: 10.3390/info15020090
Authors: Sidong Liu Cristián Castillo-Olea Shlomo Berkovsky
The past decade has witnessed an explosive growth in the development and use of artificial intelligence (AI) across diverse fields [...]
]]>Information doi: 10.3390/info15020089
Authors: Timotej Jagrič Aljaž Herman
This paper presents a broad study on the application of the BERT (Bidirectional Encoder Representations from Transformers) model for multiclass text classification, specifically focusing on categorizing business descriptions into 1 of 13 distinct industry categories. The study involved a detailed fine-tuning phase resulting in a consistent decrease in training loss, indicative of the model’s learning efficacy. Subsequent validation on a separate dataset revealed the model’s robust performance, with classification accuracies ranging from 83.5% to 92.6% across different industry classes. Our model showed a high overall accuracy of 88.23%, coupled with a robust F1 score of 0.88. These results highlight the model’s ability to capture and utilize the nuanced features of text data pertinent to various industries. The model has the capability to harness real-time web data, thereby enabling the utilization of the latest and most up-to-date information affecting to the company’s product portfolio. Based on the model’s performance and its characteristics, we believe that the process of relative valuation can be drastically improved.
]]>Information doi: 10.3390/info15020088
Authors: Sokratis Tselegkaridis Theodosios Sapounidis
Utilizing Arduino development boards for learning microcontroller circuits is a prevalent practice across various educational levels. Nevertheless, the literature offers limited insights into the impact of these boards on student performance and attitudes. Therefore, this paper aims to investigate the performance of 58 university students in learning microcontroller circuits with modular boards designed for Arduino through a series of 4 exercises. Specifically, students’ performance is assessed through pre-tests and post-tests, in three learning units: (a) microcontroller, (b) coding, and (c) circuit. Additionally, the study captures students’ attitudes and measures their perceived usability of modular boards. For this purpose, the students completed a specially designed attitude questionnaire and the system usability scale (SUS) questionnaire. Statistical analysis is conducted using t-tests, ANOVA, and ANCOVA, along with bootstrapping. The findings reveal statistically significant differences between pre-tests and post-tests in all cases. Among the three learning units, the use of modular boards appears to have the most significant impact on coding. Based on students’ responses, the SUS results indicate that modular boards appear to be a quite usable approach for teaching microcontrollers. Finally, students generally express positive attitudes toward modular boards.
]]>Information doi: 10.3390/info15020087
Authors: D. Criado-Ramón L. G. B. Ruiz J. R. S. Iruela M. C. Pegalajar
This paper introduces the first completely unsupervised methodology for non-intrusive load monitoring that does not rely on any additional data, making it suitable for real-life applications. The methodology includes an algorithm to efficiently decompose the aggregated energy load from households in events and algorithms based on expert knowledge to assign each of these events to four types of appliances: fridge, dishwasher, microwave, and washer/dryer. The methodology was developed to work with smart meters that have a granularity of 1 min and was evaluated using the Reference Energy Disaggregation Dataset. The results show that the algorithm can disaggregate the refrigerator with high accuracy and the usefulness of the proposed methodology to extract relevant features from other appliances, such as the power use and duration from the heating cycles of a dishwasher.
]]>Information doi: 10.3390/info15020086
Authors: Jairo Fuentes Jose Aguilar Edwin Montoya Ángel Pinto
In this paper, we propose autonomous cycles of data analysis tasks for the automation of the production chains aimed to improve the productivity of Micro, Small and Medium Enterprises (MSMEs) in the context of agroindustry. In the autonomous cycles of data analysis tasks, each task interacts with the others and has different functions, in order to reach the goal of the cycle. In this article, we identify three industrial-automation processes within the production chain, in which autonomous cycles can be applied. The first cycle is responsible to identify the type of input to be transformed—such as quantity, quality, time, and cost—based on information from the organization and its context. The second cycle selects the technological level used in the raw-material transformation, characterizing the platform of plant processing. The last cycle identifies the level of specialization of the generated product, such as the quality and value of the product. Finally, we apply the first autonomous cycle to define the type of input to be transformed in a coffee factory.
]]>Information doi: 10.3390/info15020085
Authors: Kalyan Chatterjee M. Raju N. Selvamuthukumaran M. Pramod B. Krishna Kumar Anjan Bandyopadhyay Saurav Mallik
According to global data on visual impairment from the World Health Organization in 2010, an estimated 285 million individuals, including 39 million who are blind, face visual impairments. These individuals use non-contact methods such as voice commands and hand gestures to interact with user interfaces. Recognizing the significance of hand gesture recognition for this vulnerable population and aiming to improve user usability, this study employs a Generative Adversarial Network (GAN) coupled with Convolutional Neural Network (CNN) techniques to generate a diverse set of hand gestures. Recognizing hand gestures using HaCk typically involves a two-step approach. First, the GAN is trained to generate synthetic hand gesture images, and then a separate CNN is employed to classify gestures in real-world data. The evaluation of HaCk is demonstrated through a comparative analysis using Leave-One-Out Cross-Validation (LOO CV) and Holdout Cross-Validation (Holdout CV) tests. These tests are crucial for assessing the model’s generalization, robustness, and suitability for practical applications. The experimental results reveal that the performance of HaCk surpasses that of other compared ML/DL models, including CNN, FTCNN, CDCGAN, GestureGAN, GGAN, MHG-CAN, and ASL models. Specifically, the improvement percentages for the LOO CV Test are 17.03%, 20.27%, 15.76%, 13.76%, 10.16%, 5.90%, and 15.90%, respectively. Similarly, for the Holdout CV Test, HaCk outperforms HU, ZM, GB, GB-ZM, GB-HU, CDCGAN, GestureGAN, GGAN, MHG-CAN, and ASL models, with improvement percentages of 56.87%, 15.91%, 13.97%, 24.81%, 23.52%, 17.72%, 15.72%, 12.12%, 7.94%, and 17.94%, respectively.
]]>Information doi: 10.3390/info15020084
Authors: Efstathios Konstantinos Anastasiadis Ioannis Antoniou
We extend network analysis to directed criminal networks in the context of asymmetric links. We computed selected centralities, centralizations and the assortativity of a drug trafficking network with 110 nodes and 295 edges. We also monitored the centralizations of eleven temporal networks corresponding to successive stages of investigation during the period 1994–1996. All indices reach local extrema at the stage of highest activity, extending previous results to directed networks. The sharpest changes (90%) are observed for betweenness and in-degree centralization. A notable difference between entropies is observed: the in-degree entropy reaches a global minimum at month 12, while the out-degree entropy reaches a global maximum. This confirms that at the stage of highest activity, incoming instructions are precise and focused, while outgoing instructions are diversified. These findings are expected to be useful for alerting the authorities to increasing criminal activity. The disruption simulations on the time-averaged network extend previous results on undirected networks to directed networks.
]]>Information doi: 10.3390/info15020083
Authors: Dimitris Mpouziotas Jeries Besharat Ioannis G. Tsoulos Chrysostomos Stylios
AliAmvra is a project developed to explore and promote high-quality catches of the Amvrakikos Gulf (GP) to Artas’ wider regions. In addition, this project aimed to implement an integrated plan of action to form a business identity with high added value and achieve integrated business services adapted to the special characteristics of the area. The action plan for this project was to actively search for new markets, create a collective identity for the products, promote their quality and added value, engage in gastronomes and tasting exhibitions, dissemination and publicity actions, as well as enhance the quality of the products and markets based on the customer needs. The primary focus of this study is to observe and analyze the data retrieved from various tasting exhibitions of the AliAmvra project, with a target goal of improving customer experience and product quality. An extensive analysis was conducted for this study by collecting data through surveys that took place in the gastronomes of the AliAmvra project. Our objective was to conduct two types of reviews, one focused in data analysis and the other on evaluating model-driven algorithms. Each review utilized a survey with an individual structure, with each one serving a different purpose. In addition, our model review focused its attention on developing a robust recommendation system with said data. The algorithms we evaluated were MLP (multi-layered perceptron), RBF (radial basis function), GenClass, NNC (neural network construction), and FC (feature construction), which were used for the implementation of the recommendation system. As our final verdict, we determined that FC (feature construction) performed best, presenting the lowest classification rate of 24.87%, whilst the algorithm that performed the worst on average was RBF (radial basis function). Our final objective was to showcase and expand the work put into the AliAmvra project through this analysis.
]]>Information doi: 10.3390/info15020082
Authors: Yu-Hung Chang Chien-Hung Liu Shingchern D. You
The dynamic flexible job-shop problem (DFJSP) is a realistic and challenging problem that many production plants face. As the product line becomes more complex, the machines may suddenly break down or resume service, so we need a dynamic scheduling framework to cope with the changing number of machines over time. This issue has been rarely addressed in the literature. In this paper, we propose an improved learning-to-dispatch (L2D) model to generate a reasonable and good schedule to minimize the makespan. We formulate a DFJSP as a disjunctive graph and use graph neural networks (GINs) to embed the disjunctive graph into states for the agent to learn. The use of GINs enables the model to handle the dynamic number of machines and to effectively generalize to large-scale instances. The learning agent is a multi-layer feedforward network trained with a reinforcement learning algorithm, called proximal policy optimization. We trained the model on small-sized problems and tested it on various-sized problems. The experimental results show that our model outperforms the existing best priority dispatching rule algorithms, such as shortest processing time, most work remaining, flow due date per most work remaining, and most operations remaining. The results verify that the model has a good generalization capability and, thus, demonstrate its effectiveness.
]]>Information doi: 10.3390/info15020081
Authors: Azin Yazdi Amir Karimi Stylianos Mystakidis
This study applies bibliometric and network analysis methods to map the literature-based landscape of gamification in online distance learning. Two thousand four hundred and nineteen publications between 2000 and 2023 from the Scopus database were analyzed. Leading journals, influential articles, and the most critical topics on gamification in online training were identified. The co-authors’ analysis demonstrates a considerable rise in the number of nations evaluating research subjects, indicating increasing international cooperation. The main contributors are the United States, the United Kingdom, China, Spain, and Canada. The co-occurrence network analysis of keywords revealed six distinct research clusters: (i) the implementation of gamification in various learning contexts, (ii) investigating the application of gamification in student education to promote the use of electronic learning, (iii) utilizing artificial intelligence tools in online learning, (iv) exploring educational technologies, (v) developing strategies for creating a playful learning environment, and (vi) understanding children’s learning processes. Finally, an analysis of the most cited articles identified three research themes: (a) gamification-based learning platforms, (b) measurement of users’ appreciation and satisfaction, and (c) 3D virtual immersive learning environments. This study contributes to the subject discipline by informing researchers about the latest research trends in online education gamification and identifying promising research directions.
]]>Information doi: 10.3390/info15020080
Authors: Agnieszka Dutkowska-Zuk Joe Bourne Chengyuan An Xuan Gao Oktay Cetinkaya Peter Novitzky Gideon Ogunniye Rachel Cooper David De Roure Julie McCann Jeremy Watson Tim Watson Eleri Jones
This systematic literature review explores the scholarly debate around public perceptions and behaviors in the context of cybersecurity in connected places. It reveals that, while many articles highlight the importance of public perceptions and behaviors during a cyberattack, there is no unified consensus on how to influence them in order to minimize the attack’s impact and expedite recovery. Public perceptions can affect the success and sustainability of connected places; however, exactly how and to what extent remains unknown. We argue that more research is needed on the mechanisms to assess the influence of public perceptions and associated behaviors on threats to security in connected places. Furthermore, there is a need to investigate the models and tools currently being deployed by connected place design and management to understand and influence public perceptions and behaviors. Lastly, we identify the requirements to investigate the complex relationship between the public and connected place managers, define all stakeholders clearly, and explore the patterns between specific connected place cybersecurity incidents and the methods used to transform public perceptions.
]]>Information doi: 10.3390/info15020079
Authors: Christian Bonnici West Simon Grima
The richness and complexity of consent present challenges to those aiming to make related contributions to computer information systems (CIS). This paper aims to support consent-related research in CIS by simplifying the understanding of existing literature and facilitating the framing of future consent management research. Firstly, it outlines existing consent management research and shows how it relates to the literature in law and ethics. Secondly, it presents some fundamental explanations and definitions that must be considered for further contributions to the consent management literature. Thirdly, it identifies five types of consent-related stances often taken in the consent management literature and explains each in some detail. Fourth, it explains one of the identified types of stances (i.e., the disciplinary stance) by expanding on the links between consent as a legal construct and its ethical counterpart. Fifth, considering another of the identified types of stances (i.e., the theoretical stances normally adopted in the consent management literature), the paper presents the key requirements for legally and ethically effective consent management based on three prominent theories. Sixth, it presents the identified types of stances in a conceptual model, contending that the model is novel, relevant, understandable, and useful.
]]>Information doi: 10.3390/info15020078
Authors: Moutaz Alazab Salah Alhyari
Industry 4.0 has revolutionized manufacturing processes and facilities through the creation of smart and sustainable production facilities. Blockchain technology (BCT) has emerged as an invaluable asset within Industrial Revolution 4.0 (IR4.0), offering increased transparency, security, and traceability across supply chains. This systematic literature review explores the role of BCT in creating smart and sustainable manufacturing facilities, while exploring its implications for supply chain management (SCM). Through a detailed examination of 82 research articles, this review highlights three areas where BCT can have a dramatic effect on smart and sustainable manufacturing: firstly, BCT can promote green production methods by supporting efficient resource use, waste reduction strategies and eco-friendly production methods; and secondly, it allows companies to implement smart and eco-friendly manufacturing practices through BCT solutions. BCT promotes intelligent manufacturing systems by facilitating real-time data sharing, predictive maintenance, and automated decision-making. Furthermore, BCT strengthens SCM by increasing visibility, traceability, and collaboration between partners of SC operations. The review also highlights the potential limitations of BCT, such as scalability challenges and the need for standardized protocols. Future research should focus on addressing these limitations and further exploring the potential of BCT in IR4.0.
]]>Information doi: 10.3390/info15020077
Authors: Maryan Rizinski Andrej Jankov Vignesh Sankaradas Eugene Pinsky Igor Mishkovski Dimitar Trajanov
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results.
]]>Information doi: 10.3390/info15020076
Authors: Yupei Shu Xu Chen Xuan Di
This paper aims to use location-based social media data to infer the impact of the Russia–Ukraine war on human mobility. We examine the impact of the Russia–Ukraine war on changes in human mobility in terms of the spatial range of check-in locations using social media location data. Specifically, we collect users’ check-in location data from Twitter and analyze the average gyration of check-ins from a region across the timeline of major events associated with the war. Change-point detection is performed on these time-series check-ins to identify the timeline of abrupt changes, which are shown to be consistent with the timing of a series of sanctions and policies. We find that war-related events may contribute secondary impacts (e.g., the surge in gas prices) to users’ travel patterns. The impact of the Russia–Ukraine war on users’ travel patterns can differ based on their own scope. Our case study demonstrates that users’ gyration in Warsaw, Paris, and Berlin experienced a decrease of over 50% during periods of gas price surges. These changes in users’ gyration patterns were particularly noticeable in neighboring countries like Poland compared to the other three countries. The findings of this study can assist policymakers, regulators, and urban planners to evaluate the impact of the war and to be adaptable to city planning after the war.
]]>Information doi: 10.3390/info15020075
Authors: Jinjia Zhou Jian Yang
Compressive Sensing (CS) has emerged as a transformative technique in image compression, offering innovative solutions to challenges in efficient signal representation and acquisition. This paper provides a comprehensive exploration of the key components within the domain of CS applied to image and video compression. We delve into the fundamental principles of CS, highlighting its ability to efficiently capture and represent sparse signals. The sampling strategies employed in image compression applications are examined, emphasizing the role of CS in optimizing the acquisition of visual data. The measurement coding techniques leveraging the sparsity of signals are discussed, showcasing their impact on reducing data redundancy and storage requirements. Reconstruction algorithms play a pivotal role in CS, and this article reviews state-of-the-art methods, ensuring a high-fidelity reconstruction of visual information. Additionally, we explore the intricate optimization between the CS encoder and decoder, shedding light on advancements that enhance the efficiency and performance of compression techniques in different scenarios. Through a comprehensive analysis of these components, this review aims to provide a holistic understanding of the applications, challenges, and potential optimizations in employing CS for image and video compression tasks.
]]>Information doi: 10.3390/info15020074
Authors: Miloš Bogdanović Jelena Kocić Leonid Stoimenov
Language is a unique ability of human beings. Although relatively simple for humans, the ability to understand human language is a highly complex task for machines. For a machine to learn a particular language, it must understand not only the words and rules used in a particular language, but also the context of sentences and the meaning that words take on in a particular context. In the experimental development we present in this paper, the goal was the development of the language model SRBerta—a language model designed to understand the formal language of Serbian legal documents. SRBerta is the first of its kind since it has been trained using Cyrillic legal texts contained within a dataset created specifically for this purpose. The main goal of SRBerta network development was to understand the formal language of Serbian legislation. The training process was carried out using minimal resources (single NVIDIA Quadro RTX 5000 GPU) and performed in two phases—base model training and fine-tuning. We will present the structure of the model, the structure of the training datasets, the training process, and the evaluation results. Further, we will explain the accuracy metric used in our case and demonstrate that SRBerta achieves a high level of accuracy for the task of masked language modeling in Serbian Cyrillic legal texts. Finally, SRBerta model and training datasets are publicly available for scientific and commercial purposes.
]]>Information doi: 10.3390/info15020073
Authors: Aaradh Nepal Francesco Perono Cacciafoco
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not only is the corpus very small, but it is challenging to link Minoan, the language behind Linear A, to any known language. Most decipherment attempts involve using the phonetic values of Linear B, a grammatological offspring of Linear A, to ‘read’ Linear A. However, this yields meaningless words. Recently, novel approaches to deciphering the script have emerged which involve a computational component. In this paper, two such approaches are combined to account for the biases involved in provisionally assigning Linear B phonetic values to Linear A and to shed more light on the possible connections of Linear A with other scripts and languages from the region. Additionally, the limitations inherent in such approaches are discussed. Firstly, a feature-based similarity measure is used to compare Linear A with the Carian Alphabet and the Cypriot Syllabary. A few Linear A symbols are matched with symbols from the Carian Alphabet and the Cypriot Syllabary. Finally, using the derived phonetic values, Linear A is compared with Ancient Egyptian, Luwian, Hittite, Proto-Celtic, and Uralic using a consonantal approach. Some possible word matches are identified from each language.
]]>Information doi: 10.3390/info15020072
Authors: Filippo Orazi Simone Gasperini Stefano Lodi Claudio Sartori
Quantum computing has rapidly gained prominence for its unprecedented computational efficiency in solving specific problems when compared to classical computing counterparts. This surge in attention is particularly pronounced in the realm of quantum machine learning (QML) following a classical trend. Here we start with a comprehensive overview of the current state-of-the-art in Quantum Support Vector Machines (QSVMs). Subsequently, we analyze the limitations inherent in both annealing and gate-based techniques. To address these identified weaknesses, we propose a novel hybrid methodology that integrates aspects of both techniques, thereby mitigating several individual drawbacks while keeping the advantages. We provide a detailed presentation of the two components of our hybrid models, accompanied by the presentation of experimental results that corroborate the efficacy of the proposed architecture. These results pave the way for a more integrated paradigm in quantum machine learning and quantum computing at large, transcending traditional compartmentalization.
]]>Information doi: 10.3390/info15020071
Authors: Ivan Volaric Victor Sucic
One of the frequently used classes of sparse reconstruction algorithms is based on the iterative shrinkage/thresholding procedure, in which the thresholding parameter controls a trade-off between the algorithm’s accuracy and execution time. In order to avoid this trade-off, we propose using a fast intersection of confidence intervals method in order to adaptively control the threshold value throughout the iterations of the reconstruction algorithm. We have upgraded the two-step iterative shrinkage thresholding algorithm with a such procedure, improving its performance. The proposed algorithm, denoted as the FICI-TwIST, along with a few selected state-of-the-art sparse reconstruction algorithms, has been tested on the classical problem of image recovery by emphasizing the image sparsity in the discrete cosine and the discrete wavelet domain. Furthermore, we have derived a single wavelet transformation matrix which avoids wrapping effects, thereby achieving significantly faster execution times as compared to a more traditional function-based transformation. The obtained results indicate the competitive performance of the proposed algorithm, even in cases where all algorithm parameters have been individually fine-tuned for best performance.
]]>Information doi: 10.3390/info15020070
Authors: Liu Yang Gang Wang Hongjun Wang
Aligned with global Sustainable Development Goals (SDGs) and multidisciplinary approaches integrating AI with sustainability, this research introduces an innovative AI framework for analyzing Modern French Poetry. It applies feature extraction techniques (TF-IDF and Doc2Vec) and machine learning algorithms (especially SVM) to create a model that objectively classifies poems by their stylistic and thematic attributes, transcending traditional subjective analyses. This work demonstrates AI’s potential in literary analysis and cultural exchange, highlighting the model’s capacity to facilitate cross-cultural understanding and enhance poetry education. The efficiency of the AI model, compared to traditional methods, shows promise in optimizing resources and reducing the environmental impact of education. Future research will refine the model’s technical aspects, ensuring effectiveness, equity, and personalization in education. Expanding the model’s scope to various poetic styles and genres will enhance its accuracy and generalizability. Additionally, efforts will focus on an equitable AI tool implementation for quality education access. This research offers insights into AI’s role in advancing poetry education and contributing to sustainability goals. By overcoming the outlined limitations and integrating the model into educational platforms, it sets a path for impactful developments in computational poetry and educational technology.
]]>Information doi: 10.3390/info15020069
Authors: Valerii Kozlovskyi Ivan Shvets Yurii Lysetskyi Mikolaj Karpinski Aigul Shaikhanova Gulmira Shangytbayeva
The classification of the natural and anthropogenic destabilizing factors of a telecommunications network as a complex system is presented herein. This research shows that to evaluate the parameters of a telecommunications network in the presence of destabilizing factors, it is necessary to modify classical linear methods to reduce their sensitivity to the incompleteness of a priori information. Using generalized linear models of multiple regression, a combined method was developed for assessing and predicting the survivability of a telecommunications network under conditions of uncertainty regarding the influence of destabilizing factors. The method consists of accumulating current information about the parameters and state of the network, the statistical analysis and processing of information, and the extraction of sufficient sample statistics. The basis of the developed method was balancing multiple correlation–regression analysis with the number of regression equations and the observed results. Various methods of estimating the mathematical expectation and correlation matrix of the observed results under the conditions of random loss of part of the observed data (for example, removing incomplete sample elements, substituting the average, pairwise crossing out, and substituting the regression) were analyzed. It was established that a shift in the obtained estimates takes place under the conditions of a priori uncertainty of the statistics of the observed data. Given these circumstances, recommendations are given for the correct removal of sample elements and variables with missing values. It is shown that with significant unsteadiness of the parameters and state of the network under study and a noticeable imbalance in the number of regression equations and observed results, it is advisable to use stepwise regression methods.
]]>