applsci-logo

Journal Browser

Journal Browser

Methods and Applications of Data Management and Analytics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 September 2024 | Viewed by 50953

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
Interests: spatial-temporal data management; graph data analysis; big data analytics; stream processing and uncertain data management
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
Interests: graph data processing; distributed data processing; database systems

Special Issue Information

Dear Colleagues,

Data management and analysis have recently attracted widespread attention from academia and industry due to emerging technologies powered by, and contributing to, exponential data growth. The large volume, high velocity, and wide variety of data not only pose new challenges in efficiently managing and analyzing the data, but also bring opportunities to explore the potential value of data. Therefore, this Special Issue intends to present new methods and applications in the field of data management and analysis.

We invite the submission of original research contributions in areas including, but not limited to, big data processing, data mining, data engineering, data stream systems, data security and privacy, data quality management, database systems, database theory, semi-structured data management, graph data processing, spatial and temporal data processing, uncertain and probabilistic data management, AI for database systems, and novel applications in data science.

Prof. Dr. Wenjie Zhang
Dr. Zhengyi Yang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data management and analysis
  • big data processing
  • AI for database systems
  • data mining and models
  • data stream systems
  • data security and privacy
  • graph data processing
  • spatial and temporal data processing
  • uncertain and probabilistic data management
  • data quality management

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

30 pages, 1597 KiB  
Article
Efficiency Boosts in Human Mobility Data Privacy Risk Assessment: Advancements within the PRUDEnce Framework
by Fernanda O. Gomes, Roberto Pellungrini, Anna Monreale, Chiara Renso and Jean E. Martina
Appl. Sci. 2024, 14(17), 8014; https://doi.org/10.3390/app14178014 - 7 Sep 2024
Viewed by 354
Abstract
With the exponential growth of mobility data generated by IoT, social networks, and mobile devices, there is a pressing need to address privacy concerns. Our work proposes methods to reduce the computation of privacy risk evaluation on mobility datasets, focusing on reducing background [...] Read more.
With the exponential growth of mobility data generated by IoT, social networks, and mobile devices, there is a pressing need to address privacy concerns. Our work proposes methods to reduce the computation of privacy risk evaluation on mobility datasets, focusing on reducing background knowledge configurations and matching functions, and enhancing code performance. Leveraging the unique characteristics of trajectory data, we aim to minimize the size of combination sets and directly evaluate risk for trajectories with distinct values. Additionally, we optimize efficiency by storing essential information in memory to eliminate unnecessary computations. These approaches offer a more efficient and effective means of identifying and addressing privacy risks associated with diverse mobility datasets. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

21 pages, 2375 KiB  
Article
Mutation-Based Multivariate Time-Series Anomaly Generation on Latent Space with an Attention-Based Variational Recurrent Neural Network for Robust Anomaly Detection in an Industrial Control System
by Seungho Jeon, Kijong Koo, Daesung Moon and Jung Taek Seo
Appl. Sci. 2024, 14(17), 7714; https://doi.org/10.3390/app14177714 - 1 Sep 2024
Viewed by 842
Abstract
Anomaly detection involves identifying data that deviates from normal patterns. Two primary strategies are used: one-class classification and binary classification. In Industrial Control Systems (ICS), where anomalies can cause significant damage, timely and accurate detection is essential, often requiring analysis of time-series data. [...] Read more.
Anomaly detection involves identifying data that deviates from normal patterns. Two primary strategies are used: one-class classification and binary classification. In Industrial Control Systems (ICS), where anomalies can cause significant damage, timely and accurate detection is essential, often requiring analysis of time-series data. One-class classification is commonly used but tends to have a high false alarm rate. To address this, binary classification is explored, which can better differentiate between normal and anomalous data, though it struggles with class imbalance in ICS datasets. This paper proposes a mutation-based technique for generating ICS time-series anomalies. The method maps ICS time-series data into a latent space using a variational recurrent autoencoder, applies mutation operations, and reconstructs the time-series, introducing plausible anomalies that reflect multivariate correlations. Evaluations of ICS datasets show that these synthetic anomalies are visually and statistically credible. Training a binary classifier on data augmented with these anomalies effectively mitigates the class imbalance problem. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

19 pages, 1199 KiB  
Article
Product Demand Prediction with Spatial Graph Neural Networks
by Jiale Li, Li Fan, Xuran Wang, Tiejiang Sun and Mengjie Zhou
Appl. Sci. 2024, 14(16), 6989; https://doi.org/10.3390/app14166989 - 9 Aug 2024
Viewed by 874
Abstract
In the rapidly evolving online marketplace, accurately predicting the demand for pre-owned items presents a significant challenge for sellers, impacting pricing strategies, product presentation, and marketing investments. Traditional demand prediction methods, while foundational, often fall short in addressing the dynamic and heterogeneous nature [...] Read more.
In the rapidly evolving online marketplace, accurately predicting the demand for pre-owned items presents a significant challenge for sellers, impacting pricing strategies, product presentation, and marketing investments. Traditional demand prediction methods, while foundational, often fall short in addressing the dynamic and heterogeneous nature of e-commerce data, which encompasses textual descriptions, visual elements, geographic contexts, and temporal dynamics. This paper introduces a novel approach utilizing the Graph Neural Network (GNN) to enhance demand prediction accuracy by leveraging the spatial relationships inherent in online sales data, named SGNN. Drawing from the rich dataset provided in the fourth Kaggle competition, we construct a spatially aware graph representation of the marketplace, integrating advanced attention mechanisms to refine predictive accuracy. Our methodology defines the product demand prediction problem as a regression task on an attributed graph, capturing both local and global spatial dependencies that are fundamental to accurate predicting. Through attention-aware message propagation and node-level demand prediction, our model effectively addresses the multifaceted challenges of e-commerce demand prediction, demonstrating superior performance over traditional statistical methods, machine learning techniques, and even deep learning models. The experimental findings validate the effectiveness of our GNN-based approach, offering actionable insights for sellers navigating the complexities of the online marketplace. This research not only contributes to the academic discourse on e-commerce demand prediction but also provides a scalable and adaptable framework for future applications, paving the way for more informed and effective online sales strategies. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

21 pages, 3153 KiB  
Article
Data Analytics for Optimizing and Predicting Employee Performance
by Laura Gabriela Tanasescu, Andreea Vines, Ana Ramona Bologa and Oana Vîrgolici
Appl. Sci. 2024, 14(8), 3254; https://doi.org/10.3390/app14083254 - 12 Apr 2024
Cited by 1 | Viewed by 2200
Abstract
The need to increase employee performance and productivity has become vital in most companies nowadays, considering the number of changes that processes and people have faced during recent years in many organizations. This becomes even more important as it can sustain the growth [...] Read more.
The need to increase employee performance and productivity has become vital in most companies nowadays, considering the number of changes that processes and people have faced during recent years in many organizations. This becomes even more important as it can sustain the growth of the company, as well as the competitiveness. This work will present multiple methods and comparisons between them for the process of building a machine learning algorithm to predict performance scores for employees in one organization; these methods include pre-processing the data, selecting the best variables, building the best algorithms for the available data, and tuning their hyperparameters. The current research aims to conclude on a collection of practices that will determine the best predictions for the given variables, so that human opinion can become less influential in employee appraisal, increasing objectivity and overall productivity. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

18 pages, 14737 KiB  
Article
EEG Emotion Classification Based on Graph Convolutional Network
by Zhiqiang Fan, Fangyue Chen, Xiaokai Xia and Yu Liu
Appl. Sci. 2024, 14(2), 726; https://doi.org/10.3390/app14020726 - 15 Jan 2024
Cited by 1 | Viewed by 1730
Abstract
EEG-based emotion recognition is a task that uses scalp-EEG data to classify the emotion states of humans. The study of EEG-based emotion recognition can contribute to a large spectrum of application fields including healthcare and human–computer interaction. Recent studies in neuroscience reveal that [...] Read more.
EEG-based emotion recognition is a task that uses scalp-EEG data to classify the emotion states of humans. The study of EEG-based emotion recognition can contribute to a large spectrum of application fields including healthcare and human–computer interaction. Recent studies in neuroscience reveal that the brain regions and their interactions play an essential role in the processing of different stimuli and the generation of corresponding emotional states. Nevertheless, such regional interactions, which have been proven to be critical in recognizing emotions in neuroscience, are largely overlooked in existing machine learning or deep learning models, which focus on individual channels in brain signals. Motivated by this, in this paper, we present RGNet, a model that is designed to learn the regional level representation of EEG signal for accurate emotion recognition. Specifically, after applying preprocessing and feature extraction techniques on raw signals, RGNet adopts a novel region-wise encoder to extract the features of channels located within each region as input to compute the regional level features, enabling the model to effectively explore the regional functionality. A graph is then constructed by considering each region as a node and connections between regions as edges, upon which a graph convolutional network is designed with spectral filtering and learned adjacency matrix. Instead of focusing on only the spatial proximity, it allows the model to capture more complex functional relationships. We conducted experiments from the perspective of region division strategies, region encoders and input feature types. Our model has achieved 98.64% and 99.33% for Deap and Dreamer datasets, respectively. The comparison studies show that RGNet outperforms the majority of the existing models for emotion recognition from EEG signals. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

22 pages, 3643 KiB  
Article
A Study of Reciprocal Job Recommendation for College Graduates Integrating Semantic Keyword Matching and Social Networking
by Jinping Yao, Yunhong Xu and Jiaojiao Gao
Appl. Sci. 2023, 13(22), 12305; https://doi.org/10.3390/app132212305 - 14 Nov 2023
Viewed by 1346
Abstract
With the surge in college graduate numbers, a disparity has emerged where the supply of jobs falls short of demand, intensifying employment pressures annually. College graduates, due to their lack of historical employment data compared with job seekers in the broader society, encounter [...] Read more.
With the surge in college graduate numbers, a disparity has emerged where the supply of jobs falls short of demand, intensifying employment pressures annually. College graduates, due to their lack of historical employment data compared with job seekers in the broader society, encounter a ‘cold start’ issue in the job recommendation process. Additionally, the nature of job recommendations, which differs fundamentally from unilateral recommendations, requires consideration of reciprocity between both parties involved. This article introduces a new approach to job recommendations using college graduates as the object of study. In the screening stage, a semantic keyword iterative algorithm is applied to compute the similarity between the resume and recruitment texts. This algorithm enhances the intersectionality of keywords in the calculation process, maximizing the utilization of resume information to enhance the accuracy of text similarity calculations. The ranking phase utilizes in-school data to build a social network between college graduates and graduated students and solves the system’s cold-start problem using the social network to recommend jobs for college graduates where graduated students are employed. We introduce a dual-dimensional matching approach that incorporates both specialty and salary, building upon the amalgamated semantic keyword iterative algorithm and the social network job recommendation method, to enhance the reciprocity of job recommendations. The job recommendation method introduced herein outperforms other methods in terms of the average satisfaction rate (AR) and normalized discounted cumulative gain (NDCG), thereby confirming its superior ability to meet the job-seeking preferences of graduates and the recruitment criteria of employers. This job recommendation method offers effective assistance to graduates lacking employment experience and historical employment data, facilitating their search for more suitable job opportunities. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

30 pages, 15684 KiB  
Article
Machine Learning-Based Label Quality Assurance for Object Detection Projects in Requirements Engineering
by Neven Pičuljan and Željka Car
Appl. Sci. 2023, 13(10), 6234; https://doi.org/10.3390/app13106234 - 19 May 2023
Cited by 1 | Viewed by 2811
Abstract
In recent years, the field of artificial intelligence has experienced significant growth, which has been primarily attributed to advancements in hardware and the efficient training of deep neural networks on graphics processing units. The development of high-quality artificial intelligence solutions necessitates a strong [...] Read more.
In recent years, the field of artificial intelligence has experienced significant growth, which has been primarily attributed to advancements in hardware and the efficient training of deep neural networks on graphics processing units. The development of high-quality artificial intelligence solutions necessitates a strong emphasis on data-centric approaches that involve the collection, labeling and quality-assurance of data and labels. These processes, however, are labor-intensive and often demand extensive human effort. Simultaneously, there exists an abundance of untapped data that could potentially be utilized to train models capable of addressing complex problems. These raw data, nevertheless, require refinement to become suitable for machine learning training. This study concentrates on the computer vision subdomain within artificial intelligence and explores data requirements within the context of requirements engineering. Among the various data requirement activities, label quality assurance is crucial. To address this problem, we propose a machine learning-based method for automatic label quality assurance, especially in the context of object detection use cases. Our approach aims to support both annotators and computer vision project stakeholders while reducing the time and resources needed to conduct label quality assurance activities. In our experiments, we trained a neural network on a small set of labeled data and achieved an accuracy of 82% in differentiating good and bad labels on a large set of labeled data. This demonstrates the potential of our approach in automating label quality assurance. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

24 pages, 1814 KiB  
Article
Implicit Bias of Deep Learning in the Large Learning Rate Phase: A Data Separability Perspective
by Chunrui Liu, Wei Huang and Richard Yi Da Xu
Appl. Sci. 2023, 13(6), 3961; https://doi.org/10.3390/app13063961 - 20 Mar 2023
Cited by 2 | Viewed by 1930
Abstract
Previous literature on deep learning theory has focused on implicit bias with small learning rates. In this work, we explore the impact of data separability on the implicit bias of deep learning algorithms under the large learning rate. Using deep linear networks for [...] Read more.
Previous literature on deep learning theory has focused on implicit bias with small learning rates. In this work, we explore the impact of data separability on the implicit bias of deep learning algorithms under the large learning rate. Using deep linear networks for binary classification with the logistic loss under the large learning rate regime, we characterize the implicit bias effect with data separability on training dynamics. From a data analytics perspective, we claim that depending on the separation conditions of data, the gradient descent iterates will converge to a flatter minimum in the large learning rate phase, which results in improved generalization. Our theory is rigorously proven under the assumption of degenerate data by overcoming the difficulty of the non-constant Hessian of logistic loss and confirmed by experiments on both experimental and non-degenerated datasets. Our results highlight the importance of data separability in training dynamics and the benefits of learning rate annealing schemes using an initial large learning rate. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

Review

Jump to: Research

36 pages, 8665 KiB  
Review
Machine Learning Methods in Weather and Climate Applications: A Survey
by Liuyi Chen, Bocheng Han, Xuesong Wang, Jiazhen Zhao, Wenke Yang and Zhengyi Yang
Appl. Sci. 2023, 13(21), 12019; https://doi.org/10.3390/app132112019 - 3 Nov 2023
Cited by 17 | Viewed by 17052
Abstract
With the rapid development of artificial intelligence, machine learning is gradually becoming popular for predictions in all walks of life. In meteorology, it is gradually competing with traditional climate predictions dominated by physical models. This survey aims to consolidate the current understanding of [...] Read more.
With the rapid development of artificial intelligence, machine learning is gradually becoming popular for predictions in all walks of life. In meteorology, it is gradually competing with traditional climate predictions dominated by physical models. This survey aims to consolidate the current understanding of Machine Learning (ML) applications in weather and climate prediction—a field of growing importance across multiple sectors, including agriculture and disaster management. Building upon an exhaustive review of more than 20 methods highlighted in existing literature, this survey pinpointed eight techniques that show particular promise for improving the accuracy of both short-term weather and medium-to-long-term climate forecasts. According to the survey, while ML demonstrates significant capabilities in short-term weather prediction, its application in medium-to-long-term climate forecasting remains limited, constrained by factors such as intricate climate variables and data limitations. Current literature tends to focus narrowly on either short-term weather or medium-to-long-term climate forecasting, often neglecting the relationship between the two, as well as general neglect of modeling structure and recent advances. By providing an integrated analysis of models spanning different time scales, this survey aims to bridge these gaps, thereby serving as a meaningful guide for future interdisciplinary research in this rapidly evolving field. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

33 pages, 1628 KiB  
Review
AI Fairness in Data Management and Analytics: A Review on Challenges, Methodologies and Applications
by Pu Chen, Linna Wu and Lei Wang
Appl. Sci. 2023, 13(18), 10258; https://doi.org/10.3390/app131810258 - 13 Sep 2023
Cited by 9 | Viewed by 19963
Abstract
This article provides a comprehensive overview of the fairness issues in artificial intelligence (AI) systems, delving into its background, definition, and development process. The article explores the fairness problem in AI through practical applications and current advances and focuses on bias analysis and [...] Read more.
This article provides a comprehensive overview of the fairness issues in artificial intelligence (AI) systems, delving into its background, definition, and development process. The article explores the fairness problem in AI through practical applications and current advances and focuses on bias analysis and fairness training as key research directions. The paper explains in detail the concept, implementation, characteristics, and use cases of each method. The paper explores strategies to reduce bias and improve fairness in AI systems, reviews challenges and solutions to real-world AI fairness applications, and proposes future research directions. In addition, this study provides an in-depth comparative analysis of the various approaches, utilizing cutting-edge research information to elucidate their different characteristics, strengths, and weaknesses. The results of the comparison provide guidance for future research. The paper concludes with an overview of existing challenges in practical applications and suggests priorities and solutions for future research. The conclusions provide insights for promoting fairness in AI systems. The information reviewed in this paper is drawn from reputable sources, including leading academic journals, prominent conference proceedings, and well-established online repositories dedicated to AI fairness. However, it is important to recognize that research nuances, sample sizes, and contextual factors may create limitations that affect the generalizability of the findings. Full article
(This article belongs to the Special Issue Methods and Applications of Data Management and Analytics)
Show Figures

Figure 1

Back to TopTop