applsci-logo

Journal Browser

Journal Browser

Data Analysis and Data Mining for Knowledge Discovery

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 July 2025 | Viewed by 9332

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronics Engineering and Telecommunications, State University of Rio de Janeiro, Rio de Janeiro 205513, Brazil
Interests: intelligent systems; machine learning; embedded systems; swarm robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Systems Engineering and Computation, State University of Rio de Janeiro, Rio de Janeiro 205513, Brazil
Interests: intelligent systems; machine learning; network-on-chips

Special Issue Information

Dear Colleagues,

As data continue to be a vital asset across industries, the ability to analyze and derive actionable insights from vast amounts of data is paramount. This Special Issue will cover a broad range of topics, including advanced data mining algorithms, big data analytics, machine learning, data visualization, text and web mining, real-world applications, and data privacy and security. This Special Issue seeks to gather high-quality, original research articles and reviews that contribute to the development of novel data analysis and data mining approaches, addressing current challenges and highlighting future directions.

The scope of the Special Issue includes, but is not limited to, the following:

  • Advanced data mining algorithms: exploration of new algorithms and enhancements to existing methods for effective data mining.
  • Big data analytics: techniques and tools for handling and analyzing large-scale data sets, including distributed computing and cloud-based solutions.
  • Machine learning and artificial intelligence: integration of machine learning and AI in data analysis to improve predictive accuracy and decision making.
  • Data visualization: innovative methods for visualizing complex data to enhance interpretability and insights.
  • Text and web mining: approaches for extracting valuable information from unstructured text and web data.
  • Applications of data mining: case studies and practical applications of data mining in various fields such as healthcare, finance, marketing, cybersecurity, and social sciences.
  • Data privacy and security: strategies for ensuring data privacy and security in data mining processes.
  • Emerging technologies: impact of emerging technologies such as blockchain, IoT, and edge computing on data analysis and mining.

Prof. Dr. Nadia Nedjah
Prof. Dr. Luiza de Macedo Mourelle
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • big data analytics
  • machine learning
  • artificial intelligence
  • data visualization
  • text mining
  • data privacy
  • data security
  • emerging technologies
  • predictive analytics
  • cloud computing
  • internet of things
  • edge computing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

22 pages, 4471 KiB  
Article
Optimized Navigation for SLAM Using Marker-Assisted Region Scanning, Path Finding, and Mapping Completion Control
by Luigi Maciel Ribeiro, Nadia Nedjah and Paulo Victor Rodrigues de Carvalho
Appl. Sci. 2025, 15(7), 3433; https://doi.org/10.3390/app15073433 - 21 Mar 2025
Viewed by 200
Abstract
This paper introduces Marker-Assisted Region Scanning for Simultaneous Localization and Mapping (MARS-SLAM), a novel approach to optimizing SLAM in unknown environments. Designed to enhance autonomous exploration in extreme conditions, MARS-SLAM ensures efficient navigation while providing a systematic method for verifying mapping completion. The [...] Read more.
This paper introduces Marker-Assisted Region Scanning for Simultaneous Localization and Mapping (MARS-SLAM), a novel approach to optimizing SLAM in unknown environments. Designed to enhance autonomous exploration in extreme conditions, MARS-SLAM ensures efficient navigation while providing a systematic method for verifying mapping completion. The approach leverages virtual markers to track unexplored regions, guiding the robot through an organized and comprehensive exploration process. Markers are placed at the LiDAR sensor’s range limit in free areas, maintaining a dynamic list of regions yet to be visited. Mapping is considered complete when no markers remain, signifying full coverage of the environment. Target marker selection is based on age (creation order) and distance (path length from the robot). The method was validated in three virtual environments of varying complexity, demonstrating superior performance compared to alternative navigation strategies, including predefined zigzag routes and routes generated by Ant Colony Optimization (ACO). Experimental results show that MARS-SLAM achieves complete and accurate mapping while significantly reducing the number of poses required. Specifically, it achieves a 64.39% reduction in poses compared to ACO and 71.07% compared to zigzag navigation, highlighting its efficiency in complex environments. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

29 pages, 4378 KiB  
Article
Analysis of Sparse Trajectory Features Based on Mobile Device Location for User Group Classification Using Gaussian Mixture Model
by Yohei Kakimoto, Yuto Omae and Hirotaka Takahashi
Appl. Sci. 2025, 15(2), 982; https://doi.org/10.3390/app15020982 - 20 Jan 2025
Viewed by 892
Abstract
Location data collected from mobile devices via global positioning system often lack semantic information and can form sparse trajectories in space and time. This study investigates whether user age groups can be accurately classified solely from such sparse spatial–temporal trajectories. We propose a [...] Read more.
Location data collected from mobile devices via global positioning system often lack semantic information and can form sparse trajectories in space and time. This study investigates whether user age groups can be accurately classified solely from such sparse spatial–temporal trajectories. We propose a feature extraction method based on a Gaussian mixture model (GMM), which assigns representative points (RPs) by clustering the location data and aggregating user trajectories into these RPs. We then construct three machine learning (ML) models—support vector classifier (SVC), random forest (RF), and deep neural network (DNN)—using the GMM-based features and compare their performance with that of the improved DNN (IDNN), which is an existing feature extraction approach. In our experiments, we introduced a missing value ratio θth to quantify trajectory sparsity and analyzed the effect of trajectory sparsity on the classification accuracy and generalizability performance of the ML models. The results indicate that GMM-based features outperform IDNN-based features in both classification accuracy and generalization performance. Notably, the RF model achieved the highest accuracy, whereas the SVC model displayed stable generalizability. As the missing value ratio θth increases, the IDNN becomes more susceptible to overfitting, whereas the GMM-based approach preserves accuracy and robustness. These findings suggest that sparse trajectories can still offer meaningful classification performance with appropriate feature design and model selection even without semantic information. This approach holds promise for domains where large-scale, sparse trajectory data are common, including urban planning, marketing analysis, and public policy. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

26 pages, 3035 KiB  
Article
Leveraging Machine Learning for Sophisticated Rental Value Predictions: A Case Study from Munich, Germany
by Wenjun Chen, Saber Farag, Usman Butt and Haider Al-Khateeb
Appl. Sci. 2024, 14(20), 9528; https://doi.org/10.3390/app14209528 - 18 Oct 2024
Cited by 1 | Viewed by 2892
Abstract
There has been very limited research conducted to predict rental prices in the German real estate market using an AI-based approach. From a general perspective, conventional approaches struggle to handle large amounts of data and fail to consider the numerous elements that affect [...] Read more.
There has been very limited research conducted to predict rental prices in the German real estate market using an AI-based approach. From a general perspective, conventional approaches struggle to handle large amounts of data and fail to consider the numerous elements that affect rental prices. The absence of sophisticated, data-driven analytical tools further complicates this situation, impeding stakeholders, such as tenants, landlords, real estate agents, and the government, from obtaining the accurate insights necessary for making well-informed decisions in this area. This paper applies novel machine learning (ML) approaches, including ensemble techniques, neural networks, linear regression (LR), and tree-based algorithms, specifically designed for forecasting rental prices in Munich. To ensure accuracy and reliability, the performance of these models is evaluated using the R2 score and root mean squared error (RMSE). The study provides two feature sets for model comparison, selected by particle swarm optimisation (PSO) and CatBoost. These two feature selection methods identify significant variables based on different mechanisms, such as seeking the optimal solution with an objective function and converting categorical features into target statistics (TSs) to address high-dimensional issues. These methods are ideal for this German dataset, which contains 49 features. Testing the performance of 10 ML algorithms on two sets helps validate the robustness and efficacy of the AI-based approach utilising the PyTorch framework. The findings illustrate that ML models combined with PyTorch-based neural networks (PNNs) demonstrate high accuracy compared to standalone ML models, regardless of feature changes. The improved performance indicates that utilising the PyTorch framework for predictive tasks is advantageous, as evidenced by a statistical significance test in terms of both R2 and RMSE (p-values < 0.001). The integration results display outstanding accuracy, averaging 90% across both feature sets. Particularly, the XGB model, which exhibited the lowest performance among all models in both sets, significantly improved from 0.8903 to 0.9097 in set 1 and from 0.8717 to 0.9022 in set 2 after being combined with the PNN. These results showcase the efficacy of using the PyTorch framework, enhancing the precision and reliability of the ML models in predicting the dynamic real estate market. Given that this study applies two feature sets and demonstrates consistent performance across sets with varying characteristics, the methodology may be applied to other locations. By offering accurate projections, it aids investors, renters, property managers, and regulators in facilitating better decision-making in the real estate sector. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

22 pages, 9854 KiB  
Article
Leveraging LLMs for Efficient Topic Reviews
by Bady Gana, Andrés Leiva-Araos, Héctor Allende-Cid and José García
Appl. Sci. 2024, 14(17), 7675; https://doi.org/10.3390/app14177675 - 30 Aug 2024
Cited by 3 | Viewed by 2746
Abstract
This paper presents the topic review (TR), a novel semi-automatic framework designed to enhance the efficiency and accuracy of literature reviews. By leveraging the capabilities of large language models (LLMs), TR addresses the inefficiencies and error-proneness of traditional review methods, especially in rapidly [...] Read more.
This paper presents the topic review (TR), a novel semi-automatic framework designed to enhance the efficiency and accuracy of literature reviews. By leveraging the capabilities of large language models (LLMs), TR addresses the inefficiencies and error-proneness of traditional review methods, especially in rapidly evolving fields. The framework significantly improves literature review processes by integrating advanced text mining and machine learning techniques. Through a case study approach, TR offers a step-by-step methodology that begins with query generation and refinement, followed by semi-automated text mining to identify relevant articles. LLMs are then employed to extract and categorize key themes and concepts, facilitating an in-depth literature analysis. This approach demonstrates the transformative potential of natural language processing in literature reviews. With an average similarity of 69.56% between generated and indexed keywords, TR effectively manages the growing volume of scientific publications, providing researchers with robust strategies for complex text synthesis and advancing knowledge in various domains. An expert analysis highlights a positive Fleiss’ Kappa score, underscoring the significance and interpretability of the results. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

Other

Jump to: Research

36 pages, 594 KiB  
Systematic Review
AI-Driven Predictive Maintenance in Mining: A Systematic Literature Review on Fault Detection, Digital Twins, and Intelligent Asset Management
by Luis Rojas, Álvaro Peña and José Garcia
Appl. Sci. 2025, 15(6), 3337; https://doi.org/10.3390/app15063337 - 19 Mar 2025
Cited by 1 | Viewed by 1275
Abstract
The mining industry faces increasing challenges in maintaining high production levels while minimizing unplanned failures and operational costs. Critical assets, such as crushers, conveyor belts, mills, and ventilation systems, operate under extreme conditions, leading to accelerated wear and failure risks. Traditional maintenance strategies [...] Read more.
The mining industry faces increasing challenges in maintaining high production levels while minimizing unplanned failures and operational costs. Critical assets, such as crushers, conveyor belts, mills, and ventilation systems, operate under extreme conditions, leading to accelerated wear and failure risks. Traditional maintenance strategies often fail to prevent unexpected downtimes, safety hazards, and economic losses. As a response, industries are integrating predictive monitoring technologies, including machine learning, the Internet of Things, and digital twins, to enhance early fault detection and optimize maintenance strategies. This Systematic Literature Review analyzes 166 high-impact studies from Scopus and Web of Science, identifying key trends in fault detection algorithms, hybrid AI models, and real-time monitoring techniques. The findings highlight the increasing adoption of deep learning, reinforcement learning, and digital twins for anomaly detection and process optimization. Additionally, AI-driven methods are improving sensor-based data acquisition and asset management, extending equipment lifecycles while reducing failures. Despite these advancements, challenges such as data standardization, model scalability, and system interoperability persist, requiring further research. Future work should focus on real-time AI applications, explainable models, and academia-industry collaboration to accelerate the implementation of intelligent maintenance solutions, ensuring greater reliability, efficiency, and sustainability in mining operations. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

13 pages, 3698 KiB  
Tutorial
Detailed Examples of Figure Preparation in the Two Most Common Graph Layouts
by Izolda Gorgol and Hubert Salwa
Appl. Sci. 2025, 15(5), 2645; https://doi.org/10.3390/app15052645 - 1 Mar 2025
Viewed by 501
Abstract
Graphs are an excellent tool with applications in various branches of engineering. Graph layouts have emerged as a cornerstone in the visual representation and analysis of complex systems. They are indispensable in reducing complexity, optimizing designs, improving communication, and enhancing problem-solving capabilities. They [...] Read more.
Graphs are an excellent tool with applications in various branches of engineering. Graph layouts have emerged as a cornerstone in the visual representation and analysis of complex systems. They are indispensable in reducing complexity, optimizing designs, improving communication, and enhancing problem-solving capabilities. They transform abstract concepts and data into visual formats that are easier to interpret, analyze, and apply in real-world engineering challenges. Therefore, many graph layouts are designed for various purposes. It is not easy to choose the most appropriate one. There are a number of surveys on this subject, but they are descriptive ones. In this paper, we focus on the two most versatile—and therefore most widely used—layouts, namely Fruchterman–Reingold and ForceAtlas2, and show their possibilities in a visual way. We compare how the drawings appear while using various settings of the available options. This helps to choose an appropriate set of settings in practice. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

Back to TopTop