Data Mining and Machine Learning with Applications, 2nd Edition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: 31 August 2026 | Viewed by 16700

Special Issue Editors


E-Mail Website
Guest Editor
Jiangsu Engineering Center of Network Monitoring, School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China
Interests: data mining; big data analytics; knowledge discovery; cloud computing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer, Texas Tech University, Lubbock, TX 79409, USA
Interests: data science; machine learning; computational intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the emergence of big data and the advances made in computing services, artificial intelligence (AI) has attracted increasing attention around the world. The role of AI is becoming more and more important in our daily lives in applications such as machine learning, pattern recognition, computer vision, data mining, human–machine interfaces, information retrieval, and natural language processing. To this end, an increasing number of researchers and engineers are already or will be involved in the AI field.

This topic aims to bring together leading scientists in deep learning and related areas within artificial intelligence, data mining, and machine learning with applications. Papers using advanced mathematical methods and statistical approaches in these areas are particularly welcome for publication in this Special Issue.

Prof. Dr. Wei Fang
Dr. Victor S. Sheng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • data mining
  • time series data analytics
  • statistical machine learning
  • statistical classification
  • statistical inference
  • Bayesian methods
  • algorithms and architectures for big data searches, mining, and processing
  • deep learning
  • computer vision and image processing
  • evolutionary computation
  • knowledge discovery
  • industrial and medical applications
  • security applications
  • applications of unsupervised learning
  • industrial and medical applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 2107 KB  
Article
A Deep-Learning-Based Method for High-Precision Real-Time Detection of Steel Surface Defects
by Guanying Song, Xiwen Wang, Gaoxia Fan, Hanquan Zhang, Guo Li, Zhenni Li, Jingyi Liu and Dong Xiao
Mathematics 2026, 14(4), 621; https://doi.org/10.3390/math14040621 - 10 Feb 2026
Cited by 1 | Viewed by 672
Abstract
Steel defects, stemming from issues like raw material imperfections and processing inconsistencies, present substantial challenges for the material’s effective use and subsequent manufacturing. Consequently, the real-time, accurate, and rapid detection of these defects is paramount in production, playing a vital role in cost [...] Read more.
Steel defects, stemming from issues like raw material imperfections and processing inconsistencies, present substantial challenges for the material’s effective use and subsequent manufacturing. Consequently, the real-time, accurate, and rapid detection of these defects is paramount in production, playing a vital role in cost reduction, efficiency enhancement, and resource conservation. To address these needs, this paper proposes a deep deep-learning-based image recognition method for defect detection using YOLOv7 (You Only Look Once), designated YOLOv7-SGS. This approach introduces a novel architecture, the YOLOv7-SGS network, which builds upon the standard YOLOv7. The enhancements include integrating a Shape-IoU model into the core backbone, innovatively incorporating an SGE attention mechanism, and refining the convolution algorithm with GSConv to boost model performance. The resulting YOLOv7-SGS model achieves an absolute 6% improvement in mAP@0.5 compared to the baseline model. Moreover, it attains a detection speed of 32 FPS, showcasing significant advantages and offering valuable insights for future research and practical applications. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

23 pages, 4868 KB  
Article
Enhancing Predictive Accuracy in Medical Data Through Oversampling and Interpolation Techniques
by Alma Rocío Sagaceta-Mejía, Pedro Pablo González-Pérez, Julián Fresán-Figueroa and Máximo Eduardo Sánchez-Gutiérrez
Mathematics 2025, 13(24), 4032; https://doi.org/10.3390/math13244032 - 18 Dec 2025
Viewed by 701
Abstract
Class imbalance is a major challenge in supervised classification, often leading to biased predictions and limited generalization. This issue is particularly pronounced in medical diagnostics, where datasets typically contain far more negative than positive cases. In this study, we compare two oversampling strategies: [...] Read more.
Class imbalance is a major challenge in supervised classification, often leading to biased predictions and limited generalization. This issue is particularly pronounced in medical diagnostics, where datasets typically contain far more negative than positive cases. In this study, we compare two oversampling strategies: the Synthetic Minority Oversampling Technique (SMOTE) and the Conditional Tabular Generative Adversarial Network (ctGAN). Using the benchmark Pima Indians Diabetes dataset, we generated balanced datasets through both methods and trained a multilayer perceptron classifier. Performance was evaluated with accuracy, precision, sensitivity, and F1 Score. The results show that both SMOTE and ctGAN improve classification on imbalanced data, with SMOTE consistently achieving superior sensitivity and F1 Score. These findings highlight the importance of selecting appropriate augmentation strategies to enhance the reliability and clinical usefulness of machine learning models in medical diagnostics. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

17 pages, 2226 KB  
Article
Multi-Aspect Sentiment Analysis of Arabic Café Reviews Using Machine and Deep Learning Approaches
by Hmood Al-Dossari and Munerah Altalasi
Mathematics 2025, 13(24), 3895; https://doi.org/10.3390/math13243895 - 5 Dec 2025
Viewed by 658
Abstract
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we [...] Read more.
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we combine machine learning (Linear SVC, Naïve Bayes, Logistic Regression, Decision Tree, Random Forest) and a Convolutional Neural Network (CNN) to perform aspect identification and sentiment classification. A rigorous preprocessing and feature-engineering with TF-IDF and n-gram was implemented and statistically validated through bootstrap confidence intervals and Friedman–Nemenyi significance tests. Experimental results demonstrate that Linear SVC with optimized TF-IDF tri-grams achieved a macro-F1 of 0.89 for aspect identification and 0.71 for sentiment classification. Meanwhile, the CNN model yielded a comparable F1 of 0.89 for aspect identification and a higher 0.76 for sentiment classification. The findings highlight that effective feature representation and model selection can substantially improve Arabic opinion mining. The proposed framework provides a reliable foundation for analyzing Arabic user feedback on location-based platforms and supports more interpretable and data-driven business insights. These insights are essential to enhance personalized recommendations and business intelligence in the hospitality sector. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

24 pages, 4004 KB  
Article
Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control
by Taha J. Alhindi
Mathematics 2025, 13(23), 3876; https://doi.org/10.3390/math13233876 - 3 Dec 2025
Cited by 1 | Viewed by 665
Abstract
This paper addresses semi-supervised anomaly detection in settings where only a small subset of normal data can be labeled. Such conditions arise, for example, in industrial quality control of windshield wiper noise, where expert labeling is costly and limited. Our objective is to [...] Read more.
This paper addresses semi-supervised anomaly detection in settings where only a small subset of normal data can be labeled. Such conditions arise, for example, in industrial quality control of windshield wiper noise, where expert labeling is costly and limited. Our objective is to learn a one-class decision boundary that leverages the geometry of unlabeled data while remaining robust to contamination and scarcity of labeled normals. We propose a graph-attention-regularized deep support vector data description (GAR-DSVDD) model that combines a deep one-class enclosure with a latent k-nearest-neighbor graph whose edges are weighted by similarity- and score-aware attention. The resulting loss integrates (i) a distance-based enclosure on labeled normals, (ii) a graph smoothness term on squared distances over the attention-weighted graph, and (iii) a center-pull regularizer on unlabeled samples to avoid over-smoothing and boundary drift. Experiments on a controlled simulated dataset and an industrial windshield wiper acoustics dataset show that GAR-DSVDD consistently improves the F1 score under scarce label conditions. On average, F1 increases from 0.78 to 0.84 on the simulated benchmark and from 0.63 to 0.86 on the industrial case study relative to the best competing baseline. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

16 pages, 2065 KB  
Article
Comparative Analysis of Machine Learning Models for Predicting Student Success in Online Programming Courses: A Study Based on LMS Data and External Factors
by Felipe Emiliano Arévalo-Cordovilla and Marta Peña
Mathematics 2024, 12(20), 3272; https://doi.org/10.3390/math12203272 - 18 Oct 2024
Cited by 19 | Viewed by 8199
Abstract
Early prediction of student performance in online programming courses is essential for implementing timely interventions to enhance academic outcomes. This study aimed to predict academic success by comparing four machine learning models: Logistic Regression, Random Forest, Support Vector Machine (SVM), and Neural Network [...] Read more.
Early prediction of student performance in online programming courses is essential for implementing timely interventions to enhance academic outcomes. This study aimed to predict academic success by comparing four machine learning models: Logistic Regression, Random Forest, Support Vector Machine (SVM), and Neural Network (Multilayer Perceptron, MLP). We analyzed data from the Moodle Learning Management System (LMS) and external factors of 591 students enrolled in online object-oriented programming courses at the Universidad Estatal de Milagro (UNEMI) between 2022 and 2023. The data were preprocessed to address class imbalance using the synthetic minority oversampling technique (SMOTE), and relevant features were selected based on Random Forest importance rankings. The models were trained and optimized using Grid Search with cross-validation. Logistic Regression achieved the highest Area Under the Receiver Operating Characteristic Curve (AUC-ROC) on the test set (0.9354), indicating strong generalization capability. SVM and Neural Network models performed adequately but were slightly outperformed by the simpler models. These findings suggest that integrating LMS data with external factors enhances early prediction of student success. Logistic Regression is a practical and interpretable tool for educational institutions to identify at-risk students, and to implement personalized interventions. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

20 pages, 24161 KB  
Article
Deep Embedding Koopman Neural Operator-Based Nonlinear Flight Training Trajectory Prediction Approach
by Jing Lu, Jingjun Jiang and Yidan Bai
Mathematics 2024, 12(14), 2162; https://doi.org/10.3390/math12142162 - 10 Jul 2024
Cited by 5 | Viewed by 4475
Abstract
Accurate flight training trajectory prediction is a key task in automatic flight maneuver evaluation and flight operations quality assurance (FOQA), which is crucial for pilot training and aviation safety management. The task is extremely challenging due to the nonlinear chaos of trajectories, the [...] Read more.
Accurate flight training trajectory prediction is a key task in automatic flight maneuver evaluation and flight operations quality assurance (FOQA), which is crucial for pilot training and aviation safety management. The task is extremely challenging due to the nonlinear chaos of trajectories, the unconstrained airspace maps, and the randomization of driving patterns. In this work, a deep learning model based on data-driven modern koopman operator theory and dynamical system identification is proposed. The model does not require the manual selection of dictionaries and can automatically generate augmentation functions to achieve nonlinear trajectory space mapping. The model combines stacked neural networks to create a scalable depth approximator for approximating the finite-dimensional Koopman operator. In addition, the model uses finite-dimensional operator evolution to achieve end-to-end adaptive prediction. In particular, the model can gain some physical interpretability through operator visualization and generative dictionary functions, which can be used for downstream pattern recognition and anomaly detection tasks. Experiments show that the model performs well, particularly on flight training trajectory datasets. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

Back to TopTop