Skip to Content
You are currently on the new version of our website. Access the old version .

90 Results Found

  • Article
  • Open Access
2 Citations
1,287 Views
19 Pages

23 December 2024

Synthetic oversampling methods for dealing with imbalanced classification problems have been widely studied. However, the current synthetic oversampling methods still cannot perform well when facing high-dimensional imbalanced financial data. The fai...

  • Article
  • Open Access
696 Views
35 Pages

14 October 2025

With the growing complexity of high-dimensional imbalanced datasets in critical fields such as medical diagnosis and bioinformatics, feature selection has become essential to reduce computational costs, alleviate model bias, and improve classificatio...

  • Article
  • Open Access
8 Citations
4,159 Views
19 Pages

An Evaluation of Feature Selection Robustness on Class Noisy Data

  • Simone Pau,
  • Alessandra Perniciano,
  • Barbara Pes and
  • Dario Rubattu

3 August 2023

With the increasing growth of data dimensionality, feature selection has become a crucial step in a variety of machine learning and data mining applications. In fact, it allows identifying the most important attributes of the task at hand, improving...

  • Article
  • Open Access
29 Citations
4,906 Views
26 Pages

27 June 2020

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced dat...

  • Systematic Review
  • Open Access
224 Views
20 Pages

29 January 2026

Integrating machine learning (ML) with Statistical Process Control (SPC) is important for Industry 4.0 environments. Contemporary manufacturing data exhibit high-dimensionality, autocorrelation, non-stationarity, and class imbalance, which challenge...

  • Article
  • Open Access
4 Citations
4,000 Views
20 Pages

An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data

  • Ismael Ramos-Pérez,
  • José Antonio Barbero-Aparicio,
  • Antonio Canepa-Oneto,
  • Álvar Arnaiz-González and
  • Jesús Maudes-Raedo

16 April 2024

The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of F...

  • Article
  • Open Access
1 Citations
1,960 Views
15 Pages

14 December 2024

In high-dimensional machine learning tasks, supervised feature extraction is essential for improving model performance, with Linear Discriminant Analysis (LDA) being a common approach. However, LDA tends to deliver suboptimal performance when dealing...

  • Article
  • Open Access
38 Citations
6,571 Views
16 Pages

21 July 2021

Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learn...

  • Article
  • Open Access
11 Citations
3,184 Views
13 Pages

30 June 2019

Machine learning plays an important role in ligand-based virtual screening. However, conventional machine learning approaches tend to be inefficient when dealing with such problems where the data are imbalanced and features describing the chemical ch...

  • Article
  • Open Access
16 Citations
5,186 Views
13 Pages

LICIC: Less Important Components for Imbalanced Multiclass Classification

  • Vincenzo Dentamaro,
  • Donato Impedovo and
  • Giuseppe Pirlo

9 December 2018

Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of d...

  • Article
  • Open Access
1 Citations
1,704 Views
19 Pages

A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis

  • Jingxun Cai,
  • Zne-Jung Lee,
  • Zhihxian Lin and
  • Ming-Ren Yang

6 March 2025

Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing dia...

  • Article
  • Open Access
8 Citations
2,800 Views
24 Pages

15 July 2024

The complexity of cancer development involves intricate interactions among multiple biomarkers, such as gene-environment interactions. Utilizing microarray gene expression profile data for cancer classification is anticipated to be effective, thus dr...

  • Article
  • Open Access
1 Citations
1,819 Views
18 Pages

25 January 2025

Online social networks, as platforms for personal expression, have evolved into complex networks integrating political and social dimensions. This evolution has shifted the focus of network governance from addressing hacking activities to mitigating...

  • Article
  • Open Access
2,202 Views
19 Pages

15 November 2024

Although imbalanced data have been studied for many years, the problem of data imbalance is still a major problem in the development of machine learning and artificial intelligence. The development of deep learning and artificial intelligence has fur...

  • Article
  • Open Access
5 Citations
3,393 Views
19 Pages

18 June 2023

Imbalanced learning problems often occur in application scenarios and are additionally an important research direction in the field of machine learning. Traditional classifiers are substantially less effective for datasets with an imbalanced distribu...

  • Article
  • Open Access
1 Citations
17,233 Views
73 Pages

Detecting pump-and-dump schemes involving cryptoassets with high-frequency data is challenging due to imbalanced datasets and the early occurrence of unusual trading volumes. To address these issues, we propose constructing synthetic balanced dataset...

  • Article
  • Open Access
34 Citations
9,343 Views
16 Pages

AutoEncoder and LightGBM for Credit Card Fraud Detection Problems

  • Haichao Du,
  • Li Lv,
  • An Guo and
  • Hongliang Wang

6 April 2023

This paper proposes a method called autoencoder with probabilistic LightGBM (AED-LGB) for detecting credit card frauds. This deep learning-based AED-LGB algorithm first extracts low-dimensional feature data from high-dimensional bank credit card feat...

  • Article
  • Open Access
4 Citations
2,434 Views
15 Pages

Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping

  • Armando Bolívar,
  • Vicente García,
  • Roberto Alejo,
  • Rogelio Florencia-Juárez and
  • J. Salvador Sánchez

4 July 2024

An innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid applicat...

  • Article
  • Open Access
9 Citations
4,349 Views
29 Pages

The quality of machine learning models can suffer when inappropriate data is used, which is especially prevalent in high-dimensional and imbalanced data sets. Data preparation and preprocessing can mitigate some problems and can thus result in better...

  • Article
  • Open Access
61 Citations
7,915 Views
16 Pages

22 October 2021

This paper proposes a method, called autoencoder with probabilistic random forest (AE-PRF), for detecting credit card frauds. The proposed AE-PRF method first utilizes the autoencoder to extract features of low-dimensionality from credit card transac...

  • Article
  • Open Access
2,353 Views
20 Pages

Multivariate Time Series Anomaly Detection Based on Inverted Transformer with Multivariate Memory Gate

  • Yuan Ma,
  • Weiwei Liu,
  • Changming Xu,
  • Luyi Bai,
  • Ende Zhang and
  • Junwei Wang

8 September 2025

In the industrial IoT, it is vital to detect anomalies in multivariate time series, yet it faces numerous challenges, including highly imbalanced datasets, complex and high-dimensional data, and large disparities across variables. Despite the recent...

  • Article
  • Open Access
367 Views
32 Pages

12 November 2025

For classification problems, an imbalanced dataset can seriously reduce the learning efficiency in machine learning. In order to solve this problem, many scholars have proposed a series of methods mainly from the data and algorithm levels. At the dat...

  • Systematic Review
  • Open Access
939 Views
30 Pages

Machine Learning and Ensemble Methods for Cardiovascular Disease Prediction: A Systematic Review of Approaches, Performance Trends, and Research Challenges

  • Ghazala Gul,
  • Imtiaz Ali Korejo,
  • Dil Nawaz Hakro,
  • Haitham Alqahtani,
  • Abdullah Abbasi,
  • Muhammad Babar,
  • Osama Al Rahbi and
  • Najma Imtiaz Ali

Knowledge discovery helps mitigate the shortcomings of classical machine learning, especially those so-called imbalanced, high-dimensional, and noisy data challenges. Adaptive combination of multiple models, voting and other data fusion strategies, a...

  • Article
  • Open Access
12 Citations
3,898 Views
17 Pages

23 March 2023

With the rapid increase in the number of cyber-attacks, detecting and preventing malicious behavior has become more important than ever before. In this study, we propose a method for detecting and classifying malicious behavior in host process data u...

  • Article
  • Open Access
2,072 Views
24 Pages

Fault detection and remaining useful life (RUL) prediction are critical tasks in self-healing network (SHN) environments and industrial cyber–physical systems. These domains demand intelligent systems capable of handling dynamic, high-dimension...

  • Article
  • Open Access
19 Citations
8,704 Views
31 Pages

A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications

  • Tajul Miftahushudur,
  • Halil Mertkan Sahin,
  • Bruce Grieve and
  • Hujun Yin

29 January 2025

This survey explores recent advances in addressing class imbalance issues for developing machine learning models in precision agriculture, with a focus on techniques used for plant disease detection, soil management, and crop classification. We exami...

  • Article
  • Open Access
93 Citations
6,907 Views
21 Pages

LSTM and Bat-Based RUSBoost Approach for Electricity Theft Detection

  • Muhammad Adil,
  • Nadeem Javaid,
  • Umar Qasim,
  • Ibrar Ullah,
  • Muhammad Shafiq and
  • Jin-Ghoo Choi

25 June 2020

The electrical losses in power systems are divided into non-technical losses (NTLs) and technical losses (TLs). NTL is more harmful than TL because it includes electricity theft, faulty meters and billing errors. It is one of the major concerns in th...

  • Article
  • Open Access
6 Citations
1,504 Views
46 Pages

24 May 2025

The landscape of 5G communication introduces heightened risks from malicious attacks, posing significant threats to network security and availability. The unique characteristics of 5G networks, while enabling advanced communication, present challenge...

  • Article
  • Open Access
4 Citations
2,820 Views
20 Pages

In the semiconductor manufacturing industry, achieving high yields constitutes one of the pivotal factors for sustaining market competitiveness. When confronting the substantial volume of high-dimensional, non-linear, and imbalanced data generated du...

  • Article
  • Open Access
23 Citations
2,688 Views
25 Pages

4 March 2024

Casting defects in turbine blades can significantly reduce an aero-engine’s service life and cause secondary damage to the blades when exposed to harsh environments. Therefore, casting defect detection plays a crucial role in enhancing aircraft...

  • Article
  • Open Access
1 Citations
2,774 Views
19 Pages

23 November 2022

Purpose: High-involvement experience products (HIEP) are generally characterized by a high value and difficult purchasing decision for customers, and a wrong decision will bring large losses to consumers, severely affecting their trust in enterprises...

  • Article
  • Open Access
3 Citations
1,595 Views
19 Pages

28 May 2025

Laser-induced breakdown spectroscopy (LIBS) is a rapid, cost-effective technique for elemental analysis that enables real-time measurements with minimal sample preparation. However, LIBS datasets are often high-dimensional and imbalanced, limiting th...

  • Article
  • Open Access
50 Citations
10,705 Views
21 Pages

6 March 2017

Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectat...

  • Article
  • Open Access
295 Views
36 Pages

17 December 2025

Hyperspectral imagery (HSI), as a core data carrier in remote sensing, plays a crucial role in many fields. Still, it also faces numerous challenges, including the curse of dimensionality, noise interference, and small samples. These problems severel...

  • Article
  • Open Access
11 Citations
3,927 Views
14 Pages

In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tes...

  • Article
  • Open Access
6 Citations
3,653 Views
27 Pages

14 April 2023

Cyber-security systems collect information from multiple security sensors to detect network intrusions and their models. As attacks become more complex and security systems diversify, the data used by intrusion-detection systems becomes more dimensio...

  • Article
  • Open Access
1 Citations
1,601 Views
25 Pages

Continual Learning for Intrusion Detection Under Evolving Network Threats

  • Chaoqun Guo,
  • Xihan Li,
  • Jubao Cheng,
  • Shunjie Yang and
  • Huiquan Gong

4 October 2025

In the face of ever-evolving cyber threats, modern intrusion detection systems (IDS) must achieve long-term adaptability without sacrificing performance on previously encountered attacks. Traditional IDS approaches often rely on static training assum...

  • Article
  • Open Access
3 Citations
1,814 Views
27 Pages

HQRNN-FD: A Hybrid Quantum Recurrent Neural Network for Fraud Detection

  • Yao-Chong Li,
  • Yi-Fan Zhang,
  • Rui-Qing Xu,
  • Ri-Gui Zhou and
  • Yi-Lin Dong

27 August 2025

Detecting financial fraud is a critical aspect of modern intelligent financial systems. Despite the advances brought by deep learning in predictive accuracy, challenges persist—particularly in capturing complex, high-dimensional nonlinear featu...

  • Article
  • Open Access
4 Citations
4,463 Views
22 Pages

Vibration Analysis Using Multi-Layer Perceptron Neural Networks for Rotor Imbalance Detection in Quadrotor UAV

  • Ba Tarfi Salem Abdullah Salem,
  • Mohd Na’im Abdullah,
  • Faizal Mustapha,
  • Nur Shahirah Atifah Kanirai and
  • Mazli Mustapha

30 January 2025

Rotor imbalance in quadrotor UAVs poses a critical challenge, compromising flight stability, increasing maintenance demands, and reducing overall operational efficiency. Traditional vibration analysis methods, such as Fast Fourier Transform (FFT) and...

  • Article
  • Open Access
3 Citations
2,147 Views
22 Pages

29 September 2024

Hyperspectral small target detection (HSTD) is a promising pixel-level detection task. However, due to the low contrast and imbalanced number between the target and the background spatially and the high dimensions spectrally, it is a challenging one....

  • Article
  • Open Access
1,201 Views
21 Pages

Corporate bankruptcy prediction has become increasingly critical amid economic uncertainty. This study proposes a novel two-stage machine learning approach to enhance bankruptcy prediction accuracy, applied to Tokyo Stock Exchange-listed companies. F...

  • Article
  • Open Access
1 Citations
1,386 Views
26 Pages

23 February 2025

Predictive health management (PHM) plays a pivotal role in the maintenance of contemporary industrial systems, with the evaluation of the state of health (SOH) and the prediction of remaining useful life (RUL) constituting its central objectives. Nev...

  • Article
  • Open Access
9 Citations
2,323 Views
13 Pages

11 June 2024

To address the significant challenges in determining the single-well production of tight gas and shale gas after hydraulic fracturing, artificial intelligence (AI) methods were implemented. Machine learning (ML) algorithms such as random forest (RF),...

  • Article
  • Open Access
2 Citations
2,535 Views
30 Pages

12 August 2025

Insurance fraud detection is a significant challenge due to increasing fraudulent claims, class imbalance, and the increasing complexity of fraudulent behaviour. Traditional machine learning models often struggle to generalize effectively when applie...

  • Article
  • Open Access
252 Views
29 Pages

10 December 2025

In this paper, we propose the Adaptive Volcano Support Vector Machine (AVSVM)—a novel classification model inspired by the dynamic behavior of volcanic eruptions—for the purpose of enhancing malware detection. Unlike conventional SVMs tha...

  • Systematic Review
  • Open Access
27 Citations
5,005 Views
25 Pages

18 August 2023

In light of the escalating ubiquity of the Internet, the proliferation of cyber-attacks, coupled with their intricate and surreptitious nature, has significantly imperiled network security. Traditional machine learning methodologies inherently exhibi...

  • Article
  • Open Access
6 Citations
3,004 Views
21 Pages

WGAN-DL-IDS: An Efficient Framework for Intrusion Detection System Using WGAN, Random Forest, and Deep Learning Approaches

  • Shehla Gul,
  • Sobia Arshad,
  • Sanay Muhammad Umar Saeed,
  • Adeel Akram and
  • Muhammad Awais Azam

27 December 2024

The rise in cyber security issues has caused significant harm to tech world and thus society in recent years. Intrusion detection systems (IDSs) are crucial for the detection and the mitigation of the increasing risk of cyber attacks. False and disre...

  • Article
  • Open Access
4 Citations
2,427 Views
17 Pages

Early detection of drought stress in greenhouse tomato (Solanum lycopersicum) is an important issue. Real-time and nondestructive assessment of plant water status is possible by spectroscopy. However, spectral data often suffer from the problems of c...

  • Article
  • Open Access
1,432 Views
22 Pages

19 June 2025

Against the backdrop of global climate change and increasing ecological pressure, the refined monitoring of forest resources and accurate tree species identification have become essential tasks for sustainable forest management. Hyperspectral remote...

  • Article
  • Open Access
188 Views
26 Pages

An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning

  • Suresh K. S,
  • Thenmozhi Elumalai,
  • Radhakrishnan Rajamani,
  • Anubhav Kumar,
  • Balamurugan Balusamy,
  • Sumendra Yogarayan and
  • Kaliyaperumal Prabu

19 January 2026

Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To ad...

of 2