MDPI - Publisher of Open Access Journals

26 pages, 5861 KB

Open AccessArticle

Robust Industrial Surface Defect Detection Using Statistical Feature Extraction and Capsule Network Architectures

by Azeddine Mjahad and Alfredo Rosado-Muñoz

Sensors 2025, 25(19), 6063; https://doi.org/10.3390/s25196063 - 2 Oct 2025

Viewed by 159

Automated quality control is critical in modern manufacturing, especially for metallic cast components, where fast and accurate surface defect detection is required. This study evaluates classical Machine Learning (ML) algorithms using extracted statistical parameters and deep learning (DL) architectures including ResNet50, Capsule Networks, [...] Read more.

Automated quality control is critical in modern manufacturing, especially for metallic cast components, where fast and accurate surface defect detection is required. This study evaluates classical Machine Learning (ML) algorithms using extracted statistical parameters and deep learning (DL) architectures including ResNet50, Capsule Networks, and a 3D Convolutional Neural Network (CNN3D) using 3D image inputs. Using the Dataset Original, ML models with the selected parameters achieved high performance: RF reached 99.4 ± 0.2% precision and 99.4 ± 0.2% sensitivity, GB 96.0 ± 0.2% precision and 96.0 ± 0.2% sensitivity. ResNet50 trained with extracted parameters reached 98.0 ± 1.5% accuracy and 98.2 ± 1.7% F1-score. Capsule-based architectures achieved the best results, with ConvCapsuleLayer reaching 98.7 ± 0.2% accuracy and 100.0 ± 0.0% precision for the normal class, and 98.9 ± 0.2% F1-score for the affected class. CNN3D applied on 3D image inputs reached 88.61 ± 1.01% accuracy and 90.14 ± 0.95% F1-score. Using the Dataset Expanded with ML and PCA-selected features, Random Forest achieved 99.4 ± 0.2% precision and 99.4 ± 0.2% sensitivity, K-Nearest Neighbors 99.2 ± 0.0% precision and 99.2 ± 0.0% sensitivity, and SVM 99.2 ± 0.0% precision and 99.2 ± 0.0% sensitivity, demonstrating consistent high performance. All models were evaluated using repeated train-test splits to calculate averages of standard metrics (accuracy, precision, recall, F1-score), and processing times were measured, showing very low per-image execution times (as low as

3.69 \times 10^{- 4}

s/image), supporting potential real-time industrial application. These results indicate that combining statistical descriptors with ML and DL architectures provides a robust and scalable solution for automated, non-destructive surface defect detection, with high accuracy and reliability across both the original and expanded datasets. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)

► Show Figures

Figure 1

12 pages, 1170 KB

Open AccessArticle

Demographic, Morphological, and Histopathological Characteristics of Melanoma and Nevi: Insights from Statistical Analysis and Machine Learning Models

by Blagjica Lazarova, Gordana Petrushevska, Zdenka Stojanovska and Stephen C. Mullins

Diagnostics 2025, 15(19), 2499; https://doi.org/10.3390/diagnostics15192499 - 1 Oct 2025

Viewed by 276

Abstract

Background: Early and accurate differentiation between melanomas and benign nevi is essential for making proper clinical decisions. This study aimed to identify clinical, morphological, and histopathological variables most strongly associated with melanoma, using both statistical and machine learning approaches. Methods: This study [...] Read more.

Background: Early and accurate differentiation between melanomas and benign nevi is essential for making proper clinical decisions. This study aimed to identify clinical, morphological, and histopathological variables most strongly associated with melanoma, using both statistical and machine learning approaches. Methods: This study evaluated 184 melanocytic lesions using clinical, morphological, and histopathological parameters. Univariable analyses were performed in XLStat statistical software, version 2014.5.03, while multivariable machine learning models were developed in Jamovi (version 2.4). Five supervised algorithms (random forest, partial least squares, elastic net regression, conditional inference trees, and k-nearest neighbors) were compared using repeated cross-validation, with performance evaluated by accuracy, Kappa, sensitivity, specificity, F1 score, and calibration. Results: Univariable analysis identified significant differences between melanomas and nevi in age, horizontal diameter, gender, lesion location, and selected histopathological features (cytological and extracellular matrix changes, epidermal interactions). However, several associations weakened in multivariable analysis due to collinearity and overlapping effects. Using glmnet, the most influential independent predictors were cytological changes, horizontal diameter, epidermal interactions, and extracellular matrix features, alongside age, gender, and lesion location. The model achieved high discrimination (AUC = 0.97, 95% CI: 0.93–0.99) and accuracy (training: 95.3%; test: 92.6%), confirming robustness. Conclusions: Structured demographic, morphological, and histopathological data—particularly age, lesion size, cytological and extracellular matrix changes, and epidermal interactions—can effectively support classification of melanocytic lesions. Machine learning approaches (the glmnet model in our study) provide a reliable framework to evaluate such predictors and offer practical diagnostic support in dermatopathology. Full article

(This article belongs to the Special Issue Artificial Intelligence in Dermatology)

► Show Figures

Figure 1

21 pages, 596 KB

Open AccessArticle

Exploiting the Feature Space Structures of KNN and OPF Algorithms for Identification of Incipient Faults in Power Transformers

by André Gifalli, Marco Akio Ikeshoji, Danilo Sinkiti Gastaldello, Victor Hideki Saito Yamaguchi, Welson Bassi, Talita Mazon, Floriano Torres Neto, Pedro da Costa Junior and André Nunes de Souza

Mach. Learn. Knowl. Extr. 2025, 7(3), 102; https://doi.org/10.3390/make7030102 - 18 Sep 2025

Viewed by 487

Abstract

Power transformers represent critical assets within the electrical power system, and their unexpected failures may result in substantial financial losses for both utilities and consumers. Dissolved Gas Analysis (DGA) is a well-established diagnostic method extensively employed to detect incipient faults in power transformers. [...] Read more.

Power transformers represent critical assets within the electrical power system, and their unexpected failures may result in substantial financial losses for both utilities and consumers. Dissolved Gas Analysis (DGA) is a well-established diagnostic method extensively employed to detect incipient faults in power transformers. Although several conventional and machine learning techniques have been applied to DGA, most of them focus only on fault classification and lack the capability to provide predictive scenarios that would enable proactive maintenance planning. In this context, the present study introduces a novel approach to DGA interpretation, which highlights the trends and progression of faults by exploring the feature space through the algorithms k-Nearest Neighbors (KNN) and Optimum-Path Forest (OPF). To improve accuracy, the following strategies were implemented: statistical filtering based on normal distribution to eliminate outliers from the dataset; augmentation of gas-related features; and feature selection using optimization algorithms such as Cuckoo Search and Genetic Algorithms. The approach was validated using data from several transformers, with fault diagnoses cross-checked against inspection reports provided by the utility company. The findings indicate that the proposed method offers valuable insights into the progression, proximity, and classification of faults with satisfactory accuracy, thereby supporting its recommendation as a complementary tool for diagnosing incipient transformer faults. Full article

► Show Figures

Figure 1

15 pages, 612 KB

Open AccessArticle

Comparison of Supervised Machine Learning Models to Logistic Regression Model Using Tooth-Related Factors to Predict the Outcome of Nonsurgical Periodontal Treatment

by Ali J. B. Al-Sharqi, Mohammed Taha Ahmed Baban, Nada K. Imran, Sarhang S. Gul and Ali A. Abdulkareem

Diagnostics 2025, 15(18), 2333; https://doi.org/10.3390/diagnostics15182333 - 15 Sep 2025

Viewed by 339

Abstract

Background/Objectives: Conventional logistic regression is widely used in the field of dentistry, specifically for prediction purposes in longitudinal studies. This study aimed to compare the validity of different supervised machine learning (ML) models to the conventional logistic regression (LR) model to predict the [...] Read more.

Background/Objectives: Conventional logistic regression is widely used in the field of dentistry, specifically for prediction purposes in longitudinal studies. This study aimed to compare the validity of different supervised machine learning (ML) models to the conventional logistic regression (LR) model to predict the outcomes of nonsurgical periodontal treatment (NSPT). Methods: Patients diagnosed with periodontitis received full periodontal charting, including bleeding on probing (BoP), probing pocket depth (PPD), and clinical attachment loss (CAL). Furthermore, the tooth type, tooth location, tooth surface, arch type, and gingival phenotype were also collected as site-specific predictors. Later, root surface debridement was provided and treatment outcomes were evaluated after 3 months. Site-specific predictors were used to train five ML models, including random forest (RF), decision tree (DT), support vector classifier (SVC), K-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB), to develop predictive models. Results: Site-specific predictors of 1108 examined sites were used, and the overall accuracy prediction of the conventional LR model was 70.4%, with PPD statistically significantly associated with the outcome of NSPT (odds ratio = 0.577, p = 0.001). Among the ML models examined, only GNB and SVC showed comparable prediction accuracy (71.0% and 70.4%, respectively) to the LR model, whereas the prediction accuracies of KNN, RF, and DT were 65.0%, 62.0%, and 61.0%, respectively. Similarly, baseline PPD was shown to be the most important featured predictor by both the RF and DT models. Conclusions: The evidence suggests that supervised ML models do not outperform the LR model in predicting the outcomes of NSPT. A larger sample size and more predictors of periodontitis are necessary to enhance the accuracy of ML models over the LR model in predicting the outcomes of NSPT. Full article

(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)

► Show Figures

Figure 1

25 pages, 6705 KB

Open AccessArticle

Machine Learning-Enhanced Monitoring and Assessment of Urban Drinking Water Quality in North Bhubaneswar, Odisha, India

by Kshyana Prava Samal, Rakesh Ranjan Thakur, Alok Kumar Panda, Debabrata Nandi, Alok Kumar Pati, Kumarjeeb Pegu and Bojan Đurin

Limnol. Rev. 2025, 25(3), 44; https://doi.org/10.3390/limnolrev25030044 - 12 Sep 2025

Viewed by 1371

Abstract

Access to clean drinking water is crucial for any region’s social and economic growth. However, rapid urbanization and industrialization have significantly deteriorated water quality, posing severe pollution threats from domestic, agricultural, and industrial sources. This study presents an innovative framework for assessing water [...] Read more.

Access to clean drinking water is crucial for any region’s social and economic growth. However, rapid urbanization and industrialization have significantly deteriorated water quality, posing severe pollution threats from domestic, agricultural, and industrial sources. This study presents an innovative framework for assessing water quality in North Bhubaneswar, integrating the Water Quality Index (WQI) with statistical analysis, geospatial technologies, and machine learning models. The WQI, calculated using the Weighted Arithmetic Index method, provides a single composite value representing overall water quality based on several key physicochemical parameters. To evaluate potable water quality across 21 wards in the northern zone, several key parameters were monitored, including pH, electrical conductivity (EC), dissolved oxygen (DO), hardness, chloride, total dissolved solids (TDSs), and biochemical oxygen demand (BOD). The Weighted Arithmetic WQI method was employed to determine overall water quality, which ranged from excellent to good. Furthermore, Principal Component Analysis (PCA) revealed a strong positive correlation (r > 0.6) between pH, conductivity, hardness, and alkalinity. To enhance the accuracy and reliability of water quality assessment, multiple machine learning models Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) were applied to classify water quality based on these parameters. Among them, the Decision Tree (DT) and Random Forest (RF) models demonstrated the highest precision (91.8% and 92.7%, respectively) and overall accuracy (91.7%), making them the most effective in predicting water quality and integrating WQI, machine learning, and statistics to analyze water quality. The study emphasizes the importance of continuous water quality monitoring and offers data-driven recommendations to ensure sustainable access to clean drinking water in North Bhubaneswar. Full article

► Show Figures

Figure 1

19 pages, 3431 KB

Open AccessArticle

Computed Tomography Radiomics and Machine Learning for Prediction of Histology-Based Hepatic Steatosis Scores

by Winston T. Chu, Hui Wang, Marcelo A. Castro, Venkatesh Mani, C. Paul Morris, Thomas C. Friedrich, David H. O’Connor, Courtney L. Finch, Ji Hyun Lee, Philip J. Sayre, Gabriella Worwa, Anya Crane, Jens H. Kuhn, Ian Crozier, Jeffrey Solomon and Claudia Calcagno

Diagnostics 2025, 15(18), 2310; https://doi.org/10.3390/diagnostics15182310 - 11 Sep 2025

Viewed by 665

Abstract

Background/Objective: Computed tomography (CT) can be used to non-invasively assess the health of the liver; however, radiologist evaluation and simple thresholding alone are insufficient for diagnosis of hepatic steatosis, necessitating biopsies. This study explored CT radiomics and machine learning to enable non-invasive, objective, [...] Read more.

Background/Objective: Computed tomography (CT) can be used to non-invasively assess the health of the liver; however, radiologist evaluation and simple thresholding alone are insufficient for diagnosis of hepatic steatosis, necessitating biopsies. This study explored CT radiomics and machine learning to enable non-invasive, objective, and quantitative prediction of steatosis severity across the macaque liver. Methods: In this retrospective study, CT images of 42 crab-eating macaques (age [yr] = 6.1 ± 1.7; sex [male/female] = 26/16) with varying degrees of hepatic steatosis were analyzed, and the results were compared to histology-based steatosis scores of livers from the same animals. After extracting radiomic features, a thorough array of statistical analyses, feature selection techniques, and machine learning models were applied to identify a distinct radiomic signature of histologically defined hepatic steatosis. Results: We identified 12 radiomic features that correlated with steatosis scores, and hierarchical clustering based on radiomic attributes alone revealed clusters roughly aligning with steatosis severity groups. The k-nearest neighbors model architecture best predicted histopathologic steatosis scores in both classification and regression tasks (area under the receiver operating characteristic curve [AUC ROC] = 0.89 ± 0.09; root-mean-square error [RMSE] = 0.60 ± 0.10). Feature analyses identified seven key radiomic features (six first-order features and one gray-level co-occurrence matrix feature) that were most important when predicting steatosis. Conclusions: We identified a CT radiomic signature of steatosis and demonstrated that histology-based steatosis scores can be predicted non-invasively and objectively using machine learning and CT radiomics as a potential alternative to invasive core biopsies. Given the strong similarities in liver structure, liver function, and hepatic steatosis pathophysiology between macaques and humans, these findings have the potential to translate to humans. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Radiomics in Medical Diagnosis)

► Show Figures

Figure 1

14 pages, 3047 KB

Open AccessArticle

Modeling the Seasonal and Spatial Dynamics of Epigeic Fauna in the Context of Vineyard Landscape Use Using Machine Learning

by Vladimír Langraf and Kornélia Petrovičová

Agronomy 2025, 15(9), 2117; https://doi.org/10.3390/agronomy15092117 - 3 Sep 2025

Viewed by 434

Abstract

Epigeic groups play a key ecological role in vineyards, as they represent a significant component of soil and surface communities that directly affect the functioning of the agroecosystem. They act as predators, decomposers of organic matter, and important regulators of pest populations, thereby [...] Read more.

Epigeic groups play a key ecological role in vineyards, as they represent a significant component of soil and surface communities that directly affect the functioning of the agroecosystem. They act as predators, decomposers of organic matter, and important regulators of pest populations, thereby contributing to the natural biological protection of the vineyard. We conducted research between 2021 and 2023, where we monitored the impact of different types of vineyard landscape habitats on the spatial distribution and abundance of epigeic fauna. Over the study period, 57,964 individuals were recorded, with the highest abundance observed in 2023 and the lowest in 2022. Redundancy analysis confirmed a significant impact of habitat type on community composition, especially in semi-intensive and intensive vineyards, meadows, and abandoned sites, with the differences being statistically significant in all monitored habitats. The interannual changes indicated a significant decrease in biodiversity in 2022, followed by a significant increase in 2023, indicating a positive effect of changing management practices and natural succession on restoring ecological stability. The K-nearest neighbor (KNN) prediction model successfully classified individual years based on the number of individuals and taxa with an accuracy of 97%, with 2021 characterized by lower biodiversity, 2022 by a transitional state, and 2023 by a higher taxa and abundance level. The findings highlight the sensitivity of epigeic fauna communities to management and environmental changes and confirm that the application of gentle agri-environmental measures can significantly contribute to the maintenance and restoration of biodiversity in agricultural landscapes. Full article

(This article belongs to the Section Pest and Disease Management)

► Show Figures

Figure 1

23 pages, 2991 KB

Open AccessArticle

Enhancing Alzheimer’s Diagnosis with Machine Learning on EEG: A Spectral Feature-Based Comparative Analysis

by Yeliz Senkaya, Cetin Kurnaz and Ferdi Ozbilgin

Diagnostics 2025, 15(17), 2190; https://doi.org/10.3390/diagnostics15172190 - 29 Aug 2025

Viewed by 1011

Abstract

Background/Objectives: Alzheimer’s disease (AD) is a devastating neurodegenerative disorder that progressively impairs cognitive, neurological, and behavioral functions, severely affecting quality of life. The current diagnostic process relies on expert interpretation of extensive clinical assessments, often leading to delays that reduce the effectiveness of [...] Read more.

Background/Objectives: Alzheimer’s disease (AD) is a devastating neurodegenerative disorder that progressively impairs cognitive, neurological, and behavioral functions, severely affecting quality of life. The current diagnostic process relies on expert interpretation of extensive clinical assessments, often leading to delays that reduce the effectiveness of early interventions. Given the lack of a definitive cure, accelerating and improving diagnosis is critical to slowing disease progression. Electroencephalography (EEG), a widely used non-invasive technique, captures AD-related brain activity alterations, yet extracting meaningful features from EEG signals remains a significant challenge. This study introduces a machine learning (ML)-driven approach to enhance AD diagnosis using EEG data. Methods: EEG recordings from 36 AD patients, 23 Frontotemporal Dementia (FTD) patients, and 29 healthy individuals (HC) were analyzed. EEG signals were processed within the 0.5–45 Hz frequency range using the Welch method to compute the Power Spectral Density (PSD). From both the time-domain signals and the corresponding PSD, a total of 342 statistical and spectral features were extracted. The resulting feature set was then partitioned into training and test datasets while preserving the distribution of class labels. Feature selection was performed on the training set using Spearman and Pearson correlation analyses to identify the most informative features. To enhance classification performance, hyperparameter tuning was conducted using Bayesian optimization. Subsequently, classification was carried out using Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN) the optimized hyperparameters. Results: The SVM classifier achieved a notable accuracy of 96.01%, outperforming previously reported methods. Conclusions: These results demonstrate the potential of machine learning-based EEG analysis as an effective approach for the early diagnosis of Alzheimer’s Disease, enabling timely clinical intervention and ultimately contributing to improved patient outcomes. Full article

(This article belongs to the Special Issue Artificial Intelligence in Brain Diseases)

► Show Figures

Figure 1

24 pages, 4431 KB

Open AccessArticle

Fault Classification in Power Transformers Using Dissolved Gas Analysis and Optimized Machine Learning Algorithms

by Vuyani M. N. Dladla and Bonginkosi A. Thango

Machines 2025, 13(8), 742; https://doi.org/10.3390/machines13080742 - 20 Aug 2025

Viewed by 604

Abstract

Power transformers are critical assets in electrical power systems, yet their fault diagnosis often relies on conventional dissolved gas analysis (DGA) methods such as the Duval Pentagon and Triangle, Key Gas, and Rogers Ratio methods. Even though these methods are commonly used, they [...] Read more.

Power transformers are critical assets in electrical power systems, yet their fault diagnosis often relies on conventional dissolved gas analysis (DGA) methods such as the Duval Pentagon and Triangle, Key Gas, and Rogers Ratio methods. Even though these methods are commonly used, they present limitations in classification accuracy, concurrent fault identification, and manual sample handling. In this study, a framework of optimized machine learning algorithms that integrates Chi-squared statistical feature selection with Random Search hyperparameter optimization algorithms was developed to enhance transformer fault classification accuracy using DGA data, thereby addressing the limitations of conventional methods and improving diagnostic precision. Utilizing the R2024b MATLAB Classification Learner App, five optimized machine learning algorithms were trained and tested using 282 transformer oil samples with varying DGA gas concentrations obtained from industrial transformers, the IEC TC10 database, and the literature. The optimized and assessed models are Linear Discriminant, Naïve Bayes, Decision Trees, Support Vector Machine, Neural Networks, k-Nearest Neighbor, and the Ensemble Algorithm. From the proposed models, the best performing algorithm, Optimized k-Nearest Neighbor, achieved an overall performance accuracy of 92.478%, followed by the Optimized Neural Network at 89.823%. To assess their performance against the conventional methods, the same dataset used for the optimized machine learning algorithms was used to evaluate the performance of the Duval Triangle and Duval Pentagon methods using VAISALA DGA software version 1.1.0; the proposed models outperformed the conventional methods, which could only achieve a classification accuracy of 35.757% and 30.818%, respectively. This study concludes that the application of the proposed optimized machine learning algorithms can enhance the classification accuracy of DGA-based faults in power transformers, supporting more reliable diagnostics and proactive maintenance strategies. Full article

(This article belongs to the Section Electrical Machines and Drives)

► Show Figures

Figure 1

23 pages, 811 KB

Open AccessArticle

Efficient Dynamic Emotion Recognition from Facial Expressions Using Statistical Spatio-Temporal Geometric Features

by Yacine Yaddaden

Big Data Cogn. Comput. 2025, 9(8), 213; https://doi.org/10.3390/bdcc9080213 - 19 Aug 2025

Viewed by 934

Abstract

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal [...] Read more.

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal evolution of facial expressions but often suffer from high computational costs, limiting their suitability for real-time use. In this paper, we propose an efficient dynamic AFER approach based on a novel spatio-temporal representation. Facial landmarks are extracted, and all possible Euclidean distances are computed to model the spatial structure. To capture temporal variations, three statistical metrics are applied to each distance sequence. A feature selection stage based on the Extremely Randomized Trees (ExtRa-Trees) algorithm is then performed to reduce dimensionality and enhance classification performance. Finally, the emotions are classified using a linear multi-class Support Vector Machine (SVM) and compared against the k-Nearest Neighbors (k-NN) method. The proposed approach is evaluated on three benchmark datasets: CK+, MUG, and MMI, achieving recognition rates of 94.65%, 93.98%, and 75.59%, respectively. Our results demonstrate that the proposed method achieves a strong balance between accuracy and computational efficiency, making it well-suited for real-time facial expression recognition applications. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

18 pages, 1501 KB

Open AccessArticle

Application of Fractal Radiomics and Machine Learning for Differentiation of Non-Small Cell Lung Cancer Subtypes on PET/MR Images

by Ewelina Bębas, Konrad Pauk, Jolanta Pauk, Kristina Daunoravičienė, Małgorzata Mojsak, Marcin Hładuński, Małgorzata Domino and Marta Borowska

J. Clin. Med. 2025, 14(16), 5776; https://doi.org/10.3390/jcm14165776 - 15 Aug 2025

Cited by 1 | Viewed by 632

Abstract

Objectives: Non-small cell lung cancer (NSCLC), the most prevalent type of lung cancer, includes subtypes such as adenocarcinoma (ADC) and squamous cell carcinoma (SCC), which require distinct management approaches. Accurately differentiating NSCLC subtypes based on diagnostic imaging remains challenging. However, the extraction of [...] Read more.

Objectives: Non-small cell lung cancer (NSCLC), the most prevalent type of lung cancer, includes subtypes such as adenocarcinoma (ADC) and squamous cell carcinoma (SCC), which require distinct management approaches. Accurately differentiating NSCLC subtypes based on diagnostic imaging remains challenging. However, the extraction of radiomic features—such as first-order statistics (FOS), second-order statistics (SOS), and fractal dimension texture analysis (FDTA) features—from magnetic resonance (MR) images supports the development of quantitative NSCLC assessments. Methods: This study aims to evaluate whether the integration of FDTA features with FOS and SOS texture features in MR image analysis improves machine learning classification of NSCLC into ADC and SCC subtypes. The study was conducted on 274 MR images, comprising ADC (n = 122) and SCC (n = 152) cases. From the segmented MR images, 93 texture features were extracted. The random forest algorithm was used to identify informative features from both FOS/SOS and combined FOS/SOS/FDTA datasets. Subsequently, the k-nearest neighbors (kNN) algorithm was applied to classify MR images as ADC or SCC. Results: The highest performance (accuracy = 0.78, precision = 0.81, AUC = 0.89) was achieved using 37 texture features selected from the combined FOS/SOS/FDTA dataset. Conclusions: Incorporating fractal descriptors into the texture-based classification of lung MR images enhances the differentiation of NSCLC subtypes. Full article

(This article belongs to the Section Oncology)

► Show Figures

Figure 1

23 pages, 13439 KB

Open AccessArticle

Precision Identification of Irrigated Areas in Semi-Arid Regions Using Optical-Radar Time-Series Features and Ensemble Machine Learning

by Weifeng Li, Changlai Xiao, Xiujuan Liang, Weifei Yang, Jiang Zhang, Rongkun Dai, Yuhan La, Le Kang and Deyu Zhao

Hydrology 2025, 12(8), 214; https://doi.org/10.3390/hydrology12080214 - 14 Aug 2025

Viewed by 674

Abstract

Addressing limitations in remote sensing irrigation monitoring (insufficient resolution, single-source constraints, poor terrain adaptability), this study developed a high-precision identification framework for Jianping County, China, a semi-arid region. We integrated Sentinel-1 SAR (VV/VH), Sentinel-2 multispectral, and MOD11A1 land surface temperature data. Savitzky–Golay (S-G) [...] Read more.

Addressing limitations in remote sensing irrigation monitoring (insufficient resolution, single-source constraints, poor terrain adaptability), this study developed a high-precision identification framework for Jianping County, China, a semi-arid region. We integrated Sentinel-1 SAR (VV/VH), Sentinel-2 multispectral, and MOD11A1 land surface temperature data. Savitzky–Golay (S-G) filtering reconstructed time-series datasets for NDVI, SAVI, TVDI, and VV/VH backscatter coefficients. Irrigation mapping employed random forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) algorithms. Key results demonstrate the following. (1) RF achieved superior performance with overall accuracies of 91.00% (2022), 88.33% (2023), and 87.78% (2024), and Kappa coefficients of 86.37%, 80.96%, and 80.40%, showing minimal deviation (0.66–3.44%) from statistical data; (2) SAVI and VH exhibited high irrigation sensitivity, with peak differences between irrigated/non-irrigated areas reaching 0.48 units (SAVI, July–August) and 2.78 dB (VH); (3) cropland extraction accuracy showed <3% discrepancy versus governmental statistics. The “Multi-temporal Feature Fusion + S-G Filtering + RF Optimization” framework provides an effective solution for precision irrigation monitoring in complex semi-arid environments. Full article

► Show Figures

Figure 1

16 pages, 1719 KB

Open AccessArticle

Geographical Origin Classification of Oolong Tea Using an Electronic Nose: Application of Machine Learning and Gray Relational Analysis

by Sushant Kaushal, Priya Rana, Chao-Chin Chung and Ho-Hsien Chen

Chemosensors 2025, 13(8), 295; https://doi.org/10.3390/chemosensors13080295 - 8 Aug 2025

Viewed by 639

Abstract

Taiwan accounts for 90% of the total oolong tea production and enjoys a good global reputation for its quality. In recent years, oolong tea from neighboring countries has been imported into Taiwan and sold as Taiwanese oolong at high prices. This study aimed [...] Read more.

Taiwan accounts for 90% of the total oolong tea production and enjoys a good global reputation for its quality. In recent years, oolong tea from neighboring countries has been imported into Taiwan and sold as Taiwanese oolong at high prices. This study aimed to rapidly classify oolong tea from four geographical origins (Taiwan, Vietnam, China, and Indonesia) using an electronic nose (E-nose) combined with machine learning. Color measurements were also conducted to support the classification. The electronic nose (E-nose) was utilized to analyze the aroma profiles of tea samples. To classify the samples, five machine learning models—linear discriminant analysis (LDA), support vector machine (SVM), K-nearest neighbor (KNN), artificial neural network (ANN), and random forest (RF)—were developed using 70% of the dataset for training and tested on the remaining 30%. Gray relational analysis (GRA) was applied to measure the relationship between sensor responses and reference tea origins. Multivariate analysis of variance (MANOVA) indicated a statistically significant effect of tea origin on color parameters, as confirmed by both Pillai’s trace and Wilks’ Lambda (Λ) tests (p = 0.000 < 0.05). Among the tested models, LDA and ANN achieved the highest overall classification accuracy (98.33%), with ANN outperforming in the discrimination of Taiwanese oolong tea, achieving 98.89% accuracy. GRA presented higher gray relational grade (GRG) values for Taiwanese tea samples compared to other origins and identified sensors S4, S6, and S14 as the dominant contributors. In conclusion, the E-nose combined with machine learning provides a rapid, non-destructive, and effective approach for geographical origin classification of oolong tea. Full article

(This article belongs to the Special Issue Applications of Electronic Nose (E-Nose) and Electronic Tongue (E-Tongue) in Food Quality)

► Show Figures

Figure 1

19 pages, 4537 KB

Open AccessArticle

Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape

by Ahmet Hilmi Erciyes, Toygun Atasoy, Abdurrahman Tursun and Sibel Canaz Sevgen

Buildings 2025, 15(15), 2773; https://doi.org/10.3390/buildings15152773 - 6 Aug 2025

Viewed by 1095

Abstract

The prediction of real estate values is vital for taxation, transactions, mortgages, and urban policy development. Values can be predicted more accurately by statistical or advanced methods together when the size of the data is huge. In metropolitan cities like İstanbul, where size [...] Read more.

The prediction of real estate values is vital for taxation, transactions, mortgages, and urban policy development. Values can be predicted more accurately by statistical or advanced methods together when the size of the data is huge. In metropolitan cities like İstanbul, where size of the real estate data is vast and complex, mass appraisal methods supported by Machine Learning offer a scalable and consistent alternative. This study employs six algorithms: Artificial Neural Network, Extreme Gradient Boosting, K-Nearest Neighbors, Support Vector Regression, Random Forest, and Semi-Log Regression, to estimate the values of real estate on both the Asian and European continent parts of İstanbul. In total, 168,099 residential properties were utilized along with 30 of their features from both sides of the Bosphorus. The results show that RF yielded the best performance in Beşiktaş, while XGBoost performed best in Üsküdar. ANN also produced competitive results, although slightly less accurate than those of XGBoost and RF. In contrast, traditional SVR and SLR models underperformed, especially in terms of R² and RMSE values. With its large-scale dataset, focusing on one of the greatest metropolitan areas, Istanbul, and the usage of multiple ML algorithms, this study stands as a comprehensive and practical contribution to the field of automated real estate valuation. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

28 pages, 10147 KB

Open AccessArticle

Construction of Analogy Indicator System and Machine-Learning-Based Optimization of Analogy Methods for Oilfield Development Projects

by Muzhen Zhang, Zhanxiang Lei, Chengyun Yan, Baoquan Zeng, Fei Huang, Tailai Qu, Bin Wang and Li Fu

Energies 2025, 18(15), 4076; https://doi.org/10.3390/en18154076 - 1 Aug 2025

Viewed by 443

Abstract

Oil and gas development is characterized by high technical complexity, strong interdisciplinarity, long investment cycles, and significant uncertainty. To meet the need for quick evaluation of overseas oilfield projects with limited data and experience, this study develops an analogy indicator system and tests [...] Read more.

Oil and gas development is characterized by high technical complexity, strong interdisciplinarity, long investment cycles, and significant uncertainty. To meet the need for quick evaluation of overseas oilfield projects with limited data and experience, this study develops an analogy indicator system and tests multiple machine-learning algorithms on two analogy tasks to identify the optimal method. Using an initial set of basic indicators and a database of 1436 oilfield samples, a combined subjective–objective weighting strategy that integrates statistical methods with expert judgment is used to select, classify, and assign weights to the indicators. This process results in 26 key indicators for practical analogy analysis. Single-indicator and whole-asset analogy experiments are then performed with five standard machine-learning algorithms—support vector machine (SVM), random forest (RF), backpropagation neural network (BP), k-nearest neighbor (KNN), and decision tree (DT). Results show that SVM achieves classification accuracies of 86% and 95% in medium-high permeability sandstone oilfields, respectively, greatly surpassing other methods. These results demonstrate the effectiveness of the proposed indicator system and methodology, providing efficient and objective technical support for evaluating and making decisions on overseas oilfield development projects. Full article

(This article belongs to the Section H1: Petroleum Engineering)

► Show Figures

Figure 1

Search Results (344)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (344)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI