MDPI - Publisher of Open Access Journals

24 pages, 2719 KB

Open AccessArticle

Enhancing Road Freight Price Forecasting Using Gradient Boosting Ensemble Supervised Machine Learning Algorithm

by Artur Budzyński and Maria Cieśla

Mathematics 2025, 13(18), 2964; https://doi.org/10.3390/math13182964 - 12 Sep 2025

Viewed by 493

For effective logistics planning and pricing strategies, it is essential to predict road freight transportation costs accurately. Using a real-world dataset with 45,569 freight offers and 52 different variables, including financial, logistical, geographical, and temporal characteristics, this study presents a data-driven method for [...] Read more.

For effective logistics planning and pricing strategies, it is essential to predict road freight transportation costs accurately. Using a real-world dataset with 45,569 freight offers and 52 different variables, including financial, logistical, geographical, and temporal characteristics, this study presents a data-driven method for forecasting transport prices. To create a strong predictive model, the approach combines hyperparameter optimization, evolutionary feature selection, and extensive feature engineering. Because gradient boosting works well for modelling intricate, nonlinear relationships, it was used as the main algorithm. Temporal dependencies were maintained through a nested cross-validation framework with a time-series split, which improved the generalizability of the model. With a mean absolute percentage error (MAPE) of 6.27%, the model showed excellent predictive accuracy. Key predictive factors included total transport distance, load and delivery quantities, temperature constraints, and aggregated categorical features such as route and vehicle type. The results confirm that evolutionary algorithms are capable of efficiently optimizing model parameters, as well as feature subsets, greatly enhancing interpretability and performance. In the freight logistics industry, this method offers useful insights for operational and dynamic pricing decision-making. This model may be expanded in future research to include external data sources and investigate its suitability for use in various geographic locations and modes of transportation. Full article

(This article belongs to the Special Issue Evolutionary Machine Learning for Real-World Applications)

► Show Figures

Figure 1

22 pages, 11023 KB

Open AccessArticle

Comparing Satellite-Derived and Model-Based Surface Soil Moisture for Spring Barley Yield Prediction in Central Europe

by Felix Reuß, Mariette Vreugdenhil, Emanuel Bueechi and Wolfgang Wagner

Remote Sens. 2025, 17(8), 1394; https://doi.org/10.3390/rs17081394 - 14 Apr 2025

Viewed by 1245

Abstract

Surface soil moisture (SSM) has proven to be an important variable for the yield prediction of main crops like maize and wheat, but its value for spring barley, the third most cultivated crop in Europe, has not yet been evaluated. This study assesses [...] Read more.

Surface soil moisture (SSM) has proven to be an important variable for the yield prediction of main crops like maize and wheat, but its value for spring barley, the third most cultivated crop in Europe, has not yet been evaluated. This study assesses how much of spring barley yield variability can be explained by the commonly used model and satellite-based global SSM products ERA5 SWVL1 and H SAF. A Feed Forward Neural Network, SSM time series, and reference yield data are used to predict spring barley yield at NUTS level for Austria, Czechia, and Germany. A random train-test split is used to assess the explained variability and a cross-validation at the NUTS level for the spatial evaluation. The results indicate the following: (1) ERA5 SWVL1 achieved an R² of 0.37, H SAF an R² of 0.33; (2) Both products achieved the lowest RMSE and MAE in Czechia, high RMSE and MAE values are observed in Eastern Germany. (3) ERA5 SWVL1 performed better in areas with low sensitivity for microwaves like the Alpine region, but both products achieved similar results in 80% of the NUTS regions. These findings contribute to better utilization of SSM and more accurate yield predictions for spring barley and similar crops. Full article

(This article belongs to the Special Issue Advances in Remote Sensing for Crop Monitoring and Food Security)

► Show Figures

Figure 1

28 pages, 4913 KB

Open AccessArticle

Modeling of Predictive Maintenance Systems for Laser-Welders in Continuous Galvanizing Lines Based on Machine Learning with Welder Control Data

by Jin-Seong Choi, So-Won Choi and Eul-Bum Lee

Sustainability 2023, 15(9), 7676; https://doi.org/10.3390/su15097676 - 7 May 2023

Cited by 11 | Viewed by 4365

Abstract

This study aimed to develop a predictive maintenance model using machine learning (ML) techniques to automatically detect equipment failures before line shutdowns due to equipment malfunctions, explicitly focusing on laser welders in the continuous galvanizing lines (CGLs) of a steel plant in Korea. [...] Read more.

This study aimed to develop a predictive maintenance model using machine learning (ML) techniques to automatically detect equipment failures before line shutdowns due to equipment malfunctions, explicitly focusing on laser welders in the continuous galvanizing lines (CGLs) of a steel plant in Korea. The study selected an auto-encoder (AE) as a base model, which has the strength of applying normal data and a long short-term memory (LSTM) model for application to time series data, such as equipment operation data. Here, a laser welder predictive maintenance model (LW-PMM) based on the LSTM-AE algorithm was developed by combining the technical advantages of both algorithms. Approximately 1500 types of data were collected, and approximately 200 were selected through preprocessing. The training and testing datasets were split at a ratio of 8:2, and the model parameters were optimized using 10-fold cross-validation. The performance evaluation of the LW-PMM resulted in an accuracy rate of 97.3%, a precision rate of 79.8%, a recall rate of 100%, and an F1-score of 88.8%. The precision of 79.8% compared to the 100% recall value indicated that although the model predicted all failures in the equipment as failures, 20.2% of them were duplicate values, which can be interpreted as one of the five failure signals being not an actual failure. As a result of the application to an actual CGL operation site, equipment abnormalities were detected for the first time 27 h before failure, resulting in a reduction of 18 h compared with the existing process. This study is unique because it started as a proof of concept (POC) and was validated in a production setting as a pilot system for the predictive maintenance of laser welders. We expect this study to be expanded and applied to steel production processes, contributing to digital transformation and innovation in the steel industry. Full article

(This article belongs to the Special Issue Industry 4.0 Digital Transformation for Intelligent Construction, Operation and Maintenance)

► Show Figures

Figure 1

18 pages, 3260 KB

Open AccessArticle

Deep Learning Approach with LSTM for Daily Streamflow Prediction in a Semi-Arid Area: A Case Study of Oum Er-Rbia River Basin, Morocco

by Karima Nifa, Abdelghani Boudhar, Hamza Ouatiki, Haytam Elyoussfi, Bouchra Bargam and Abdelghani Chehbouni

Water 2023, 15(2), 262; https://doi.org/10.3390/w15020262 - 8 Jan 2023

Cited by 39 | Viewed by 6954

Abstract

Daily hydrological modelling is among the most challenging tasks in water resource management, particularly in terms of streamflow prediction in semi-arid areas. Various methods were applied in order to deal with this complex phenomenon, but recently data-driven models have taken a better space, [...] Read more.

Daily hydrological modelling is among the most challenging tasks in water resource management, particularly in terms of streamflow prediction in semi-arid areas. Various methods were applied in order to deal with this complex phenomenon, but recently data-driven models have taken a better space, given their ability to solve prediction problems in time series. In this study, we have employed the Long Short-Term Memory (LSTM) network to simulate the daily streamflow over the Ait Ouchene watershed (AIO) in the Oum Er-Rbia river basin in Morocco, based on a temporal sequence of in situ and remotely sensed hydroclimatic data ranging from 2001 to 2010. The analysis adopted in this work is based on three-dimension input required by the LSTM model (1); the input samples used three splitting approaches: 70% of the dataset as training, splitting the data considering the hydrological year and the cross-validation method; (2) the sequence length; (3) and the input features using two different scenarios. The prediction results demonstrate that the LSTM performs poorly using the default data input scenario, whereas the best results during the testing were found in a sequence length of 30 days using approach 3 (R² = 0.58). In addition, the LSTM fed with the lagged data input scenario using the Forward Feature Selection (FFS) method provides high performance accuracy using approach 2 (R² = 0.84) in a sequence length of 20 days. Eventually, in applications related to water resources management where data are limited, the use of the deep learning technique is able to create high predictive accuracy, which can be enhanced with the right combination subset of features by using FFS. Full article

(This article belongs to the Section Water Resources Management, Policy and Governance)

► Show Figures

Figure 1

15 pages, 825 KB

Open AccessArticle

De Novo Prediction of Drug Targets and Candidates by Chemical Similarity-Guided Network-Based Inference

by Carlos Vigil-Vásquez and Andreas Schüller

Int. J. Mol. Sci. 2022, 23(17), 9666; https://doi.org/10.3390/ijms23179666 - 26 Aug 2022

Cited by 3 | Viewed by 3224

Abstract

Identifying drug–target interactions is a crucial step in discovering novel drugs and for drug repositioning. Network-based methods have shown great potential thanks to the straightforward integration of information from different sources and the possibility of extracting novel information from the graph topology. However, [...] Read more.

Identifying drug–target interactions is a crucial step in discovering novel drugs and for drug repositioning. Network-based methods have shown great potential thanks to the straightforward integration of information from different sources and the possibility of extracting novel information from the graph topology. However, despite recent advances, there is still an urgent need for efficient and robust prediction methods. Here, we present SimSpread, a novel method that combines network-based inference with chemical similarity. This method employs a tripartite drug–drug–target network constructed from protein–ligand interaction annotations and drug–drug chemical similarity on which a resource-spreading algorithm predicts potential biological targets for both known or failed drugs and novel compounds. We describe small molecules as vectors of similarity indices to other compounds, thereby providing a flexible means to explore diverse molecular representations. We show that our proposed method achieves high prediction performance through multiple cross-validation and time-split validation procedures over a series of datasets. In addition, we demonstrate that our method performed a balanced exploration of both chemical ligand space (scaffold hopping) and biological target space (target hopping). Our results suggest robust and balanced performance, and our method may be useful for predicting drug targets, virtual screening, and drug repositioning. Full article

(This article belongs to the Special Issue Computational Methods in Drug Design)

► Show Figures

Graphical abstract

16 pages, 4936 KB

Open AccessArticle

Application of Feature Selection Based on Multilayer GA in Stock Prediction

by Xiaoning Li, Qiancheng Yu, Chen Tang, Zekun Lu and Yufan Yang

Symmetry 2022, 14(7), 1415; https://doi.org/10.3390/sym14071415 - 10 Jul 2022

Cited by 8 | Viewed by 2412

Abstract

This paper proposes a feature selection model based on a multilayer genetic algorithm (GA) to select the features of a high stock dividend (HSD) and eliminate the relatively redundant features in the optimal solution by using layer-by-layer information transfer and two-dimensionality reduction methods. [...] Read more.

This paper proposes a feature selection model based on a multilayer genetic algorithm (GA) to select the features of a high stock dividend (HSD) and eliminate the relatively redundant features in the optimal solution by using layer-by-layer information transfer and two-dimensionality reduction methods. Combining the ensemble model and time-series split cross-validation (TSCV) indicator as the fitness function solves the problem of selecting the fitness function for each layer. The symmetry character of the model is fully utilized in the two-dimensionality reduction processes, according to the change in data dimensions and the unbalanced characteristics of the HSD, setting the corresponding TSCV indicators. We built seven ensemble prediction models for actual stock trading data for comparison experiments. The results show that the feature selection model based on multilayer GA can effectively eliminate the relatively redundant features after dimensionality reduction and significantly improve the balancing accuracy, precision and AUC performance of the seven ensemble learning models. Finally, adversarial validation is used to analyze the differences in the balanced accuracy of the training and test sets caused by the inconsistent distribution of the data sets. Full article

(This article belongs to the Special Issue Machine Learning and Data Analysis)

► Show Figures

Figure 1

24 pages, 8746 KB

Open AccessArticle

Machine Learning-Based Intelligent Prediction of Elastic Modulus of Rocks at Thar Coalfield

by Niaz Muhammad Shahani, Xigui Zheng, Xiaowei Guo and Xin Wei

Sustainability 2022, 14(6), 3689; https://doi.org/10.3390/su14063689 - 21 Mar 2022

Cited by 36 | Viewed by 4307

Abstract

Elastic modulus (E) is a key parameter in predicting the ability of a material to withstand pressure and plays a critical role in the design of rock engineering projects. E has broad applications in the stability of structures in mining, petroleum, geotechnical engineering, [...] Read more.

Elastic modulus (E) is a key parameter in predicting the ability of a material to withstand pressure and plays a critical role in the design of rock engineering projects. E has broad applications in the stability of structures in mining, petroleum, geotechnical engineering, etc. E can be determined directly by conducting laboratory tests, which are time consuming, and require high-quality core samples and costly modern instruments. Thus, devising an indirect estimation method of E has promising prospects. In this study, six novel machine learning (ML)-based intelligent regression models, namely, light gradient boosting machine (LightGBM), support vector machine (SVM), Catboost, gradient boosted tree regressor (GBRT), random forest (RF), and extreme gradient boosting (XGBoost), were developed to predict the impacts of four input parameters, namely, wet density (ρ_wet) in gm/cm³, moisture (%), dry density (ρ_d) in gm/cm³, and Brazilian tensile strength (BTS) in MPa on output E (GPa). The associated strengths of every input and output were systematically measured employing a series of fundamental statistical investigation tools to categorize the most dominant and important input parameters. The actual dataset of E was split as 70% for the training and 30% for the testing for each model. In order to enhance the performance of each developed model, an iterative 5-fold cross-validation method was used. Therefore, based on the results of the study, the XGBoost model outperformed the other developed models with a higher accuracy, coefficient of determination (R² = 0.999), mean absolute error (MAE = 0.0015), mean square error (MSE = 0.0008), root mean square error (RMSE = 0.0089), and a20-index = 0.996 of the test data. In addition, GBRT and RF have also shown high accuracy in predicting E with R² values of 0.988 and 0.989, respectively, but they can be used conditionally. Based on sensitivity analysis, all parameters were positively correlated, while BTS was the most influential parameter in predicting E. Using an ML-based intelligent approach, this study was able to provide alternative elucidations for predicting E with appropriate accuracy and run time at Thar coalfield, Pakistan. Full article

(This article belongs to the Special Issue Advances in Rock Mechanics and Geotechnical Engineering)

► Show Figures

Figure 1

25 pages, 1110 KB

Open AccessArticle

On Comparing Cross-Validated Forecasting Models with a Novel Fuzzy-TOPSIS Metric: A COVID-19 Case Study

by Dalton Garcia Borges de Souza, Erivelton Antonio dos Santos, Francisco Tarcísio Alves Júnior and Mariá Cristina Vasconcelos Nascimento

Sustainability 2021, 13(24), 13599; https://doi.org/10.3390/su132413599 - 9 Dec 2021

Cited by 5 | Viewed by 2817

Abstract

Time series cross-validation is a technique to select forecasting models. Despite the sophistication of cross-validation over single test/training splits, traditional and independent metrics, such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), are commonly used to assess the model’s accuracy. [...] Read more.

Time series cross-validation is a technique to select forecasting models. Despite the sophistication of cross-validation over single test/training splits, traditional and independent metrics, such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), are commonly used to assess the model’s accuracy. However, what if decision-makers have different models fitting expectations to each moment of a time series? What if the precision of the forecasted values is also important? This is the case of predicting COVID-19 in Amapá, a Brazilian state in the Amazon rainforest. Due to the lack of hospital capacities, a model that promptly and precisely responds to notable ups and downs in the number of cases may be more desired than average models that only have good performances in more frequent and calm circumstances. In line with this, this paper proposes a hybridization of the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) and fuzzy sets to create a similarity metric, the closeness coefficient (CC), that enables relative comparisons of forecasting models under heterogeneous fitting expectations and also considers volatility in the predictions. We present a case study using three parametric and three machine learning models commonly used to forecast COVID-19 numbers. The results indicate that the introduced fuzzy similarity metric is a more informative performance assessment metric, especially when using time series cross-validation. Full article

(This article belongs to the Topic Industrial Engineering and Management)

► Show Figures

Figure 1

23 pages, 19939 KB

Open AccessArticle

Computer-Aided Intracranial EEG Signal Identification Method Based on a Multi-Branch Deep Learning Fusion Model and Clinical Validation

by Yiping Wang, Yang Dai, Zimo Liu, Jinjie Guo, Gongpeng Cao, Mowei Ouyang, Da Liu, Yongzhi Shan, Guixia Kang and Guoguang Zhao

Brain Sci. 2021, 11(5), 615; https://doi.org/10.3390/brainsci11050615 - 11 May 2021

Cited by 29 | Viewed by 4637

Abstract

Surgical intervention or the control of drug-refractory epilepsy requires accurate analysis of invasive inspection intracranial EEG (iEEG) data. A multi-branch deep learning fusion model is proposed to identify epileptogenic signals from the epileptogenic area of the brain. The classical approach extracts multi-domain signal [...] Read more.

Surgical intervention or the control of drug-refractory epilepsy requires accurate analysis of invasive inspection intracranial EEG (iEEG) data. A multi-branch deep learning fusion model is proposed to identify epileptogenic signals from the epileptogenic area of the brain. The classical approach extracts multi-domain signal wave features to construct a time-series feature sequence and then abstracts it through the bi-directional long short-term memory attention machine (Bi-LSTM-AM) classifier. The deep learning approach uses raw time-series signals to build a one-dimensional convolutional neural network (1D-CNN) to achieve end-to-end deep feature extraction and signal detection. These two branches are integrated to obtain deep fusion features and results. Resampling is employed to split the imbalanced epileptogenic and non-epileptogenic samples into balanced subsets for clinical validation. The model is validated over two publicly available benchmark iEEG databases to verify its effectiveness on a private, large-scale, clinical stereo EEG database. The model achieves high sensitivity (97.78%), accuracy (97.60%), and specificity (97.42%) on the Bern–Barcelona database, surpassing the performance of existing state-of-the-art techniques. It is then demonstrated on a clinical dataset with an average intra-subject accuracy of 92.53% and cross-subject accuracy of 88.03%. The results suggest that the proposed method is a valuable and extremely robust approach to help researchers and clinicians develop an automated method to identify the source of iEEG signals. Full article

(This article belongs to the Special Issue Neuroinformatics and Signal Processing)

► Show Figures

Figure 1

10 pages, 653 KB

Open AccessArticle

Prediction of Somatotype from Bioimpedance Analysis in Elite Youth Soccer Players

by Francesco Campa, Catarina N. Matias, Pantelis T. Nikolaidis, Henry Lukaski, Jacopo Talluri and Stefania Toselli

Int. J. Environ. Res. Public Health 2020, 17(21), 8176; https://doi.org/10.3390/ijerph17218176 - 5 Nov 2020

Cited by 10 | Viewed by 3712

Abstract

The accurate body composition assessment comprises several variables, causing it to be a time consuming evaluation as well as requiring different and sometimes costly measurement instruments. The aim of this study was to develop new equations for the somatotype prediction, reducing the number [...] Read more.

The accurate body composition assessment comprises several variables, causing it to be a time consuming evaluation as well as requiring different and sometimes costly measurement instruments. The aim of this study was to develop new equations for the somatotype prediction, reducing the number of normal measurements required by the Heath and Carter approach. A group of 173 male soccer players (age, 13.6 ± 2.2 years, mean ± standard deviation; body mass index, BMI, 19.9 ± 2.5 kg/m²), members of the academy of a professional Italian soccer team participating in the first division (Serie A), participated in this study. Bioelectrical impedance analysis (BIA) was performed using the single frequency of 50 kHz and fat-free mass (FFM) was calculated using a BIA specific, impedance based equation. Somatotype components were estimated according to the Heath-Carter method. The participants were randomly split into development (n = 117) and validation groups (n = 56). New anthropometric and BIA based models were developed (endomorphy = −1.953 − 0.011 × stature²/resistance + 0.135 × BMI + 0.232 × triceps skinfold, R² = 0.86, SEE = 0.28; mesomorphy = 6.848 + 0.138 × phase angle + 0.232 × contracted arm circumference + 0.166 × calf circumference − 0.093 × stature, R² = 0.87, SEE = 0.40; ectomorphy = −5.592 − 38.237 × FFM/stature + 0.123 × stature, R² = 0.86, SEE = 0.37). Cross validation revealed R² of 0.84, 0.80, and 0.87 for endomorphy, mesomorphy, and ectomorphy, respectively. The new proposed equations allow for the integration of the somatotype assessment into BIA, reducing the number of collected measurements, the instruments used, and the time normally required to obtain a complete body composition analysis. Full article

(This article belongs to the Special Issue Training Load and Performance Monitoring, Recovery, Wellbeing, Illness and Injury Prevention)

► Show Figures

Figure 1

12 pages, 1492 KB

Open AccessArticle

Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients

by Fu-Yuan Cheng, Himanshu Joshi, Pranai Tandon, Robert Freeman, David L Reich, Madhu Mazumdar, Roopa Kohli-Seth, Matthew A. Levin, Prem Timsina and Arash Kia

J. Clin. Med. 2020, 9(6), 1668; https://doi.org/10.3390/jcm9061668 - 1 Jun 2020

Cited by 146 | Viewed by 13717

Abstract

Objectives: Approximately 20–30% of patients with COVID-19 require hospitalization, and 5–12% may require critical care in an intensive care unit (ICU). A rapid surge in cases of severe COVID-19 will lead to a corresponding surge in demand for ICU care. Because of constraints [...] Read more.

Objectives: Approximately 20–30% of patients with COVID-19 require hospitalization, and 5–12% may require critical care in an intensive care unit (ICU). A rapid surge in cases of severe COVID-19 will lead to a corresponding surge in demand for ICU care. Because of constraints on resources, frontline healthcare workers may be unable to provide the frequent monitoring and assessment required for all patients at high risk of clinical deterioration. We developed a machine learning-based risk prioritization tool that predicts ICU transfer within 24 h, seeking to facilitate efficient use of care providers’ efforts and help hospitals plan their flow of operations. Methods: A retrospective cohort was comprised of non-ICU COVID-19 admissions at a large acute care health system between 26 February and 18 April 2020. Time series data, including vital signs, nursing assessments, laboratory data, and electrocardiograms, were used as input variables for training a random forest (RF) model. The cohort was randomly split (70:30) into training and test sets. The RF model was trained using 10-fold cross-validation on the training set, and its predictive performance on the test set was then evaluated. Results: The cohort consisted of 1987 unique patients diagnosed with COVID-19 and admitted to non-ICU units of the hospital. The median time to ICU transfer was 2.45 days from the time of admission. Compared to actual admissions, the tool had 72.8% (95% CI: 63.2–81.1%) sensitivity, 76.3% (95% CI: 74.7–77.9%) specificity, 76.2% (95% CI: 74.6–77.7%) accuracy, and 79.9% (95% CI: 75.2–84.6%) area under the receiver operating characteristics curve. Conclusions: A ML-based prediction model can be used as a screening tool to identify patients at risk of imminent ICU transfer within 24 h. This tool could improve the management of hospital resources and patient-throughput planning, thus delivering more effective care to patients hospitalized with COVID-19. Full article

(This article belongs to the Special Issue COVID-19: From Pathophysiology to Clinical Practice)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI